Astro-HPC Research Team, Summer 2010: [From Robert] Installing the software Part 1

Installing the software proved to be just as difficult, if not more, to set up than the hardware. The specifications on Nvidia's website state that the CUDA software will run on (one of many flavours of linux) Fedora 10 .
Because of this, we decided to work with version 10 64-bit (even though the latest build was 13 at the time). However, due to troubles in recognizing our new video cards (GTX 480), we were not able to load the Graphical User Interface (GUI) properly. This then prompted us to install Fedora 11 64-bit, but we ran into the main issue of the NVIDIA drivers not installing on our system due to incompatible kernel. With so much frustration, we decided to take a chance with Fedora 12 64-bit. Sure enough, everything started to work out, and installing the NVIDIA and CUDA software were on their way.
---
First off, installing the NVIDIA drivers, we decided to use Linux x86_64 Display Driver Version 256.25 Beta.
Because the NVIDIA drivers require a non-X11 interface, the computer needs to be exited out from it. To do so, a Terminal window was opened up, and logged in as root
# su
Then, the OS initialization file called 'inittab' was edited
# cd /etc/
Here, the last line of the file was edited from
id:5:initdefault:
to
id:3:initdefault:
The file was saved and the computer was rebooted. From here the computer then loaded in the OS but loaded a text based log-in screen, due to the changes made in the file. Here, the OS was then logged in as root, and then moved to the location of the NVIDIA driver to run the installation file. In our case:
# cd /Downloads/
# sh NVIDIA-Linux-x86_64-256.25.run
After installing, the inittab file needed to be reverted back to 'id:5:initdefault:' so that on boot, it would load the GUI. Once saved, the computer was restarted.
--
After installing the NVIDIA drivers, the CUDA compiler needed to be installed. In our case:
# cd /Downloads/
# sudo sh cudatoolkit_3.0_linux_64_fedora10.run.sh
This was installed in the default path (/usr/local/cuda)
To make the compiler command (nvcc) work, the .bash_profile and the .bashrc (located at ~/) needed to be edited. Both PATH and LD_LIBRARY_PATH needs to be pointed to the CUDA library:
#
PATH=/usr/local/cuda/bin:$PATH
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH #if 32 bit machine is used, use lib:$
export PATH LD_LIBRARY_PATH
#
For the .bashrc file, the following needed to be added:
#
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
#

The system then needed to be restarted for the changes to take place.
--
The only thing left now was installing the SDK, which includes both sample and benchmark programs and diagnostics for the hardware. In our case:
# cd /Downloads/
# sh gpucomputingsdk_3.0_linux.run.sh
This was then installed in the default path (~/NVIDIA_GPU_Computing_SDK/), although it could have been installed in non-user path as well.

Once this was installed, a MAKE file had to be run on the SDK at ~/NVIDIA_GPU_Computing_SDK/C/
In our machine, our make file did not execute properly, complaining about " cannot find -lglut "
To fix this, we needed to install the glut packages:
# sudo yum install freeglut
and
# sudo yum install freeglut-devel

Once these were installed, the MAKE file was executed again, and it passed, installing the SDK examples to ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/
From there, a numerous executables are available, mainly deviceQuery and bandwidthTest.

# ./bandwidthTest --device=all
[bandwidthTest]
./bandwidthTest Starting...

!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!

Running on...

Device 0: GeForce GTX 480
Device 1: GeForce GTX 480
Device 2: GeForce GTX 480
Quick Mode

Host to Device Bandwidth, 3 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5155.2

Device to Host Bandwidth, 3 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4267.7

Device to Device Bandwidth, 3 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 120670.9

[bandwidthTest] - Test results:
PASSED

# ./deviceQuery --all
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 3 devices supporting CUDA

Device 0: "GeForce GTX 480"
CUDA Driver Version: 3.00
CUDA Runtime Version: 3.00
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1609760768 bytes
Number of multiprocessors: 15
Number of cores: 480
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.45 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Device 1: "GeForce GTX 480"
CUDA Driver Version: 3.00
CUDA Runtime Version: 3.00
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1610285056 bytes
Number of multiprocessors: 15
Number of cores: 480
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.45 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Device 2: "GeForce GTX 480"
CUDA Driver Version: 3.00
CUDA Runtime Version: 3.00
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1610285056 bytes
Number of multiprocessors: 15
Number of cores: 480
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.45 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4246847, CUDA Runtime Version = 3.00, NumDevs = 3, Device = GeForce GTX 480, Device = GeForce GTX 480

PASSED

At this point, the NVIDIA CUDA was installed and working.
To be continued...

Astro-HPC Research Team, Summer 2010

Saturday, June 26, 2010

[From Robert] Installing the software Part 1

No comments:

Post a Comment