Wednesday, May 26, 2010

[from Anthony] some CUDA examples

There's a website I found which has a handful of CUDA codes such as adding matrics and solving Laplace's equation.

https://visualization.hpc.mil/wiki/GPGPU

I do find it useful for CUDA beginner's like myself since they explain what is happening at each step of their code. They also compare codes that are programmed for the CPU and GPU in terms of how they are written (in C++ btw) and how fast the CPU and GPU execute the program.

For instance, in the Laplace 2D solver they discussed that the smaller the grid (the smaller then NxN array) and fewer iterations the more likely that the CPU wins in terms of speed because the GPU puts more effort in setting up. Here are some of my results with different N and iterations...

n=10 iterations=1000 time_gpu=6586.000 time_cpu=1601.000

n=10 iterations=100 time_gpu=615.000 time_cpu=150.000

n=60 iterations=100 time_gpu=617.000 time_cpu=5980.000

I won't be posting their codes - it's pretty length-y. They also have a plot where they compare the CPU/GPU times of executing the program.


By the way, the devices I have in my macbook pro are...


Device 0: "GeForce 9600M GT"

CUDA Driver Version: 3.0

CUDA Runtime Version: 3.0

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 1

Total amount of global memory: 268107776 bytes

Number of multiprocessors: 4

Number of cores: 32

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Clock rate: 1.25 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: No

Compute mode: Default (multiple host threads can use this device simultaneously)


Device 1: "GeForce 9400M"

CUDA Driver Version: 3.0

CUDA Runtime Version: 3.0

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 1

Total amount of global memory: 266010624 bytes

Number of multiprocessors: 2

Number of cores: 16

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Clock rate: 1.10 GHz

Concurrent copy and execution: No

Run time limit on kernels: Yes

Integrated: Yes

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)



It's not as great in comparison tp what we have in our new machines where our group installed the new Fermi cards (I think three for each node or something?). So if I were to carry out some massive, intense computations and graphics it would be recommended to use our new machines since they have a better cooling system.

Just now I tried the code in Dr. Dobbs part 1 CUDA. It had no problems compiling with the nvcc command. It also suggested to add printf to track what is happening in the program. I will try that and see what kind of results I get.

For our CPU programs, most of us have written our N*(2-body) simulations and I have done that with both cartesian and polar coords. Still working on the visuals where I'm trying with OpenGL. Josh, you should post your openGL classes, codes, whatnot - it will save *all* of us a lot of time.

I already have mathGL in place but apparently it uses kuickshow as its image viewer and I seem to have difficulties getting kuickshow on my mac, so right now it can't output any image with mathGL. But it has no problems compiling so I'm pretty sure I installed it correclty.

As if now, I am (most of us as well....) modifying our codes to include radiation pressure as a source of physical interactions between particles and I'm still figuring out how to do this with our codes.


Anthony


No comments:

Post a Comment