tutorial from Nvidia GTC

October 16, 2009

My talk is attached in *pdf. Overall the conference felt like a rock’n'roll show, my first non-academic meeting of this nature. Thoroughly enjoyable although a little difficult to tell what people are doing since it’s mostly proprietary work. The announcement of fermi was certainly exciting; ~8x times the double prec performance, ECC and many more cores/memory…

Talk outline:

  • Company Background

  • CUDA accelerates Geophysics:
    • Data Processing w/ Linear Algebra

    • Sparse Matrices
    • Seismic Imaging:Kirchhoff Time Migration
      • Frequency Filtering

      • Traveltime/Weight Calculations
      • GPU Migration
  • Summary

nvidiaSummit0909


CUDA + OCTAVE

September 30, 2009

I’ve been experimenting with CUDA and OCTAVE; there is at least one company who have produced GPU enabled MEX functions. The big difficulty is of course that there is no support for internal floats within OCTAVE (afaik) and similarly with Matlab. However if one can leave the data and work with it on the device for some time, then there are only two explicit conversions btwn float <-> double needed. Or you could sacrifice some performance in CUDA in return for using doubles. At any rate here’s an example Makefile, happy experimenting. For this example I gutted the matrix Mul example from the CUDA sdk; the wrapper *oct source code (*cc, really C++ with octave extensions) contains an extern C section which references the cuda kernel (*cu). Don’t forget to indent instructions under ‘all’ with a single tab for make.


#! /usr/bin/env make
#make file for octfile/cuda
#Mac OSX 10.5.8 intel core 2 duo
#cuda include/lib
CUDA_INC_PATH=/usr/local/cuda/include
CUDA_LIB_PATH=/usr/local/cuda/lib
#octfile compiler
CC=mkoctfile
#basic flags
CFLAGS= -I$(CUDA_INC_PATH)
LDFLAGS= -L$(CUDA_LIB_PATH) -lcudart -lcuda

all:
$(CC) $(CFLAGS) -c cudaMatrixMul.cc -o cudaMatrixMul.o
nvcc  -c matMul_kernel.cu -o matMul_kernel.o -Wall
$(CC) $(LDFLAGS) cudaMatrixMul.o matMul_kernel.o -o cudaMatrixMul.oct

#clean:
rm -f  cudaMatrixMul.o matMul_kernel.o


CUDA crash course

August 10, 2009

I’m really excited to be going to the Nvidia Research Summit in my new capacity as Senior Physicist for Stone Ridge Technology. Nvidia provide a remarkable product, with exceptional service and support for the scientist, made all the more possible by the majority of money coming from gaming. I’d encourage anyone with a view to doing affordable HPC to start at the CUDA zone, pick up a card from your favorite electronics store, or apply for one via the academic program, and start programming. Here’s a short course below, compiled from notes I made while at PSU, for a rapid introduction, more to come…
cuda crash course


Split-Step Algorithm

June 30, 2009

This is a fast method for the solution of parabolic PDE’s, relying on the FFT implementation of the Fourier Transform to speed things up. Here it’s applied to an acoustic/seismic problem, following the development of Kuperman/Jackson in “Imaging of Complex Media with Acoustic and Seismic Waves”. Consider first the Helmholtz equation for a point source at r’,z’:
eq1_ssa
where G is the Green’s function, K the wavenumber (function of the frequency omega and sound speed c). Assuming azimuthal symmetry, G may be expressed as a product of two functions:
eq2_ssa
and similarly K, now a product of (constant) K0 and index of refraction n. Substituting into the Helmholtz equation gives two PDE’s:
eq3_ssa
The first PDE has Bessel functions as solutions; taking the assymptotic outgoing Hankel function solution and substituting it into the second, with the narrow angle approximation (second derivative of psi with respect to r much smaller than first derivative wrt r), one finds for the second (parabolic) PDE:
eq4_ssa
where chi is the fourier transform of psi. Assuming the variation of n is insignificant, in the wave-space domain the PDE and solution are:
eq5_ssa
Finally, the inverse FT gives the field as a function of depth (Delta r = r-r0, r0 the boundary value):
eq6_ssa

The following is a little hack in octave, to demonstrate the solution method:

function [Psi,s]=test_ssa(N,Dr,P0,n,K0)

% to test split step algo on parabolic PDE:
%
% \frac{\partial^2 \psi}{\partial r^2}
% +2iK0\frac{\partial \psi}{\partial r}
% +K0^2(n^2-1)\psi = 0
%
% K0*n^2 = (const.) wave number
% N=marching steps in field
% P0=boundary field data
% Dr=field step size

% wjb 0609

%initialize
m=length(P0); Psi=zeros(N,m); Psi(1,:)=P0;

%Nyquist thm, s (wave space) range

s=fftshift(-m/2:m/2-1)./Dr;

for k=2:N

Psi(k,:)=exp((i*K0/2)*(n^2-1)*Dr).*ifft(exp(-(i*Dr*s.^2)./(2*K0)).*fft(Psi(k-1,:)));

end

ssa_test


CUDA/GPU install

April 15, 2009

assuming you’re installing GPU + CUDA from scratch, here’s the steps
for a linux i686:

1. Find and download appropriate driver from Nvidia. With the old GPU card still in place, switch to terminal ‘ctrl-alt-F1′ and as root stop the X-server by issuing ‘init 3′. Then install via ’sh NV*run’, answering yes to all the prompts. Issue ‘init 5′ and shut down… when done, replace video card

2. Reboot and install the CUDA toolkit, don’t forget to ‘chmod +x cud*run’ before issuing ’sh cud*run’. After answering yes to all the prompts & installing, add to .bash_profile:

PATH=$PATH:/usr/local/cuda/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib

export PATH
export LD_LIBRARY_PATH

3. Install the CUDA SDK, you should be good to go, but may also need to disable SELinux (eg., temporarily: ‘echo 0 > /selinux/enforce’)


CUDA part I

April 10, 2009

NMR can provide unparalleled insight into local atomic structural details, although in the solid state interpretation of lineshapes is hindered by anisotropic broadening, eg., the attached MQMAS spectra of two distinct chemical sites. The situation grows much worse for disordered materials where parameters take on a stochastic nature. I’ve developed some HPC software to enable the simulation of MQMAS spectra for these scenarios, and now my attention is turned to implementation using general purpose graphical processor unit (GPGPU) programming. For the paltry sum of 70USD you can have a little super computer of your own in the form the NVIDIA GeForce 8400 GPU. I installed today and have started playing…

foo