tutorial from Nvidia GTC

October 16, 2009

My talk is attached in *pdf. Overall the conference felt like a rock’n'roll show, my first non-academic meeting of this nature. Thoroughly enjoyable although a little difficult to tell what people are doing since it’s mostly proprietary work. The announcement of fermi was certainly exciting; ~8x times the double prec performance, ECC and many more cores/memory…

Talk outline:

  • Company Background

  • CUDA accelerates Geophysics:
    • Data Processing w/ Linear Algebra

    • Sparse Matrices
    • Seismic Imaging:Kirchhoff Time Migration
      • Frequency Filtering

      • Traveltime/Weight Calculations
      • GPU Migration
  • Summary

nvidiaSummit0909


CUDA + OCTAVE

September 30, 2009

I’ve been experimenting with CUDA and OCTAVE; there is at least one company who have produced GPU enabled MEX functions. The big difficulty is of course that there is no support for internal floats within OCTAVE (afaik) and similarly with Matlab. However if one can leave the data and work with it on the device for some time, then there are only two explicit conversions btwn float <-> double needed. Or you could sacrifice some performance in CUDA in return for using doubles. At any rate here’s an example Makefile, happy experimenting. For this example I gutted the matrix Mul example from the CUDA sdk; the wrapper *oct source code (*cc, really C++ with octave extensions) contains an extern C section which references the cuda kernel (*cu). Don’t forget to indent instructions under ‘all’ with a single tab for make.


#! /usr/bin/env make
#make file for octfile/cuda
#Mac OSX 10.5.8 intel core 2 duo
#cuda include/lib
CUDA_INC_PATH=/usr/local/cuda/include
CUDA_LIB_PATH=/usr/local/cuda/lib
#octfile compiler
CC=mkoctfile
#basic flags
CFLAGS= -I$(CUDA_INC_PATH)
LDFLAGS= -L$(CUDA_LIB_PATH) -lcudart -lcuda

all:
$(CC) $(CFLAGS) -c cudaMatrixMul.cc -o cudaMatrixMul.o
nvcc  -c matMul_kernel.cu -o matMul_kernel.o -Wall
$(CC) $(LDFLAGS) cudaMatrixMul.o matMul_kernel.o -o cudaMatrixMul.oct

#clean:
rm -f  cudaMatrixMul.o matMul_kernel.o


CUDA/GPU install

April 15, 2009

assuming you’re installing GPU + CUDA from scratch, here’s the steps
for a linux i686:

1. Find and download appropriate driver from Nvidia. With the old GPU card still in place, switch to terminal ‘ctrl-alt-F1′ and as root stop the X-server by issuing ‘init 3′. Then install via ’sh NV*run’, answering yes to all the prompts. Issue ‘init 5′ and shut down… when done, replace video card

2. Reboot and install the CUDA toolkit, don’t forget to ‘chmod +x cud*run’ before issuing ’sh cud*run’. After answering yes to all the prompts & installing, add to .bash_profile:

PATH=$PATH:/usr/local/cuda/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib

export PATH
export LD_LIBRARY_PATH

3. Install the CUDA SDK, you should be good to go, but may also need to disable SELinux (eg., temporarily: ‘echo 0 > /selinux/enforce’)


Expression Templates in C++

April 15, 2009

High performance C++ has traditionally been a misnomer, particularly with regards to linear algebra classes and templates. Handcoded C and FORTRAN are the usual approaches taken in the physical sciences, but these lack the portability, maintainability and overall elegance of (object oriented) C++. It was brought to my attention recently that there are really clever ways of using the compiler for performance hacks in C++. Amongst these methods are expression templates by Todd Veldhuizen et al.

The difficulty in vanilla C++ array classes is the overhead associated with operations on arrays,  eg., addition and multiplication. Temporary objects are created, requiring precious memory bandwidth and unnecessary loops.  Expression templates is an ingenious way around this, by essentially nesting templates, and in conjunction with overloaded operators, creating parse trees at compile time, which effectively inline expressions. The following is pseudocode from Todd Veldhuizen’s notes:

struct plus{}; //overloaded operator
class Array{}; //the class

//X is a node in the parse tree; method
//works b/c class can take itself as template
//parameter

template<typename Left, typename Op, typename Right>
class X{};

//overloaded operator ‘+’ for use in nesting templated
//expressions

template<class T>
X<T, plus, Array> operator+(T,Array)
{

return X<T,plus, Array>()

}

If A,B,C represent class instances, then “D=A+B+C” looks like:

D=A+B+C;
=X<Array,plus,Array>() + C;
X<X<Array,plus,Array>,plus,Array>();

which gives performance comparable to handcoded C. For more information see the article.


CUDA part I

April 10, 2009

NMR can provide unparalleled insight into local atomic structural details, although in the solid state interpretation of lineshapes is hindered by anisotropic broadening, eg., the attached MQMAS spectra of two distinct chemical sites. The situation grows much worse for disordered materials where parameters take on a stochastic nature. I’ve developed some HPC software to enable the simulation of MQMAS spectra for these scenarios, and now my attention is turned to implementation using general purpose graphical processor unit (GPGPU) programming. For the paltry sum of 70USD you can have a little super computer of your own in the form the NVIDIA GeForce 8400 GPU. I installed today and have started playing…

foo


CP2k install howto

April 8, 2009

There is the possibility that I’ll be using CP2k in the near future, so I’ve been getting better acquainted. By all accounts this is a distribution that makes you work for its love, which is fine b/c I’m not needy. It also represents a new domain for me, mixed Gaussian/plane waves and molecular dynamics, I look forward to what it can do. In the meantime, Lady and Gentleman, I give you 5 READMEs and 7 hours later the complete steps for a full install on a fresh Linux i686 OS


0.0 Get Coffee

1.0 Install g95:

# wget -O – http://ftp.g95.org/g95-x86-linux.tgz | tar xvfz -

//check g95-install/bin for binary name, make symbolic link, eg.,

# ln -s /root/g95-install/bin/i686-unknown-linux-gnu-g95 /bin/g95

2.0 Install Lapack:

# wget -O – http://www.netlib.org/lapack/lapack.tgz | tar xvfz -

# cd lap*

//cp & edit the make.inc file to use g95 (edit FORTRAN & LOADER lines)

# cp make.inc.example make.inc
# vim make.inc

//make

# make lib

2.1 Lunch

3.0 Install & run ATLAS:

//get file from sourceforge & gunzip etc, cd into ATLAS,
//make a dir

# mkdir MyObj
# cd MyObj

//run configure specifying compiler and path to LAPACK eg.,

# /root/ATLAS/configure -C if /bin/g95 –with-netlib-lapack=/root/lapack-3.2/lapack_LINUX.a

//run all variants of make (will take quite a while :) )

# make
# make check
# make ptcheck
# make time
# make install

3.1 Coffee

4. Install FFTW3:

# wget -O – ftp://ftp.fftw.org/pub/fftw/fftw-3.2.1.tar.gz | tar xvfz -
# cd fft*
# ./configure
# make
# make install

5. Install cp2k

//download eg., via cvs:

# cvs -d :pserver:anonymous@cvs.cp2k.berlios.de:/cvsroot/cp2k co cp2k

// cd into cp2k/arch, create/edit arch file, eg., for Linux 686 w/ g95, serial;

# cp Linux-i686-g95.sopt foo.sopt

// edit with vim, making sure order of libraries is lapack/blas/cblas/atlas:

LIBS = /usr/local/atlas/lib/liblapack.a /usr/local/atlas/lib/libf77blas.a /
usr/local/atlas/lib/libcblas.a /usr/local/atlas/lib/libatlas.a

// cd into cp2k/makefiles,

# make ARCH=foo VERSION=sopt
# make install

// if build succesful, should have executable in ~/cp2k/exe/foo, make link eg.,

# ln -s /root/cp2k/exe/foo/cp2k.sopt /bin/cp2k

5.1 Dinner

5.2 Coffee

6. Watch tv for a bit

7. Edit & run ~cp2k/tools/do_regtest in future


emergence via connectedness

March 19, 2009

Cellular Automata have proved fertile research for at least 50 years, providing models for everything ranging from crystal growth to fluid dynamics. Discrete rules are applied at each time step to evolve a particular cellular state; ordered patterns emerge from random initial configurations. The example given here using the connected components labeling (CCL) algorithm demonstrates how topology also can give rise to order from randomness.

in_test2 out_test1

CCL in OCTAVE:

lab=1; freq=1; x=0; y=0;
% main loop(s)

for i=2:c-1
for j=2:d-1

if ( a(i,j)!=0 & label(i,j)==0)

x=i;
y=j;

label(i,j)=lab;
while (x!=1) & (y!=1) & (x!=c) & (y!=d)

[mm,x,nn,y]=pop_s(x,y);

if (a(mm-1,nn-1)==1 && label(mm-1,nn-1)==0 )

[x,y]=push_s(mm-1,x,nn-1,y);
label(mm-1,nn-1)=lab;
freq(lab)++;
end

if (a(mm-1,nn)==1 && label(mm-1,nn)==0)

[x,y]=push_s(mm-1,x,nn,y);
label(mm-1,nn)=lab;
freq(lab)++;
end

if (a(mm-1,nn+1)==1 && label(mm-1,nn+1)==0)

[x,y]=push_s(mm-1,x,nn+1,y);
label(mm-1,nn+1)=lab;
freq(lab)++;
end

if (a(mm,nn-1)==1 && label(mm,nn-1)==0)

[x,y]=push_s(mm,x,nn-1,y);
label(mm,nn-1)=lab;
freq(lab)++;
end

if (a(mm,nn+1)==1 && label(mm,nn+1)==0)

[x,y]=push_s(mm,x,nn+1,y);
label(mm,nn+1)=lab;
freq(lab)++;
end

if (a(mm+1,nn-1)==1 && label(mm+1,nn-1)==0)

[x,y]=push_s(mm+1,x,nn-1,y);
label(mm+1,nn-1)=lab;
freq(lab)++;
end

if (a(mm+1,nn)==1 && label(mm+1,nn)==0)

[x,y]=push_s(mm+1,x,nn,y);
label(mm+1,nn)=lab;
freq(lab)++;
end

if (a(mm+1,nn+1)==1 && label(mm+1,nn+1)==0)

[x,y]=push_s(mm+1,x,nn+1,y);
label(mm+1,nn+1)=lab;
freq(lab)++;
end

end %endwhile

lab++;
freq=[freq, 1];

end %endif

end
end


CPMG simulation/modeling

February 23, 2009

A script for simulating the CPMG sequence using the density matrix, with Gauss/Gauss envelope/spikelet broadening, for a forthcoming paper.


function [tot, gbb,t]=d_cpmg(N,tau,n,delta,r,rr)

%cpmg experiment evolution wjb 02/09
%simple/ideal pulse sequence:

%90y-n*[-180x-]

%N time steps
%tau
%n 180 loops
%delta chem shift
%r envelope gauss br^2
%rr spikelet gauss br^2

%matrix for I_x & I_y

a=[0 1/2; 1/2 0]; b=[0 -i/2; i/2 0];

%time step & initial rho

t=tau/(N-1); rho=a;

for k=1:n
sig(1)=trace(rho*a); sigi(1)=trace(rho*b);

for j=2:N/2

%iterate; free precession for tau/2

rho = [exp(-i*t*delta) 0; 0 exp(i*t*delta)]*rho*[exp(i*t*delta) 0; 0 exp(-i*t*delta)];

sig(j)=trace(rho*a); sigi(j)=trace(rho*b);

end

%apply 180x

rho = [0 exp(i*pi/2); exp(i*pi/2) 0]*rho*[0 exp(-i*pi/2); exp(-i*pi/2) 0];

%iterate; free precession for tau/2

sig(N/2+1)=trace(rho*a); sigi(N/2+1)=trace(rho*b);

for j=2:N/2

rho = [exp(-i*t*delta) 0; 0 exp(i*t*delta)]*rho*[exp(i*t*delta) 0; 0 exp(-i*t*delta)];

sig(j+N/2)=trace(rho*a); sigi(j+N/2)=trace(rho*b);
end

if k>1

tot=[tot (sig+i*sigi)];

else

tot=(sig+i*sigi);
end

end

tt=-tau/2:t:t*(N-1); gb=exp(-rr.*tt.^2); gb = [gb(N/2+1:N) gb(1:N/2)]; gbb=gb;

for i=1:n-1

gbb=[gbb gb];

end

t=0:t:t*(n*N-1); gbb=gbb.*exp(-r.*t.^2);


cpmg1


Glorious Gawk part II

February 18, 2009

Here’s a snapshot from a shell script to extract various important segments from a *ps file, after conversion from *pdf. It uses various gawk/awk tricks including using patterns for brackets, checking lengths of records to discriminate lines/polylines. Your mileage may vary a little, but if you check the *ps file preamble, you should be able to translate this to your specific tasks


#!/bin/sh
# line/text extract from pdf wjb 12/08, 02/08

NOFILE=64
[ -z $1 ] && echo “foo.sh <filename>” && exit $NOFILE

myfile=${1}

#convert to ps

echo “converting pdf -> ps…”

pdf2ps $myfile gHjLz.ps

echo “…done”

#take out line drawing sections w/ line numbers

echo “extracting lines & text…”

awk ‘$4==”scale”,$1==”Q” {print NR > “gHjLq.txt”}’ gHjLz.ps
awk ‘$4==”scale”,$1==”Q” {print $0 > “gHjLp.txt”}’ gHjLz.ps
awk ‘$1==”q” {print NR > “test.txt”}’ gHjLz.ps

# _p == polylines,  _l == lines

awk ‘BEGIN { RS = “q” } ; {if (NF > 12) print NR,$0 > “gHjLp_p.txt”; else print NR,$0 > “gHjLp_l.txt”}’  gHjLp.txt

awk ‘$4==”scale” {print $1 > “gHjLp_foo.txt”}’ gHjLp_p.txt

awk ‘$4==”scale”,$1==”Q” {print NR,$0 > “gHjLp_mol.txt”}’ gHjLz.ps

#take out text w/ line numbers

awk ‘$4 == “,” {print NR,$0 > “gHjLp_text.txt”}; $1 == “$C” {print NR,x > “gHjLp_text.txt”}; {x=$0}; $1 == “$C”, $1 == “,” {print NR,$0> “gHjLp_text.txt”}’ gHjLz.ps


Building a distributed PSE

February 17, 2009

Over the last couple of years, I’ve been invested in building a distributed problem solving environment (PSE). I offer a simplified application here in the hopes that the general ideas can be useful to someone.

What has also initiated a desire to see this released is my wife’s iPhone with its addictive applications. On the downside, there is a serious lack of real estate and unless I convert to Mac I can’t get my hands on the SDK. However, coupling the iPhone’s portability with a personal server running traditional applications makes for a cheap and powerful PSE. Computational tasks and data storage can take place server side with the iPhone serving as a very pleasing web portal.

pse_fig

There are plenty of applications of these ideas to the sciences, for someone with a modicum of talent and vision. For the purposes of this example I’m going to use free financial data and the computational task will be options pricing under the Black Scholes model for European contracts. I’ve taken a data source at random, and I have no idea if there is a European style options contract available. Also realize that there is potentially unlimited risk associated with options, and it is widely held that the traditional theory is naïve and at worst downright wrong, particularly when using historical data to calculate volatility. So use this example under advisement…and as always, please be careful with curl, don’t get yourself banned :)

  1. Set up MySQL + Apache + PHP on a server
  2. Create Database + table on server:
  3. //invoke mysql (as root)
    >mysql –u root –p
    //create database
    >create database tech_stock;
    //extend permission to user bill with password *****
    >GRANT ALL ON tech_stock.* TO bill@localhost IDENTIFIED BY “*****“;
    //quit;
    >quit;
    //as user, invoke mysql as before; select database
    >mysql tech_stock –u bill –p
    //create a table
    >create table tech_A( n_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, date VARCHAR(8), open FLOAT,
    high FLOAT, low FLOAT, close FLOAT, volume FLOAT);
    //check it exists
    >DESCRIBE tech_A;
    //logout
    >quit;

  4. Write a shell script to grab data:
  5. #!/bin/sh
    # data update/download wjb 02/08
    Month=$(date +%b)
    Day=$(date +%d)
    Year=$(date +%G)
    str1=”http://finance.google.com/finance/historical?cid=659815&startdate=Feb+10%2C+2008&enddate=”
    str2=”%2C+”$Year”&output=csv”
    curl $str1$Month”+”$Day$str2 > tech_A.csv
    exit #cheerio

  6. Make a html portal:
  7. <html><body style=”font-family:verdana”>
    <title>PSE Demo</title>
    <form action=”main.php” method=”post” enctype=”multipart/form-data”>
    <font size=”5″ style=”color:blue”>Data Processing Portal<br>
    <table style=”background-color:blue”>
    <tr><td><font color=”white”>Password:</font> </td>
    <td><input type=”password” name=”pswd” /> </td></tr>
    <tr><td><font color=”white”>Email: </font></td>
    <td><input type=”text” name=”email” /> </td></tr>
    <tr><td><input type=”submit” name=”cmdupload” value=”Submit” /></tr></td>
    </table>
    </form>
    <br>
    <font size=”1″>
    Collaboratory for SDE data modeling <br>
    </font>
    </body></html>

more to come…