Hagedoorn Construction

November 25, 2009

I’ve been writing GPU code for Kirchhoff Time Migration, a technique applied in seismic imaging. Among the tools used by geophysicists and other workers in oil + gas to perform seismic imaging, the most ubiquitous is probably KTM. This method is directly preceeded by the seismic image construction technique developed primarily by Hagedoorn, although both approaches essentially represent Greens functions solutions to the scalar wave equation. Data surveys are constructed using an array of pulsed sources and corresponding receivers. Data recorded at receiver locations (referred to as ‘traces’) generally correspond to a particular source-receiver pair, and in the simplest case source and receiver are coincident. An image into the earth is constructed by first calculating the traveltime for a given src/rec position and image point, using these values as well as the one way traveltime and velocity at the depth in question. This traveltime is then used as an index into the trace data, to find the amplitude point (energy) which contributes to the image point. Many points from different traces constructively and destructively interfere to create the subsequent image. The initial approach developed by Hagedoorn used little more than a compass and ruler to perform this imaging task, which is to quote the vernacular ‘old school’ and provides great pedagogical insight into KTM. There is a fantastic article by Bleistein here, and below is an example of a Hagedoorn construction performed in octave; the surface is the superposition of the arcs, like figure 2 of the article.



black angels

November 17, 2009

my new favorite band, great psychedelic rock


UNIX redirect,pipes & reconfigurable data processing

November 12, 2009

In various sciences one is often invested in processing voluminous data which follows a given format, with many examples following (hopefully) the same format. A typical data file might have a line of environment variables, followed by raw data, which repeats in columns and/or rows eg.,

4,23,45,56,78
12,23,45,56,67,78,89,87,…,72
32,34,23,21,23,56,43,23,…,34
34,54,32,45,89,76,54,98,…,58
67,67,88,32,34,21,22,97,…,51

A typical section of C code which would use this might look like:

float config[8];
scanf(“%f,%f,%f,%f,%f”,&config[1],&config[2],&config[3],
&config[4],&config[5]);

printf(“%f,%f,%f,%f,%f\n”,config[1],config[2],config[3],
config[4],config[5]);

int RECORDS = (int) config[1];
int i;
if ( RECORDS < 100){
int SIZE = sizeof(float)*RECORDS;

float *deltaCSMean = malloc(SIZE);
float *deltaCSSigma = malloc(SIZE);
float *CQMean = malloc(SIZE);
float *CQSigma = malloc(SIZE);
float *etaMean = malloc(SIZE);
float *etaSigma = malloc(SIZE);
float *brdF2 = malloc(SIZE);
float *brdF1 = malloc(SIZE);
float *amplitude = malloc(SIZE);

for (i=0; i< RECORDS; i++){
scanf(“%f,%f,%f,%f,%f,%f,%f,%f,%f”,&deltaCSMean[i],
&deltaCSSigma[i],&CQMean[i],&CQSigma[i],
&etaMean[i],&etaSigma[i],
&brdF2[i],&brdF1[i],&amplitude[i]);
printf(“%f,%f,%f,%f,%f,%f,%f,%f,%f\n”,deltaCSMean[i],
deltaCSSigma[i],CQMean[i],CQSigma[i],
etaMean[i],etaSigma[i],brdF2[i],brdF1[i],amplitude[i]);
}
}

Using redirect a simple invocation of executable foo using input foo1.txt and output foo2.txt would of course be:

./foo < foo1.txt > foo2.txt

which is fine until one needs different data from the file. Supposing in this example data repeats along columns, then one can use awk and a pipe to reshape the data input, removing the need to edit and recompile the C source for different data subsets:

gawk ‘BEGIN{FS=”,”; print “4,12,23,44,56″; getline;} {for(i=lowBnd;i<upBnd;i++){printf(“%f,”,$i) }; printf(“%f\n”,$upBnd) }’ foo1.txt | ./foo > foo2.txt

where lowBnd and upBnd correspond to the data column limits in the input file. Note that the environment variables are written during the BEGIN block, allowing the specification of (for instance) a different job for the new data set.


tutorial from Nvidia GTC

October 16, 2009

My talk is attached in *pdf. Overall the conference felt like a rock’n'roll show, my first non-academic meeting of this nature. Thoroughly enjoyable although a little difficult to tell what people are doing since it’s mostly proprietary work. The announcement of fermi was certainly exciting; ~8x times the double prec performance, ECC and many more cores/memory…

Talk outline:

  • Company Background

  • CUDA accelerates Geophysics:
    • Data Processing w/ Linear Algebra

    • Sparse Matrices
    • Seismic Imaging:Kirchhoff Time Migration
      • Frequency Filtering

      • Traveltime/Weight Calculations
      • GPU Migration
  • Summary

nvidiaSummit0909


CUDA + OCTAVE

September 30, 2009

I’ve been experimenting with CUDA and OCTAVE; there is at least one company who have produced GPU enabled MEX functions. The big difficulty is of course that there is no support for internal floats within OCTAVE (afaik) and similarly with Matlab. However if one can leave the data and work with it on the device for some time, then there are only two explicit conversions btwn float <-> double needed. Or you could sacrifice some performance in CUDA in return for using doubles. At any rate here’s an example Makefile, happy experimenting. For this example I gutted the matrix Mul example from the CUDA sdk; the wrapper *oct source code (*cc, really C++ with octave extensions) contains an extern C section which references the cuda kernel (*cu). Don’t forget to indent instructions under ‘all’ with a single tab for make.


#! /usr/bin/env make
#make file for octfile/cuda
#Mac OSX 10.5.8 intel core 2 duo
#cuda include/lib
CUDA_INC_PATH=/usr/local/cuda/include
CUDA_LIB_PATH=/usr/local/cuda/lib
#octfile compiler
CC=mkoctfile
#basic flags
CFLAGS= -I$(CUDA_INC_PATH)
LDFLAGS= -L$(CUDA_LIB_PATH) -lcudart -lcuda

all:
$(CC) $(CFLAGS) -c cudaMatrixMul.cc -o cudaMatrixMul.o
nvcc  -c matMul_kernel.cu -o matMul_kernel.o -Wall
$(CC) $(LDFLAGS) cudaMatrixMul.o matMul_kernel.o -o cudaMatrixMul.oct

#clean:
rm -f  cudaMatrixMul.o matMul_kernel.o


MCMC paper finished

September 30, 2009

Abstract:

Many nuclei probed by NMR are relatively insensitive to detection, requiring methods such as the Carr-Purcell Meiboom-Gill (CPMG) pulse sequence. Experiments which follow this general approach are composed of pulse trains, giving rise to characteristic spikelet patterns in the frequency domain. In the presence of multiple underlying chemical sites, each spikelet intensity is a sum of some unknown proportion of contributions from each site. This work outlines a modeling approach based around Markov Chain Monte Carlo (MCMC), which negates the need for intensive simulations using density matrix formalism. In support of this technique, a spikelet pattern is produced using the density matrix formalism for an ensemble of spin 1/2 nuclei, and the underlying chemical shifts and intensities reproduced using the method outlined. Finally, MCMC is used to model the CPMG spectrum of a (3,3,3-trifluoropropyl)dimethylchlorosilane (TFS) treated aluminosilicate, providing evidence in support of a particular model of silanol group surface attachment to the bulk.


first IPA

August 10, 2009

I finally got it together and brewed recently, the results were superb.

I used a fairly traditional IPA recipe, leaving most of the cascade malt in tact, ie., very little steeping was actually done. I also used about 2 pounds of black raspberries for an overall result that has strong citrus notes, very sweet and pleasant aftertaste. Next: extra stout in time for fall
first_ipa


CUDA crash course

August 10, 2009

I’m really excited to be going to the Nvidia Research Summit in my new capacity as Senior Physicist for Stone Ridge Technology. Nvidia provide a remarkable product, with exceptional service and support for the scientist, made all the more possible by the majority of money coming from gaming. I’d encourage anyone with a view to doing affordable HPC to start at the CUDA zone, pick up a card from your favorite electronics store, or apply for one via the academic program, and start programming. Here’s a short course below, compiled from notes I made while at PSU, for a rapid introduction, more to come…
cuda crash course


Bachelor City

August 6, 2009

A picture of me, an XE ford falcon, and a dishwasher circa 1996 where I lived with four other friends in Brisbane. Incidentally, the tin foil over my housemate’s window was placed there in order to allow him to sleep in. Till noon.
Bill crushing washing machine


Split-Step Algorithm

June 30, 2009

This is a fast method for the solution of parabolic PDE’s, relying on the FFT implementation of the Fourier Transform to speed things up. Here it’s applied to an acoustic/seismic problem, following the development of Kuperman/Jackson in “Imaging of Complex Media with Acoustic and Seismic Waves”. Consider first the Helmholtz equation for a point source at r’,z’:
eq1_ssa
where G is the Green’s function, K the wavenumber (function of the frequency omega and sound speed c). Assuming azimuthal symmetry, G may be expressed as a product of two functions:
eq2_ssa
and similarly K, now a product of (constant) K0 and index of refraction n. Substituting into the Helmholtz equation gives two PDE’s:
eq3_ssa
The first PDE has Bessel functions as solutions; taking the assymptotic outgoing Hankel function solution and substituting it into the second, with the narrow angle approximation (second derivative of psi with respect to r much smaller than first derivative wrt r), one finds for the second (parabolic) PDE:
eq4_ssa
where chi is the fourier transform of psi. Assuming the variation of n is insignificant, in the wave-space domain the PDE and solution are:
eq5_ssa
Finally, the inverse FT gives the field as a function of depth (Delta r = r-r0, r0 the boundary value):
eq6_ssa

The following is a little hack in octave, to demonstrate the solution method:

function [Psi,s]=test_ssa(N,Dr,P0,n,K0)

% to test split step algo on parabolic PDE:
%
% \frac{\partial^2 \psi}{\partial r^2}
% +2iK0\frac{\partial \psi}{\partial r}
% +K0^2(n^2-1)\psi = 0
%
% K0*n^2 = (const.) wave number
% N=marching steps in field
% P0=boundary field data
% Dr=field step size

% wjb 0609

%initialize
m=length(P0); Psi=zeros(N,m); Psi(1,:)=P0;

%Nyquist thm, s (wave space) range

s=fftshift(-m/2:m/2-1)./Dr;

for k=2:N

Psi(k,:)=exp((i*K0/2)*(n^2-1)*Dr).*ifft(exp(-(i*Dr*s.^2)./(2*K0)).*fft(Psi(k-1,:)));

end

ssa_test