UNIX redirect,pipes & reconfigurable data processing

November 12, 2009

In various sciences one is often invested in processing voluminous data which follows a given format, with many examples following (hopefully) the same format. A typical data file might have a line of environment variables, followed by raw data, which repeats in columns and/or rows eg.,

4,23,45,56,78
12,23,45,56,67,78,89,87,…,72
32,34,23,21,23,56,43,23,…,34
34,54,32,45,89,76,54,98,…,58
67,67,88,32,34,21,22,97,…,51

A typical section of C code which would use this might look like:

float config[8];
scanf(“%f,%f,%f,%f,%f”,&config[1],&config[2],&config[3],
&config[4],&config[5]);

printf(“%f,%f,%f,%f,%f\n”,config[1],config[2],config[3],
config[4],config[5]);

int RECORDS = (int) config[1];
int i;
if ( RECORDS < 100){
int SIZE = sizeof(float)*RECORDS;

float *deltaCSMean = malloc(SIZE);
float *deltaCSSigma     = malloc(SIZE);
float *CQMean           = malloc(SIZE);
float *CQSigma           = malloc(SIZE);
float *etaMean           = malloc(SIZE);
float *etaSigma         = malloc(SIZE);
float *brdF2             = malloc(SIZE);
float *brdF1             = malloc(SIZE);
float *amplitude         = malloc(SIZE);

for (i=0; i< RECORDS; i++){
scanf(“%f,%f,%f,%f,%f,%f,%f,%f,%f”,&deltaCSMean[i],
&deltaCSSigma[i],&CQMean[i],&CQSigma[i],
&etaMean[i],&etaSigma[i],
&brdF2[i],&brdF1[i],&amplitude[i]);
printf(“%f,%f,%f,%f,%f,%f,%f,%f,%f\n”,deltaCSMean[i],
deltaCSSigma[i],CQMean[i],CQSigma[i],
etaMean[i],etaSigma[i],brdF2[i],brdF1[i],amplitude[i]);
}
}

Using redirect a simple invocation of executable foo using input foo1.txt and output foo2.txt would of course be:

./foo < foo1.txt > foo2.txt

which is fine until one needs different data from the file. Supposing in this example data repeats along columns, then one can use awk and a pipe to reshape the data input, removing the need to edit and recompile the C source for different data subsets:

gawk ‘BEGIN{FS=”,”; print “4,12,23,44,56″; getline;} {for(i=lowBnd;i<upBnd;i++){printf(“%f,”,$i) }; printf(“%f\n”,$upBnd) }’ foo1.txt | ./foo > foo2.txt

where lowBnd and upBnd correspond to the data column limits in the input file. Note that the environment variables are written during the BEGIN block, allowing the specification of (for instance) a different job for the new data set.


tutorial from Nvidia GTC

October 16, 2009

My talk is attached in *pdf. Overall the conference felt like a rock’n'roll show, my first non-academic meeting of this nature. Thoroughly enjoyable although a little difficult to tell what people are doing since it’s mostly proprietary work. The announcement of fermi was certainly exciting; ~8x times the double prec performance, ECC and many more cores/memory…

Talk outline:

  • Company Background

  • CUDA accelerates Geophysics:
    • Data Processing w/ Linear Algebra

    • Sparse Matrices
    • Seismic Imaging:Kirchhoff Time Migration
      • Frequency Filtering

      • Traveltime/Weight Calculations
      • GPU Migration
  • Summary

nvidiaSummit0909


CUDA + OCTAVE

September 30, 2009

I’ve been experimenting with CUDA and OCTAVE; there is at least one company who have produced GPU enabled MEX functions. The big difficulty is of course that there is no support for internal floats within OCTAVE (afaik) and similarly with Matlab. However if one can leave the data and work with it on the device for some time, then there are only two explicit conversions btwn float <-> double needed. Or you could sacrifice some performance in CUDA in return for using doubles. At any rate here’s an example Makefile, happy experimenting. For this example I gutted the matrix Mul example from the CUDA sdk; the wrapper *oct source code (*cc, really C++ with octave extensions) contains an extern C section which references the cuda kernel (*cu). Don’t forget to indent instructions under ‘all’ with a single tab for make.


#! /usr/bin/env make
#make file for octfile/cuda
#Mac OSX 10.5.8 intel core 2 duo
#cuda include/lib
CUDA_INC_PATH=/usr/local/cuda/include
CUDA_LIB_PATH=/usr/local/cuda/lib
#octfile compiler
CC=mkoctfile
#basic flags
CFLAGS= -I$(CUDA_INC_PATH)
LDFLAGS= -L$(CUDA_LIB_PATH) -lcudart -lcuda

all:
$(CC) $(CFLAGS) -c cudaMatrixMul.cc -o cudaMatrixMul.o
nvcc  -c matMul_kernel.cu -o matMul_kernel.o -Wall
$(CC) $(LDFLAGS) cudaMatrixMul.o matMul_kernel.o -o cudaMatrixMul.oct

#clean:
rm -f  cudaMatrixMul.o matMul_kernel.o


MCMC paper finished

September 30, 2009

Abstract:

Many nuclei probed by NMR are relatively insensitive to detection, requiring methods such as the Carr-Purcell Meiboom-Gill (CPMG) pulse sequence. Experiments which follow this general approach are composed of pulse trains, giving rise to characteristic spikelet patterns in the frequency domain. In the presence of multiple underlying chemical sites, each spikelet intensity is a sum of some unknown proportion of contributions from each site. This work outlines a modeling approach based around Markov Chain Monte Carlo (MCMC), which negates the need for intensive simulations using density matrix formalism. In support of this technique, a spikelet pattern is produced using the density matrix formalism for an ensemble of spin 1/2 nuclei, and the underlying chemical shifts and intensities reproduced using the method outlined. Finally, MCMC is used to model the CPMG spectrum of a (3,3,3-trifluoropropyl)dimethylchlorosilane (TFS) treated aluminosilicate, providing evidence in support of a particular model of silanol group surface attachment to the bulk.


Split-Step Algorithm

June 30, 2009

This is a fast method for the solution of parabolic PDE’s, relying on the FFT implementation of the Fourier Transform to speed things up. Here it’s applied to an acoustic/seismic problem, following the development of Kuperman/Jackson in “Imaging of Complex Media with Acoustic and Seismic Waves”. Consider first the Helmholtz equation for a point source at r’,z’:
eq1_ssa
where G is the Green’s function, K the wavenumber (function of the frequency omega and sound speed c). Assuming azimuthal symmetry, G may be expressed as a product of two functions:
eq2_ssa
and similarly K, now a product of (constant) K0 and index of refraction n. Substituting into the Helmholtz equation gives two PDE’s:
eq3_ssa
The first PDE has Bessel functions as solutions; taking the assymptotic outgoing Hankel function solution and substituting it into the second, with the narrow angle approximation (second derivative of psi with respect to r much smaller than first derivative wrt r), one finds for the second (parabolic) PDE:
eq4_ssa
where chi is the fourier transform of psi. Assuming the variation of n is insignificant, in the wave-space domain the PDE and solution are:
eq5_ssa
Finally, the inverse FT gives the field as a function of depth (Delta r = r-r0, r0 the boundary value):
eq6_ssa

The following is a little hack in octave, to demonstrate the solution method:

function [Psi,s]=test_ssa(N,Dr,P0,n,K0)

% to test split step algo on parabolic PDE:
%
% \frac{\partial^2 \psi}{\partial r^2}
% +2iK0\frac{\partial \psi}{\partial r}
% +K0^2(n^2-1)\psi = 0
%
% K0*n^2 = (const.) wave number
% N=marching steps in field
% P0=boundary field data
% Dr=field step size

% wjb 0609

%initialize
m=length(P0); Psi=zeros(N,m); Psi(1,:)=P0;

%Nyquist thm, s (wave space) range

s=fftshift(-m/2:m/2-1)./Dr;

for k=2:N

Psi(k,:)=exp((i*K0/2)*(n^2-1)*Dr).*ifft(exp(-(i*Dr*s.^2)./(2*K0)).*fft(Psi(k-1,:)));

end

ssa_test


Expression Templates in C++

April 15, 2009

High performance C++ has traditionally been a misnomer, particularly with regards to linear algebra classes and templates. Handcoded C and FORTRAN are the usual approaches taken in the physical sciences, but these lack the portability, maintainability and overall elegance of (object oriented) C++. It was brought to my attention recently that there are really clever ways of using the compiler for performance hacks in C++. Amongst these methods are expression templates by Todd Veldhuizen et al.

The difficulty in vanilla C++ array classes is the overhead associated with operations on arrays,  eg., addition and multiplication. Temporary objects are created, requiring precious memory bandwidth and unnecessary loops.  Expression templates is an ingenious way around this, by essentially nesting templates, and in conjunction with overloaded operators, creating parse trees at compile time, which effectively inline expressions. The following is pseudocode from Todd Veldhuizen’s notes:

struct plus{}; //overloaded operator
class Array{}; //the class

//X is a node in the parse tree; method
//works b/c class can take itself as template
//parameter

template<typename Left, typename Op, typename Right>
class X{};

//overloaded operator ‘+’ for use in nesting templated
//expressions

template<class T>
X<T, plus, Array> operator+(T,Array)
{

return X<T,plus, Array>()

}

If A,B,C represent class instances, then “D=A+B+C” looks like:

D=A+B+C;
=X<Array,plus,Array>() + C;
X<X<Array,plus,Array>,plus,Array>();

which gives performance comparable to handcoded C. For more information see the article.


DocEng09 paper (almost done)

April 7, 2009

Abstract: For at least fifty years, liquid state Nuclear Magnetic Resonance (NMR) has served as an important analytical technique in studying local atomic bonding information. Thus, a vast amount of data of interest to the chemist and crystallographer resides in archived documents, containing liquid state NMR spectra and accompanying molecular structures. These structures are determined on the basis of chemical shift information from spectra, using well established empirical rules. The combined wealth of information represented visually in the spectra and molecules precludes straightforward inclusion in a traditional database. Given its value to the researcher, work by this group is being dedicated to automatic extraction of spectral and molecular information from documents, for conversion to the Chemical Markup Language (CML) and incorporation into a database. Preliminary results are presented here, as well as details of future work.

Keywords: Information Extraction, Chemical Markup Language, Nuclear Magnetic Resonance, Computer Assisted Structure Determination, PostScript

doceng09fig


MQMAS sim paper in press

March 20, 2009

Abstract: The majority of nuclei available for study in solid state Nuclear Magnetic Resonance have half-integer spin I > 1/2, with corresponding electric quadrupole moment. As such, they may couple with a surrounding electric field gradient. This effect introduces anisotropic line broadening to spectra, arising from distinct chemical species within polycrystalline solids. In Multiple Quantum Magic Angle Spinning (MQMAS) experiments, a second frequency dimension is created, devoid of quadrupolar anisotropy. As a result, the center of gravity of peaks in the high resolution dimension are functions of isotropic quadrupole and chemical shifts alone. However, for complex materials, these parameters take on a stochastic nature due in turn to structural and chemical disorder. Lineshapes may still overlap in the isotropic dimension, complicating the task of assignment and interpretation. A distributed computational approach is presented here which permits simulation of the MQMAS spectrum, generated by random variates from model distributions of isotropic chemical and quadrupole shifts. Owing to the non-convex nature of the least squared cost function between experimental and simulated spectra, simulated annealing is used to optimize the simulation parameters. In this manner, local chemical environments for disordered materials may be characterized, and via a re-sampling approach, error estimates for parameters produced.

Key words: Nuclear Magnetic Resonance, Multiple Quantum Magic Angle Spinning, OpenMP, Sobol sequence, quasi-random numbers, simulated annealing, distribution functions, quadrupole interaction.


sim_ex


emergence via connectedness

March 19, 2009

Cellular Automata have proved fertile research for at least 50 years, providing models for everything ranging from crystal growth to fluid dynamics. Discrete rules are applied at each time step to evolve a particular cellular state; ordered patterns emerge from random initial configurations. The example given here using the connected components labeling (CCL) algorithm demonstrates how topology also can give rise to order from randomness.

in_test2 out_test1

CCL in OCTAVE:

lab=1; freq=1; x=0; y=0;
% main loop(s)

for i=2:c-1
for j=2:d-1

if ( a(i,j)!=0 & label(i,j)==0)

x=i;
y=j;

label(i,j)=lab;
while (x!=1) & (y!=1) & (x!=c) & (y!=d)

[mm,x,nn,y]=pop_s(x,y);

if (a(mm-1,nn-1)==1 && label(mm-1,nn-1)==0 )

[x,y]=push_s(mm-1,x,nn-1,y);
label(mm-1,nn-1)=lab;
freq(lab)++;
end

if (a(mm-1,nn)==1 && label(mm-1,nn)==0)

[x,y]=push_s(mm-1,x,nn,y);
label(mm-1,nn)=lab;
freq(lab)++;
end

if (a(mm-1,nn+1)==1 && label(mm-1,nn+1)==0)

[x,y]=push_s(mm-1,x,nn+1,y);
label(mm-1,nn+1)=lab;
freq(lab)++;
end

if (a(mm,nn-1)==1 && label(mm,nn-1)==0)

[x,y]=push_s(mm,x,nn-1,y);
label(mm,nn-1)=lab;
freq(lab)++;
end

if (a(mm,nn+1)==1 && label(mm,nn+1)==0)

[x,y]=push_s(mm,x,nn+1,y);
label(mm,nn+1)=lab;
freq(lab)++;
end

if (a(mm+1,nn-1)==1 && label(mm+1,nn-1)==0)

[x,y]=push_s(mm+1,x,nn-1,y);
label(mm+1,nn-1)=lab;
freq(lab)++;
end

if (a(mm+1,nn)==1 && label(mm+1,nn)==0)

[x,y]=push_s(mm+1,x,nn,y);
label(mm+1,nn)=lab;
freq(lab)++;
end

if (a(mm+1,nn+1)==1 && label(mm+1,nn+1)==0)

[x,y]=push_s(mm+1,x,nn+1,y);
label(mm+1,nn+1)=lab;
freq(lab)++;
end

end %endwhile

lab++;
freq=[freq, 1];

end %endif

end
end