Category: quantum mechanics

Nvidia GTC 2013 Accepted Talks

I’m fortunate to work with some very talented people 🙂 Work supported in part by a generous donation of 50 M2070 Nvidia Tesla cards from HP

Accelerating and Scaling Lanczos Diagonalization with GPGPU

W. J. Brouwer, Sreejith G.J., F. Spiga

Lanczos diagonalization (LD) is an important algorithm for calculating eigenvalues and eigenvectors of large matrices, used in many applications. A example performed countless times throughout the world on a daily basis involves latent semantic analysis of documents for search and retrieval. This presentation details work devoted to exploiting the massive parallelism and scalability of GPUs, in order to enhance LD for key aspects of condensed matter physics. One significant application area is the diagonalization of the Hamiltonian for large, dense matrices encountered in studies of the fractional quantum Hall effect. A second application discussed in this work is to the Self Consistent Field (SCF) cycle of a Density Functional Theory (DFT) code, Quantum Espresso. Initial results are promising, demonstrating a 18x speedup using GPU, over an optimized CPU implementation. Further, the use of MPI in conjunction with NVIDIA GPUDirect allows for scaling to all GPUs across the cluster used in this work.

Estimating Fracture Size in Gas Shale Formations using Smart Proppants and GPGPU

M. K. Hassan, W. J. Brouwer, R. Mittra

This presentation considers a system containing specially engineered particles called smart proppants, which have the potential to serve as sensors in estimating the effective length of fractures in gas shale formation, a significant factor in predicting the yield of a reservoir. The proppants are suspended and randomly distributed in a background medium, the fracturing fluid. A Monte Carlo method is used to generate the properties and position of smart proppants within a volume that approximates the fracture zone. Modeling the particles as dipoles, one can construct a large and unwieldy matrix equation, simplified by the application of characteristic basis functions (CBF). The CBF method involves application of LU decomposition and SVD based methods to matrix sub blocks, in order to produce subsequent solutions. This presentation will discuss the overall application and solution method, as well as the results of using GPU implementations of the key algorithms, effectively providing 30-40x performance improvement over using a single CPU.

Large Scale Lanczos

This is an important algorithm for finding eigenvalues of large matrices eg., for latent semantic indexing in search. I’m currently porting this to GPU for a colleague’s research into topological exitons in FQHE (utilizing diagonalization of the Hamiltonian); matrices involved have side N > 1e6. The Lanczos algorithm is relatively simple, although it does require restart/re-orthogonalization throughout. The implementation uses, among others, the sgemv routine from Intel MKL and/or cuBLAS, initial results promising; 3x speedup using GPU, over 8 CPU threads.

APS March talk


Workers in various scientific disciplines seek to develop chemical models for extended and molecular systems. The modeling process revolves around the gradual refinement of model assumptions, through comparison of experimental and computational results. Solid state Nuclear Magnetic Resonance (NMR) is one such experimental technique, providing great insight into chemical order over Angstrom length scales. However, interpretation of spectra for complex materials is difficult, often requiring intensive simulations. Similarly, working forward from the model in order to produce experimental quantities via ab initio is computationally demanding. The work involved in these two significant steps, compounded by the need to iterate back and forth, drastically slows the discovery process for new materials. There is thus great motivation for the derivation of structural models directly from complex experimental data, the subject of this work. Using solid state NMR experimental datasets, in conjunction with ab initio calculations of measurable NMR parameters, a network of machine learning kernels are trained to rapidly yield structural details, on the basis of input NMR spectra. Results for an environmentally relevant material will be presented, and directions for future work.

aps talk

xml call tree

To be sure, there are many great profilers out there; each with different learning curves and costs ranging from free to a kidney. If you face a new distribution or one you would like to develop with, there is no substitute for rolling your own; follow this bash script, rebuild and filter the output and you have an xml call tree with times for calls, and pointers to source code and line number. This is applied to the fortran code(s) in quantum espresso, you can readily adapt to your own needs. Caveat: There are a variety of filter steps in the pipeline; taking into account fortran syntax and different programming styles is difficult 🙂 Once you have your stdout filtered to proper xml, you can view in something like eclipse eg,. attached image. Be aware also that you can and will write very large files; I exclude many i/o and related functions, these you can poke with strace or similar anyway.


#set source and destination paths

#set include/exclude strings; these i/o related function calls won't be timed later

#do once
#mkdir  -p $PROF_BAK_PATH

declare -a file_list

#filenames to array
file_list=$(ls -l ${PROFILE_BAK_PATH} | gawk '/f90/ {print $9}')


#parse files & pretty up, inserting write statements and timers
for x in $file_list
    let "cnt+=1"
    sed 's/\,\&/\,\ \&/g' $PROFILE_BAK_PATH/$x | \
    sed 's/)/)\ /g' | \
    gawk 'BEGIN{IGNORECASE=1; FS="&"} (NF<=1) || ($0~"#") || !($1~"if") || ($0~"!") || '$INCLUDE' {print $0}; \
    (NF>1) && !($0~"#") && ($1~"if") && !($0~"!") && '$EXCLUDE' {printf("%s ",$1)}' |\
    gawk 'BEGIN{IGNORECASE=1;} /if/ { if (($0~"&") && !($0~"!") &&!($0~"#") \
    && !($NF~"&") && !($0~"error")) {gsub(/&/," ");print} else print $0}; \
    !/if/ {print $0}' |\
    gawk 'BEGIN{IGNORECASE=1; } (($1~"if")||($1~"else")) && (length($0)>100) \
    {{for (i=1;i<=NF;i++) printf("%s",$i);} printf("\n")} !(($1~"if")||($1~"else")) || (length($0)<=100) {print $0}' |\
    gawk 'BEGIN{IGNORECASE=1; FS="&"} (NF==1) || ($0~"#") || !($0~"call") || '$INCLUDE' {print $0}; \
    (NF>1) && !($0~"#") && ($0 ~ "call") && '$EXCLUDE' {printf("%s ",$1)}' |\
    gawk 'BEGIN{IGNORECASE=1; FS="&"} (NF==1) || ($0~"#") || !($0~"call") || '$INCLUDE' {print $0}; \
    (NF>1) && !($0~"#") && ($0 ~ "call") && '$EXCLUDE' {printf("%s ",$1)}' |\
    gawk 'BEGIN{IGNORECASE=1;} !/call/{print $0}; /call/ {for (i=1; i<=NF; i++) printf("%s",$i); printf("\n")}' | \
    sed 's/call/\ call\ /g' | \
    sed 's/CALL/\ call\ /g' |\
    sed 's/(/\ (/g' | \
    gawk 'BEGIN{IGNORECASE=1; ch=1;} !/call/ || '$INCLUDE' {print $0}; \
    /call/ && '$EXCLUDE' {pos=pos+1; print " call date_and_time(values=time_array_0) "; \
    for (i=1; i<=NF; i++) {if ($i=="call") ch=i+1;} ;\
    print " start_time = time_array_0(6)*60 + time_array_0(7) + 0.001*time_array_0(8) "; \
    print " write(*,*)\" <",$ch,">\" "; print $0; \
    print " call date_and_time(values=time_array_1) "; \
    print " finish_time=time_array_1(6)*60 + time_array_1(7) + 0.001*time_array_1(8) "; \
    print " write(*,*)\" <meta> \",1000*(finish_time-start_time),\" : ",NR," : ",'$cnt'," : ",$ch," </meta>\" " ;\
    print " write(*,*)\" <source> ~/espresso-PRACE/PW/'$x' </source> \" " ;  \
    print " write(*,*)\"</",$ch,">\" " ; }' | \
    gawk 'BEGIN{IGNORECASE=1; } {print $0;} /IMPLICIT NONE/{ print "INTEGER :: time_array_0(8), time_array_1(8)"; \
    print "REAL :: start_time, finish_time"; }' |\
    gawk 'BEGIN{FS=","} ($0~"call" && (NF <=5)) || ($0 ~ "!") || !/call/{print $0} \
    $0~"call" && (NF>5) && !($0 ~ "!") {print $1,",",$2,",",$3,",",$4,",&"; for (i=5;i<NF;i++) \
    printf("%s,",$i); printf("%s \n",$NF)}' > $PROFILE_DIR_PATH/$x

#copy in hard cases; don't bother
cp $PROFILE_BAK_PATH/dynamics_module.f90 $PROFILE_DIR_PATH
cp $PROFILE_BAK_PATH/compute_becsum.f90 $PROFILE_DIR_PATH


MCMC paper finished


Many nuclei probed by NMR are relatively insensitive to detection, requiring methods such as the Carr-Purcell Meiboom-Gill (CPMG) pulse sequence. Experiments which follow this general approach are composed of pulse trains, giving rise to characteristic spikelet patterns in the frequency domain. In the presence of multiple underlying chemical sites, each spikelet intensity is a sum of some unknown proportion of contributions from each site. This work outlines a modeling approach based around Markov Chain Monte Carlo (MCMC), which negates the need for intensive simulations using density matrix formalism. In support of this technique, a spikelet pattern is produced using the density matrix formalism for an ensemble of spin 1/2 nuclei, and the underlying chemical shifts and intensities reproduced using the method outlined. Finally, MCMC is used to model the CPMG spectrum of a (3,3,3-trifluoropropyl)dimethylchlorosilane (TFS) treated aluminosilicate, providing evidence in support of a particular model of silanol group surface attachment to the bulk.

CPMG simulation/modeling

A script for simulating the CPMG sequence using the density matrix, with Gauss/Gauss envelope/spikelet broadening, for a forthcoming paper.

function [tot, gbb,t]=d_cpmg(N,tau,n,delta,r,rr)

%cpmg experiment evolution wjb 02/09
%simple/ideal pulse sequence:


%N time steps
%n 180 loops
%delta chem shift
%r envelope gauss br^2
%rr spikelet gauss br^2

%matrix for I_x & I_y

a=[0 1/2; 1/2 0]; b=[0 -i/2; i/2 0];

%time step & initial rho

t=tau/(N-1); rho=a;

for k=1:n
sig(1)=trace(rho*a); sigi(1)=trace(rho*b);

for j=2:N/2

%iterate; free precession for tau/2

rho = [exp(-i*t*delta) 0; 0 exp(i*t*delta)]*rho*[exp(i*t*delta) 0; 0 exp(-i*t*delta)];

sig(j)=trace(rho*a); sigi(j)=trace(rho*b);


%apply 180x

rho = [0 exp(i*pi/2); exp(i*pi/2) 0]*rho*[0 exp(-i*pi/2); exp(-i*pi/2) 0];

%iterate; free precession for tau/2

sig(N/2+1)=trace(rho*a); sigi(N/2+1)=trace(rho*b);

for j=2:N/2

rho = [exp(-i*t*delta) 0; 0 exp(i*t*delta)]*rho*[exp(i*t*delta) 0; 0 exp(-i*t*delta)];

sig(j+N/2)=trace(rho*a); sigi(j+N/2)=trace(rho*b);

if k>1

tot=[tot (sig+i*sigi)];




tt=-tau/2:t:t*(N-1); gb=exp(-rr.*tt.^2); gb = [gb(N/2+1:N) gb(1:N/2)]; gbb=gb;

for i=1:n-1

gbb=[gbb gb];


t=0:t:t*(n*N-1); gbb=gbb.*exp(-r.*t.^2);


Quick tour through Hilbert space

Before part III of forecasting with RSS+SVM+wavelets I thought it would help to give some useful concepts from Hilbert space. Attached is a very rough look, and also an application using orthogonal functions to model a stationary signal (Fourier series). This will contrast nicely with wavelets, which are most useful for non-stationary signals eg., stock indices.
hilbert space overview
Fourier series example

Forecasting w/ RSS feeds + SVM + Wavelets

You know the words, but probably haven’t seen them in the same sentence. Simply put, I’m going to wax lyrical about the causal relationship between news and certain types of stock index, illustrated thus:
There are lots of assumptions here, including the efficiency and transparency of the market, and the assumption that the investor doesn’t have inside knowledge. I also assume that the investor is only informed this way, he/she has no conception of real intrinsic value until such times as he/she is informed via company reports and the like. So a good candidate index would be a tech stock, where the actual commodity might be ambiguous and the index is heavily manipulated by opinion over real worth. To summarize, the figure relates to a public company whose index is a strong function of investor feedback from news. We would like to exploit this fact, for the class of companies for which this might be true, by using machine learning to parse news and generate a signal which is some function of the company’s index. We are relying on the power of the written word to influence events far into the future.

For example, consider a fictitious tech consultancy (offering the somewhat ambiguous commodity of ‘useful information’) who features regularly in your favorite tech RSS feed. Let’s naïvely assign a value of +1 to a positive statement, -1 to a negative statement. You may read at alternate times:

That’s preposterous, what they offer is useful
That’s useful, what they offer is preposterous

Both statements convey both positive and pejorative messages directed at different parties, yet are composed of the same words. So in classifying a statement using machine learning on your PS3 yellow dog cluster, we must come up with both a useful dictionary and a means to encode context, even before assigning value to words. A naïve assignment of value to words gives each statement a sum total of 0, even though they convey drastically different opinions. Further, frequency can be useful, consider:

That’s preposterous, what they offer is very, very useful

Obviously ‘very’ is used to provide emphasis and therefore frequency of words has bearing also. Consider finally the statement:

CEO Joe Blo will have neurosurgery on June 11

Since the company is basically selling information, the CEO’s power to provide information is going to change drastically after 06/11. Thus there is some uncertainty, and we would expect sentiment to sway between negative and positive in the interim ie., words may take a ‘value’ in the continuum between +/- 1.

Finally, some measure of the reliability or impact of the news source proves helpful in weighting this source against others. This ‘impact factor’ could be simply determined from website volume, or from something more complicated like a Markov Chain ranking algorithm. To summarize then, in order to classify text and generate ‘signal’ from a RSS feed, at the very least we require:

  • a dictionary
  • a means to encode word:
    • value
    • context
    • frequency
  • Impact of the news source

How we classify the very large amount of information produced after this manner to produce useful signal is a trickier topic requiring a little geometry and perhaps Bayes. To get the creative juices bubbling, here’s a couple of lines of code to play with in bash:

# date | cut -c12-19 > foo.txt

#curl –silent ‘; | awk ‘{for (i=1;i<=NF;i++) { if ($i==”is”) {print NR,i}}}’ >> foo.txt

Next time I’ll provide a more rigorous example and show how we can produce useful signal from an ensemble of SVM’s. (In the meantime you may want to check out SVM Light). Last but not least, I’ll go over ideas/objects from Hilbert space including completeness and wavelets, and how we can ultimately use some math with our signal for robust time series prediction.

ED: go easy with curl, you don’t want to come across as a robot and get your IP/domain banned 🙂


Here’s an extract from an old essay of mine, from way the heck back in 1995, submitted in PH237 at UQ, document link follows:

Bohr’s well documented opposition to Einstein’s corpuscular theory of light abated in lieu of the Bothe- Geiger experiments, in which the particle nature of radiative phenomena manifested itself. These results were in blatant contradiction with the Bohr-Kramer-Slater interpretation of the interaction between atomic systems. The B-K-S paper assumed that the radiative aspects of atomic transitions were solely describable in terms of the “wave picture”. It was this descriptive contrast which prompted Bohr to find a harmonious relationship between the particle-wave nature of the radiative aspects of quantum interactions.