Category: Ab Initio

qe vc-relax to poscar

We’re doing some comparisons between structural optimizations in vasp versus qe. Here’s a little script for converting intermediate structures in a qe vc-relax run to poscar format; I was able to open files in vmd and create simulations from output. You may need to modify the assumptions made with regards to structure format etc


#a little script to get all the structures from a qe file & write them to separate poscar files

if [ $# -lt 1 ]
	echo "need to supply input qe filename, with extension .out"

if [ "$ext" != "out" ]
	echo "filename must end in .out"

base=$(basename $1 .out)

echo $output

#number of atoms
atoms=$(more $file | awk '/number of atoms/ {print $5}')

#multiplier in angstrom
mult=$(echo "$(more $file | awk 'BEGIN{i=1} /CELL_PARAMETERS/{if (i==1) {print $0; i=2} }' \
| sed 's/)/\ )/g' | awk '{print $3}') * 0.5291772083" | bc)

#atomic symbols
declare -a symb
symb=$(more $file | awk '/PseudoPot. #/{print $5; }')

#all array objects in same string

#find the frequency of each
declare -a freq

for x in $symb
	freq[cnt]=$(more $file | awk 'BEGIN{i=1; j=0; } /site n./{ {while (i<='$atoms') {getline; {if ($2=="'$x'") j++;}; i++}}} END{print j;}')
	symbols+=$x" "
	let "cnt+=1";

#write out intermediate structures
awk "BEGIN{i=0;} /CELL_PARAMETERS/{{str=\"$base\"i\".poscar\"; print \"$symbols\",\"   #generated by qe2pos \",\"$dt\" > str; \
print \"$mult\" >> str;\
getline; print \$0 >> str;\
getline; print \$0 >> str;\
getline; print \$0 >> str;\
print \"${freq[*]}\" >> str;\
print \"Direct\" >> str;\
getline; getline;\
for (j=0; j<=$atoms; j++){\
getline; print \$2,\$3,\$4 >> str;}\
} \
i++;}" $file


Large Scale Lanczos

This is an important algorithm for finding eigenvalues of large matrices eg., for latent semantic indexing in search. I’m currently porting this to GPU for a colleague’s research into topological exitons in FQHE (utilizing diagonalization of the Hamiltonian); matrices involved have side N > 1e6. The Lanczos algorithm is relatively simple, although it does require restart/re-orthogonalization throughout. The implementation uses, among others, the sgemv routine from Intel MKL and/or cuBLAS, initial results promising; 3x speedup using GPU, over 8 CPU threads.

materials informatics

Attached is a recent poster from the snowbird machine learning conference, using support vector machines to learn the complex mapping between NMR spectra and underlying structure. By using machine learning to solve this inverse problem, the hope is of course to present spectra for new and interesting materials to the ML network in order to learn the underlying structure automatically without need for intensive ab initio calculations and experimental simulation.

APS March talk


Workers in various scientific disciplines seek to develop chemical models for extended and molecular systems. The modeling process revolves around the gradual refinement of model assumptions, through comparison of experimental and computational results. Solid state Nuclear Magnetic Resonance (NMR) is one such experimental technique, providing great insight into chemical order over Angstrom length scales. However, interpretation of spectra for complex materials is difficult, often requiring intensive simulations. Similarly, working forward from the model in order to produce experimental quantities via ab initio is computationally demanding. The work involved in these two significant steps, compounded by the need to iterate back and forth, drastically slows the discovery process for new materials. There is thus great motivation for the derivation of structural models directly from complex experimental data, the subject of this work. Using solid state NMR experimental datasets, in conjunction with ab initio calculations of measurable NMR parameters, a network of machine learning kernels are trained to rapidly yield structural details, on the basis of input NMR spectra. Results for an environmentally relevant material will be presented, and directions for future work.

aps talk

tidy profile data

here’s a bash/octave script for tidying up the aforementioned profiling output. The script pulls relevant meta data, collecting time and frequency for same call(s) and producing a neat, sorted *csv which can be used for generating nice graphs, here again an example from the lovely quantum espresso

 #cleanup and sort profile data 
gawk '($1~"meta"){print $0}' | \ 
sed 's/\ //g' | \ 
sed 's/meta//g' |\ 
sed 's/<//g'|\ 
sed 's/\///g'|\ 
sed 's/>//g'|\ 
tee prof_data_raw.txt |\ 
gawk 'BEGIN{FS=":"}{print NR,",",$1,",",$2$3}' > foo.txt 
gawk 'BEGIN{FS=":"}{print $4,",",$2,"#",$3,","}' < prof_data_raw.txt > names.txt 

cat > temp.m << %end 

load foo.txt; 

% collect the profile data for unique calls 
% the call column 3 is a unique cat of call line + file number 
[dat ind]=sort(foo(:,3));
 out = foo(ind,:); 
%determine total unique call keys (col 3) 
uniq = unique(out(:,3)); 
num = length(uniq); 

%output profile data matrix 
profl = zeros(num,4); 

%collect call frequency information 
for i=1:num
         ind = find(out(:,3)==uniq(i));
         %call frequency
         %total call time
         %average call time
         %line number
         ln = out(ind,1);
         profl(i,1) = ln(1); 

%sort by total time 
[dat ind]=sort(profl(:,3)); 
out = profl(ind,:); 
[a b]=fscanf(fid,'%s'); 
str = strsplit(a,","); 
fid = fopen('profile_output.csv','w');  
fprintf(fid,"call, hash, total time (ms), average time (ms), frequency\n"); 

for i=1:num
         index = out(i,1);


octave --silent --eval temp 

#rm temp.m prof_data_raw.txt foo.txt names.txt 

xml call tree

To be sure, there are many great profilers out there; each with different learning curves and costs ranging from free to a kidney. If you face a new distribution or one you would like to develop with, there is no substitute for rolling your own; follow this bash script, rebuild and filter the output and you have an xml call tree with times for calls, and pointers to source code and line number. This is applied to the fortran code(s) in quantum espresso, you can readily adapt to your own needs. Caveat: There are a variety of filter steps in the pipeline; taking into account fortran syntax and different programming styles is difficult ūüôā Once you have your stdout filtered to proper xml, you can view in something like eclipse eg,. attached image. Be aware also that you can and will write very large files; I exclude many i/o and related functions, these you can poke with strace or similar anyway.


#set source and destination paths

#set include/exclude strings; these i/o related function calls won't be timed later

#do once
#mkdir  -p $PROF_BAK_PATH

declare -a file_list

#filenames to array
file_list=$(ls -l ${PROFILE_BAK_PATH} | gawk '/f90/ {print $9}')


#parse files & pretty up, inserting write statements and timers
for x in $file_list
    let "cnt+=1"
    sed 's/\,\&/\,\ \&/g' $PROFILE_BAK_PATH/$x | \
    sed 's/)/)\ /g' | \
    gawk 'BEGIN{IGNORECASE=1; FS="&"} (NF<=1) || ($0~"#") || !($1~"if") || ($0~"!") || '$INCLUDE' {print $0}; \
    (NF>1) && !($0~"#") && ($1~"if") && !($0~"!") && '$EXCLUDE' {printf("%s ",$1)}' |\
    gawk 'BEGIN{IGNORECASE=1;} /if/ { if (($0~"&") && !($0~"!") &&!($0~"#") \
    && !($NF~"&") && !($0~"error")) {gsub(/&/," ");print} else print $0}; \
    !/if/ {print $0}' |\
    gawk 'BEGIN{IGNORECASE=1; } (($1~"if")||($1~"else")) && (length($0)>100) \
    {{for (i=1;i<=NF;i++) printf("%s",$i);} printf("\n")} !(($1~"if")||($1~"else")) || (length($0)<=100) {print $0}' |\
    gawk 'BEGIN{IGNORECASE=1; FS="&"} (NF==1) || ($0~"#") || !($0~"call") || '$INCLUDE' {print $0}; \
    (NF>1) && !($0~"#") && ($0 ~ "call") && '$EXCLUDE' {printf("%s ",$1)}' |\
    gawk 'BEGIN{IGNORECASE=1; FS="&"} (NF==1) || ($0~"#") || !($0~"call") || '$INCLUDE' {print $0}; \
    (NF>1) && !($0~"#") && ($0 ~ "call") && '$EXCLUDE' {printf("%s ",$1)}' |\
    gawk 'BEGIN{IGNORECASE=1;} !/call/{print $0}; /call/ {for (i=1; i<=NF; i++) printf("%s",$i); printf("\n")}' | \
    sed 's/call/\ call\ /g' | \
    sed 's/CALL/\ call\ /g' |\
    sed 's/(/\ (/g' | \
    gawk 'BEGIN{IGNORECASE=1; ch=1;} !/call/ || '$INCLUDE' {print $0}; \
    /call/ && '$EXCLUDE' {pos=pos+1; print " call date_and_time(values=time_array_0) "; \
    for (i=1; i<=NF; i++) {if ($i=="call") ch=i+1;} ;\
    print " start_time = time_array_0(6)*60 + time_array_0(7) + 0.001*time_array_0(8) "; \
    print " write(*,*)\" <",$ch,">\" "; print $0; \
    print " call date_and_time(values=time_array_1) "; \
    print " finish_time=time_array_1(6)*60 + time_array_1(7) + 0.001*time_array_1(8) "; \
    print " write(*,*)\" <meta> \",1000*(finish_time-start_time),\" : ",NR," : ",'$cnt'," : ",$ch," </meta>\" " ;\
    print " write(*,*)\" <source> ~/espresso-PRACE/PW/'$x' </source> \" " ;  \
    print " write(*,*)\"</",$ch,">\" " ; }' | \
    gawk 'BEGIN{IGNORECASE=1; } {print $0;} /IMPLICIT NONE/{ print "INTEGER :: time_array_0(8), time_array_1(8)"; \
    print "REAL :: start_time, finish_time"; }' |\
    gawk 'BEGIN{FS=","} ($0~"call" && (NF <=5)) || ($0 ~ "!") || !/call/{print $0} \
    $0~"call" && (NF>5) && !($0 ~ "!") {print $1,",",$2,",",$3,",",$4,",&"; for (i=5;i<NF;i++) \
    printf("%s,",$i); printf("%s \n",$NF)}' > $PROFILE_DIR_PATH/$x

#copy in hard cases; don't bother
cp $PROFILE_BAK_PATH/dynamics_module.f90 $PROFILE_DIR_PATH
cp $PROFILE_BAK_PATH/compute_becsum.f90 $PROFILE_DIR_PATH


CP2k install howto

There is the possibility that I’ll be using CP2k in the near future, so I’ve been getting better acquainted. By all accounts this is a distribution that makes you work for its love, which is fine b/c I’m not needy. It also represents a new domain for me, mixed Gaussian/plane waves and molecular dynamics, I look forward to what it can do. In the meantime, Lady and Gentleman, I give you 5 READMEs and 7 hours later the complete steps for a full install on a fresh Linux i686 OS
EDit 07/13: if you can, you should substitute AMD’s ACML or Intel’s MKL (depending on your machine of course) for the Lapack/ATLAS steps below. Back when I wrote this there was no academic version of MKL…

0.0 Get Coffee

1.0 Install g95:

# wget -O – | tar xvfz –

//check g95-install/bin for binary name, make symbolic link, eg.,

# ln -s /root/g95-install/bin/i686-unknown-linux-gnu-g95 /bin/g95

2.0 Install Lapack:

# wget -O – | tar xvfz –

# cd lap*

//cp & edit the file to use g95 (edit FORTRAN & LOADER lines)

# cp
# vim


# make lib

2.1 Lunch

3.0 Install & run ATLAS:

//get file from sourceforge & gunzip etc, cd into ATLAS,
//make a dir

# mkdir MyObj
# cd MyObj

//run configure specifying compiler and path to LAPACK eg.,

# /root/ATLAS/configure -C if /bin/g95 –with-netlib-lapack=/root/lapack-3.2/lapack_LINUX.a

//run all variants of make (will take quite a while ūüôā )

# make
# make check
# make ptcheck
# make time
# make install

3.1 Coffee

4. Install FFTW3:

# wget -O – | tar xvfz –
# cd fft*
# ./configure
# make
# make install

5. Install cp2k

//download eg., via cvs:

# cvs -d co cp2k

// cd into cp2k/arch, create/edit arch file, eg., for Linux 686 w/ g95, serial;

# cp Linux-i686-g95.sopt foo.sopt

// edit with vim, making sure order of libraries is lapack/blas/cblas/atlas:

LIBS = /usr/local/atlas/lib/liblapack.a /usr/local/atlas/lib/libf77blas.a /
usr/local/atlas/lib/libcblas.a /usr/local/atlas/lib/libatlas.a

// cd into cp2k/makefiles,

# make ARCH=foo VERSION=sopt
# make install

// if build succesful, should have executable in ~/cp2k/exe/foo, make link eg.,

# ln -s /root/cp2k/exe/foo/cp2k.sopt /bin/cp2k

5.1 Dinner

5.2 Coffee

6. Watch tv for a bit

7. Edit & run ~cp2k/tools/do_regtest in future

Norm Conserving Pseudopotentials I

Excellent description by Eberhard Engel here

I’m in the process of trying to create a large number of NCPP’s for the calculation of magnetic properties, using GIPAW. The end goal is to try, in conjunction with machine learning and NMR, to do structure determination for complicated materials.

This is somewhat related to the work presented at M’soft eScience, with the slightly different goal of going from a large database of calculated values, and via clustering and comparison to NMR simulations, back-out chemical structures.