Category: GPGPU

SCSSC conference

I was privileged to speak at the Southern California Simulations in Science conference this week, held at UCSB in beautiful Santa Barbara. I had some great interactions and was really impressed by the level of work presented in the poster session. Overall a very enjoyable day; a big thanks to Paul, Fuz and Burak for the invitation! my talk

Nvidia Jetson

Finally splashed out on the TK1 dev kit; color me impressed! The buyer is first greeted by a neat looking, minimalist cardboard box, which contains the board, power supply and micro-usb cord, for flashing the device. I also picked up a serial-usb cord for debugging, as well as a usb hub via the local friendly Frys.
20150107_172530_resized

Booted rapidly, then decided to install a more recent OS; at the time of writing, found an excellent description with links here. Flashing the device took less than 30seconds, a new record I think. Shortly thereafter, installed OpenCV optimized specifically for the Tegra, as well as a few other things via the Nvidia JetPack. After installing a few other dependencies by way of sudo apt-get, I built and ran plot2txt software using a key benchmark, an image with >2M pixels. All data series were extracted in just under 2 seconds, only slightly worse than an Intel i7. The high relative performance in this case has much to do with OpenCV, which can take advantage of the CUDA cores. I tested a number of other benchmarks & libraries including DGEMM by way of ATLAS for ARM, and found at least 6 G Flop/s in many cases. I also find jetson to be incredibly stable versus several other dev kits/ARM offerings; it will happily run at full tilt all day long, barely breaking a sweat, courtesy the superior design and generous fan.
20150111_153739_resized

For more, check out the Nvidia Developer Zone/embedded computing, you most definitely will not be disappointed!

new GPU book

Numerical Computations with GPUs comes out later this year; Pierre-Yves and I were able to contribute a chapter on LU &QR decomposition (the latter using Givens rotations) for batches of dense matrices. We saw some impressive performance improvements for specific problem sizes. QR will benefit particularly from CUDA 6 and the availability of the fast/safe reciprocal hypotenuse function rhypot(x,y), more details here .

nvidia gtc 2014

Was a superlative conference this year once again in San Jose. Jen-Hsun Huang never fails to deliver in his rousing keynotes, this year announcing the exciting prospect of Nvlink and Pascal with superior I/O bandwidth, coming down the pipeline for IBM power systems. Special bonus this year : everyone received the terrific new shield android gaming device, already put to good use by this aging nerd. Too many great talks to mention, Pierre-Yves did a particularly fine job on presenting a new GPU approach for batch QR decomposition, as well as his MD work, and I was able to essentially report out on our year as a CUDA research center. Wonderful to meet new friends and old, great conversation over dinner, bofs and frequent collisions in the hallways (for a large convention center, seems awful small at times.) Officially my favorite conference; thank you Nvidia!

gpu fsolve

We’re working with a colleague at PSU to scale a chemical kinetic model for metabolic networks, from hundreds to hopefully thousands of equations. Good results are obtained for the first step of porting the model solution to gpu, essentially making newton-raphson work on what amounts to a sparse matrix equation. More details are in the fsolve folder at lion-codes.

fsolve_blur

lion codes

My friend and colleague Pierre-Yves and I are gradually getting our GPU codes released here. He has done particuarly fine work on a new implementation for QR decomposition. We see good results also for other algorithms on GPU, including a Newton-Raphson based approach for solving large (sparse-matrix) non-linear systems of equations.

find the fpe

some macros & code snippets for seeking and destroying fpes in your CUDA code

#ifndef _FPE_DEBUG_INFO_
#define _FPE_DEBUG_INFO_ printf("fpe detected in %s around %d, dump follows :\n",__FILE__,__LINE__);
#endif

#define fpe(x) (isnan(x) || isinf(x))

#ifdef _VERBOSE_DEBUG_
	if (fpe(my_variable)){
   		_FPE_DEBUG_INFO_
      	        printf("my_variable : %f threadIdx %i\n",my_variable,threadIdx.x);
	        asm(โ€œtrap;โ€);

  	}
#endif