new GPU book
Numerical Computations with GPUs comes out later this year; Pierre-Yves and I were able to contribute a chapter on LU &QR decomposition (the latter using Givens rotations) for batches of dense matrices. We saw some impressive performance improvements for specific problem sizes. QR will benefit particularly from CUDA 6 and the availability of the fast/safe reciprocal hypotenuse function rhypot(x,y), more details here .
Advertisements