I’ve been experimenting with CUDA and OCTAVE; there is at least one company who have produced GPU enabled MEX functions. The big difficulty is of course that there is no support for internal floats within OCTAVE (afaik) and similarly with Matlab. However if one can leave the data and work with it on the device for some time, then there are only two explicit conversions btwn float <-> double needed. Or you could sacrifice some performance in CUDA in return for using doubles. At any rate here’s an example Makefile, happy experimenting. For this example I gutted the matrix Mul example from the CUDA sdk; the wrapper *oct source code (*cc, really C++ with octave extensions) contains an extern C section which references the cuda kernel (*cu). Don’t forget to indent instructions under ‘all’ with a single tab for make.
ED: here’s some recent CUDA notes HPC Essentials IV

#! /usr/bin/env make
#make file for octfile/cuda
#Mac OSX 10.5.8 intel core 2 duo
#cuda include/lib
#octfile compiler
#basic flags
LDFLAGS= -L$(CUDA_LIB_PATH) -lcudart -lcuda

$(CC) $(CFLAGS) -c -o cudaMatrixMul.o
nvcc  -c -o matMul_kernel.o -Wall
$(CC) $(LDFLAGS) cudaMatrixMul.o matMul_kernel.o -o cudaMatrixMul.oct

rm -f  cudaMatrixMul.o matMul_kernel.o



  1. bbrouwer

    This is very cool, thank you for letting me know! And I was interested to read that floats are supported as of 3.2, I didn’t realize. Although the arrival of fermi will offer better doubles support; I think we’ll see many more people make the switch to GPGPU.

    I’m in the process of writing seismic imaging code which is OCTAVE + CUDA, the real issue seems to be the difference in licensing, have you had any problems in this regard?

    I’d like to read more of your work too, will you be posting?

  2. yonesur

    Hi bill, I guess you are writing your own functions in CUDA and compiling to make them work in Octave. That’s a lot of work! I’ve been working exclusively within Octave for the last year, so I wanted to stay away from C programming and interfacing issues, at least for now.

    I’m working in a code to restore astronomical (solar, in particular) images, and I had to calculate ~ 40 orders of the Zernike aberration functions over an array of ~100×100, 10⁵ or 10⁶ times. Therefore I implemented that function in CUDA. Later I discovered I was being naive in my implementation, because I could calculate something related just once, and reuse it in every iteration without too much overhead 8-/

    Then, I stopped my CUDA development “until further notice”.

    However, I want to tell you that I used the gpulib libraries
    For non commercial use, they are free and they supply the *source code* for the Matlab interfacing functions. They have a binary blob, which does all the hard work in CUDA, but that library is exposed through mostly-compatible Octave .m functions. I managed to compile and install it in Octave, after some tweaks. I discovered they use a missing function from Octave, but rewrote that code in simpler terms, and had that library working under Octave. My CUDA code was based on that library.

    They just released version 1.2.2, I don’t know if they fixed the problems I had to fix for Octave. If you want to give it a try, let me know if you need advice. They told me explicitly not to post those instructions on a public website, but I can email you them.


  3. bbrouwer

    awesome, I will check out the libraries, that would certainly reduce most of the overhead, as you say hand coding everything is intensive. Life is too short 🙂

    I have done a little image analysis work, from memory there was no really fast way of calculating zernike moments, I appreciate that you have a tough problem there!

    I was using moments for machine learning & optical character recognition, although that also has been delayed until further notice 🙂 I will email you, I’d love to read more on your work


  4. Carlos

    I am finishing college and my thesis, I focus on the reconstruction of an astronomical image. By Zernike polynomials.
    I am very interested in your experience with CUDA.
    Was it good? Or why would I leave?
    Thankful in advance


    • bbrouwer

      cool; definitely try CUDA, think of it as extensions to C, it isn’t much worse 🙂 you will need to think parallel, although as you can imagine, image processing tasks map very well; pixel thread. Start at the cudazone and the cuda sdk, lots of examples and code to try. If you have specific questions, I’m happy to help!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s