Friday, October 26, 2012

CUDA notes 1

At work I am getting the chance to experiment with CUDA to speed up some of our computationally expensive tasks. It is both very exciting as well as a very intimidating prospect as we have already invested in a Tesla c2075 board for these tests. While I have played with CUDA in the past, it is still a lot to pick up since there have been significant changes since then, and I was also unable to delve too deeply into it at the time.

To get up to speed quickly I am reading through "CUDA Application Design and Development". One thing that is not covered in the first chapter that I am very interested in is how to time the execution of a kernel. I did find some info on stack overflow about the cudaEvent object. Here is a simple example:

#include <iostream>
using namespace std;
#include <thrust/reduce.h>
#include <thrust/sequence.h>
#include <thrust/device_vector.h>
#include <thrust host_vector.h>

int main(){
 const int N=50000;
 float elapsedTime=0.0;
 cudaEvent_t start, stop;
 cudaEventCreate(&start);
 cudaEventCreate(&stop);

 cudaEventRecord(start, 0) ;//start event timer running
 thrust::device_vector<int> a(N);
 thrust::sequence(a.begin(), a.end(), 0);
 int sumA=thrust::reduce(a.begin(), a.end(), 0);
 cudaEventRecord(stop, 0);
 cudaEventSynchronize(stop);//remember kernels run asynchronously
 cudaEventElapsedTime(&elapsedTime, start, stop);
 
 int sumCheck=0;
 for (int i=0; i<N; i++)sumCheck+=i;

 cudaEventElapsedTime(&elapseTime, start, stop);

 if (sumCheck==sumA)cout<<"Test Succeeded in "<<elapseTime<<" milliseconds!"<<endl;
 else cout<<"Test FAILED"<<endl;

 cudaEventDestroy(start);
 cudaEventDestroy(stop);
 return(0);
}




Saturday, October 20, 2012

Making Thing See, pt. 1

I have recently been reading the book Making Things See which is a tour of Kinect programming using the Processing language. I have played around a little bit with processing in the past, for some artistic type programming and also because I do a lot of programming in a similar project, Arduino.

Processing itself is a language that runs on the JVM, with a subset that can run in the browser using Processing.js. It is a pretty minimalist language which makes it easy to 'sketch' programs out, which is what they actually refer to the code as. I find it very nice to work in as like Arduino you have two main pieces, a setup method and in this case a draw method which can be though of as being like a game loop (arduino has loop instead of draw). This seems like a very convenient language for doing simple visual based code, and even complex openGL code is possible with it. Because of the simple and clear syntax I think the author has made a fine choice in languages for exploring the Kinect as this is going to be a very visual journey.

The first few projects introduce the basics of the Kinect, from how to install the libraries/drivers needed for later projects as well as the physical hardware and the underlying principles behind it. The first few coding projects will feel very simple for those with significant development backgrounds, especially the initial in depth explanations of the listings. However, I still find it very exciting to get such cool results interfacing with the Kinect so easily.

So far I have gotten up to project 7. This is the first project where I feel any truly experienced developer will want to rewrite the code significantly. The author uses a procedural style, which makes it very easy to follow and focus on the Kinect/computer vision concepts. However There is a significant amount of duplicate code here, and I found an object made short order of cleaning it up significantly.

Wednesday, October 3, 2012

GPGPU thougths

Lately I have been looking into gpgpu computing for work, and keep running across interesting things. I have been interested in gpgpu computing for a while now, and have played around with CUDA some in the past. I think that this is a great platform, but to me the future really lies with heterogenous solutions like openCL that can bridge between the gpu and cpu more easily. One big issue that I currently see with these solutions however is that they are all a C like syntax. While in many ways this makes sense, as really this is still in its early years of evolution, I think eventually we will need to develop languages that are more high level, yet allow us the flexibility and power to work on the GPU. I ran across MC# today, which is a small step in the right direction. It brings a C# like syntax to heterogenous computing, instead of the C like syntax of openCL. I would honestly prefer a more python like syntax to be honest as I think there is great power in simple expressiveness. In many ways this is critical on the gpu, since the problems solved there are likely more algorithm heavy than say on a web server. It would be great to express the essence of that more easily.