Parallel performance study of scientific codes using CUDA and OpenCL

Matthew A Nunes

04/05/2014

Supervised by David W Walker; Moderated by Xianfang Sun

This project is aimed at comparing the performance of OpenCL and CUDA on a variety of algorithms. The algorithms that will be implemented are:

1) The Matrix Multiplication algorithm 2) The Laplace equation 3) An image blurring algorithm 4) A molecular dynamics algorithm 5) A cellular automata algorithm

The main variable being measured and compared is the runtime. Data will be gathered by varying the problem sizes and language specific variables (such as chunk size, number of threads per block etc.) to see how the variables affect the runtime. All the experiments will be carried out on the same machine to reduce the chances of the results being skewed unintentionally. The data will be stored in a spreadsheet (such as Excel or OpenOffice Calc) and graphs will be generated in order to easily assess the trend in the graphs.

Time permitting, the algorithms will then be further optimised using a variety of techniques such as making use of shared memory. Furthermore, given that OpenCL is should run on any AMD device, its performance for one of the algorithms could be analysed when running on a phone. Again, graphs will be plotted to visualise the differences these changes have on the runtime.

Initial Plan (03/02/2014) [Zip Archive]

1-Initial_Plan.pdf

Final Report (04/05/2014) [Zip Archive]

Parallel performance study of scientific codes using CUDA and OpenCL

Initial Plan (03/02/2014) [Zip Archive]

Final Report (04/05/2014) [Zip Archive]

Publication Form