Initial prototype for GPU backend

The GPU backend implements two important abstract classes:

org.apache.sysml.runtime.controlprogram.context.GPUContext
org.apache.sysml.runtime.controlprogram.context.GPUObject

The GPUContext is responsible for GPU memory management and initialization/destruction of Cuda handles. Currently, an active instance of the GPUContext class is made available globally and is used to store handles of the allocated blocks on the GPU. A count is kept per block for the number of instructions that need it. When the count is 0, the block may be evicted on a call to GPUObject.evict().

A GPUObject (like RDDObject and BroadcastObject) is stored in CacheableData object. It gets call-backs from SystemML's bufferpool on following methods

void acquireDeviceRead()
void acquireDeviceModifyDense()
void acquireDeviceModifySparse
void acquireHostRead()
void acquireHostModify()
void releaseInput()
void releaseOutput()

Sparse matrices on GPU are represented in CSR format. In the SystemML runtime, they are represented in MCSR or modified CSR format. A conversion cost is incurred when sparse matrices are sent back and forth between host and device memory.

Concrete classes JCudaContext and JCudaObject (which extend GPUContext & GPUObject respectively) contain references to org.jcuda.*.

The LibMatrixCUDA class contains methods to invoke CUDA libraries (where available) and invoke custom kernels. Runtime classes (that extend GPUInstruction) redirect calls to functions in this class. Some functions in LibMatrixCUDA need finer control over GPU memory management primitives. These are provided by JCudaObject.

Setup instructions:

Follow the instructions from https://developer.nvidia.com/cuda-downloads and install CUDA 8.0.
Follow the instructions from https://developer.nvidia.com/cudnn and install CuDNN v5.1.

To use SystemML's GPU backend when using the jar or uber-jar

Add JCuda's jar into the classpath.
Use -gpu flag.

For example: to use GPU backend in standalone mode:

java -classpath $JAR_PATH:systemml-1.0.0-SNAPSHOT-standalone.jar org.apache.sysml.api.DMLScript -f MyDML.dml -gpu -exec singlenode ...