The GPU backend implements two important abstract classes:
org.apache.sysml.runtime.controlprogram.context.GPUContext
org.apache.sysml.runtime.controlprogram.context.GPUObject
The GPUContext
is responsible for GPU memory management and initialization/destruction of Cuda handles. Currently, an active instance of the GPUContext
class is made available globally and is used to store handles of the allocated blocks on the GPU. A count is kept per block for the number of instructions that need it. When the count is 0, the block may be evicted on a call to GPUObject.evict()
.
A GPUObject
(like RDDObject and BroadcastObject) is stored in CacheableData object. It gets call-backs from SystemML's bufferpool on following methods
Sparse matrices on GPU are represented in CSR
format. In the SystemML runtime, they are represented in MCSR
or modified CSR
format. A conversion cost is incurred when sparse matrices are sent back and forth between host and device memory.
Concrete classes JCudaContext
and JCudaObject
(which extend GPUContext
& GPUObject
respectively) contain references to org.jcuda.*
.
The LibMatrixCUDA
class contains methods to invoke CUDA libraries (where available) and invoke custom kernels. Runtime classes (that extend GPUInstruction
) redirect calls to functions in this class. Some functions in LibMatrixCUDA
need finer control over GPU memory management primitives. These are provided by JCudaObject
.
https://developer.nvidia.com/cuda-downloads
and install CUDA 8.0.https://developer.nvidia.com/cudnn
and install CuDNN v5.1.To use SystemML's GPU backend when using the jar or uber-jar
-gpu
flag.For example: to use GPU backend in standalone mode:
java -classpath $JAR_PATH:systemml-1.0.0-SNAPSHOT-standalone.jar org.apache.sysml.api.DMLScript -f MyDML.dml -gpu -exec singlenode ...