blob: 1957bdd65fbb64a9cf9b081b5280b7699a333ec7 [file] [log] [blame] [view]
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->
# Initial prototype for GPU backend
The GPU backend implements two important abstract classes:
1. `org.apache.sysml.runtime.controlprogram.context.GPUContext`
2. `org.apache.sysml.runtime.controlprogram.context.GPUObject`
The `GPUContext` is responsible for GPU memory management and initialization/destruction of Cuda handles.
Currently, an active instance of the `GPUContext` class is made available globally and is used to store handles
of the allocated blocks on the GPU. A count is kept per block for the number of instructions that need it.
When the count is 0, the block may be evicted on a call to `GPUObject.evict()`.
A `GPUObject` (like RDDObject and BroadcastObject) is stored in CacheableData object. It gets call-backs from SystemDS's bufferpool on following methods
1. void acquireDeviceRead()
2. void acquireDeviceModifyDense()
3. void acquireDeviceModifySparse
4. void acquireHostRead()
5. void acquireHostModify()
6. void releaseInput()
7. void releaseOutput()
Sparse matrices on GPU are represented in `CSR` format. In the SystemDS runtime, they are represented in `MCSR` or modified `CSR` format.
A conversion cost is incurred when sparse matrices are sent back and forth between host and device memory.
Concrete classes `JCudaContext` and `JCudaObject` (which extend `GPUContext` & `GPUObject` respectively) contain references to `org.jcuda.*`.
The `LibMatrixCUDA` class contains methods to invoke CUDA libraries (where available) and invoke custom kernels.
Runtime classes (that extend `GPUInstruction`) redirect calls to functions in this class.
Some functions in `LibMatrixCUDA` need finer control over GPU memory management primitives. These are provided by `JCudaObject`.
### Setup instructions:
1. Follow the instructions from `https://developer.nvidia.com/cuda-downloads` and install CUDA 8.0.
2. Follow the instructions from `https://developer.nvidia.com/cudnn` and install CuDNN v5.1.
To use SystemDS's GPU backend when using the jar or uber-jar
1. Add JCuda's jar into the classpath.
2. Use `-gpu` flag.
For example: to use GPU backend in standalone mode:
```bash
java -classpath $JAR_PATH:systemml-1.0.0-SNAPSHOT-standalone.jar org.apache.sysml.api.DMLScript -f MyDML.dml -gpu -exec singlenode ...
```