blob: 141f4ab2ad4851d25139840fd418f2961900a8c7 [file] [log] [blame]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Software Stack
SINGA's software stack includes three major components, namely, core, IO and
model. Figure 1 illustrates these components together with the hardware.
The core component provides memory management and tensor operations;
IO has classes for reading (and writing) data from (to) disk and network; The
model component provides data structures and algorithms for machine learning models,
e.g., layers for neural network models, optimizers/initializer/metric/loss for
general machine learning models.
<img src="../_static/images/singav1-sw.png" align="center" width="500px"/>
<br/>
<span><strong>Figure 1 - SINGA V1 software stack.</strong></span>
## Core
[Tensor](tensor.html) and [Device](device.html) are two core abstractions in SINGA. Tensor class represents a
multi-dimensional array, which stores model variables and provides linear algebra
operations for machine learning
algorithms, including matrix multiplication and random functions. Each tensor
instance (i.e. a tensor) is allocated on a Device instance.
Each Device instance (i.e. a device) is created against one hardware device,
e.g. a GPU card or a CPU core. Devices manage the memory of tensors and execute
tensor operations on its execution units, e.g. CPU threads or CUDA streams.
Depending on the hardware and the programming language, SINGA have implemented
the following specific device classes:
* **CudaGPU** represents an Nvidia GPU card. The execution units are the CUDA streams.
* **CppCPU** represents a normal CPU. The execution units are the CPU threads.
* **OpenclGPU** represents normal GPU card from both Nvidia and AMD.
The execution units are the CommandQueues. Given that OpenCL is compatible with
many hardware devices, e.g. FPGA and ARM, the OpenclGPU has the potential to be
extended for other devices.
Different types of devices use different programming languages to write the kernel
functions for tensor operations,
* CppMath (tensor_math_cpp.h) implements the tensor operations using Cpp for CppCPU
* CudaMath (tensor_math_cuda.h) implements the tensor operations using CUDA for CudaGPU
* OpenclMath (tensor_math_opencl.h) implements the tensor operations using OpenCL for OpenclGPU
In addition, different types of data, such as float32 and float16, could be supported by adding
the corresponding tensor functions.
Typically, users would create a device instance and pass it to create multiple
tensor instances. When users call the Tensor functions, these function would invoke
the corresponding implementation (CppMath/CudaMath/OpenclMath) automatically. In
other words, the implementation of Tensor operations is transparent to users.
Most machine learning algorithms could be expressed using (dense or sparse) tensors.
Therefore, with the Tensor abstraction, SINGA would be able to run a wide range of models,
including deep learning models and other traditional machine learning models.
The Tensor and Device abstractions are extensible to support a wide range of hardware device
using different programming languages. A new hardware device would be supported by
adding a new Device subclass and the corresponding implementation of the Tensor
operations (xxxMath).
Optimizations in terms of speed and memory could be implemented by Device, which
manages both operation execution and memory malloc/free. More optimization details
would be described in the [Device page](device.html).
## Model
On top of the Tensor and Device abstractions, SINGA provides some higher level
classes for machine learning modules.
* [Layer](layer.html) and its subclasses are specific for neural networks. Every layer provides
functions for forward propagating features and backward propagating gradients w.r.t the training loss functions.
They wraps the complex layer operations so that users can easily create neural nets
by connecting a set of layers.
* [Initializer](initializer.html) and its subclasses provide variant methods of initializing
model parameters (stored in Tensor instances), following Uniform, Gaussian, etc.
* [Loss](loss.html) and its subclasses defines the training objective loss functions.
Both functions of computing the loss values and computing the gradient of the prediction w.r.t the
objective loss are implemented. Example loss functions include squared error and cross entropy.
* [Metric](metric.html) and its subclasses provide the function to measure the
performance of the model, e.g., the accuracy.
* [Optimizer](optimizer.html) and its subclasses implement the methods for updating
model parameter values using parameter gradients, including SGD, AdaGrad, RMSProp etc.
## IO
The IO module consists of classes for data loading, data preprocessing and message passing.
* Reader and its subclasses load string records from disk files
* Writer and its subclasses write string records to disk files
* Encoder and its subclasses encode Tensor instances into string records
* Decoder and its subclasses decodes string records into Tensor instances
* Endpoint represents a communication endpoint which provides functions for passing messages to each other.
* Message represents communication messages between Endpoint instances. It carries both meta data and payload.