content/en/_sources/docs/software_stack.md.txt - singa-site - Git at Google

 <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
 -->
 # Software Stack

 SINGA's software stack includes three major components, namely, core, IO and
 model. Figure 1 illustrates these components together with the hardware.
 The core component provides memory management and tensor operations;
 IO has classes for reading (and writing) data from (to) disk and network; The
 model component provides data structures and algorithms for machine learning models,
 e.g., layers for neural network models, optimizers/initializer/metric/loss for
 general machine learning models.


 <img src="../_static/images/singav1-sw.png" align="center" width="500px"/>
 <br/>
 <span><strong>Figure 1 - SINGA V1 software stack.</strong></span>

 ## Core

 [Tensor](tensor.html) and [Device](device.html) are two core abstractions in SINGA. Tensor class represents a
 multi-dimensional array, which stores model variables and provides linear algebra
 operations for machine learning
 algorithms, including matrix multiplication and random functions. Each tensor
 instance (i.e. a tensor) is allocated on a Device instance.
 Each Device instance (i.e. a device) is created against one hardware device,
 e.g. a GPU card or a CPU core. Devices manage the memory of tensors and execute
 tensor operations on its execution units, e.g. CPU threads or CUDA streams.

 Depending on the hardware and the programming language, SINGA have implemented
 the following specific device classes:

 * **CudaGPU** represents an Nvidia GPU card. The execution units are the CUDA streams.
 * **CppCPU** represents a normal CPU. The execution units are the CPU threads.
 * **OpenclGPU** represents normal GPU card from both Nvidia and AMD.
   The execution units are the CommandQueues. Given that OpenCL is compatible with
   many hardware devices, e.g. FPGA and ARM, the OpenclGPU has the potential to be
   extended for other devices.

 Different types of devices use different programming languages to write the kernel
 functions for tensor operations,

 * CppMath (tensor_math_cpp.h) implements the tensor operations using Cpp for CppCPU
 * CudaMath (tensor_math_cuda.h) implements the tensor operations using CUDA for CudaGPU
 * OpenclMath (tensor_math_opencl.h) implements the tensor operations using OpenCL for OpenclGPU

 In addition, different types of data, such as float32 and float16, could be supported by adding
 the corresponding tensor functions.

 Typically, users would create a device instance and pass it to create multiple
 tensor instances. When users call the Tensor functions, these function would invoke
 the corresponding implementation (CppMath/CudaMath/OpenclMath) automatically. In
 other words, the implementation of Tensor operations is transparent to users.

 Most machine learning algorithms could be expressed using (dense or sparse) tensors.
 Therefore, with the Tensor abstraction, SINGA would be able to run a wide range of models,
 including deep learning models and other traditional machine learning models.

 The Tensor and Device abstractions are extensible to support a wide range of hardware device
 using different programming languages. A new hardware device would be supported by
 adding a new Device subclass and the corresponding implementation of the Tensor
 operations (xxxMath).

 Optimizations in terms of speed and memory could be implemented by Device, which
 manages both operation execution and memory malloc/free. More optimization details
 would be described in the [Device page](device.html).


 ## Model

 On top of the Tensor and Device abstractions, SINGA provides some higher level
 classes for machine learning modules.

 * [Layer](layer.html) and its subclasses are specific for neural networks. Every layer provides
   functions for forward propagating features and backward propagating gradients w.r.t the training loss functions.
   They wraps the complex layer operations so that users can easily create neural nets
   by connecting a set of layers.

 * [Initializer](initializer.html) and its subclasses provide variant methods of initializing
   model parameters (stored in Tensor instances), following Uniform, Gaussian, etc.

 * [Loss](loss.html) and its subclasses defines the training objective loss functions.
   Both functions of computing the loss values and computing the gradient of the prediction w.r.t the
   objective loss are implemented. Example loss functions include squared error and cross entropy.

 * [Metric](metric.html) and its subclasses provide the function to measure the
   performance of the model, e.g., the accuracy.

 * [Optimizer](optimizer.html) and its subclasses implement the methods for updating
   model parameter values using parameter gradients, including SGD, AdaGrad, RMSProp etc.


 ## IO

 The IO module consists of classes for data loading, data preprocessing and message passing.

 * Reader and its subclasses load string records from disk files
 * Writer and its subclasses write string records to disk files
 * Encoder and its subclasses encode Tensor instances into string records
 * Decoder and its subclasses decodes string records into Tensor instances
 * Endpoint represents a communication endpoint which provides functions for passing messages to each other.
 * Message represents communication messages between Endpoint instances. It carries both meta data and payload.
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	# Software Stack

	SINGA's software stack includes three major components, namely, core, IO and
	model. Figure 1 illustrates these components together with the hardware.
	The core component provides memory management and tensor operations;
	IO has classes for reading (and writing) data from (to) disk and network; The
	model component provides data structures and algorithms for machine learning models,
	e.g., layers for neural network models, optimizers/initializer/metric/loss for
	general machine learning models.


	<img src="../_static/images/singav1-sw.png" align="center" width="500px"/>
	<br/>
	<span><strong>Figure 1 - SINGA V1 software stack.</strong></span>

	## Core

	[Tensor](tensor.html) and [Device](device.html) are two core abstractions in SINGA. Tensor class represents a
	multi-dimensional array, which stores model variables and provides linear algebra
	operations for machine learning
	algorithms, including matrix multiplication and random functions. Each tensor
	instance (i.e. a tensor) is allocated on a Device instance.
	Each Device instance (i.e. a device) is created against one hardware device,
	e.g. a GPU card or a CPU core. Devices manage the memory of tensors and execute
	tensor operations on its execution units, e.g. CPU threads or CUDA streams.

	Depending on the hardware and the programming language, SINGA have implemented
	the following specific device classes:

	* CudaGPU represents an Nvidia GPU card. The execution units are the CUDA streams.
	* CppCPU represents a normal CPU. The execution units are the CPU threads.
	* OpenclGPU represents normal GPU card from both Nvidia and AMD.
	The execution units are the CommandQueues. Given that OpenCL is compatible with
	many hardware devices, e.g. FPGA and ARM, the OpenclGPU has the potential to be
	extended for other devices.

	Different types of devices use different programming languages to write the kernel
	functions for tensor operations,

	* CppMath (tensor_math_cpp.h) implements the tensor operations using Cpp for CppCPU
	* CudaMath (tensor_math_cuda.h) implements the tensor operations using CUDA for CudaGPU
	* OpenclMath (tensor_math_opencl.h) implements the tensor operations using OpenCL for OpenclGPU

	In addition, different types of data, such as float32 and float16, could be supported by adding
	the corresponding tensor functions.

	Typically, users would create a device instance and pass it to create multiple
	tensor instances. When users call the Tensor functions, these function would invoke
	the corresponding implementation (CppMath/CudaMath/OpenclMath) automatically. In
	other words, the implementation of Tensor operations is transparent to users.

	Most machine learning algorithms could be expressed using (dense or sparse) tensors.
	Therefore, with the Tensor abstraction, SINGA would be able to run a wide range of models,
	including deep learning models and other traditional machine learning models.

	The Tensor and Device abstractions are extensible to support a wide range of hardware device
	using different programming languages. A new hardware device would be supported by
	adding a new Device subclass and the corresponding implementation of the Tensor
	operations (xxxMath).

	Optimizations in terms of speed and memory could be implemented by Device, which
	manages both operation execution and memory malloc/free. More optimization details
	would be described in the [Device page](device.html).


	## Model

	On top of the Tensor and Device abstractions, SINGA provides some higher level
	classes for machine learning modules.

	* [Layer](layer.html) and its subclasses are specific for neural networks. Every layer provides
	functions for forward propagating features and backward propagating gradients w.r.t the training loss functions.
	They wraps the complex layer operations so that users can easily create neural nets
	by connecting a set of layers.

	* [Initializer](initializer.html) and its subclasses provide variant methods of initializing
	model parameters (stored in Tensor instances), following Uniform, Gaussian, etc.

	* [Loss](loss.html) and its subclasses defines the training objective loss functions.
	Both functions of computing the loss values and computing the gradient of the prediction w.r.t the
	objective loss are implemented. Example loss functions include squared error and cross entropy.

	* [Metric](metric.html) and its subclasses provide the function to measure the
	performance of the model, e.g., the accuracy.

	* [Optimizer](optimizer.html) and its subclasses implement the methods for updating
	model parameter values using parameter gradients, including SGD, AdaGrad, RMSProp etc.


	## IO

	The IO module consists of classes for data loading, data preprocessing and message passing.

	* Reader and its subclasses load string records from disk files
	* Writer and its subclasses write string records to disk files
	* Encoder and its subclasses encode Tensor instances into string records
	* Decoder and its subclasses decodes string records into Tensor instances
	* Endpoint represents a communication endpoint which provides functions for passing messages to each other.
	* Message represents communication messages between Endpoint instances. It carries both meta data and payload.