| # Layers |
| |
| --- |
| |
| Layer is a core abstraction in SINGA. It performs a variety of feature |
| transformations for extracting high-level features, e.g., loading raw features, |
| parsing RGB values, doing convolution transformation, etc. |
| |
| The *Basic user guide* section introduces the configuration of a built-in |
| layer. *Advanced user guide* explains how to extend the base Layer class to |
| implement users' functions. |
| |
| ## Basic user guide |
| |
| ### Layer configuration |
| |
| Configuration of two example layers are shown below, |
| |
| layer { |
| name: "data" |
| type: kCSVRecord |
| store_conf { } |
| } |
| layer{ |
| name: "fc1" |
| type: kInnerProduct |
| srclayers: "data" |
| innerproduct_conf{ } |
| param{ } |
| } |
| |
| There are some common fields for all kinds of layers: |
| |
| * `name`: a string used to differentiate two layers in a neural net. |
| * `type`: an integer used for identifying a specific Layer subclass. The types of built-in |
| layers are listed in LayerType (defined in job.proto). |
| For user-defined layer subclasses, `user_type` should be used instead of `type`. |
| * `srclayers`: names of the source layers. |
| In SINGA, all connections are [converted](neural-net.html) to directed connections. |
| * `param`: configuration for a [Param](param.html) instance. |
| There can be multiple Param objects in one layer. |
| |
| Different layers may have different configurations. These configurations |
| are defined in `<type>_conf`. E.g., "fc1" layer has |
| `innerproduct_conf`. The subsequent sections |
| explain the functionality of each built-in layer and how to configure it. |
| |
| ### Built-in Layer subclasses |
| SINGA has provided many built-in layers, which can be used directly to create neural nets. |
| These layers are categorized according to their functionalities, |
| |
| * Input layers for loading records (e.g., images) from disk files, HDFS or network into memory. |
| * Neuron layers for feature transformation, e.g., [convolution](../api/classsinga_1_1ConvolutionLayer.html), [pooling](../api/classsinga_1_1PoolingLayer.html), dropout, etc. |
| * Loss layers for measuring the training objective loss, e.g., Cross Entropy loss or Euclidean loss. |
| * Output layers for outputting the prediction results (e.g., probabilities of each category) or features into persistent storage, e.g., disk or HDFS. |
| * Connection layers for connecting layers when the neural net is partitioned. |
| |
| #### Input layers |
| |
| Input layers load training/test data from disk or other places (e.g., HDFS or network) |
| into memory. |
| |
| ##### StoreInputLayer |
| |
| [StoreInputLayer](../api/classsinga_1_1StoreInputLayer.html) is a base layer for |
| loading data from data store. The data store can be a KVFile or TextFile (LMDB, |
| LevelDB, HDFS, etc., will be supported later). Its `ComputeFeature` function reads |
| batchsize (string:key, string:value) tuples. Each tuple is parsed by a `Parse` function |
| implemented by its subclasses. |
| |
| The configuration for this layer is in `store_conf`, |
| |
| store_conf { |
| backend: # "kvfile" or "textfile" |
| path: # path to the data store |
| batchsize : 32 |
| prefetching: true #default value is false |
| ... |
| } |
| |
| ##### SingleLabelRecordLayer |
| |
| It is a subclass of StoreInputLayer. It assumes the (key, value) tuple loaded |
| from a data store contains a feature vector (and a label) for one data instance. |
| All feature vectors are of the same fixed length. The shape of one instance |
| is configured through the `shape` field, e.g., the following configuration |
| specifies the shape for the CIFAR10 images. |
| |
| store_conf { |
| shape: 3 #channels |
| shape: 32 #height |
| shape: 32 #width |
| } |
| |
| It may do some preprocessing like [standardization](http://ufldl.stanford.edu/wiki/index.php/Data_Preprocessing). |
| The data for preprocessing is loaded by and parsed in a virtual function, which is implemented by |
| its subclasses. |
| |
| ##### RecordInputLayer |
| |
| It is a subclass of SingleLabelRecordLayer. It parses the value field from one |
| tuple into a RecordProto, which is generated by Google Protobuf according |
| to common.proto. It can be used to store features for images (e.g., using the pixel field) |
| or other objects (using the data field). The key field is not parsed. |
| |
| type: kRecordInput |
| store_conf { |
| has_label: # default is true |
| ... |
| } |
| |
| ##### CSVInputLayer |
| |
| It is a subclass of SingleLabelRecordLayer. The value field from one tuple is parsed |
| as a CSV line (separated by comma). The first number would be parsed as a label if |
| `has_label` is configured in `store_conf`. Otherwise, all numbers would be parsed |
| into one row of the `data_` Blob. |
| |
| type: kCSVInput |
| store_conf { |
| has_label: # default is true |
| ... |
| } |
| |
| ##### ImagePreprocessLayer |
| |
| This layer does image preprocessing, e.g., cropping, mirroring and scaling, against |
| the data Blob from its source layer. It deprecates the RGBImageLayer which |
| works on the Record from ShardDataLayer. It still uses the same configuration as |
| RGBImageLayer, |
| |
| type: kImagePreprocess |
| rgbimage_conf { |
| scale: float |
| cropsize: int # cropping each image to keep the central part with this size |
| mirror: bool # mirror the image by set image[i,j]=image[i,len-j] |
| meanfile: "Image_Mean_File_Path" |
| } |
| |
| ##### ShardDataLayer (Deprected) |
| Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. |
| |
| [ShardDataLayer](../api/classsinga_1_1ShardDataLayer.html) is a subclass of DataLayer, |
| which reads Records from disk file. The file should be created using |
| [DataShard](../api/classsinga_1_1DataShard.html) |
| class. With the data file prepared, users configure the layer as |
| |
| type: kShardData |
| sharddata_conf { |
| path: "path to data shard folder" |
| batchsize: int |
| random_skip: int |
| } |
| |
| `batchsize` specifies the number of records to be trained for one mini-batch. |
| The first `rand() % random_skip` `Record`s will be skipped at the first |
| iteration. This is to enforce that different workers work on different Records. |
| |
| ##### LMDBDataLayer (Deprected) |
| Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. |
| |
| [LMDBDataLayer] is similar to ShardDataLayer, except that the Records are |
| loaded from LMDB. |
| |
| type: kLMDBData |
| lmdbdata_conf { |
| path: "path to LMDB folder" |
| batchsize: int |
| random_skip: int |
| } |
| |
| ##### ParserLayer (Deprected) |
| Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. |
| |
| It get a vector of Records from DataLayer and parse features into |
| a Blob. |
| |
| virtual void ParseRecords(Phase phase, const vector<Record>& records, Blob<float>* blob) = 0; |
| |
| |
| ##### LabelLayer (Deprected) |
| Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. |
| |
| [LabelLayer](../api/classsinga_1_1LabelLayer.html) is a subclass of ParserLayer. |
| It parses a single label from each Record. Consequently, it |
| will put $b$ (mini-batch size) values into the Blob. It has no specific configuration fields. |
| |
| |
| ##### MnistImageLayer (Deprected) |
| Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. |
| [MnistImageLayer] is a subclass of ParserLayer. It parses the pixel values of |
| each image from the MNIST dataset. The pixel |
| values may be normalized as `x/norm_a - norm_b`. For example, if `norm_a` is |
| set to 255 and `norm_b` is set to 0, then every pixel will be normalized into |
| [0, 1]. |
| |
| type: kMnistImage |
| mnistimage_conf { |
| norm_a: float |
| norm_b: float |
| } |
| |
| ##### RGBImageLayer (Deprected) |
| Deprected! Please use the ImagePreprocessLayer. |
| [RGBImageLayer](../api/classsinga_1_1RGBImageLayer.html) is a subclass of ParserLayer. |
| It parses the RGB values of one image from each Record. It may also |
| apply some transformations, e.g., cropping, mirroring operations. If the |
| `meanfile` is specified, it should point to a path that contains one Record for |
| the mean of each pixel over all training images. |
| |
| type: kRGBImage |
| rgbimage_conf { |
| scale: float |
| cropsize: int # cropping each image to keep the central part with this size |
| mirror: bool # mirror the image by set image[i,j]=image[i,len-j] |
| meanfile: "Image_Mean_File_Path" |
| } |
| |
| ##### PrefetchLayer |
| |
| [PrefetchLayer](../api/classsinga_1_1PrefetchLayer.html) embeds other input layers |
| to do data prefeching. It will launch a thread to call the embedded layers to load and extract features. |
| It ensures that the I/O task and computation task can work simultaneously. |
| One example PrefetchLayer configuration is, |
| |
| layer { |
| name: "prefetch" |
| type: kPrefetch |
| sublayers { |
| name: "data" |
| type: kShardData |
| sharddata_conf { } |
| } |
| sublayers { |
| name: "rgb" |
| type: kRGBImage |
| srclayers:"data" |
| rgbimage_conf { } |
| } |
| sublayers { |
| name: "label" |
| type: kLabel |
| srclayers: "data" |
| } |
| exclude:kTest |
| } |
| |
| The layers on top of the PrefetchLayer should use the name of the embedded |
| layers as their source layers. For example, the "rgb" and "label" should be |
| configured to the `srclayers` of other layers. |
| |
| |
| #### Output Layers |
| |
| Output layers get data from their source layers and write them to persistent storage, |
| e.g., disk files or HDFS (to be supported). |
| |
| ##### RecordOutputLayer |
| |
| This layer gets data (and label if it is available) from its source layer and converts it into records of type |
| RecordProto. Records are written as (key = instance No., value = serialized record) tuples into Store, e.g., KVFile. The configuration of this layer |
| should include the specifics of the Store backend via `store_conf`. |
| |
| layer { |
| name: "output" |
| type: kRecordOutput |
| srclayers: |
| store_conf { |
| backend: "kvfile" |
| path: |
| } |
| } |
| |
| ##### CSVOutputLayer |
| This layer gets data (and label if it available) from its source layer and converts it into |
| a string per instance with fields separated by commas (i.e., CSV format). The shape information |
| is not kept in the string. All strings are written into |
| Store, e.g., text file. The configuration of this layer should include the specifics of the Store backend via `store_conf`. |
| |
| layer { |
| name: "output" |
| type: kCSVOutput |
| srclayers: |
| store_conf { |
| backend: "textfile" |
| path: |
| } |
| } |
| |
| #### Neuron Layers |
| |
| Neuron layers conduct feature transformations. |
| |
| #### ActivationLayer |
| |
| type: kActivation |
| activation_conf { |
| type: {RELU, SIGMOID, TANH, STANH} |
| } |
| |
| ##### ConvolutionLayer |
| |
| [ConvolutionLayer](../api/classsinga_1_1ConvolutionLayer.html) conducts convolution transformation. |
| |
| type: kConvolution |
| convolution_conf { |
| num_filters: int |
| kernel: int |
| stride: int |
| pad: int |
| } |
| param { } # weight/filter matrix |
| param { } # bias vector |
| |
| The int value `num_filters` stands for the count of the applied filters; the int |
| value `kernel` stands for the convolution kernel size (equal width and height); |
| the int value `stride` stands for the distance between the successive filters; |
| the int value `pad` pads each with a given int number of pixels border of |
| zeros. |
| |
| ##### InnerProductLayer |
| |
| [InnerProductLayer](../api/classsinga_1_1InnerProductLayer.html) is fully connected with its (single) source layer. |
| Typically, it has two parameter fields, one for weight matrix, and the other |
| for bias vector. It rotates the feature of the source layer (by multiplying with weight matrix) and |
| shifts it (by adding the bias vector). |
| |
| type: kInnerProduct |
| innerproduct_conf { |
| num_output: int |
| } |
| param { } # weight matrix |
| param { } # bias vector |
| |
| |
| ##### PoolingLayer |
| |
| [PoolingLayer](../api/classsinga_1_1PoolingLayer.html) is used to do a normalization (or averaging or sampling) of the |
| feature vectors from the source layer. |
| |
| type: kPooling |
| pooling_conf { |
| pool: AVE|MAX // Choose whether use the Average Pooling or Max Pooling |
| kernel: int // size of the kernel filter |
| pad: int // the padding size |
| stride: int // the step length of the filter |
| } |
| |
| The pooling layer has two methods: Average Pooling and Max Pooling. |
| Use the enum AVE and MAX to choose the method. |
| |
| * Max Pooling selects the max value for each filtering area as a point of the |
| result feature blob. |
| * Average Pooling averages all values for each filtering area at a point of the |
| result feature blob. |
| |
| ##### ReLULayer |
| |
| [ReLuLayer](../api/classsinga_1_1ReLULayer.html) has rectified linear neurons, which conducts the following |
| transformation, `f(x) = Max(0, x)`. It has no specific configuration fields. |
| |
| ##### STanhLayer |
| |
| [STanhLayer](../api/classsinga_1_1TanhLayer.html) uses the scaled tanh as activation function, i.e., `f(x)=1.7159047* tanh(0.6666667 * x)`. |
| It has no specific configuration fields. |
| |
| ##### SigmoidLayer |
| |
| [SigmoidLayer] uses the sigmoid (or logistic) as activation function, i.e., |
| `f(x)=sigmoid(x)`. It has no specific configuration fields. |
| |
| |
| ##### Dropout Layer |
| [DropoutLayer](../api/asssinga_1_1DropoutLayer.html) is a layer that randomly dropouts some inputs. |
| This scheme helps deep learning model away from over-fitting. |
| |
| type: kDropout |
| dropout_conf { |
| dropout_ratio: float # dropout probability |
| } |
| |
| ##### LRNLayer |
| [LRNLayer](../api/classsinga_1_1LRNLayer.html), (Local Response Normalization), normalizes over the channels. |
| |
| type: kLRN |
| lrn_conf { |
| local_size: int |
| alpha: float // scaling parameter |
| beta: float // exponential number |
| } |
| |
| `local_size` specifies the quantity of the adjoining channels which will be summed up. |
| For `WITHIN_CHANNEL`, it means the side length of the space region which will be summed up. |
| |
| |
| |
| ### CuDNN layers |
| |
| CuDNN v3 and v4 are supported in SINGA, which include the following layers, |
| |
| * CudnnActivationLayer (activation functions are SIGMOID, TANH, RELU) |
| * CudnnConvLayer |
| * CudnnLRNLayer |
| * CudnnPoolLayer |
| * CudnnSoftmaxLayer |
| |
| These layers have the same configuration as the corresponding CPU layers. |
| For CuDNN v4, the batch normalization layer is added, which is named as |
| `CudnnBMLayer`. |
| |
| |
| #### Loss Layers |
| |
| Loss layers measures the objective training loss. |
| |
| ##### SoftmaxLossLayer |
| |
| [SoftmaxLossLayer](../api/classsinga_1_1SoftmaxLossLayer.html) is a combination of the Softmax transformation and |
| Cross-Entropy loss. It applies Softmax firstly to get a prediction probability |
| for each output unit (neuron) and compute the cross-entropy against the ground truth. |
| It is generally used as the final layer to generate labels for classification tasks. |
| |
| type: kSoftmaxLoss |
| softmaxloss_conf { |
| topk: int |
| } |
| |
| The configuration field `topk` is for selecting the labels with `topk` |
| probabilities as the prediction results. It is tedious for users to view the |
| prediction probability of every label. |
| |
| #### ConnectionLayer |
| |
| Subclasses of ConnectionLayer are utility layers that connects other layers due |
| to neural net partitioning or other cases. |
| |
| ##### ConcateLayer |
| |
| [ConcateLayer](../api/classsinga_1_1ConcateLayer.html) connects more than one source layers to concatenate their feature |
| blob along given dimension. |
| |
| type: kConcate |
| concate_conf { |
| concate_dim: int // define the dimension |
| } |
| |
| ##### SliceLayer |
| |
| [SliceLayer](../api/classsinga_1_1SliceLayer.html) connects to more than one destination layers to slice its feature |
| blob along given dimension. |
| |
| type: kSlice |
| slice_conf { |
| slice_dim: int |
| } |
| |
| ##### SplitLayer |
| |
| [SplitLayer](../api/classsinga_1_1SplitLayer.html) connects to more than one destination layers to replicate its |
| feature blob. |
| |
| type: kSplit |
| split_conf { |
| num_splits: int |
| } |
| |
| ##### BridgeSrcLayer & BridgeDstLayer |
| |
| [BridgeSrcLayer](../api/classsinga_1_1BridgeSrcLayer.html) & |
| [BridgeDstLayer](../api/classsinga_1_1BridgeDstLayer.html) are utility layers assisting data (e.g., feature or |
| gradient) transferring due to neural net partitioning. These two layers are |
| added implicitly. Users typically do not need to configure them in their neural |
| net configuration. |
| |
| ### OutputLayer |
| |
| It write the prediction results or the extracted features into file, HTTP stream |
| or other places. Currently SINGA has not implemented any specific output layer. |
| |
| ## Advanced user guide |
| |
| The base Layer class is introduced in this section, followed by how to |
| implement a new Layer subclass. |
| |
| ### Base Layer class |
| |
| #### Members |
| |
| LayerProto layer_conf_; |
| vector<Blob<float>> datavec_, gradvec_; |
| vector<AuxType> aux_data_; |
| |
| The base layer class keeps the user configuration in `layer_conf_`. |
| `datavec_` stores the features associated with this layer. |
| There are layers without feature vectors; instead, they share the data from |
| source layers. |
| The `gradvec_` is for storing the gradients of the |
| objective loss w.r.t. the `datavec_`. The `aux_data_` stores the auxiliary data, e.g., image label (set `AuxType` to int). |
| If images have variant number of labels, the AuxType can be defined to `vector<int>`. |
| Currently, we hard code `AuxType` to int. It will be added as a template argument of Layer class later. |
| |
| If a layer has parameters, these parameters are declared using type |
| [Param](param.html). Since some layers do not have |
| parameters, we do not declare any `Param` in the base layer class. |
| |
| #### Functions |
| |
| virtual void Setup(const LayerProto& conf, const vector<Layer*>& srclayers); |
| virtual void ComputeFeature(int flag, const vector<Layer*>& srclayers) = 0; |
| virtual void ComputeGradient(int flag, const vector<Layer*>& srclayers) = 0; |
| |
| The `Setup` function reads user configuration, i.e. `conf`, and information |
| from source layers, e.g., mini-batch size, to set the |
| shape of the `data_` (and `grad_`) field as well |
| as some other layer specific fields. |
| Memory will not be allocated until computation over the data structure happens. |
| |
| The `ComputeFeature` function evaluates the feature blob by transforming (e.g. |
| convolution and pooling) features from the source layers. `ComputeGradient` |
| computes the gradients of parameters associated with this layer. These two |
| functions are invoked by the [TrainOneBatch](train-one-batch.html) |
| function during training. Hence, they should be consistent with the |
| `TrainOneBatch` function. Particularly, for feed-forward and RNN models, they are |
| trained using [BP algorithm](train-one-batch.html#back-propagation), |
| which requires each layer's `ComputeFeature` |
| function to compute `data_` based on source layers, and requires each layer's |
| `ComputeGradient` to compute gradients of parameters and source layers' |
| `grad_`. For energy models, e.g., RBM, they are trained by |
| [CD algorithm](train-one-batch.html#contrastive-divergence), which |
| requires each layer's `ComputeFeature` function to compute the feature vectors |
| for the positive phase or negative phase depending on the `phase` argument, and |
| requires the `ComputeGradient` function to only compute parameter gradients. |
| For some layers, e.g., loss layer or output layer, they can put the loss or |
| prediction result into the `metric` argument, which will be averaged and |
| displayed periodically. |
| |
| ### Implementing a new Layer subclass |
| |
| Users can extend the Layer class or other subclasses to implement their own feature transformation |
| logics as long as the two virtual functions are overridden to be consistent with |
| the `TrainOneBatch` function. The `Setup` function may also be overridden to |
| read specific layer configuration. |
| |
| The [RNNLM](rnn.html) provides a couple of user-defined layers. You can refer to them as examples. |
| |
| #### Layer specific protocol message |
| |
| To implement a new layer, the first step is to define the layer specific |
| configuration. Suppose the new layer is `FooLayer`, the layer specific |
| google protocol message `FooLayerProto` should be defined as |
| |
| # in user.proto |
| package singa |
| import "job.proto" |
| message FooLayerProto { |
| optional int32 a = 1; // specific fields to the FooLayer |
| } |
| |
| In addition, users need to extend the original `LayerProto` (defined in job.proto of SINGA) |
| to include the `foo_conf` as follows. |
| |
| extend LayerProto { |
| optional FooLayerProto foo_conf = 101; // unique field id, reserved for extensions |
| } |
| |
| If there are multiple new layers, then each layer that has specific |
| configurations would have a `<type>_conf` field and takes one unique extension number. |
| SINGA has reserved enough extension numbers, e.g., starting from 101 to 1000. |
| |
| # job.proto of SINGA |
| LayerProto { |
| ... |
| extensions 101 to 1000; |
| } |
| |
| With user.proto defined, users can use |
| [protoc](https://developers.google.com/protocol-buffers/) to generate the `user.pb.cc` |
| and `user.pb.h` files. In users' code, the extension fields can be accessed via, |
| |
| auto conf = layer_proto_.GetExtension(foo_conf); |
| int a = conf.a(); |
| |
| When defining configurations of the new layer (in job.conf), users should use |
| `user_type` for its layer type instead of `type`. In addition, `foo_conf` |
| should be enclosed in brackets. |
| |
| layer { |
| name: "foo" |
| user_type: "kFooLayer" # Note user_type of user-defined layers is string |
| [foo_conf] { # Note there is a pair of [] for extension fields |
| a: 10 |
| } |
| } |
| |
| #### New Layer subclass declaration |
| |
| The new layer subclass can be implemented like the built-in layer subclasses. |
| |
| class FooLayer : public singa::Layer { |
| public: |
| void Setup(const LayerProto& conf, const vector<Layer*>& srclayers) override; |
| void ComputeFeature(int flag, const vector<Layer*>& srclayers) override; |
| void ComputeGradient(int flag, const vector<Layer*>& srclayers) override; |
| |
| private: |
| // members |
| }; |
| |
| Users must override the two virtual functions to be called by the |
| `TrainOneBatch` for either BP or CD algorithm. Typically, the `Setup` function |
| will also be overridden to initialize some members. The user configured fields |
| can be accessed through `layer_conf_` as shown in the above paragraphs. |
| |
| #### New Layer subclass registration |
| |
| The newly defined layer should be registered in [main.cc](http://singa.incubator.apache.org/docs/programming-guide) by adding |
| |
| driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); // "kFooLayer" should be matched to layer configurations in job.conf. |
| |
| After that, the [NeuralNet](neural-net.html) can create instances of the new Layer subclass. |