content/v0.3.0/en/_sources/docs/layer.txt - singa-site - Git at Google

 # Layers

 ---

 Layer is a core abstraction in SINGA. It performs a variety of feature
 transformations for extracting high-level features, e.g., loading raw features,
 parsing RGB values, doing convolution transformation, etc.

 The *Basic user guide* section introduces the configuration of a built-in
 layer. *Advanced user guide* explains how to extend the base Layer class to
 implement users' functions.

 ## Basic user guide

 ### Layer configuration

 Configuration of two example layers are shown below,

     layer {
       name: "data"
       type: kCSVRecord
       store_conf { }
     }
     layer{
       name: "fc1"
       type: kInnerProduct
       srclayers: "data"
       innerproduct_conf{ }
       param{ }
     }

 There are some common fields for all kinds of layers:

   * `name`: a string used to differentiate two layers in a neural net.
   * `type`: an integer used for identifying a specific Layer subclass. The types of built-in
   layers are listed in LayerType (defined in job.proto).
   For user-defined layer subclasses, `user_type` should be used instead of `type`.
   * `srclayers`: names of the source layers.
   In SINGA, all connections are [converted](neural-net.html) to directed connections.
   * `param`: configuration for a [Param](param.html) instance.
   There can be multiple Param objects in one layer.

 Different layers may have different configurations. These configurations
 are defined in `<type>_conf`.  E.g., "fc1" layer has
 `innerproduct_conf`. The subsequent sections
 explain the functionality of each built-in layer and how to configure it.

 ### Built-in Layer subclasses
 SINGA has provided many built-in layers, which can be used directly to create neural nets.
 These layers are categorized according to their functionalities,

   * Input layers for loading records (e.g., images) from disk files, HDFS or network into memory.
   * Neuron layers for feature transformation, e.g., [convolution](../api/classsinga_1_1ConvolutionLayer.html), [pooling](../api/classsinga_1_1PoolingLayer.html), dropout, etc.
   * Loss layers for measuring the training objective loss, e.g., Cross Entropy loss or Euclidean loss.
   * Output layers for outputting the prediction results (e.g., probabilities of each category) or features into persistent storage, e.g., disk or HDFS.
   * Connection layers for connecting layers when the neural net is partitioned.

 #### Input layers

 Input layers load training/test data from disk or other places (e.g., HDFS or network)
 into memory.

 ##### StoreInputLayer

 [StoreInputLayer](../api/classsinga_1_1StoreInputLayer.html) is a base layer for
 loading data from data store. The data store can be a KVFile or TextFile (LMDB,
 LevelDB, HDFS, etc., will be supported later). Its `ComputeFeature` function reads
 batchsize (string:key, string:value) tuples. Each tuple is parsed by a `Parse` function
 implemented by its subclasses.

 The configuration for this layer is in `store_conf`,

     store_conf {
       backend: # "kvfile" or "textfile"
       path: # path to the data store
       batchsize : 32
       prefetching: true #default value is false
       ...
     }

 ##### SingleLabelRecordLayer

 It is a subclass of StoreInputLayer. It assumes the (key, value) tuple loaded
 from a data store contains a feature vector (and a label) for one data instance.
 All feature vectors are of the same fixed length. The shape of one instance
 is configured through the `shape` field, e.g., the following configuration
 specifies the shape for the CIFAR10 images.

     store_conf {
       shape: 3  #channels
       shape: 32 #height
       shape: 32 #width
     }

 It may do some preprocessing like [standardization](http://ufldl.stanford.edu/wiki/index.php/Data_Preprocessing).
 The data for preprocessing is loaded by and parsed in a virtual function, which is implemented by
 its subclasses.

 ##### RecordInputLayer

 It is a subclass of SingleLabelRecordLayer. It parses the value field from one
 tuple into a RecordProto, which is generated by Google Protobuf according
 to common.proto.  It can be used to store features for images (e.g., using the pixel field)
 or other objects (using the data field). The key field is not parsed.

     type: kRecordInput
     store_conf {
       has_label: # default is true
       ...
     }

 ##### CSVInputLayer

 It is a subclass of SingleLabelRecordLayer. The value field from one tuple is parsed
 as a CSV line (separated by comma). The first number would be parsed as a label if
 `has_label` is configured in `store_conf`. Otherwise, all numbers would be parsed
 into one row of the `data_` Blob.

     type: kCSVInput
     store_conf {
       has_label: # default is true
       ...
     }

 ##### ImagePreprocessLayer

 This layer does image preprocessing, e.g., cropping, mirroring and scaling, against
 the data Blob from its source layer. It deprecates the RGBImageLayer which
 works on the Record from ShardDataLayer. It still uses the same configuration as
 RGBImageLayer,

     type: kImagePreprocess
     rgbimage_conf {
       scale: float
       cropsize: int  # cropping each image to keep the central part with this size
       mirror: bool  # mirror the image by set image[i,j]=image[i,len-j]
       meanfile: "Image_Mean_File_Path"
     }

 ##### ShardDataLayer (Deprected)
 Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

 [ShardDataLayer](../api/classsinga_1_1ShardDataLayer.html) is a subclass of DataLayer,
 which reads Records from disk file. The file should be created using
 [DataShard](../api/classsinga_1_1DataShard.html)
 class. With the data file prepared, users configure the layer as

     type: kShardData
     sharddata_conf {
       path: "path to data shard folder"
       batchsize: int
       random_skip: int
     }

 `batchsize` specifies the number of records to be trained for one mini-batch.
 The first `rand() % random_skip` `Record`s will be skipped at the first
 iteration. This is to enforce that different workers work on different Records.

 ##### LMDBDataLayer (Deprected)
 Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

 [LMDBDataLayer] is similar to ShardDataLayer, except that the Records are
 loaded from LMDB.

     type: kLMDBData
     lmdbdata_conf {
       path: "path to LMDB folder"
       batchsize: int
       random_skip: int
     }

 ##### ParserLayer (Deprected)
 Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

 It get a vector of Records from DataLayer and parse features into
 a Blob.

     virtual void ParseRecords(Phase phase, const vector<Record>& records, Blob<float>* blob) = 0;


 ##### LabelLayer (Deprected)
 Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.

 [LabelLayer](../api/classsinga_1_1LabelLayer.html) is a subclass of ParserLayer.
 It parses a single label from each Record. Consequently, it
 will put $b$ (mini-batch size) values into the Blob. It has no specific configuration fields.


 ##### MnistImageLayer (Deprected)
 Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.
 [MnistImageLayer] is a subclass of ParserLayer. It parses the pixel values of
 each image from the MNIST dataset. The pixel
 values may be normalized as `x/norm_a - norm_b`. For example, if `norm_a` is
 set to 255 and `norm_b` is set to 0, then every pixel will be normalized into
 [0, 1].

     type: kMnistImage
     mnistimage_conf {
       norm_a: float
       norm_b: float
     }

 ##### RGBImageLayer (Deprected)
 Deprected! Please use the ImagePreprocessLayer.
 [RGBImageLayer](../api/classsinga_1_1RGBImageLayer.html) is a subclass of ParserLayer.
 It parses the RGB values of one image from each Record. It may also
 apply some transformations, e.g., cropping, mirroring operations. If the
 `meanfile` is specified, it should point to a path that contains one Record for
 the mean of each pixel over all training images.

     type: kRGBImage
     rgbimage_conf {
       scale: float
       cropsize: int  # cropping each image to keep the central part with this size
       mirror: bool  # mirror the image by set image[i,j]=image[i,len-j]
       meanfile: "Image_Mean_File_Path"
     }

 ##### PrefetchLayer

 [PrefetchLayer](../api/classsinga_1_1PrefetchLayer.html) embeds other input layers
 to do data prefeching.  It will launch a thread to call the embedded layers to load and extract features.
 It ensures that the I/O task and computation task can work simultaneously.
 One example PrefetchLayer configuration is,

     layer {
       name: "prefetch"
       type: kPrefetch
       sublayers {
         name: "data"
         type: kShardData
         sharddata_conf { }
       }
       sublayers {
         name: "rgb"
         type: kRGBImage
         srclayers:"data"
         rgbimage_conf { }
       }
       sublayers {
         name: "label"
         type: kLabel
         srclayers: "data"
       }
       exclude:kTest
     }

 The layers on top of the PrefetchLayer should use the name of the embedded
 layers as their source layers. For example, the "rgb" and "label" should be
 configured to the `srclayers` of other layers.


 #### Output Layers

 Output layers get data from their source layers and write them to persistent storage,
 e.g., disk files or HDFS (to be supported).

 ##### RecordOutputLayer

 This layer gets data (and label if it is available) from its source layer and converts it into records of type
 RecordProto. Records are written as (key = instance No., value = serialized record) tuples into Store, e.g., KVFile. The configuration of this layer
 should include the specifics of the Store backend via `store_conf`.

     layer {
       name: "output"
       type: kRecordOutput
       srclayers:
       store_conf {
         backend: "kvfile"
         path:
       }
     }

 ##### CSVOutputLayer
 This layer gets data (and label if it available) from its source layer and converts it into
 a string per instance with fields separated by commas (i.e., CSV format). The shape information
 is not kept in the string. All strings are written into
 Store, e.g., text file. The configuration of this layer should include the specifics of the Store backend via `store_conf`.

     layer {
       name: "output"
       type: kCSVOutput
       srclayers:
       store_conf {
         backend: "textfile"
         path:
       }
     }

 #### Neuron Layers

 Neuron layers conduct feature transformations.

 #### ActivationLayer

     type: kActivation
     activation_conf {
       type: {RELU, SIGMOID, TANH, STANH}
     }

 ##### ConvolutionLayer

 [ConvolutionLayer](../api/classsinga_1_1ConvolutionLayer.html) conducts convolution transformation.

     type: kConvolution
     convolution_conf {
       num_filters: int
       kernel: int
       stride: int
       pad: int
     }
     param { } # weight/filter matrix
     param { } # bias vector

 The int value `num_filters` stands for the count of the applied filters; the int
 value `kernel` stands for the convolution kernel size (equal width and height);
 the int value `stride` stands for the distance between the successive filters;
 the int value `pad` pads each with a given int number of pixels border of
 zeros.

 ##### InnerProductLayer

 [InnerProductLayer](../api/classsinga_1_1InnerProductLayer.html) is fully connected with its (single) source layer.
 Typically, it has two parameter fields, one for weight matrix, and the other
 for bias vector. It rotates the feature of the source layer (by multiplying with weight matrix) and
 shifts it (by adding the bias vector).

     type: kInnerProduct
     innerproduct_conf {
       num_output: int
     }
     param { } # weight matrix
     param { } # bias vector


 ##### PoolingLayer

 [PoolingLayer](../api/classsinga_1_1PoolingLayer.html) is used to do a normalization (or averaging or sampling) of the
 feature vectors from the source layer.

     type: kPooling
     pooling_conf {
       pool: AVE|MAX // Choose whether use the Average Pooling or Max Pooling
       kernel: int   // size of the kernel filter
       pad: int      // the padding size
       stride: int   // the step length of the filter
     }

 The pooling layer has two methods: Average Pooling and Max Pooling.
 Use the enum AVE and MAX to choose the method.

   * Max Pooling selects the max value for each filtering area as a point of the
   result feature blob.
   * Average Pooling averages all values for each filtering area at a point of the
     result feature blob.

 ##### ReLULayer

 [ReLuLayer](../api/classsinga_1_1ReLULayer.html) has rectified linear neurons, which conducts the following
 transformation, `f(x) = Max(0, x)`. It has no specific configuration fields.

 ##### STanhLayer

 [STanhLayer](../api/classsinga_1_1TanhLayer.html) uses the scaled tanh as activation function, i.e., `f(x)=1.7159047* tanh(0.6666667 * x)`.
 It has no specific configuration fields.

 ##### SigmoidLayer

 [SigmoidLayer] uses the sigmoid (or logistic) as activation function, i.e.,
 `f(x)=sigmoid(x)`.  It has no specific configuration fields.


 ##### Dropout Layer
 [DropoutLayer](../api/asssinga_1_1DropoutLayer.html) is a layer that randomly dropouts some inputs.
 This scheme helps deep learning model away from over-fitting.

     type: kDropout
     dropout_conf {
       dropout_ratio: float # dropout probability
     }

 ##### LRNLayer
 [LRNLayer](../api/classsinga_1_1LRNLayer.html), (Local Response Normalization), normalizes over the channels.

     type: kLRN
     lrn_conf {
       local_size: int
       alpha: float  // scaling parameter
       beta: float   // exponential number
     }

 `local_size` specifies  the quantity of the adjoining channels which will be summed up.
  For `WITHIN_CHANNEL`, it means the side length of the space region which will be summed up.


 ### CuDNN layers

 CuDNN v3 and v4 are supported in SINGA, which include the following layers,

 * CudnnActivationLayer (activation functions are SIGMOID, TANH, RELU)
 * CudnnConvLayer
 * CudnnLRNLayer
 * CudnnPoolLayer
 * CudnnSoftmaxLayer

 These layers have the same configuration as the corresponding CPU layers.
 For CuDNN v4, the batch normalization layer is added, which is named as
 `CudnnBMLayer`.


 #### Loss Layers

 Loss layers measures the objective training loss.

 ##### SoftmaxLossLayer

 [SoftmaxLossLayer](../api/classsinga_1_1SoftmaxLossLayer.html) is a combination of the Softmax transformation and
 Cross-Entropy loss. It applies Softmax firstly to get a prediction probability
 for each output unit (neuron) and compute the cross-entropy against the ground truth.
 It is generally used as the final layer to generate labels for classification tasks.

     type: kSoftmaxLoss
     softmaxloss_conf {
       topk: int
     }

 The configuration field `topk` is for selecting the labels with `topk`
 probabilities as the prediction results. It is tedious for users to view the
 prediction probability of every label.

 #### ConnectionLayer

 Subclasses of ConnectionLayer are utility layers that connects other layers due
 to neural net partitioning or other cases.

 ##### ConcateLayer

 [ConcateLayer](../api/classsinga_1_1ConcateLayer.html) connects more than one source layers to concatenate their feature
 blob along given dimension.

     type: kConcate
     concate_conf {
       concate_dim: int  // define the dimension
     }

 ##### SliceLayer

 [SliceLayer](../api/classsinga_1_1SliceLayer.html) connects to more than one destination layers to slice its feature
 blob along given dimension.

     type: kSlice
     slice_conf {
       slice_dim: int
     }

 ##### SplitLayer

 [SplitLayer](../api/classsinga_1_1SplitLayer.html) connects to more than one destination layers to replicate its
 feature blob.

     type: kSplit
     split_conf {
       num_splits: int
     }

 ##### BridgeSrcLayer & BridgeDstLayer

 [BridgeSrcLayer](../api/classsinga_1_1BridgeSrcLayer.html) &
 [BridgeDstLayer](../api/classsinga_1_1BridgeDstLayer.html) are utility layers assisting data (e.g., feature or
 gradient) transferring due to neural net partitioning. These two layers are
 added implicitly. Users typically do not need to configure them in their neural
 net configuration.

 ### OutputLayer

 It write the prediction results or the extracted features into file, HTTP stream
 or other places. Currently SINGA has not implemented any specific output layer.

 ## Advanced user guide

 The base Layer class is introduced in this section, followed by how to
 implement a new Layer subclass.

 ### Base Layer class

 #### Members

     LayerProto layer_conf_;
     vector<Blob<float>> datavec_, gradvec_;
     vector<AuxType> aux_data_;

 The base layer class keeps the user configuration in `layer_conf_`.
 `datavec_` stores the features associated with this layer.
 There are layers without feature vectors; instead, they share the data from
 source layers.
 The `gradvec_` is for storing the gradients of the
 objective loss w.r.t. the `datavec_`. The `aux_data_` stores the auxiliary data, e.g., image label (set `AuxType` to int).
 If images have variant number of labels, the AuxType can be defined to `vector<int>`.
 Currently, we hard code `AuxType` to int. It will be added as a template argument of Layer class later.

 If a layer has parameters, these parameters are declared using type
 [Param](param.html). Since some layers do not have
 parameters, we do not declare any `Param` in the base layer class.

 #### Functions

     virtual void Setup(const LayerProto& conf, const vector<Layer*>& srclayers);
     virtual void ComputeFeature(int flag, const vector<Layer*>& srclayers) = 0;
     virtual void ComputeGradient(int flag, const vector<Layer*>& srclayers) = 0;

 The `Setup` function reads user configuration, i.e. `conf`, and information
 from source layers, e.g., mini-batch size,  to set the
 shape of the `data_` (and `grad_`) field as well
 as some other layer specific fields.
 Memory will not be allocated until computation over the data structure happens.

 The `ComputeFeature` function evaluates the feature blob by transforming (e.g.
 convolution and pooling) features from the source layers.  `ComputeGradient`
 computes the gradients of parameters associated with this layer.  These two
 functions are invoked by the [TrainOneBatch](train-one-batch.html)
 function during training. Hence, they should be consistent with the
 `TrainOneBatch` function. Particularly, for feed-forward and RNN models, they are
 trained using [BP algorithm](train-one-batch.html#back-propagation),
 which requires each layer's `ComputeFeature`
 function to compute `data_` based on source layers, and requires each layer's
 `ComputeGradient` to compute gradients of parameters and source layers'
 `grad_`. For energy models, e.g., RBM, they are trained by
 [CD algorithm](train-one-batch.html#contrastive-divergence), which
 requires each layer's `ComputeFeature` function to compute the feature vectors
 for the positive phase or negative phase depending on the `phase` argument, and
 requires the `ComputeGradient` function to only compute parameter gradients.
 For some layers, e.g., loss layer or output layer, they can put the loss or
 prediction result into the `metric` argument, which will be averaged and
 displayed periodically.

 ### Implementing a new Layer subclass

 Users can extend the Layer class or other subclasses to implement their own feature transformation
 logics as long as the two virtual functions are overridden to be consistent with
 the `TrainOneBatch` function. The `Setup` function may also be overridden to
 read specific layer configuration.

 The [RNNLM](rnn.html) provides a couple of user-defined layers. You can refer to them as examples.

 #### Layer specific protocol message

 To implement a new layer, the first step is to define the layer specific
 configuration. Suppose the new layer is `FooLayer`, the layer specific
 google protocol message `FooLayerProto` should be defined as

     # in user.proto
     package singa
     import "job.proto"
     message FooLayerProto {
       optional int32 a = 1;  // specific fields to the FooLayer
     }

 In addition, users need to extend the original `LayerProto` (defined in job.proto of SINGA)
 to include the `foo_conf` as follows.

     extend LayerProto {
       optional FooLayerProto foo_conf = 101;  // unique field id, reserved for extensions
     }

 If there are multiple new layers, then each layer that has specific
 configurations would have a `<type>_conf` field and takes one unique extension number.
 SINGA has reserved enough extension numbers, e.g., starting from 101 to 1000.

     # job.proto of SINGA
     LayerProto {
       ...
       extensions 101 to 1000;
     }

 With user.proto defined, users can use
 [protoc](https://developers.google.com/protocol-buffers/) to generate the `user.pb.cc`
 and `user.pb.h` files.  In users' code, the extension fields can be accessed via,

     auto conf = layer_proto_.GetExtension(foo_conf);
     int a = conf.a();

 When defining configurations of the new layer (in job.conf), users should use
 `user_type` for its layer type instead of `type`. In addition, `foo_conf`
 should be enclosed in brackets.

     layer {
       name: "foo"
       user_type: "kFooLayer"  # Note user_type of user-defined layers is string
       [foo_conf] {      # Note there is a pair of [] for extension fields
         a: 10
       }
     }

 #### New Layer subclass declaration

 The new layer subclass can be implemented like the built-in layer subclasses.

     class FooLayer : public singa::Layer {
      public:
       void Setup(const LayerProto& conf, const vector<Layer*>& srclayers) override;
       void ComputeFeature(int flag, const vector<Layer*>& srclayers) override;
       void ComputeGradient(int flag, const vector<Layer*>& srclayers) override;

      private:
       //  members
     };

 Users must override the two virtual functions to be called by the
 `TrainOneBatch` for either BP or CD algorithm. Typically, the `Setup` function
 will also be overridden to initialize some members. The user configured fields
 can be accessed through `layer_conf_` as shown in the above paragraphs.

 #### New Layer subclass registration

 The newly defined layer should be registered in [main.cc](http://singa.incubator.apache.org/docs/programming-guide) by adding

     driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); // "kFooLayer" should be matched to layer configurations in job.conf.

 After that, the [NeuralNet](neural-net.html) can create instances of the new Layer subclass.