content/v0.3.0/en/_sources/docs/param.txt - singa-site - Git at Google

 # Parameters

 ---

 A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix
 or a bias vector. *Basic user guide* describes how to configure for a `Param`
 object, and *Advanced user guide* provides details on implementing users'
 parameter initialization methods.

 ## Basic user guide

 The configuration of a Param object is inside a layer configuration, as the
 `Param` are associated with layers. An example configuration is like

     layer {
       ...
       param {
         name : "p1"
         init {
           type : kConstant
           value: 1
         }
       }
     }

 The [SGD algorithm](overview.html) starts with initializing all
 parameters according to user specified initialization method (the `init` field).
 For the above example,
 all parameters in `Param` "p1" will be initialized to constant value 1. The
 configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html):

   * name, an identifier string. It is an optional field. If not provided, SINGA
   will generate one based on layer name and its order in the layer.
   * init, field for setting initialization methods.
   * share_from, name of another `Param` object, from which this `Param` will share
   configurations and values.
   * lr_scale, float value to be multiplied with the learning rate when
   [updating the parameters](updater.html)
   * wd_scale, float value to be multiplied with the weight decay when
   [updating the parameters](updater.html)

 There are some other fields that are specific to initialization methods.

 ### Initialization methods

 Users can set the `type` of `init` use the following built-in initialization
 methods,

   * `kConst`, set all parameters of the Param object to a constant value

         type: kConst
         value: float  # default is 1

   * `kGaussian`, initialize the parameters following a Gaussian distribution.

         type: kGaussian
         mean: float # mean of the Gaussian distribution, default is 0
         std: float # standard variance, default is 1
         value: float # default 0

   * `kUniform`, initialize the parameters following an uniform distribution

         type: kUniform
         low: float # lower boundary, default is -1
         high: float # upper boundary, default is 1
         value: float # default 0

   * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e.,
   matrix) using `kGaussian` and then
   multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of
   columns of the matrix.

   * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the
   distribution is an uniform distribution.

   * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then
   multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in +
   fan_out` sums up the number of columns and rows of the matrix.

 For all above initialization methods except `kConst`, if their `value` is not
 1, every parameter will be multiplied with `value`. Users can also implement
 their own initialization method following the *Advanced user guide*.


 ## Advanced user guide

 This sections describes the details on implementing new parameter
 initialization methods.

 ### Base ParamGenerator
 All initialization methods are implemented as
 subclasses of the base `ParamGenerator` class.

     class ParamGenerator {
      public:
       virtual void Init(const ParamGenProto&);
       void Fill(Param*);

      protected:
       ParamGenProto proto_;
     };

 Configurations of the initialization method is in `ParamGenProto`. The `Fill`
 function fills the `Param` object (passed in as an argument).

 ### New ParamGenerator subclass

 Similar to implement a new Layer subclass, users can define a configuration
 protocol message,

     # in user.proto
     message FooParamProto {
       optional int32 x = 1;
     }
     extend ParamGenProto {
       optional FooParamProto fooparam_conf =101;
     }

 The configuration of `Param` would be

     param {
       ...
       init {
         user_type: 'FooParam" # must use user_type for user defined methods
         [fooparam_conf] { # must use brackets for configuring user defined messages
           x: 10
         }
       }
     }

 The subclass could be declared as,

     class FooParamGen : public ParamGenerator {
      public:
       void Fill(Param*) override;
     };

 Users can access the configuration fields in `Fill` by

     int x = proto_.GetExtension(fooparam_conf).x();

 To use the new initialization method, users need to register it in the
 [main function](programming-guide.html).

     driver.RegisterParamGenerator<FooParamGen>("FooParam")  # must be consistent with the user_type in configuration

 {% comment %}
 ### Base Param class

 ### Members

     int local_version_;
     int slice_start_;
     vector<int> slice_offset_, slice_size_;

     shared_ptr<Blob<float>> data_;
     Blob<float> grad_;
     ParamProto proto_;

 Each Param object has a local version and a global version (inside the data
 Blob). These two versions are used for synchronization. If multiple Param
 objects share the same values, they would have the same `data_` field.
 Consequently, their global version is the same. The global version is updated
 by [the stub thread](communication.html). The local version is
 updated in `Worker::Update` function which assigns the global version to the
 local version. The `Worker::Collect` function is blocked until the global
 version is larger than the local version, i.e., when `data_` is updated. In
 this way, we synchronize workers sharing parameters.

 In Deep learning models, some Param objects are 100 times larger than others.
 To ensure the load-balance among servers, SINGA slices large Param objects. The
 slicing information is recorded by `slice_*`. Each slice is assigned a unique
 ID starting from 0. `slice_start_` is the ID of the first slice of this Param
 object. `slice_offset_[i]` is the offset of the i-th slice in this Param
 object. `slice_size_[i]` is the size of the i-th slice. These slice information
 is used to create messages for transferring parameter values or gradients to
 different servers.

 Each Param object has a `grad_` field for gradients. Param objects do not share
 this Blob although they may share `data_`.  Because each layer containing a
 Param object would contribute gradients. E.g., in RNN, the recurrent layers
 share parameters values, and the gradients used for updating are averaged from all recurrent
 these recurrent layers. In SINGA, the stub thread will aggregate local
 gradients for the same Param object. The server will do a global aggregation
 of gradients for the same Param object.

 The `proto_` field has some meta information, e.g., name and ID. It also has a
 field called `owner` which is the ID of the Param object that shares parameter
 values with others.

 ### Functions
 The base Param class implements two sets of functions,

     virtual void InitValues(int version = 0);  // initialize values according to `init_method`
     void ShareFrom(const Param& other);  // share `data_` from `other` Param
     --------------
     virtual Msg* GenGetMsg(bool copy, int slice_idx);
     virtual Msg* GenPutMsg(bool copy, int slice_idx);
     ... // other message related functions.

 Besides the functions for processing the parameter values, there is a set of
 functions for generating and parsing messages. These messages are for
 transferring parameter values or gradients between workers and servers. Each
 message corresponds to one Param slice. If `copy` is false, it means the
 receiver of this message is in the same process as the sender. In such case,
 only pointers to the memory of parameter value (or gradient) are wrapped in
 the message; otherwise, the parameter values (or gradients) should be copied
 into the message.


 ## Implementing Param subclass
 Users can extend the base Param class to implement their own parameter
 initialization methods and message transferring protocols. Similar to
 implementing a new Layer subclasses, users can create google protocol buffer
 messages for configuring the Param subclass. The subclass, denoted as FooParam
 should be registered in main.cc,

     dirver.RegisterParam<FooParam>(kFooParam);  // kFooParam should be different to 0, which is for the base Param type


   * type, an integer representing the `Param` type. Currently SINGA provides one
     `Param` implementation with type 0 (the default type). If users want
     to use their own Param implementation, they should extend the base Param
     class and configure this field with `kUserParam`

 {% endcomment %}
	# Parameters

	---

	A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix
	or a bias vector. Basic user guide describes how to configure for a `Param`
	object, and Advanced user guide provides details on implementing users'
	parameter initialization methods.

	## Basic user guide

	The configuration of a Param object is inside a layer configuration, as the
	`Param` are associated with layers. An example configuration is like

	layer {
	...
	param {
	name : "p1"
	init {
	type : kConstant
	value: 1
	}
	}
	}

	The [SGD algorithm](overview.html) starts with initializing all
	parameters according to user specified initialization method (the `init` field).
	For the above example,
	all parameters in `Param` "p1" will be initialized to constant value 1. The
	configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html):

	* name, an identifier string. It is an optional field. If not provided, SINGA
	will generate one based on layer name and its order in the layer.
	* init, field for setting initialization methods.
	* share_from, name of another `Param` object, from which this `Param` will share
	configurations and values.
	* lr_scale, float value to be multiplied with the learning rate when
	[updating the parameters](updater.html)
	* wd_scale, float value to be multiplied with the weight decay when
	[updating the parameters](updater.html)

	There are some other fields that are specific to initialization methods.

	### Initialization methods

	Users can set the `type` of `init` use the following built-in initialization
	methods,

	* `kConst`, set all parameters of the Param object to a constant value

	type: kConst
	value: float # default is 1

	* `kGaussian`, initialize the parameters following a Gaussian distribution.

	type: kGaussian
	mean: float # mean of the Gaussian distribution, default is 0
	std: float # standard variance, default is 1
	value: float # default 0

	* `kUniform`, initialize the parameters following an uniform distribution

	type: kUniform
	low: float # lower boundary, default is -1
	high: float # upper boundary, default is 1
	value: float # default 0

	* `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e.,
	matrix) using `kGaussian` and then
	multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of
	columns of the matrix.

	* `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the
	distribution is an uniform distribution.

	* `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then
	multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in +
	fan_out` sums up the number of columns and rows of the matrix.

	For all above initialization methods except `kConst`, if their `value` is not
	1, every parameter will be multiplied with `value`. Users can also implement
	their own initialization method following the Advanced user guide.


	## Advanced user guide

	This sections describes the details on implementing new parameter
	initialization methods.

	### Base ParamGenerator
	All initialization methods are implemented as
	subclasses of the base `ParamGenerator` class.

	class ParamGenerator {
	public:
	virtual void Init(const ParamGenProto&);
	void Fill(Param*);

	protected:
	ParamGenProto proto_;
	};

	Configurations of the initialization method is in `ParamGenProto`. The `Fill`
	function fills the `Param` object (passed in as an argument).

	### New ParamGenerator subclass

	Similar to implement a new Layer subclass, users can define a configuration
	protocol message,

	# in user.proto
	message FooParamProto {
	optional int32 x = 1;
	}
	extend ParamGenProto {
	optional FooParamProto fooparam_conf =101;
	}

	The configuration of `Param` would be

	param {
	...
	init {
	user_type: 'FooParam" # must use user_type for user defined methods
	[fooparam_conf] { # must use brackets for configuring user defined messages
	x: 10
	}
	}
	}

	The subclass could be declared as,

	class FooParamGen : public ParamGenerator {
	public:
	void Fill(Param*) override;
	};

	Users can access the configuration fields in `Fill` by

	int x = proto_.GetExtension(fooparam_conf).x();

	To use the new initialization method, users need to register it in the
	[main function](programming-guide.html).

	driver.RegisterParamGenerator<FooParamGen>("FooParam") # must be consistent with the user_type in configuration

	{% comment %}
	### Base Param class

	### Members

	int local_version_;
	int slice_start_;
	vector<int> slice_offset_, slice_size_;

	shared_ptr<Blob<float>> data_;
	Blob<float> grad_;
	ParamProto proto_;

	Each Param object has a local version and a global version (inside the data
	Blob). These two versions are used for synchronization. If multiple Param
	objects share the same values, they would have the same `data_` field.
	Consequently, their global version is the same. The global version is updated
	by [the stub thread](communication.html). The local version is
	updated in `Worker::Update` function which assigns the global version to the
	local version. The `Worker::Collect` function is blocked until the global
	version is larger than the local version, i.e., when `data_` is updated. In
	this way, we synchronize workers sharing parameters.

	In Deep learning models, some Param objects are 100 times larger than others.
	To ensure the load-balance among servers, SINGA slices large Param objects. The
	slicing information is recorded by `slice_*`. Each slice is assigned a unique
	ID starting from 0. `slice_start_` is the ID of the first slice of this Param
	object. `slice_offset_[i]` is the offset of the i-th slice in this Param
	object. `slice_size_[i]` is the size of the i-th slice. These slice information
	is used to create messages for transferring parameter values or gradients to
	different servers.

	Each Param object has a `grad_` field for gradients. Param objects do not share
	this Blob although they may share `data_`. Because each layer containing a
	Param object would contribute gradients. E.g., in RNN, the recurrent layers
	share parameters values, and the gradients used for updating are averaged from all recurrent
	these recurrent layers. In SINGA, the stub thread will aggregate local
	gradients for the same Param object. The server will do a global aggregation
	of gradients for the same Param object.

	The `proto_` field has some meta information, e.g., name and ID. It also has a
	field called `owner` which is the ID of the Param object that shares parameter
	values with others.

	### Functions
	The base Param class implements two sets of functions,

	virtual void InitValues(int version = 0); // initialize values according to `init_method`
	void ShareFrom(const Param& other); // share `data_` from `other` Param
	--------------
	virtual Msg* GenGetMsg(bool copy, int slice_idx);
	virtual Msg* GenPutMsg(bool copy, int slice_idx);
	... // other message related functions.

	Besides the functions for processing the parameter values, there is a set of
	functions for generating and parsing messages. These messages are for
	transferring parameter values or gradients between workers and servers. Each
	message corresponds to one Param slice. If `copy` is false, it means the
	receiver of this message is in the same process as the sender. In such case,
	only pointers to the memory of parameter value (or gradient) are wrapped in
	the message; otherwise, the parameter values (or gradients) should be copied
	into the message.


	## Implementing Param subclass
	Users can extend the base Param class to implement their own parameter
	initialization methods and message transferring protocols. Similar to
	implementing a new Layer subclasses, users can create google protocol buffer
	messages for configuring the Param subclass. The subclass, denoted as FooParam
	should be registered in main.cc,

	dirver.RegisterParam<FooParam>(kFooParam); // kFooParam should be different to 0, which is for the base Param type


	* type, an integer representing the `Param` type. Currently SINGA provides one
	`Param` implementation with type 0 (the default type). If users want
	to use their own Param implementation, they should extend the base Param
	class and configure this field with `kUserParam`

	{% endcomment %}