blob: f2a0db76e16947663aa345299cee005d94219118 [file] [log] [blame]
# Hybrid Parallelism
---
## User Guide
SINGA supports different parallelism options for distributed training.
Users just need to configure it in the job configuration.
Both `NetProto` and `LayerProto` have a field `partition_dim` to control the parallelism option:
* `partition_dim=0`: neuralnet/layer is partitioned on data dimension, i.e., each worker processes a subset of data records.
* `partition_dim=1`: neuralnet/layer is partitioned on feature dimension, i.e., each worker maintains a subset of feature parameters.
`partition_dim` field in `NetProto` will be applied to all layers, unless a layer has its own `partition_dim` field set.
If we want data parallelism for the whole model, just leave `partition_dim` as default (which is 0), or configure the job.conf like:
```
neuralnet {
partition_dim: 0
layer {
name: ...
type: ...
}
...
}
```
With the hybrid parallelism, we can have layers either partitioned on data dimension or feature dimension.
For example, if we want a specific layer partitioned on feature dimension, just configure like:
```
neuralnet {
partition_dim: 0
layer {
name: "layer1_partition_on_data_dimension"
type: ...
}
layer {
name: "layer2_partition_on_feature_dimension"
type: ...
partition_dim: 1
}
...
}
```
## Developer Guide
To support hybrid parallelism, after singa read users' model and paration configuration, a set of connection layers are automatically added between layers when needed:
* `BridgeSrcLayer` & `BridgeDstLayer` are added when two connected layers are not in the same machine. They are paired and are responsible for sending data/gradient to the other side during each iteration.
* `ConcateLayer` is added when there are multiple source layers. It combines their feature blobs along a given dimension.
* `SliceLayer` is added when there are mutliple dest layers, each of which only needs a subset(slice) of this layers' feature blob.
* `SplitLayer` is added when there are multiple dest layers, each of which needs the whole feature blob.
Following is the logic used in our code to add connection layers:
```
Add Slice, Concate, Split Layers for Hybrid Partition
All cases are as follows:
src_pdim | dst_pdim | connection_type | Action
0 | 0 | OneToOne | Direct Connection
1 | 1 | OneToOne | Direct Connection
0 | 0 | OneToAll | Direct Connection
1 | 0 | OneToOne | Slice -> Concate
0 | 1 | OneToOne | Slice -> Concate
1 | 0 | OneToAll | Slice -> Concate
0 | 1 | OneToAll | Split -> Concate
1 | 1 | OneToAll | Split -> Concate
Logic:
dst_pdim = 1 && OneToAll ?
(YES) Split -> Concate
(NO) src_pdim = dst_pdim ?
(YES) Direct Connection
(NO) Slice -> Concate
```