blob: f05d243ded90364fe0fa916209dd6caa78ca9078 [file] [log] [blame]
# Neural Net Partition
---
The purposes of partitioning neural network is to distribute the partitions onto
different working units (e.g., threads or nodes, called workers in this article)
and parallelize the processing.
Another reason for partition is to handle large neural network which cannot be
hold in a single node. For instance, to train models against images with high
resolution we need large neural networks (in terms of training parameters).
Since *Layer* is the first class citizen in SIGNA, we do the partition against
layers. Specifically, we support partitions at two levels. First, users can configure
the location (i.e., worker ID) of each layer. In this way, users assign one worker
for each layer. Secondly, for one layer, we can partition its neurons or partition
the instances (e.g, images). They are called layer partition and data partition
respectively. We illustrate the two types of partitions using an simple convolutional neural network.
<img src="../_static/images/conv-mnist.png" style="width: 220px"/>
The above figure shows a convolutional neural network without any partition. It
has 8 layers in total (one rectangular represents one layer). The first layer is
DataLayer (data) which reads data from local disk files/databases (or HDFS). The second layer
is a MnistLayer which parses the records from MNIST data to get the pixels of a batch
of 8 images (each image is of size 28x28). The LabelLayer (label) parses the records to get the label
of each image in the batch. The ConvolutionalLayer (conv1) transforms the input image to the
shape of 8x27x27. The ReLULayer (relu1) conducts elementwise transformations. The PoolingLayer (pool1)
sub-samples the images. The fc1 layer is fully connected with pool1 layer. It
mulitplies each image with a weight matrix to generate a 10 dimension hidden feature which
is then normalized by a SoftmaxLossLayer to get the prediction.
<img src="../_static/images/conv-mnist-datap.png" style="width: 1000px"/>
The above figure shows the convolutional neural network after partitioning all layers
except the DataLayer and ParserLayers, into 3 partitions using data partition.
The read layers process 4 images of the batch, the black and blue layers process 2 images
respectively. Some helper layers, i.e., SliceLayer, ConcateLayer, BridgeSrcLayer,
BridgeDstLayer and SplitLayer, are added automatically by our partition algorithm.
Layers of the same color resident in the same worker. There would be data transferring
across different workers at the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer),
e.g., between s-slice-mnist-conv1 and d-slice-mnist-conv1.
<img src="../_static/images/conv-mnist-layerp.png" style="width: 1000px"/>
The above figure shows the convolutional neural network after partitioning all layers
except the DataLayer and ParserLayers, into 2 partitions using layer partition. We can
see that each layer processes all 8 images from the batch. But different partitions process
different part of one image. For instance, the layer conv1-00 process only 4 channels. The other
4 channels are processed by conv1-01 which residents in another worker.
Since the partition is done at the layer level, we can apply different partitions for
different layers to get a hybrid partition for the whole neural network. Moreover,
we can also specify the layer locations to locate different layers to different workers.