blob: 247299abecc418828e0435582cb938d85dd0feb2 [file] [log] [blame]
# Code Structure
---
<!--
### Worker Side
#### Main Classes
<img src="../_static/images/code-structure/main.jpg" style="width: 550px"/>
* **Worker**: start the solver to conduct training or resume from previous training snapshots.
* **Solver**: construct the neural network and run training algorithms over it. Validation and testing is also done by the solver along the training.
* **TableDelegate**: delegate for the parameter table physically stored in parameter servers.
it runs a thread to communicate with table servers for parameter transferring.
* **Net**: the neural network consists of multiple layers constructed from input configuration file.
* **Layer**: the core abstraction, read data (neurons) from connecting layers, and compute the data
of itself according to layer specific ComputeFeature functions. Data from the bottom layer is forwarded
layer by layer to the top.
#### Data types
<img src="../_static/images/code-structure/layer.jpg" style="width: 700px"/>
* **ComputeFeature**: read data (neurons) from in-coming layers, and compute the data
of itself according to layer type. This function can be overrided to implement different
types layers.
* **ComputeGradient**: read gradients (and data) from in-coming layers and compute
gradients of parameters and data w.r.t the learning objective (loss).
We adpat the implementation for **PoolingLayer**, **Im2colLayer** and **LRNLayer** from [Caffe](http://caffe.berkeleyvision.org/).
<img src="../_static/images/code-structure/darray.jpg" style="width: 400px"/>
* **DArray**: provide the abstraction of distributed array on multiple nodes,
supporting array/matrix operations and element-wise operations. Users can use it as a local structure.
* **LArray**: the local part for the DArray. Each LArray is treated as an
independent array, and support all array-related operations.
* **MemSpace**: manage the memory used by DArray. Distributed memory are allocated
and managed by armci. Multiple DArray can share a same MemSpace, the memory
will be released when no DArray uses it anymore.
* **Partition**: maintain both global shape and local partition information.
used when two DArray are going to interact.
* **Shape**: basic class for representing the scope of a DArray/LArray
* **Range**: basic class for representing the scope of a Partition
### Parameter Server
#### Main classes
<img src="../_static/images/code-structure/uml.jpg" style="width: 750px"/>
* **NetworkService**: provide access to the network (sending and receiving messages). It maintains a queue for received messages, implemented by NetworkQueue.
* **RequestDispatcher**: pick up next message (request) from the queue, and invoked a method (callback) to process them.
* **TableServer**: provide access to the data table (parameters). Register callbacks for different types of requests to RequestDispatcher.
* **GlobalTable**: implement the table. Data is partitioned into multiple Shard objects per table. User-defined consistency model supported by extending TableServerHandler for each table.
#### Data types
<img src="../_static/images/code-structure/type.jpg" style="width: 400px"/>
Table related messages are either of type **RequestBase** which contains different types of request, or of type **TableData** containing a key-value tuple.
#### Control flow and thread model
<img src="../_static/images/code-structure/threads.jpg" alt="uml" style="width: 1000px"/>
The figure above shows how a GET request sent from a worker is processed by the
table server. The control flow for other types of requests is similar. At
the server side, there are at least 3 threads running at any time: two by
NetworkService for sending and receiving message, and at least one by the
RequestDispatcher for dispatching requests.
-->