| MXNet R Tutorial on NDArray and Symbol |
| ====================================== |
| |
| This vignette gives a general overview of MXNet"s R package. MXNet contains a |
| mixed flavor of elements to bake flexible and efficient |
| applications. There are two major concepts introduced in this tutorial. |
| |
| * [NDArray](#ndarray-numpy-style-tensor-computations-on-cpus-and-gpus) |
| offers matrix and tensor computations on both CPU and GPU, with automatic |
| parallelization |
| * [Symbol](#symbol-and-automatic-differentiation) makes defining a neural |
| network extremely easy, and provides automatic differentiation. |
| |
| ## NDArray: Vectorized tensor computations on CPUs and GPUs |
| |
| `NDArray` is the basic vectorized operation unit in MXNet for matrix and tensor computations. |
| Users can perform usual calculations as on R"s array, but with two additional features: |
| |
| 1. **multiple devices**: all operations can be run on various devices including |
| CPU and GPU |
| 2. **automatic parallelization**: all operations are automatically executed in |
| parallel with each other |
| |
| ### Create and Initialization |
| |
| Let"s create `NDArray` on either GPU or CPU |
| |
| ```{r} |
| require(mxnet) |
| a <- mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu |
| b <- mx.nd.zeros(c(2, 3), mx.cpu()) # create a 2-by-3 matrix on cpu |
| # c <- mx.nd.zeros(c(2, 3), mx.gpu(0)) # create a 2-by-3 matrix on gpu 0, if you have CUA enabled. |
| ``` |
| |
| As a side note, normally for CUDA enabled devices, the device id of GPU starts from 0. |
| So that is why we passed in 0 to GPU id. We can also initialize an `NDArray` object in various ways: |
| |
| ```{r} |
| a <- mx.nd.ones(c(4, 4)) |
| b <- mx.rnorm(c(4, 5)) |
| c <- mx.nd.array(1:5) |
| ``` |
| |
| To check the numbers in an `NDArray`, we can simply run |
| |
| ```{r} |
| a <- mx.nd.ones(c(2, 3)) |
| b <- as.array(a) |
| class(b) |
| b |
| ``` |
| |
| ### Basic Operations |
| |
| #### Elemental-wise operations |
| |
| You can perform elemental-wise operations on `NDArray` objects: |
| |
| ```{r} |
| a <- mx.nd.ones(c(2, 4)) * 2 |
| b <- mx.nd.ones(c(2, 4)) / 8 |
| as.array(a) |
| as.array(b) |
| c <- a + b |
| as.array(c) |
| d <- c / a - 5 |
| as.array(d) |
| ``` |
| |
| If two `NDArray`s sit on different devices, we need to explicitly move them |
| into the same one. For instance: |
| |
| ```{r, eval=FALSE} |
| a <- mx.nd.ones(c(2, 3)) * 2 |
| b <- mx.nd.ones(c(2, 3), mx.gpu()) / 8 |
| c <- mx.nd.copyto(a, mx.gpu()) * b |
| as.array(c) |
| ``` |
| |
| #### Load and Save |
| |
| You can save a list of `NDArray` object to your disk with `mx.nd.save`: |
| |
| ```{r} |
| a <- mx.nd.ones(c(2, 3)) |
| mx.nd.save(a, "temp.ndarray") |
| ``` |
| |
| You can also load it back easily: |
| |
| ```{r} |
| a <- mx.nd.load("temp.ndarray") |
| as.array(a[[1]]) |
| ``` |
| |
| In case you want to save data to the distributed file system such as S3 and HDFS, |
| we can directly save to and load from them. For example: |
| |
| ```{r,eval=FALSE} |
| mx.nd.save(list(a), "s3://mybucket/mydata.bin") |
| mx.nd.save(list(a), "hdfs///users/myname/mydata.bin") |
| ``` |
| |
| ### Automatic Parallelization |
| |
| `NDArray` can automatically execute operations in parallel. It is desirable when we |
| use multiple resources such as CPU, GPU cards, and CPU-to-GPU memory bandwidth. |
| |
| For example, if we write `a <- a + 1` followed by `b <- b + 1`, and `a` is on CPU while |
| `b` is on GPU, then want to execute them in parallel to improve the |
| efficiency. Furthermore, data copy between CPU and GPU are also expensive, we |
| hope to run it parallel with other computations as well. |
| |
| However, finding the codes can be executed in parallel by eye is hard. In the |
| following example, `a <- a + 1` and `c <- c * 3` can be executed in parallel, but `a <- a + 1` and |
| `b <- b * 3` should be in sequential. |
| |
| ```{r} |
| a <- mx.nd.ones(c(2,3)) |
| b <- a |
| c <- mx.nd.copyto(a, mx.cpu()) |
| a <- a + 1 |
| b <- b * 3 |
| c <- c * 3 |
| ``` |
| |
| Luckily, MXNet can automatically resolve the dependencies and |
| execute operations in parallel with correctness guaranteed. In other words, we |
| can write program as by assuming there is only a single thread, while MXNet will |
| automatically dispatch it into multi-devices, such as multi GPU cards or multi |
| machines. |
| |
| It is achieved by lazy evaluation. Any operation we write down is issued into a |
| internal engine, and then returned. For example, if we run `a <- a + 1`, it |
| returns immediately after pushing the plus operator to the engine. This |
| asynchronous allows us to push more operators to the engine, so it can determine |
| the read and write dependency and find a best way to execute them in |
| parallel. |
| |
| The actual computations are finished if we want to copy the results into some |
| other place, such as `as.array(a)` or `mx.nd.save(a, "temp.dat")`. Therefore, if we |
| want to write highly parallelized codes, we only need to postpone when we need |
| the results. |
| |
| ## Symbol and Automatic Differentiation |
| |
| WIth the computational unit `NDArray`, we need a way to construct neural networks. MXNet provides a symbolic interface named Symbol to do so. The symbol combines both flexibility and efficiency. |
| |
| ### Basic Composition of Symbols |
| |
| The following codes create a two layer perceptrons network: |
| |
| ```{r} |
| require(mxnet) |
| net <- mx.symbol.Variable("data") |
| net <- mx.symbol.FullyConnected(data=net, name="fc1", num_hidden=128) |
| net <- mx.symbol.Activation(data=net, name="relu1", act_type="relu") |
| net <- mx.symbol.FullyConnected(data=net, name="fc2", num_hidden=64) |
| net <- mx.symbol.SoftmaxOutput(data=net, name="out") |
| class(net) |
| ``` |
| |
| Each symbol takes a (unique) string name. *Variable* often defines the inputs, |
| or free variables. Other symbols take a symbol as the input (*data*), |
| and may accept other hyper-parameters such as the number of hidden neurons (*num_hidden*) |
| or the activation type (*act_type*). |
| |
| The symbol can be simply viewed as a function taking several arguments, whose |
| names are automatically generated and can be get by |
| |
| ```{r} |
| arguments(net) |
| ``` |
| |
| As can be seen, these arguments are the parameters need by each symbol: |
| |
| - *data* : input data needed by the variable *data* |
| - *fc1_weight* and *fc1_bias* : the weight and bias for the first fully connected layer *fc1* |
| - *fc2_weight* and *fc2_bias* : the weight and bias for the second fully connected layer *fc2* |
| - *out_label* : the label needed by the loss |
| |
| We can also specify the automatic generated names explicitly: |
| |
| ```{r} |
| data <- mx.symbol.Variable("data") |
| w <- mx.symbol.Variable("myweight") |
| net <- mx.symbol.FullyConnected(data=data, weight=w, name="fc1", num_hidden=128) |
| arguments(net) |
| ``` |
| |
| ### More Complicated Composition |
| |
| MXNet provides well-optimized symbols for |
| commonly used layers in deep learning. We can also easily define new operators |
| in python. The following example first performs an elementwise add between two |
| symbols, then feed them to the fully connected operator. |
| |
| ```{r} |
| lhs <- mx.symbol.Variable("data1") |
| rhs <- mx.symbol.Variable("data2") |
| net <- mx.symbol.FullyConnected(data=lhs + rhs, name="fc1", num_hidden=128) |
| arguments(net) |
| ``` |
| |
| We can also construct symbol in a more flexible way rather than the single |
| forward composition we addressed before. |
| |
| ```{r} |
| net <- mx.symbol.Variable("data") |
| net <- mx.symbol.FullyConnected(data=net, name="fc1", num_hidden=128) |
| net2 <- mx.symbol.Variable("data2") |
| net2 <- mx.symbol.FullyConnected(data=net2, name="net2", num_hidden=128) |
| composed.net <- mx.apply(net, data=net2, name="compose") |
| arguments(composed.net) |
| ``` |
| |
| In the above example, *net* is used a function to apply to an existing symbol |
| *net*, the resulting *composed.net* will replace the original argument *data* by |
| *net2* instead. |
| |
| ### Training a Neural Net. |
| |
| The [model API](../../R-package/R/model.R) is a thin wrapper around the symbolic executors to support neural net training. |
| |
| You are also highly encouraged to read [Symbolic Configuration and Execution in Pictures for python package](../python/symbol_in_pictures.md), |
| which provides a detailed explanation of concepts in pictures. |
| |
| ### How Efficient is Symbolic API |
| |
| In short, they are designed to be very efficient in both memory and runtime. |
| |
| The major reason for us to introduce Symbolic API, is to bring the efficient C++ |
| operations in powerful toolkits such as cxxnet and caffe together with the |
| flexible dynamic NArray operations. All the memory and computation resources are |
| allocated statically during Bind, to maximize the runtime performance and memory |
| utilization. |
| |
| The coarse grained operators are equivalent to cxxnet layers, which are |
| extremely efficient. We also provide fine grained operators for more flexible |
| composition. Because we are also doing more inplace memory allocation, mxnet can |
| be ***more memory efficient*** than cxxnet, and gets to same runtime, with |
| greater flexiblity. |