|  | # NDArray: Vectorized Tensor Computations on CPUs and GPUs | 
|  |  | 
|  | `NDArray` is the basic vectorized operation unit in MXNet for matrix and tensor computations. | 
|  | Users can perform usual calculations as on an R"s array, but with two additional features: | 
|  |  | 
|  |  | 
|  |  | 
|  | - Multiple devices: All operations can be run on various devices including | 
|  | CPUs and GPUs. | 
|  |  | 
|  |  | 
|  | - Automatic parallelization: All operations are automatically executed in | 
|  | parallel with each other. | 
|  |  | 
|  | ## Create and Initialize | 
|  |  | 
|  | Let"s create `NDArray` on either a GPU or a CPU: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | require(mxnet) | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ## Loading required package: mxnet | 
|  | ## Loading required package: methods | 
|  | ``` | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu | 
|  | b <- mx.nd.zeros(c(2, 3), mx.cpu()) # create a 2-by-3 matrix on cpu | 
|  | # c <- mx.nd.zeros(c(2, 3), mx.gpu(0)) # create a 2-by-3 matrix on gpu 0, if you have CUDA enabled. | 
|  | ``` | 
|  |  | 
|  | Typically for CUDA-enabled devices, the device id of a GPU starts from 0. | 
|  | That's why we passed in 0 to the GPU id. | 
|  |  | 
|  | We can initialize an `NDArray` object in various ways: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.ones(c(4, 4)) | 
|  | b <- mx.rnorm(c(4, 5)) | 
|  | c <- mx.nd.array(1:5) | 
|  | ``` | 
|  |  | 
|  | To check the numbers in an `NDArray`, we can simply run: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.ones(c(2, 3)) | 
|  | b <- as.array(a) | 
|  | class(b) | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ## [1] "matrix" | 
|  | ``` | 
|  |  | 
|  | ```r | 
|  | b | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ##      [,1] [,2] [,3] | 
|  | ## [1,]    1    1    1 | 
|  | ## [2,]    1    1    1 | 
|  | ``` | 
|  |  | 
|  | ## Performing Basic Operations | 
|  |  | 
|  | ### Elemental-wise Operations | 
|  |  | 
|  | You can perform elemental-wise operations on `NDArray` objects, as follows: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.ones(c(2, 4)) * 2 | 
|  | b <- mx.nd.ones(c(2, 4)) / 8 | 
|  | as.array(a) | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ##      [,1] [,2] [,3] [,4] | 
|  | ## [1,]    2    2    2    2 | 
|  | ## [2,]    2    2    2    2 | 
|  | ``` | 
|  |  | 
|  | ```r | 
|  | as.array(b) | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ##       [,1]  [,2]  [,3]  [,4] | 
|  | ## [1,] 0.125 0.125 0.125 0.125 | 
|  | ## [2,] 0.125 0.125 0.125 0.125 | 
|  | ``` | 
|  |  | 
|  | ```r | 
|  | c <- a + b | 
|  | as.array(c) | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ##       [,1]  [,2]  [,3]  [,4] | 
|  | ## [1,] 2.125 2.125 2.125 2.125 | 
|  | ## [2,] 2.125 2.125 2.125 2.125 | 
|  | ``` | 
|  |  | 
|  | ```r | 
|  | d <- c / a - 5 | 
|  | as.array(d) | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ##         [,1]    [,2]    [,3]    [,4] | 
|  | ## [1,] -3.9375 -3.9375 -3.9375 -3.9375 | 
|  | ## [2,] -3.9375 -3.9375 -3.9375 -3.9375 | 
|  | ``` | 
|  |  | 
|  | If two `NDArray`s are located on different devices, we need to explicitly move them to the same one. For instance: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.ones(c(2, 3)) * 2 | 
|  | b <- mx.nd.ones(c(2, 3), mx.gpu()) / 8 | 
|  | c <- mx.nd.copyto(a, mx.gpu()) * b | 
|  | as.array(c) | 
|  | ``` | 
|  |  | 
|  | ### Loading and Saving | 
|  |  | 
|  | You can save a list of `NDArray` object to your disk with `mx.nd.save`: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.ones(c(2, 3)) | 
|  | mx.nd.save(list(a), "temp.ndarray") | 
|  | ``` | 
|  |  | 
|  | You can load it back easily: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.load("temp.ndarray") | 
|  | as.array(a[[1]]) | 
|  | ``` | 
|  |  | 
|  | ``` | 
|  | ##      [,1] [,2] [,3] | 
|  | ## [1,]    1    1    1 | 
|  | ## [2,]    1    1    1 | 
|  | ``` | 
|  |  | 
|  | We can directly save data to and load it from a distributed file system, such as Amazon S3 and HDFS: | 
|  |  | 
|  |  | 
|  | ```r | 
|  | mx.nd.save(list(a), "s3://mybucket/mydata.bin") | 
|  | mx.nd.save(list(a), "hdfs///users/myname/mydata.bin") | 
|  | ``` | 
|  |  | 
|  | ## Automatic Parallelization | 
|  |  | 
|  | `NDArray` can automatically execute operations in parallel. Automatic parallelization is useful when | 
|  | using multiple resources, such as CPU cards, GPU cards, and CPU-to-GPU memory bandwidth. | 
|  |  | 
|  | For example, if we write `a <- a + 1` followed by `b <- b + 1`, and `a` is on a CPU and | 
|  | `b` is on a GPU, executing them in parallel improves | 
|  | efficiency. Furthermore, because copying data between CPUs and GPUs are also expensive, running in parallel with other computations further increases efficiency. | 
|  |  | 
|  | It's hard to find the code that can be executed in parallel by eye. In the | 
|  | following example, `a <- a + 1` and `c <- c * 3` can be executed in parallel, but `a <- a + 1` and | 
|  | `b <- b * 3` should be in sequential. | 
|  |  | 
|  |  | 
|  | ```r | 
|  | a <- mx.nd.ones(c(2,3)) | 
|  | b <- a | 
|  | c <- mx.nd.copyto(a, mx.cpu()) | 
|  | a <- a + 1 | 
|  | b <- b * 3 | 
|  | c <- c * 3 | 
|  | ``` | 
|  |  | 
|  | Luckily, MXNet can automatically resolve the dependencies and | 
|  | execute operations in parallel accurately. This allows us to write our program assuming there is only a single thread. MXNet will | 
|  | automatically dispatch the program to multiple devices. | 
|  |  | 
|  | MXNet achieves this with lazy evaluation. Each operation is issued to an | 
|  | internal engine, and then returned. For example, if we run `a <- a + 1`, it | 
|  | returns immediately after pushing the plus operator to the engine. This | 
|  | asynchronous processing allows us to push more operators to the engine. It determines | 
|  | the read and write dependencies and the best way to execute them in | 
|  | parallel. | 
|  |  | 
|  | The actual computations are finished, allowing us to copy the results someplace else, such as `as.array(a)` or `mx.nd.save(a, "temp.dat")`. To write highly parallelized codes, we only need to postpone when we need | 
|  | the results. | 
|  |  | 
|  | ## Next Steps | 
|  | * [Symbol](http://mxnet.io/tutorials/r/symbol.html) | 
|  | * [Write and use callback functions](http://mxnet.io/tutorials/r/CallbackFunctionTutorial.html) | 
|  | * [Neural Networks with MXNet in Five Minutes](http://mxnet.io/tutorials/r/fiveMinutesNeuralNetwork.html) | 
|  | * [Classify Real-World Images with Pre-trained Model](http://mxnet.io/tutorials/r/classifyRealImageWithPretrainedModel.html) | 
|  | * [Handwritten Digits Classification Competition](http://mxnet.io/tutorials/r/mnistCompetition.html) | 
|  | * [Character Language Model using RNN](http://mxnet.io/tutorials/r/charRnnModel.html) |