blob: 20c20c088f87f290e938f94906faf6d2ff3ed119 [file] [log] [blame] [view]
# NDArray: NumPy-style Tensor Computations on CPUs and GPUs
In MXNet, `NDArray` is the basic operational unit for matrix and tensor
computations. It is similar to `numpy.ndarray`, but it has two additional
features:
- Multiple device support: All operations can be run on various devices, including CPU and GPU cards.
- Automatic parallelization: All operations are automatically executed in parallel with each other.
## Creation and Initialization
We can create an `NDArray` on either a CPU or a GPU:
```python
>>> import mxnet as mx
>>> a = mx.nd.empty((2, 3)) # create a 2-by-3 matrix on cpu
>>> b = mx.nd.empty((2, 3), mx.gpu()) # create a 2-by-3 matrix on gpu 0
>>> c = mx.nd.empty((2, 3), mx.gpu(2)) # create a 2-by-3 matrix on gpu 2
>>> c.shape # get shape
(2L, 3L)
>>> c.context # get device info
gpu(2)
```
They can be initialized in various ways:
```python
>>> a = mx.nd.zeros((2, 3)) # create a 2-by-3 matrix filled with 0
>>> b = mx.nd.ones((2, 3)) # create a 2-by-3 matrix filled with 1
>>> b[:] = 2 # set all elements of b to 2
```
We can copy the value from one `NDArray` to another, even if they are located on different devices:
```python
>>> a = mx.nd.ones((2, 3))
>>> b = mx.nd.zeros((2, 3), mx.gpu())
>>> a.copyto(b) # copy data from cpu to gpu
```
We can also convert `NDArray` to `numpy.ndarray`:
```python
>>> a = mx.nd.ones((2, 3))
>>> b = a.asnumpy()
>>> type(b)
<type 'numpy.ndarray'>
>>> print b
[[ 1. 1. 1.]
[ 1. 1. 1.]]
```
and vice versa:
```python
>>> import numpy as np
>>> a = mx.nd.empty((2, 3))
>>> a[:] = np.random.uniform(-0.1, 0.1, a.shape)
>>> print a.asnumpy()
[[-0.06821112 -0.03704893 0.06688045]
[ 0.09947646 -0.07700162 0.07681718]]
```
## Basic Element-wise Operations
By default, `NDArray` performs element-wise operations:
```python
>>> a = mx.nd.ones((2, 3)) * 2
>>> b = mx.nd.ones((2, 3)) * 4
>>> print b.asnumpy()
[[ 4. 4. 4.]
[ 4. 4. 4.]]
>>> c = a + b
>>> print c.asnumpy()
[[ 6. 6. 6.]
[ 6. 6. 6.]]
>>> d = a * b
>>> print d.asnumpy()
[[ 8. 8. 8.]
[ 8. 8. 8.]]
```
If two `NDArray`s are located on different devices, we need to explicitly move them onto the same device. The following example performs computations on GPU 0:
```python
>>> a = mx.nd.ones((2, 3)) * 2
>>> b = mx.nd.ones((2, 3), mx.gpu()) * 3
>>> c = a.copyto(mx.gpu()) * b
>>> print c.asnumpy()
[[ 6. 6. 6.]
[ 6. 6. 6.]]
```
## Load and Save
There are two ways to save data to (or load it from) disks easily. The first way uses
`pickle`. `NDArray` is pickle compatible, which means that you can simply pickle the
`NDArray` as you do with `numpy.ndarray`:
```python
>>> import mxnet as mx
>>> import pickle as pkl
>>> a = mx.nd.ones((2, 3)) * 2
>>> data = pkl.dumps(a)
>>> b = pkl.loads(data)
>>> print b.asnumpy()
[[ 2. 2. 2.]
[ 2. 2. 2.]]
```
The second way is to directly dump a list of `NDArray` to disk in binary format:
```python
>>> a = mx.nd.ones((2,3))*2
>>> b = mx.nd.ones((2,3))*3
>>> mx.nd.save('mydata.bin', [a, b])
>>> c = mx.nd.load('mydata.bin')
>>> print c[0].asnumpy()
[[ 2. 2. 2.]
[ 2. 2. 2.]]
>>> print c[1].asnumpy()
[[ 3. 3. 3.]
[ 3. 3. 3.]]
```
We can also dump a dict:
```python
>>> mx.nd.save('mydata.bin', {'a':a, 'b':b})
>>> c = mx.nd.load('mydata.bin')
>>> print c['a'].asnumpy()
[[ 2. 2. 2.]
[ 2. 2. 2.]]
>>> print c['b'].asnumpy()
[[ 3. 3. 3.]
[ 3. 3. 3.]]
```
In addition, if we have set up distributed file systems, such as Amazon S3 and HDFS, we
can directly save to and load from them. For example:
```python
>>> mx.nd.save('s3://mybucket/mydata.bin', [a,b])
>>> mx.nd.save('hdfs///users/myname/mydata.bin', [a,b])
```
## Automatic Parallelization
`NDArray` can automatically execute operations in parallel. This is desirable when you
use multiple resources, such as CPU and GPU cards, and CPU-to-GPU memory bandwidth.
For example, if we write `a += 1` followed by `b += 1`, and `a` is on a CPU card while
`b` is on a GPU card, then we will want to execute them in parallel to improve the
efficiency. Furthermore, data copies between CPU and GPU are expensive, so we
want to run them in parallel with other computations.
However, finding statements that can be executed in parallel by eye is hard. In the
following example, `a+=1` and `c*=3` can be executed in parallel, but `a+=1` and
`b*=3` have to be sequentially executed.
```python
a = mx.nd.ones((2,3))
b = a
c = a.copyto(mx.cpu())
a += 1
b *= 3
c *= 3
```
Luckily, MXNet can automatically resolve the dependencies and
execute operations in parallel with correctness guaranteed. In other words, we
can write a program as if it is using only a single thread, and MXNet will
automatically dispatch it to multiple devices, such as multiple GPU cards or multiple
computers.
MXNet achieves this by lazy evaluation. Any operation we write down is issued to a
internal engine, and then returned. For example, if we run `a += 1`, it
returns immediately after pushing the plus operation to the engine. This
asynchronism allows us to push more operations to the engine, so it can determine
the read and write dependency and find the best way to execute operations in
parallel.
The actual computations are finished when we copy the results someplace else, such as `print a.asnumpy()` or `mx.nd.save([a])`. Therefore, to write highly parallelized code, we only need to postpone asking for
the results.
## Next Steps
* [Symbol](symbol.md)
* [KVStore](kvstore.md)