devdocs/deep-learning.md - systemds - Git at Google

 <!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to you under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 {% endcomment %}
 -->

 # Initial prototype for Deep Learning

 ## Representing tensor and images in SystemDS

 In this prototype, we represent a tensor as a matrix stored in a row-major format,
 where first dimension of tensor and matrix are exactly the same. For example, a tensor (with all zeros)
 of shape [3, 2, 4, 5] can be instantiated by following DML statement:
 ```sh
 A = matrix(0, rows=3, cols=2*4*5)
 ```
 ### Tensor functions:

 #### Element-wise arithmetic operators:
 Following operators work out-of-the box when both tensors X and Y have same shape:

 * Element-wise exponentiation: `X ^ Y`
 * Element-wise unary minus: `-X`
 * Element-wise integer division: `X %/% Y`
 * Element-wise modulus operation: `X %% Y`
 * Element-wise multiplication: `X * Y`
 * Element-wise division: `X / Y`
 * Element-wise addition: `X + Y`
 * Element-wise subtraction: `X - Y`

 SystemDS does not support implicit broadcast for above tensor operations, however one can write a DML-bodied function to do so.
 For example: to perform the above operations with broadcasting on second dimensions, one can use the below `rep(Z, n)` function:
 ``` python
 rep = function(matrix[double] Z, int C) return (matrix[double] ret) {
 	ret = Z
 	for(i in 2:C) {
 		ret = cbind(ret, Z)
 	}
 }
 ```
 Using the above `rep(Z, n)` function, we can realize the element-wise arithmetic operation with broadcasting. Here are some examples:
 * X of shape [N, C, H, W] and Y of shape [1, C, H, W]: `X + Y` (Note: SystemDS does implicit broadcasting in this case because of the way
 it represents the tensor)
 * X of shape [1, C, H, W] and Y of shape [N, C, H, W]: `X + Y` (Note: SystemDS does implicit broadcasting in this case because of the way
 it represents the tensor)
 * X of shape [N, C, H, W] and Y of shape [N, 1, H, W]: `X + rep(Y, C)`
 * X of shape [N, C, H, W] and Y of shape [1, 1, H, W]: `X + rep(Y, C)`
 * X of shape [N, 1, H, W] and Y of shape [N, C, H, W]: `rep(X, C) + Y`
 * X of shape [1, 1, H, W] and Y of shape [N, C, H, W]: `rep(X, C) + Y`

 TODO: Map the NumPy tensor calls to DML expressions.

 ## Representing images in SystemDS

 The images are assumed to be stored NCHW format, where N = batch size, C = #channels, H = height of image and W = width of image.
 Hence, the images are internally represented as a matrix with dimension (N, C * H * W).

 ## Convolution and Pooling built-in functions

 This prototype also contains initial implementation of forward/backward functions for 2D convolution and pooling:
 * `conv2d(x, w, ...)`
 * `conv2d_backward_filter(x, dout, ...)` and `conv2d_backward_data(w, dout, ...)`
 * `max_pool(x, ...)` and `max_pool_backward(x, dout, ...)`

 The required arguments for all above functions are:
 * stride=[stride_h, stride_w]
 * padding=[pad_h, pad_w]
 * input_shape=[numImages, numChannels, height_image, width_image]

 The additional required argument for conv2d/conv2d_backward_filter/conv2d_backward_data functions is:
 * filter_shape=[numFilters, numChannels, height_filter, width_filter]

 The additional required argument for max_pool/avg_pool functions is:
 * pool_size=[height_pool, width_pool]

 The results of these functions are consistent with Nvidia's CuDNN library.

 ### Border mode:
 * To perform valid padding, use `padding = (input_shape-filter_shape)*(stride-1)/ 2`. (Hint: for stride length of 1, `padding = [0, 0]` performs valid padding).

 * To perform full padding, use `padding = ((stride-1)*input_shape + (stride+1)*filter_shape - 2*stride) / 2`. (Hint: for stride length of 1, `padding = [filter_h-1, filter_w-1]` performs full padding).

 * To perform same padding, use `padding = (input_shape*(stride-1) + filter_shape - stride)/2`. (Hint: for stride length of 1, `padding = [(filter_h-1)/2, (filter_w-1)/2]` performs same padding).

 ### Explanation of backward functions for conv2d

 Consider one-channel 3 X 3 image =

 | x1 | x2 | x3 |
 |----|----|----|
 | x4 | x5 | x6 |
 | x7 | x8 | x9 |

 and one 2 X 2 filter:

 | w1 | w2 |
 |----|----|
 | w3 | w4 |

 Then, `conv2d(x, w, stride=[1, 1], padding=[0, 0], input_shape=[1, 1, 3, 3], filter_shape=[1, 1, 2, 2])` produces following tensor
 of shape `[1, 1, 2, 2]`, which is represented as `1 X 4` matrix in NCHW format:

 | `w1*x1 + w2*x2 + w3*x4 + w4*x5` | `w1*x2 + w2*x3 + w3*x5 + w4*x6` | `w1*x4 + w2*x5 + w3*x7 + w4*x8` | `w1*x5 + w2*x6 + w3*x8 + w4*x9` |
 |---------------------------------|---------------------------------|---------------------------------|---------------------------------|


 Let the error propagated from above layer is

 | y1 | y2 | y3 | y4 |
 |----|----|----|----|

 Then `conv2d_backward_filter(x, y, stride=[1, 1], padding=[0, 0], input_shape=[1, 1, 3, 3], filter_shape=[1, 1, 2, 2])` produces following
 updates for the filter:

 | `y1*x1 + y2*x2 + y3*x4 + y4*x5` | `y1*x2 + y2*x3 + y3*x5 + y4*x6` |
 |---------------------------------|---------------------------------|
 | `y1*x4 + y2*x5 + y3*x7 + y4*x8` | `y1*x5 + y2*x6 + y3*x8 + y4*x9` |

 Note: since the above update is a tensor of shape [1, 1, 2, 2], it will be represented as matrix of dimension [1, 4].

 Similarly, `conv2d_backward_data(w, y, stride=[1, 1], padding=[0, 0], input_shape=[1, 1, 3, 3], filter_shape=[1, 1, 2, 2])` produces following
 updates for the image:


 | `w1*y1`         | `w2*y1 + w1*y2`                 | `w2*y2`         |
 |-----------------|---------------------------------|-----------------|
 | `w3*y1 + w1*y3` | `w4*y1 + w3*y2 + w2*y3 + w1*y4` | `w4*y2 + w2*y4` |
 | `w3*y3`         | `w4*y3 + w3*y4`                 | `w4*y4`         |

 # Caffe2DML examples

 ## Training using Caffe models on Lenet

 The below script also demonstrates how to save the trained model.

 ```python
 # Download the MNIST dataset
 from mlxtend.data import mnist_data
 import numpy as np
 from sklearn.utils import shuffle
 X, y = mnist_data()
 X, y = shuffle(X, y)
 num_classes = np.unique(y).shape[0]
 img_shape = (1, 28, 28)

 # Split the data into training and test
 n_samples = len(X)
 X_train = X[:int(.9 * n_samples)]
 y_train = y[:int(.9 * n_samples)]
 X_test = X[int(.9 * n_samples):]
 y_test = y[int(.9 * n_samples):]

 # Download the Lenet network
 import urllib
 urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/lenet/mnist/lenet.proto', 'lenet.proto')
 urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/lenet/mnist/lenet_solver.proto', 'lenet_solver.proto')

 # Train Lenet On MNIST using scikit-learn like API
 from systemml.mllearn import Caffe2DML
 lenet = Caffe2DML(sqlCtx, solver='lenet_solver.proto').set(max_iter=500, debug=True).setStatistics(True)
 print('Lenet score: %f' % lenet.fit(X_train, y_train).score(X_test, y_test))

 # Save the trained model
 lenet.save('lenet_model')
 ```

 ## Load the trained model and retrain (i.e. finetuning)

 ```python
 # Fine-tune the existing trained model
 new_lenet = Caffe2DML(sqlCtx, solver='lenet_solver.proto', weights='lenet_model').set(max_iter=500, debug=True)
 new_lenet.fit(X_train, y_train)
 new_lenet.save('lenet_model')
 ```

 ## Perform prediction using the above trained model

 ```python
 # Use the new model for prediction
 predict_lenet = Caffe2DML(sqlCtx, solver='lenet_solver.proto', weights='lenet_model')
 print('Lenet score: %f' % predict_lenet.score(X_test, y_test))
 ```

 Similarly, you can perform prediction using the pre-trained ResNet network

 ```python
 from systemml.mllearn import Caffe2DML
 from pyspark.sql import SQLContext
 import numpy as np
 import urllib, os, scipy.ndimage
 from PIL import Image
 import systemml as sml

 # ImageNet specific parameters
 img_shape = (3, 224, 224)

 # Downloads a jpg image, resizes it to 224 and return as numpy array in N X CHW format
 url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MountainLion.jpg/312px-MountainLion.jpg'
 outFile = 'test.jpg'
 urllib.urlretrieve(url, outFile)
 input_image = sml.convertImageToNumPyArr(Image.open(outFile), img_shape=img_shape)

 # Download the ResNet network
 import urllib
 urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_network.proto', 'ResNet_50_network.proto')
 urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_solver.proto', 'ResNet_50_solver.proto')

 # Assumes that you have cloned the model_zoo repository
 # git clone https://github.com/niketanpansare/model_zoo.git
 resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', weights='~/model_zoo/caffe/vision/resnet/ilsvrc12/ResNet_50_pretrained_weights').set(input_shape=img_shape)
 resnet.predict(input_image)
 ```
	<!--
	{% comment %}
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to you under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	{% endcomment %}
	-->

	# Initial prototype for Deep Learning

	## Representing tensor and images in SystemDS

	In this prototype, we represent a tensor as a matrix stored in a row-major format,
	where first dimension of tensor and matrix are exactly the same. For example, a tensor (with all zeros)
	of shape [3, 2, 4, 5] can be instantiated by following DML statement:
	```sh
	A = matrix(0, rows=3, cols=245)
	```
	### Tensor functions:

	#### Element-wise arithmetic operators:
	Following operators work out-of-the box when both tensors X and Y have same shape:

	* Element-wise exponentiation: `X ^ Y`
	* Element-wise unary minus: `-X`
	* Element-wise integer division: `X %/% Y`
	* Element-wise modulus operation: `X %% Y`
	* Element-wise multiplication: `X * Y`
	* Element-wise division: `X / Y`
	* Element-wise addition: `X + Y`
	* Element-wise subtraction: `X - Y`

	SystemDS does not support implicit broadcast for above tensor operations, however one can write a DML-bodied function to do so.
	For example: to perform the above operations with broadcasting on second dimensions, one can use the below `rep(Z, n)` function:
	``` python
	rep = function(matrix[double] Z, int C) return (matrix[double] ret) {
	ret = Z
	for(i in 2:C) {
	ret = cbind(ret, Z)
	}
	}
	```
	Using the above `rep(Z, n)` function, we can realize the element-wise arithmetic operation with broadcasting. Here are some examples:
	* X of shape [N, C, H, W] and Y of shape [1, C, H, W]: `X + Y` (Note: SystemDS does implicit broadcasting in this case because of the way
	it represents the tensor)
	* X of shape [1, C, H, W] and Y of shape [N, C, H, W]: `X + Y` (Note: SystemDS does implicit broadcasting in this case because of the way
	it represents the tensor)
	* X of shape [N, C, H, W] and Y of shape [N, 1, H, W]: `X + rep(Y, C)`
	* X of shape [N, C, H, W] and Y of shape [1, 1, H, W]: `X + rep(Y, C)`
	* X of shape [N, 1, H, W] and Y of shape [N, C, H, W]: `rep(X, C) + Y`
	* X of shape [1, 1, H, W] and Y of shape [N, C, H, W]: `rep(X, C) + Y`

	TODO: Map the NumPy tensor calls to DML expressions.

	## Representing images in SystemDS

	The images are assumed to be stored NCHW format, where N = batch size, C = #channels, H = height of image and W = width of image.
	Hence, the images are internally represented as a matrix with dimension (N, C * H * W).

	## Convolution and Pooling built-in functions

	This prototype also contains initial implementation of forward/backward functions for 2D convolution and pooling:
	* `conv2d(x, w, ...)`
	* `conv2d_backward_filter(x, dout, ...)` and `conv2d_backward_data(w, dout, ...)`
	* `max_pool(x, ...)` and `max_pool_backward(x, dout, ...)`

	The required arguments for all above functions are:
	* stride=[stride_h, stride_w]
	* padding=[pad_h, pad_w]
	* input_shape=[numImages, numChannels, height_image, width_image]

	The additional required argument for conv2d/conv2d_backward_filter/conv2d_backward_data functions is:
	* filter_shape=[numFilters, numChannels, height_filter, width_filter]

	The additional required argument for max_pool/avg_pool functions is:
	* pool_size=[height_pool, width_pool]

	The results of these functions are consistent with Nvidia's CuDNN library.

	### Border mode:
	* To perform valid padding, use `padding = (input_shape-filter_shape)*(stride-1)/ 2`. (Hint: for stride length of 1, `padding = [0, 0]` performs valid padding).

	* To perform full padding, use `padding = ((stride-1)input_shape + (stride+1)filter_shape - 2*stride) / 2`. (Hint: for stride length of 1, `padding = [filter_h-1, filter_w-1]` performs full padding).

	* To perform same padding, use `padding = (input_shape*(stride-1) + filter_shape - stride)/2`. (Hint: for stride length of 1, `padding = [(filter_h-1)/2, (filter_w-1)/2]` performs same padding).

	### Explanation of backward functions for conv2d

	Consider one-channel 3 X 3 image =

	\| x1 \| x2 \| x3 \|
	\|----\|----\|----\|
	\| x4 \| x5 \| x6 \|
	\| x7 \| x8 \| x9 \|

	and one 2 X 2 filter:

	\| w1 \| w2 \|
	\|----\|----\|
	\| w3 \| w4 \|

	Then, `conv2d(x, w, stride=[1, 1], padding=[0, 0], input_shape=[1, 1, 3, 3], filter_shape=[1, 1, 2, 2])` produces following tensor
	of shape `[1, 1, 2, 2]`, which is represented as `1 X 4` matrix in NCHW format:

	\| `w1x1 + w2x2 + w3x4 + w4x5` \| `w1x2 + w2x3 + w3x5 + w4x6` \| `w1x4 + w2x5 + w3x7 + w4x8` \| `w1x5 + w2x6 + w3x8 + w4x9` \|
	\|---------------------------------\|---------------------------------\|---------------------------------\|---------------------------------\|


	Let the error propagated from above layer is

	\| y1 \| y2 \| y3 \| y4 \|
	\|----\|----\|----\|----\|

	Then `conv2d_backward_filter(x, y, stride=[1, 1], padding=[0, 0], input_shape=[1, 1, 3, 3], filter_shape=[1, 1, 2, 2])` produces following
	updates for the filter:

	\| `y1x1 + y2x2 + y3x4 + y4x5` \| `y1x2 + y2x3 + y3x5 + y4x6` \|
	\|---------------------------------\|---------------------------------\|
	\| `y1x4 + y2x5 + y3x7 + y4x8` \| `y1x5 + y2x6 + y3x8 + y4x9` \|

	Note: since the above update is a tensor of shape [1, 1, 2, 2], it will be represented as matrix of dimension [1, 4].

	Similarly, `conv2d_backward_data(w, y, stride=[1, 1], padding=[0, 0], input_shape=[1, 1, 3, 3], filter_shape=[1, 1, 2, 2])` produces following
	updates for the image:


	\| `w1y1` \| `w2y1 + w1y2` \| `w2y2` \|
	\|-----------------\|---------------------------------\|-----------------\|
	\| `w3y1 + w1y3` \| `w4y1 + w3y2 + w2y3 + w1y4` \| `w4y2 + w2y4` \|
	\| `w3y3` \| `w4y3 + w3y4` \| `w4y4` \|

	# Caffe2DML examples

	## Training using Caffe models on Lenet

	The below script also demonstrates how to save the trained model.

	```python
	# Download the MNIST dataset
	from mlxtend.data import mnist_data
	import numpy as np
	from sklearn.utils import shuffle
	X, y = mnist_data()
	X, y = shuffle(X, y)
	num_classes = np.unique(y).shape[0]
	img_shape = (1, 28, 28)

	# Split the data into training and test
	n_samples = len(X)
	X_train = X[:int(.9 * n_samples)]
	y_train = y[:int(.9 * n_samples)]
	X_test = X[int(.9 * n_samples):]
	y_test = y[int(.9 * n_samples):]

	# Download the Lenet network
	import urllib
	urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/lenet/mnist/lenet.proto', 'lenet.proto')
	urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/lenet/mnist/lenet_solver.proto', 'lenet_solver.proto')

	# Train Lenet On MNIST using scikit-learn like API
	from systemml.mllearn import Caffe2DML
	lenet = Caffe2DML(sqlCtx, solver='lenet_solver.proto').set(max_iter=500, debug=True).setStatistics(True)
	print('Lenet score: %f' % lenet.fit(X_train, y_train).score(X_test, y_test))

	# Save the trained model
	lenet.save('lenet_model')
	```

	## Load the trained model and retrain (i.e. finetuning)

	```python
	# Fine-tune the existing trained model
	new_lenet = Caffe2DML(sqlCtx, solver='lenet_solver.proto', weights='lenet_model').set(max_iter=500, debug=True)
	new_lenet.fit(X_train, y_train)
	new_lenet.save('lenet_model')
	```

	## Perform prediction using the above trained model

	```python
	# Use the new model for prediction
	predict_lenet = Caffe2DML(sqlCtx, solver='lenet_solver.proto', weights='lenet_model')
	print('Lenet score: %f' % predict_lenet.score(X_test, y_test))
	```

	Similarly, you can perform prediction using the pre-trained ResNet network

	```python
	from systemml.mllearn import Caffe2DML
	from pyspark.sql import SQLContext
	import numpy as np
	import urllib, os, scipy.ndimage
	from PIL import Image
	import systemml as sml

	# ImageNet specific parameters
	img_shape = (3, 224, 224)

	# Downloads a jpg image, resizes it to 224 and return as numpy array in N X CHW format
	url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MountainLion.jpg/312px-MountainLion.jpg'
	outFile = 'test.jpg'
	urllib.urlretrieve(url, outFile)
	input_image = sml.convertImageToNumPyArr(Image.open(outFile), img_shape=img_shape)

	# Download the ResNet network
	import urllib
	urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_network.proto', 'ResNet_50_network.proto')
	urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_solver.proto', 'ResNet_50_solver.proto')

	# Assumes that you have cloned the model_zoo repository
	# git clone https://github.com/niketanpansare/model_zoo.git
	resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', weights='~/model_zoo/caffe/vision/resnet/ilsvrc12/ResNet_50_pretrained_weights').set(input_shape=img_shape)
	resnet.predict(input_image)
	```