tree: d766b254d1e51fe9d412fe96b669c3b614dd2f8a [path history] [tgz]
  1. caffe2dml/
  2. Example - MNIST LeNet.ipynb
  3. Example - MNIST Softmax Classifier.ipynb
  4. get_mnist_data.sh
  5. mnist_lenet-predict.dml
  6. mnist_lenet-train.dml
  7. mnist_lenet.dml
  8. mnist_lenet_distrib_sgd-train-dummy-data.dml
  9. mnist_lenet_distrib_sgd-train.dml
  10. mnist_lenet_distrib_sgd.dml
  11. mnist_softmax-predict.dml
  12. mnist_softmax-train.dml
  13. mnist_softmax.dml
  14. README.md
scripts/nn/examples/README.md

SystemML-NN Examples

This folder contains scripts and PySpark Jupyter notebooks serving as examples of using the SystemML-NN (nn) deep learning library.


Examples

MNIST Softmax Classifier

  • This example trains a softmax classifier, which is essentially a multi-class logistic regression model, on the MNIST data. The model will be trained on the training images, validated on the validation images, and tested for final performance metrics on the test images.
  • Notebook: Example - MNIST Softmax Classifier.ipynb.
  • DML Functions: mnist_softmax.dml
  • Training script: mnist_softmax-train.dml
  • Prediction script: mnist_softmax-predict.dml

MNIST “LeNet” Neural Net

  • This example trains a neural network on the MNIST data using a “LeNet” architecture. The model will be trained on the training images, validated on the validation images, and tested for final performance metrics on the test images.
  • Notebook: Example - MNIST LeNet.ipynb.
  • DML Functions: mnist_lenet.dml
  • Training script: mnist_lenet-train.dml
  • Prediction script: mnist_lenet-predict.dml

Setup

Code

  • To run the examples, please first download and unzip the project via GitHub using the “Clone or download” button on the homepage of the project, or via the following commands:

    git clone https://github.com/dusenberrymw/systemml-nn.git
    
  • Then, move into the systemml-nn folder via:

    cd systemml-nn
    

Data

  • These examples use the classic MNIST dataset, which contains labeled 28x28 pixel images of handwritten digits in the range of 0-9. There are 60,000 training images, and 10,000 testing images. Of the 60,000 training images, 5,000 will be used as validation images.
  • Download:
    • Notebooks: The data will be automatically downloaded as a step in either of the example notebooks.
    • Training scripts: Please run get_mnist_data.sh to download the data separately.

Execution

  • These examples contain scripts written in SystemML's R-like language (*.dml), as well as PySpark Jupyter notebooks (*.ipynb). The scripts contain the math for the algorithms, enclosed in functions, and the notebooks serve as full, end-to-end examples of reading in data, training models using the functions within the scripts, and evaluating final performance.

  • Notebooks: To run the notebook examples, please install the SystemML Python package with pip install systemml, and then startup Jupyter in the following manner from this directory (or for more information, please see this great blog post):

    PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --driver-memory 3G --driver-class-path SystemML.jar --jars SystemML.jar
    

    Note that all printed output, such as training statistics, from the SystemML scripts will be sent to the terminal in which Jupyter was started (for now...).

  • Scripts: To run the scripts from the command line using spark-submit, please see the comments located at the top of the -train and -predict scripts.