layout: global displayTitle: SystemML Engine Developer Guide title: SystemML Engine Developer Guide description: SystemML Engine Developer Guide

  • This will become a table of contents (this text will be scraped). {:toc}

Building SystemML

SystemML is built using Apache Maven. SystemML will build on Linux, MacOS, or Windows, and requires Maven 3 and Java 7 (or higher). To build SystemML, run:

mvn clean package

To build the SystemML distributions, run:

mvn clean package -P distribution

Testing SystemML

SystemML features a comprehensive set of integration tests. To perform these tests, run:

mvn verify

Note: these tests require R to be installed and available as part of the PATH variable on the machine on which you are running these tests.

If required, please install the following packages in R:

install.packages(c("batch", "bitops", "boot", "caTools", "data.table", "doMC", "doSNOW", "ggplot2", "glmnet", "lda", "Matrix", "matrixStats", "moments", "plotrix", "psych", "reshape", "topicmodels", "wordcloud"), dependencies=TRUE)

Development Environment

SystemML itself is written in Java and is managed using Maven. As a result, SystemML can readily be imported into a standard development environment such as Eclipse and IntelliJ IDEA. The DMLScript class serves as the main entrypoint to SystemML. Executing DMLScript with no arguments displays usage information. A script file can be specified using the -f argument.

In Eclipse, a Debug Configuration can be created with DMLScript as the Main class and any arguments specified as Program arguments. A PyDML script requires the addition of a -python switch.

Suppose that we have a hello.dml script containing the following:

print('hello ' + $1)

This SystemML script can be debugged in Eclipse using a Debug Configuration such as the following:


Python MLContext API

When working with the Python MLContext API (see src/main/python/systemml/mlcontext.py) during development, it can be useful to install the Python MLContext API in editable mode (-e). This allows Python updates to take effect without requiring the SystemML python artifact to be built and installed.

{% highlight bash %} mvn clean pip3 install -e src/main/python mvn clean package PYSPARK_PYTHON=python3 pyspark --driver-class-path target/SystemML.jar {% endhighlight %}

Using Python version 3.5.2 (default, Jul 28 2016 21:28:07) SparkSession available as ‘spark’.

from systemml import MLContext, dml ml = MLContext(sc)

Welcome to Apache SystemML!

script = dml(“print(‘hello world’)”) ml.execute(script) hello world MLResults {% endhighlight %}


Matrix Multiplication Operators

In the following, we give an overview of backend-specific physical matrix multiplication operators in SystemML as well as their internally used matrix multiplication block operations.

Basic Matrix Multiplication Operators

An AggBinaryOp hop can be compiled into the following physical operators.

1. Physical Operators in CP (single node, control program)

NameDescriptionOperation
MMbasic matrix multiplicationmm
MMChainmatrix multiplication chainmmchain
TSMMtranspose-self matrix multiplicationtsmm
PMMpermutation matrix multiplicationpmm

2. Physical Operator in MR (distributed, mapreduce)

NameDescriptionOperation
MapMMmap-side matrix multiplication, w/ or w/o aggmm
MapMMChainmap-side matrix chain multiplicationmmchain
TSMMmap-side transpose-self matrix multiplicationtsmm
PMMmap-side permutation matrix multiplicationpmm
CPMMcross-product matrix multiplication, 2 jobsmm
RMMreplication-based matrix multiplication, 1 jobmm

3. Physical Operators in SPARK (distributed, spark)

NameDescriptionOperation
MapMMsee MR, flatmap/mappartitions/maptopair + reduce/reducebykey/no_aggregationmm
MapMMChainsee MR, mapvalues/maptopair + reducemmchain
TSMMsee MR, mapvalues + reducetsmm
PMMsee MR, flatmaptopair + reducebykeypmm
CPMMsee MR, 2 x maptopair + join + maptopair + reduce/reducebykeymm
RMMsee MR, 2 x flatmap + join + maptopair + reducebykeymm
ZIPMMpartitioning-preserving 1-1 zipping mm, join + mapvalues + reducemm

Complex Matrix Multiplication Operators

A QuaternaryOp hop can be compiled into the following physical operators. Note that wsloss, wsigmoid, wdivmm have different semantics though. The main goal of these operators is to prevent the creation of dense “outer” products via selective computation over a sparse driver (sparse matrix and sparse-safe operation).

1. Physical Operators in CP (single node, control program)

NameDescriptionOperation
WSLossweighted squared losswsloss
WSigmoidweighted sigmoidwsigmoid
WDivMMweighted divide matrix multiplicationwdivmm
WCeMMweighted cross entropy matrix multiplicationwcemm
WuMMweighted unary op matrix multiplicationwumm

2. Physical Operator in MR (distributed, mapreduce)

NameDescriptionOperation
MapWSLossmap-side weighted squared losswsloss
RedWSLossreduce-side weighted squared losswsloss
MapWSigmoidmap-side weighted sigmoidwsigmoid
RedWSigmoidreduce-side weighted sigmoidwsigmoid
MapWDivMMmap-side weighted divide matrix multwdivmm
RedWDivMMreduce-side weighted divide matrix multwdivmm
MapWCeMMmap-side weighted cross entr. matrix multwcemm
RedWCeMMreduce-side w. cross entr. matrix multwcemm
MapWuMMmap-side weighted unary op matrix multwumm
RedWuMMreduce-side weighted unary op matrix multwumm

3. Physical Operators in SPARK (distributed, spark)

NameDescriptionOperation
MapWSLosssee MR, mappartitions + reducewsloss
RedWSLosssee MR, 1/2x flatmaptopair + 1-3x join + maptopair + reducewsloss
MapWSigmoidsee MR, mappartitionswsigmoid
RedWSigmoidsee MR, 1/2x flatmaptopair + 1/2x join + maptopairwsigmoid
MapWDivMMsee MR, mappartitions + reducebykeywdivmm
RedWDivMMsee MR, 1/2x flatmaptopair + 1/2x join + maptopair + reducebykeywdivmm
MapWCeMMsee MR, mappartitions + reducewcemm
RedWCeMMsee MR, 1/2x flatmaptopair + 1/2x join + maptopair + reducewcemm
MapWuMMsee MR, mappartitionswumm
RedWuMMsee MR, 1/2x flatmaptopair + 1/2x join + maptopairwumm

Core Matrix Multiplication Primitives