SystemML is now an Apache Top Level Project! Please see the Apache SystemML website for more information.
SystemML is a flexible, scalable machine learning system. SystemML's distinguishing characteristics are:
The latest version of SystemML supports: Java 8+, Scala 2.11+, Python 2.7/3.5+, Hadoop 2.6+, and Spark 2.1+.
ML algorithms in SystemML are specified in a high-level, declarative machine learning (DML) language. Algorithms can be expressed in either an R-like syntax or a Python-like syntax. DML includes linear algebra primitives, statistical functions, and additional constructs.
This high-level language significantly increases the productivity of data scientists as it provides (1) full flexibility in expressing custom analytics and (2) data independence from the underlying input formats and physical data representations.
SystemML computations can be executed in a variety of different modes. To begin with, SystemML can be operated in Standalone mode on a single machine, allowing data scientists to develop algorithms locally without need of a distributed cluster. In order to scale up, algorithms can also be distributed across a cluster using Spark or Hadoop. This flexibility allows the utilization of an organization's existing resources and expertise. In addition, SystemML features a Spark MLContext API that allows for programmatic interaction via Scala, Python, and Java. SystemML also features an embedded API for scoring models.
Algorithms specified in DML are dynamically compiled and optimized based on data and cluster characteristics using rule-based and cost-based optimization techniques. The optimizer automatically generates hybrid runtime execution plans ranging from in-memory, single-node execution, to distributed computations on Spark or Hadoop. This ensures both efficiency and scalability. Automatic optimization reduces or eliminates the need to hand-tune distributed runtime execution plans and system configurations.
SystemML features a suite of production-level examples that can be grouped into six broad categories: Descriptive Statistics, Classification, Clustering, Regression, Matrix Factorization, and Survival Analysis. Detailed descriptions of these algorithms can be found in the SystemML Algorithms Reference. The goal of these provided algorithms is to serve as production-level examples that can modified or used as inspiration for a new custom algorithm.
Before you get started on SystemML, make sure that your environment is set up and ready to go.
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)"
brew tap caskroom/cask brew install Caskroom/cask/java
brew tap homebrew/versions brew install apache-spark21
Go to the SystemML Downloads page, download
systemml-1.0.0-bin.zip (should be 2nd), and unzip it to a location of your choice.
The next step is optional, but it will make your life a lot easier.
SYSTEMML_HOMEin your bash profile. Add the following to
path/to/with the location of the download in step 4.
Make sure to open a new tab in terminal so that you make sure the changes have been made.
brew install python pip install jupyter matplotlib numpy
brew install python3 pip3 install jupyter matplotlib numpy
Congrats! You can now use SystemML!
To get started, please consult the SystemML Documentation. We recommend using the Spark MLContext API to run SystemML from Scala or Python using