blob: af6094001c4a61bab9b8fd2e57687872f1dc1b13 [file] [log] [blame]
-------------------------------------------------------------------------------
Apache SystemML (incubating)
-------------------------------------------------------------------------------
SystemML is now an Apache Incubator project! Please see the Apache SystemML
(incubating) website at http://systemml.apache.org/ for more information. The
latest project documentation can be found at the SystemML Documentation website
on GitHub at http://apache.github.io/incubator-systemml/.
SystemML is a flexible, scalable machine learning system. SystemML's
distinguishing characteristics are:
1. Algorithm customizability via R-like and Python-like languages.
2. Multiple execution modes, including Standalone, Spark Batch, Spark
MLContext, Hadoop Batch, and JMLC.
3. Automatic optimization based on data and cluster characteristics to ensure
both efficiency and scalability.
-------------------------------------------------------------------------------
SystemML in Standalone Mode
-------------------------------------------------------------------------------
Standalone mode can be run on a single machine in a non-Hadoop environment,
allowing data scientists to develop algorithms locally without need of a
distributed cluster. The Standalone release packages all required libraries
into a single distribution. Standalone mode is not appropriate for large
datasets.
OS X and Linux users can use the runStandaloneSystemML.sh script to run in
Standalone mode, while Windows users can use the runStandaloneSystemML.bat
script.
-------------------------------------------------------------------------------
Hello World Example
-------------------------------------------------------------------------------
The following example will run a "hello world" DML script on SystemML in
Standalone mode.
$ echo 'print("hello world");' > helloworld.dml
$ ./runStandaloneSystemML.sh helloworld.dml
-------------------------------------------------------------------------------
Running SystemML Algorithms
-------------------------------------------------------------------------------
Several existing algorithms can be found in the scripts directory in the
Standalone distribution. In the following example, we first obtain Haberman's
Survival Data Set. We create a metadata file for this data. We create a
types.csv file that describes the type of each column along with a
corresponding metadata file. We then run the Univariate Statistics algorithm
on the data in Standalone mode. The results are output to the
data/univarOut.mtx file.
$ wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
$ echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd
$ echo '1,1,1,2' > data/types.csv
$ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd
$ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx
For more information, please see the online SystemML documentation.