| --- |
| layout: global |
| title: Invoking SystemML in Spark Batch Mode |
| description: Invoking SystemML in Spark Batch Mode |
| --- |
| <!-- |
| {% comment %} |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to you under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| {% endcomment %} |
| --> |
| |
| * This will become a table of contents (this text will be scraped). |
| {:toc} |
| |
| <br/> |
| |
| |
| # Overview |
| |
| Given that a primary purpose of SystemML is to perform machine learning on large distributed data |
| sets, one of the most important ways to invoke SystemML is Spark Batch. Here, we will look at this |
| mode in more depth. |
| |
| **NOTE:** For a programmatic API to run and interact with SystemML via Scala or Python, please see the |
| [Spark MLContext Programming Guide](spark-mlcontext-programming-guide). |
| |
| --- |
| |
| # Spark Batch Mode Invocation Syntax |
| |
| SystemML can be invoked in Hadoop Batch mode using the following syntax: |
| |
| spark-submit SystemML.jar [-? | -help | -f <filename>] (-config <config_filename>) ([-args | -nvargs] <args-list>) |
| |
| The DML script to invoke is specified after the `-f` argument. Configuration settings can be passed to SystemML |
| using the optional `-config ` argument. DML scripts can optionally take named arguments (`-nvargs`) or positional |
| arguments (`-args`). Named arguments are preferred over positional arguments. Positional arguments are considered |
| to be deprecated. All the primary algorithm scripts included with SystemML use named arguments. |
| |
| |
| **Example #1: DML Invocation with Named Arguments** |
| |
| spark-submit SystemML.jar -f scripts/algorithms/Kmeans.dml -nvargs X=X.mtx k=5 |
| |
| |
| **Example #2: DML Invocation with Positional Arguments** |
| |
| spark-submit SystemML.jar -f src/test/scripts/applications/linear_regression/LinearRegression.dml -args "v" "y" 0.00000001 "w" |
| |
| # Execution modes |
| |
| SystemML works seamlessly with all Spark execution modes, including *local* (`--master local[*]`), |
| *yarn client* (`--master yarn-client`), *yarn cluster* (`--master yarn-cluster`), *etc*. More |
| information on Spark cluster execution modes can be found on the |
| [official Spark cluster deployment documentation](https://spark.apache.org/docs/latest/cluster-overview.html). |
| *Note* that Spark can be easily run on a laptop in local mode using the `--master local[*]` described |
| above, which SystemML supports. |
| |
| # Recommended Spark Configuration Settings |
| |
| For best performance, we recommend setting the following flags when running SystemML with Spark: |
| `--conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128`. |
| |
| # Examples |
| |
| Please see the MNIST examples in the included |
| [SystemML-NN](https://github.com/apache/systemml/tree/master/scripts/nn) |
| library for examples of Spark Batch mode execution with SystemML to train MNIST classifiers: |
| |
| * [MNIST Softmax Classifier](https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_softmax-train.dml) |
| * [MNIST LeNet ConvNet](https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet-train.dml) |