blob: 03f23c62ac740dbb7ff7ad3db6d74b4ad67f2600 [file] [log] [blame]
---
layout: page
title: Install SystemML
description: Install SystemML Page
group: nav-right
---
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->
<!-- Hero -->
<!-- <section class="full-stripe full-stripe--subpage-header clear-header">
<div class="ml-container ml-container--horizontally-center">
<div class="col col-12 content-group">
<h1>Tutorials</h1>
</div>
</div>
</section> -->
<!-- Tutorial Instructions -->
<section class="full-stripe full-stripe--alternate">
<!-- Section 1 -->
<div class="ml-container content-group content-group--tutorial border">
<!-- Section Header -->
<div class="col col-12 content-group--medium-bottom-margin">
<h2>Install SystemML</h2>
</div>
<!-- Step 1 Instructions -->
<div class="col col-12">
<h3><span class="circle">1</span>Pre-requisite</h3>
</div>
<!-- Step 1 Code -->
<div class="col col-12">
<p class="indent">Apache Spark 2.x</p>
<p class="indent">Set SPARK_HOME to a location where Spark 2.x is installed.</p>
<div id="prerequisite-tabs">
<ul>
<li><a href="#prerequisite-tabs-1">MacOS/Linux</a></li>
<li><a href="#prerequisite-tabs-2">Windows</a></li>
</ul>
<div id="prerequisite-tabs-1">
1) Java <br />
Make sure Java version is >= 1.8 and JAVA_HOME environment variable is set:
{% highlight bash %}
java -version
export JAVA_HOME="$(/usr/libexec/java_home)"{% endhighlight %}
2) Spark <br />
Download Spark from <a href="https://spark.apache.org/downloads.html">https://spark.apache.org/downloads.html</a> and move to home directory, and extract. Also, set environment variables to point to the extracted directory
{% highlight bash %}
export SPARK_HOME="$HOME/spark-2.1.0-bin-hadoop2.7"
export HADOOP_HOME=$SPARK_HOME
export SPARK_LOCAL_IP=127.0.0.1{% endhighlight %}
3) Python and Jupyter <br />
Download and install Anaconda Python 3+ from <a href="https://www.anaconda.com/distribution/#download-section">https://www.anaconda.com/distribution/#download-section</a> (includes jupyter, and pip)
{% highlight bash %}
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook' $SPARK_HOME/bin/pyspark --master local[*] --driver-memory 8G{% endhighlight %}
</div>
<div id="prerequisite-tabs-2">
1) Java <br />
Make sure Java version is >= 1.8. Also, set JAVA_HOME environment variable and include %JAVA_HOME%\bin in the environment variable PATH:
{% highlight bash %}
java -version
ls "%JAVA_HOME%"{% endhighlight %}
2) Spark <br />
Download Spark from <a href="https://spark.apache.org/downloads.html">https://spark.apache.org/downloads.html</a> and extract. Set the environment variable SPARK_HOME to point to the extracted directory. <br />
3) Install winutils <br />
- Download winutils.exe from <a href="http://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe">http://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe</a> <br />
- Place it in c:\winutils\bin <br />
- Set environment variable HADOOP_HOME to point to c:\winutils <br />
- Add c:\winutils\bin to the environment variable PATH. <br />
- Finally, modify permission of hive directory that will be used by Spark and check if Spark is correctly installed:
{% highlight bash %}
winutils.exe chmod 777 /tmp/hive
%SPARK_HOME%\bin\spark-shell
%SPARK_HOME%\bin\pyspark --master local[*] --driver-memory 8G{% endhighlight %}
3) Python and Jupyter <br />
Download and install Anaconda Python 3+ from <a href="https://www.anaconda.com/distribution/#download-section">https://www.anaconda.com/distribution/#download-section</a> (includes jupyter, and pip)
{% highlight bash %}
set PYSPARK_DRIVER_PYTHON=jupyter
set PYSPARK_DRIVER_PYTHON_OPTS=notebook
%SPARK_HOME%\bin\pyspark --master local[*] --driver-memory 8G{% endhighlight %}
</div>
</div>
</div>
<!-- Step 2 -->
<div class="col col-12">
<h3><span class="circle">2</span>Setup SystemML</h3>
</div>
<div id="setup-tabs">
<ul>
<li><a href="#setup-tabs-1">Python</a></li>
<li><a href="#setup-tabs-2">Scala</a></li>
<li><a href="#setup-tabs-3">Dev Python (Latest code)</a></li>
<li><a href="#setup-tabs-4">Dev Scala (Latest code)</a></li>
</ul>
<div id="setup-tabs-1">
1) Install SystemML:
{% highlight bash %}
pip install systemml{% endhighlight %}
2) For more information, please see the SystemML project documentation:<br/>
<pre>
<a href="http://systemml.apache.org/docs/{{ site.data.project.release_version }}/index.html">http://systemml.apache.org/docs/{{ site.data.project.release_version }}/index.html</a>
<a href="http://systemml.apache.org/docs/{{ site.data.project.release_version }}/beginners-guide-python">http://systemml.apache.org/docs/{{ site.data.project.release_version }}/beginners-guide-python</a>
</pre>
</div>
<div id="setup-tabs-2">
1) Download Apache SystemML binary release (tgz or zip):<br/>
<pre><a href="http://www.apache.org/dyn/closer.lua/systemml/{{ site.data.project.release_version }}/systemml-{{ site.data.project.release_version }}-bin.tgz">http://www.apache.org/dyn/closer.lua/systemml/{{ site.data.project.release_version }}/systemml-{{ site.data.project.release_version }}-bin.tgz</a></pre>
2) Extract binary release contents:<br/>
<pre>tar -xvzf systemml-{{ site.data.project.release_version }}-bin.tgz</pre>
3) Go to project root directory:</br>
<pre>cd systemml-{{ site.data.project.release_version }}-bin</pre>
4) Start Spark Shell with SystemML jar file:<br/>
<pre>
spark-shell --executor-memory 4G --driver-memory 4G --jars lib/systemml-{{ site.data.project.release_version }}.jar
</pre>
5) You're all set to run SystemML on Spark:<br/>
<pre>
import org.apache.sysml.api.mlcontext._
import org.apache.sysml.api.mlcontext.ScriptFactory._
val ml = new MLContext(spark)
val helloScript = dml("print('hello world')")
ml.execute(helloScript)
</pre>
6) For more information, please see the SystemML project documentation:<br/>
<pre>
<a href="http://systemml.apache.org/docs/{{ site.data.project.release_version }}/index.html">http://systemml.apache.org/docs/{{ site.data.project.release_version }}/index.html</a>
<a href="http://systemml.apache.org/docs/{{ site.data.project.release_version }}/spark-mlcontext-programming-guide">http://systemml.apache.org/docs/{{ site.data.project.release_version }}/spark-mlcontext-programming-guide</a>
</pre>
</div>
<div id="setup-tabs-3">
1) Install python development build of SystemML:
{% highlight bash %}
pip install https://sparktc.ibmcloud.com/repo/latest/systemml-1.0.0-SNAPSHOT-python.tar.gz{% endhighlight %}
</div>
<div id="setup-tabs-4">
1) Download binary development build of SystemML (tgz or zip):<br/>
<pre><a href="https://sparktc.ibmcloud.com/repo/latest/systemml-1.0.0-SNAPSHOT-bin.tgz">https://sparktc.ibmcloud.com/repo/latest/systemml-1.0.0-SNAPSHOT-bin.tgz</a></pre>
2) See further steps on Scala tab.
</div>
</div>
<!-- Step 3 Instructions -->
<div class="col col-12">
<h3><span class="circle">3</span>Configure Jupyter Notebook (Optional)</h3>
</div>
<div id="configure-jupyter-tabs">
<ul>
<li><a href="#configure-jupyter-tabs-1">Python</a></li>
<li><a href="#configure-jupyter-tabs-2">Scala</a></li>
</ul>
<div id="configure-jupyter-tabs-1">
{% highlight bash %}
# Start Jupyter Notebook Server
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.default.parallelism=100
{% endhighlight %}
</div>
<div id="configure-jupyter-tabs-2">
<h4>1) Toree Kernel Setup (Required for Scala Kernel)</h4>
1.1) Toree Installation:<br/>
For detailed instructions, visit <a href="https://github.com/apache/incubator-toree">https://github.com/apache/incubator-toree</a>.
{% highlight bash %}
pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
{% endhighlight %}
1.2) Installation of Toree in Jupyter:<br/>
For detailed instructions, visit <a href="https://toree.apache.org/docs/current/user/installation">https://toree.apache.org/docs/current/user/installation</a>.
{% highlight bash %}
jupyter toree install —-replace —-interpreters=Scala,PySpark --spark_opts="--master=local --jars <SystemML JAR File>” --spark_home=${SPARK_HOME}
{% endhighlight %}
<h4>2) Start Jupyter Notebook Server</h4>
{% highlight bash %}jupyter notebook{% endhighlight %}
<p>This will start a default browser with contents from the directory where the above command was run.
You can create your own notebook or download sample notebooks from the SystemML GitHub repository at
<a href="https://github.com/apache/systemml/tree/master/samples/jupyter-notebooks">https://github.com/apache/systemml/tree/master/samples/jupyter-notebooks</a>.</p>
<figure class="img-border"><img src="/assets/img/systemml-juypter-install.jpeg" alt="Start Jupyter Notebook Server"></figure>
<figure class="img-border"><img src="/assets/img/systemml-juypter-install-2.jpeg" alt="Start Jupyter Notebook Server"></figure>
</div>
</div>
</div>
<div class="flex-container flex-banner--horizontally-center">
<a class="button button-secondary button-center" href="get-started.html#sample-notebook">Sample Notebooks</a>
</div>
</section>
<script src="assets/js/jquery-1.12.4.min.js"></script>
<script src="assets/js/jquery-ui-1.12.1.min.js"></script>
<script>
$("#setup-tabs").tabs();
$("#configure-jupyter-tabs").tabs();
</script>