blob: 1232ef572b76ccc8637181b9a3874b32635e695d [file] [log] [blame]
<?xml version="1.0" encoding="iso-8859-1"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document xmlns="http://maven.apache.org/XDOC/2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
<properties></properties>
<title>Getting Started</title>
<body>
<section name="Installation Instructions"></section>
<p>
Apache MRQL can run in four modes: in Map-Reduce mode using Apache Hadoop, in BSP mode (Bulk Synchronous Parallel mode) using Apache Hama, in Spark mode using Apache Spark, and in Flink mode using Apache Flink.
</p>
<p>
The MRQL MapReduce mode has been tested on Apache Hadoop MapReduce releases 1.x, and 2.x (Yarn). You can download the latest tarball from <a href="http://hadoop.apache.org/releases.html">Apache Hadoop</a>. The BSP and Spark modes are optional. The BSP mode has been tested on Apache Hama 0.6.2, 0.6.3, and 0.6.4. You can download the latest tarball from <a href="http://hama.apache.org/">Apache Hama</a>. The Spark mode has been tested on Apache Spark 1.0.0 through 1.2.0 in local, standalone deploy, and Yarn modes. You can download the latest tarball prebuilt for Hadoop1 or Hadoop2 from <a href="http://spark.apache.org/">Apache Spark</a>. The Flink mode has been tested on Apache Flink 0.6-incubating through 0.8.0 in local and Yarn modes. You can download the latest tarball prebuilt for Hadoop2 from <a href="http://flink.apache.org/downloads.html">Apache Flink</a>.
</p>
<p>
The following instructions assume that you have already installed Apache Hadoop MapReduce and you have deployed it on your cluster successfully.
</p>
<subsection name="How to install MRQL"></subsection>
<p>
Download the latest stable MRQL binary release from <a href="http://www.apache.org/dyn/closer.cgi/incubator/mrql">http://www.apache.org/dyn/closer.cgi/incubator/mrql</a> and extract the files. The scripts <code>bin/mrql</code>, <code>bin/mrql.bsp</code>, and <code>bin/mrql.spark</code> evaluate MRQL queries in Hadoop, Hama, and Spark modes, respectively.
</p>
<subsection name="How to run MRQL on a Hadoop MapReduce cluster"></subsection>
<p>
Change the configuration file <code>conf/mrql-env.sh</code> to match your Hadoop installation. For a test, run the <a href="https://wiki.apache.org/mrql/Pagerank">PageRank example</a> or the <a href="https://wiki.apache.org/mrql/Kmeans">k-means clustering example</a> on a small Hadoop MapReduce cluster.
</p>
<subsection name="How to run MRQL on a Hama cluster"></subsection>
<p>
Follow the instructions in <a href="http://hama.apache.org/getting_started_with_hama.html">Getting Started with Hama</a> to set up and start Hama.
Change the configuration file <code>conf/mrql-env.sh</code> to match your Hama installation. For a test, run the <a href="https://wiki.apache.org/mrql/Pagerank">PageRank example</a> or the <a href="https://wiki.apache.org/mrql/Kmeans">k-means clustering example</a> on a Hama cluster.
</p>
<subsection name="How to run MRQL on a Spark standalone cluster"></subsection>
<p>
Follow the instructions in <a href="http://spark.apache.org/docs/latest/spark-standalone.html">Spark Standalone Mode</a> to set up and start Apache Spark in standalone deploy mode.
Change the configuration file <code>conf/mrql-env.sh</code> to match your Spark installation.
For a test, run the <a href="https://wiki.apache.org/mrql/Pagerank">PageRank example</a> or the <a href="https://wiki.apache.org/mrql/Kmeans">k-means clustering example</a> on a Spark cluster.
</p>
<subsection name="How to run MRQL in Spark mode on a Yarn cluster"></subsection>
<p>
Set SPARK_MASTER=yarn-client in conf/mrql-env.sh (see <a href="http://spark.apache.org/docs/latest/running-on-yarn.html">Running Spark on YARN</a>).
</p>
<subsection name="How to run MRQL in Flink mode on a Yarn cluster"></subsection>
<p>
First, start the Flink application manager on Yarn using ${FLINK_HOME}/bin/yarn-session.sh -n #_of_nodes (see <a href="http://flink.apache.org/docs/0.8/yarn_setup.html">Yarn Setup</a>). Then run the <a href="https://wiki.apache.org/mrql/Pagerank">PageRank example</a> using the bin/mrql.flink script.
</p>
<section name="How to Recompile MRQL"></section>
<p>
Download the latest stable MRQL source release from <a href="http://www.apache.org/dyn/closer.cgi/incubator/mrql">http://www.apache.org/dyn/closer.cgi/incubator/mrql</a> and extract the files. You can get the latest source code using:
<pre>
git clone https://git-wip-us.apache.org/repos/asf/incubator-mrql.git
</pre>
To build MRQL on Hadoop 1.x using maven, use <code>mvn clean install</code>. To build MRQL on Yarn (Hadoop 2.x), use <code>mvn -Dyarn clean install</code>. To validate the installation use <code>mvn -DskipTests=false clean install</code>, which runs the queries in <code>tests/queries</code> in memory, local Hadoop mode, local Hama mode, local Spark mode, and local Flink mode.
</p>
</body>
</document>