| |
| <!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview"> |
| <p><!–<br> |
| Licensed to the Apache Software Foundation (ASF) under one<br> |
| or more contributor license agreements. See the NOTICE file<br> |
| distributed with this work for additional information<br> |
| regarding copyright ownership. The ASF licenses this file<br> |
| to you under the Apache License, Version 2.0 (the<br> |
| “License”); you may not use this file except in compliance<br> |
| with the License. You may obtain a copy of the License at</p> |
| <pre><code> http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| </code></pre> |
| <p>–></p> |
| <h1><a id="Installation_Guide_19"></a>Installation Guide</h1> |
| <p>This tutorial will guide you through the installation and configuration of CarbonData in the following two modes :</p> |
| <ul> |
| <li><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide">On Standalone Spark cluster</a></li> |
| <li><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide">On Spark on Yarn cluster</a></li> |
| </ul> |
| <p>followed by :</p> |
| <ul> |
| <li><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide">Query Execution using Carbon Thrift Server</a></li> |
| </ul> |
| <h2><a id="Installing_and_Configuring_CarbonData_on_Standalone_Spark_Cluster_28"></a>Installing and Configuring CarbonData on “Standalone Spark” Cluster</h2> |
| <h3><a id="Prerequisite_30"></a>Prerequisite</h3> |
| <ul> |
| <li>Hadoop HDFS and Yarn should be installed and running.</li> |
| <li>Spark should be installed and running in all the clients.</li> |
| <li>CarbonData user should have permission to access HDFS.</li> |
| </ul> |
| <h3><a id="Procedure_35"></a>Procedure</h3> |
| <p>The following steps are only for Driver Nodes.(Driver nodes are the one which starts the spark context.)</p> |
| <ol> |
| <li> |
| <p><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration">Build the CarbonData</a> project and get the assembly jar from “./assembly/target/scala-2.10/carbondata_xxx.jar” and put in the “<SPARK_HOME>/carbonlib” folder.</p> |
| <p>(Note: - Create the carbonlib folder if does not exists inside “<SPARK_HOME>” path.)</p> |
| </li> |
| <li> |
| <p>carbonlib folder path must be added in Spark classpath. (Edit “<SPARK_HOME>/conf/spark-env.sh” file and modify the value of SPARK_CLASSPATH by appending “<SPARK_HOME>/carbonlib/*” to the existing value)</p> |
| </li> |
| <li> |
| <p>Copy the carbon.properties.template to “<SPARK_HOME>/conf/carbon.properties” folder from “./conf/” of CarbonData repository.</p> |
| </li> |
| <li> |
| <p>Copy “carbonplugins” folder to “<SPARK_HOME>/carbonlib” folder from “./processing/” folder of CarbonData repository.</p> |
| <p>(Note: -carbonplugins will contain .kettle folder.)</p> |
| </li> |
| <li> |
| <p>In Spark node, configure the properties mentioned as the below table in “<SPARK_HOME>/conf/spark-defaults.conf” file</p> |
| </li> |
| </ol> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Description</th> |
| <th>Value</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>carbon.kettle.home</td> |
| <td>Path that will be used by CarbonData internally to create graph for loading the data</td> |
| <td>$SPARK_HOME /carbonlib/carbonplugins</td> |
| </tr> |
| <tr> |
| <td>spark.driver.extraJavaOptions</td> |
| <td>A string of extra JVM options to pass to the driver. For instance, GC settings or other logging.</td> |
| <td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td> |
| </tr> |
| <tr> |
| <td>spark.executor.extraJavaOptions</td> |
| <td>A string of extra JVM options to pass to executors. For instance, GC settings or other logging. NOTE: You can enter multiple values separated by space.</td> |
| <td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td> |
| </tr> |
| </tbody> |
| </table> |
| <ol start="6"> |
| <li>Add the following properties in “<SPARK_HOME>/conf/” carbon.properties:</li> |
| </ol> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Required</th> |
| <th>Description</th> |
| <th>Example</th> |
| <th>Remark</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>carbon.storelocation</td> |
| <td>NO</td> |
| <td>Location where data Carbon will create the store and write the data in its own format.</td> |
| <td>hdfs://IP:PORT/Opt/CarbonStore</td> |
| <td>Propose</td> |
| </tr> |
| <tr> |
| <td>carbon.kettle.home</td> |
| <td>YES</td> |
| <td>Path that will used by Carbon internally to create graph for loading the data.</td> |
| <td>$SPARK_HOME/carbonlib/carbonplugins</td> |
| <td></td> |
| </tr> |
| </tbody> |
| </table> |
| <ol start="7"> |
| <li>Installation verification,for example:</li> |
| </ol> |
| <pre><code>./spark-shell --master spark://IP:PORT --total-executor-cores 2 --executor-memory 2G |
| </code></pre> |
| <p>Note: Make sure that user should have permission of carbon jars and files through which driver and executor will start.</p> |
| <p>To get started with CarbonData : <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start">Quick Start</a> ,<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/DDL+operations+on+CarbonData">DDL Operations</a></p> |
| <h2><a id="Installing_and_Configuring_Carbon_on_Spark_on_YARN_Cluster_72"></a>Installing and Configuring Carbon on “Spark on YARN” Cluster</h2> |
| <p>This section provides the procedure to install Carbon on “Spark on YARN” cluster.</p> |
| <h3><a id="Prerequisite_75"></a>Prerequisite</h3> |
| <ul> |
| <li>Hadoop HDFS and Yarn should be installed and running.</li> |
| <li>Spark should be installed and running in all the clients.</li> |
| <li>CarbonData user should have permission to access HDFS.</li> |
| </ul> |
| <h3><a id="Procedure_80"></a>Procedure</h3> |
| <ol> |
| <li> |
| <p><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration">Build the CarbonData</a> project and get the assembly jar from “./assembly/target/scala-2.10/carbondata_xxx.jar” and put in the “<SPARK_HOME>/carbonlib” folder.</p> |
| <p>(Note: - Create the carbonlib folder if does not exists inside “<SPARK_HOME>” path.)</p> |
| </li> |
| <li> |
| <p>Copy the carbon.properties.template to “<SPARK_HOME>/conf/carbon.properties” folder from “./conf/” of CarbonData repository.<br> |
| carbonplugins will contain .kettle folder.</p> |
| </li> |
| <li> |
| <p>Copy the “carbon.properties.template” to “<SPARK_HOME>/conf/carbon.properties” folder from conf folder of carbondata repository.</p> |
| </li> |
| <li> |
| <p>Modify the parameters in “spark-default.conf” located in the “<SPARK_HOME>/conf”</p> |
| </li> |
| </ol> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Description</th> |
| <th>Value</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>spark.master</td> |
| <td>Set this value to run the Spark in yarn cluster mode.</td> |
| <td>Set “yarn-client” to run the Spark in yarn cluster mode.</td> |
| </tr> |
| <tr> |
| <td>spark.yarn.dist.files</td> |
| <td>Comma-separated list of files to be placed in the working directory of each executor.</td> |
| <td>“<YOUR_SPARK_HOME_PATH>”/conf/carbon.properties</td> |
| </tr> |
| <tr> |
| <td>spark.yarn.dist.archives</td> |
| <td>Comma-separated list of archives to be extracted into the working directory of each executor.</td> |
| <td>“<YOUR_SPARK_HOME_PATH>”/carbonlib/carbondata_xxx.jar</td> |
| </tr> |
| <tr> |
| <td>spark.executor.extraJavaOptions</td> |
| <td>A string of extra JVM options to pass to executors. For instance NOTE: You can enter multiple values separated by space.</td> |
| <td>-Dcarbon.properties.filepath=carbon.properties</td> |
| </tr> |
| <tr> |
| <td>spark.executor.extraClassPath</td> |
| <td>Extra classpath entries to prepend to the classpath of executors. NOTE: If SPARK_CLASSPATH is defined in <a href="http://spark-env.sh">spark-env.sh</a>, then comment it and append the values in below parameter spark.driver.extraClassPath</td> |
| <td>“<YOUR_SPARK_HOME_PATH>”/carbonlib/carbonlib/carbondata_xxx.jar</td> |
| </tr> |
| <tr> |
| <td>spark.driver.extraClassPath</td> |
| <td>Extra classpath entries to prepend to the classpath of the driver. NOTE: If SPARK_CLASSPATH is defined in <a href="http://spark-env.sh">spark-env.sh</a>, then comment it and append the value in below parameter spark.driver.extraClassPath.</td> |
| <td>“<YOUR_SPARK_HOME_PATH>”/carbonlib/carbonlib/carbondata_xxx.jar</td> |
| </tr> |
| <tr> |
| <td>spark.driver.extraJavaOptions</td> |
| <td>A string of extra JVM options to pass to the driver. For instance, GC settings or other logging.</td> |
| <td>-Dcarbon.properties.filepath="<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties</td> |
| </tr> |
| <tr> |
| <td>carbon.kettle.home</td> |
| <td>Path that will used by Carbon internally to create graph for loading the data.</td> |
| <td>“<YOUR_SPARK_HOME_PATH>”/carbonlib/carbonplugins</td> |
| </tr> |
| </tbody> |
| </table> |
| <ol start="5"> |
| <li>Add the following properties in <SPARK_HOME>/conf/ carbon.properties:</li> |
| </ol> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Required</th> |
| <th>Description</th> |
| <th>Example</th> |
| <th>Default Value</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>carbon.storelocation</td> |
| <td>NO</td> |
| <td>Location where data Carbon will create the store and write the data in its own format.</td> |
| <td>hdfs://IP:PORT/Opt/CarbonStore</td> |
| <td>Propose</td> |
| </tr> |
| <tr> |
| <td>carbon.kettle.home</td> |
| <td>YES</td> |
| <td>Path that will used by Carbon internally to create graph for loading the data.</td> |
| <td>$SPARK_HOME/carbonlib/carbonplugins</td> |
| <td></td> |
| </tr> |
| </tbody> |
| </table> |
| <ol start="6"> |
| <li>Installation verification</li> |
| </ol> |
| <pre><code>./bin/spark-shell --master yarn-client --driver-memory 1g --executor-cores 2 --executor-memory 2G |
| </code></pre> |
| <p>Note: Make sure that user should have permission of carbon jars and files through which driver and executor will start.</p> |
| <p>To get started with CarbonData : <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start">Quick Start</a> ,<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/DDL+operations+on+CarbonData">DDL Operations</a></p> |
| <h2><a id="Query_execution_using_Carbon_thrift_server_118"></a>Query execution using Carbon thrift server</h2> |
| <h3><a id="Start_Thrift_server_120"></a>Start Thrift server</h3> |
| <p>a. cd <SPARK_HOME></p> |
| <p>b. Run below command to start the Carbon thrift server</p> |
| <pre><code>./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer |
| $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path> |
| </code></pre> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Parameter</th> |
| <th>Description</th> |
| <th>Example</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>CARBON_ASSEMBLY_JAR</td> |
| <td>Carbon assembly jar name present in the “”/carbonlib/ folder.</td> |
| <td>carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar</td> |
| </tr> |
| <tr> |
| <td>carbon_store_path</td> |
| <td>This is parameter to the CarbonThriftServer class. This a HDFS path where carbon files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.proeprties.</td> |
| <td>hdfs//hacluster/user/hive/warehouse/carbon.storehdfs//10.10.10.10:54310 /user/hive/warehouse/carbon.store</td> |
| </tr> |
| </tbody> |
| </table> |
| <h3><a id="Examples_133"></a>Examples</h3> |
| <ol> |
| <li>Start with default memory and executors</li> |
| </ol> |
| <pre><code>./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer $SPARK_HOME/carbonlib/carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar hdfs://hacluster/user/hive/warehouse/carbon.store |
| </code></pre> |
| <ol start="2"> |
| <li>Start with Fixed executors and resources</li> |
| </ol> |
| <pre><code>./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer --num-executors 3 --driver-memory 20g --executor-memory 250g --executor-cores 32 /srv/OSCON/BigData/HACluster/install/spark/sparkJdbc/lib/carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar hdfs://hacluster/user/hive/warehouse/carbon.store |
| </code></pre> |
| <h3><a id="Connecting_to_Carbon_Thrift_Server_Using_Beeline_142"></a>Connecting to Carbon Thrift Server Using Beeline</h3> |
| <pre><code>cd <SPARK_HOME> |
| ./bin/beeline jdbc:hive2://<thrftserver_host>:port |
| |
| Example |
| ./bin/beeline jdbc:hive2://10.10.10.10:10000 |
| </code></pre> |
| |
| </body></html> |