| |
| <!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style> |
| p img{ max-width:100%} |
| pre { |
| width: 80% !important; |
| white-space: normal; |
| } |
| </style> |
| </head><body id="preview"> |
| <!--<p><!–<br> |
| Licensed to the Apache Software Foundation (ASF) under one<br> |
| or more contributor license agreements. See the NOTICE file<br> |
| distributed with this work for additional information<br> |
| regarding copyright ownership. The ASF licenses this file<br> |
| to you under the Apache License, Version 2.0 (the<br> |
| “License”); you may not use this file except in compliance<br> |
| with the License. You may obtain a copy of the License at</p> |
| <pre><code> http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| </code></pre> |
| <p>–></p>--> |
| <p><img src="../docs/images/format/CarbonData_logo.png?raw=true" alt="CarbonData_Logo"></p> |
| <h1><a id="Quick_Start_20"></a>Quick Start</h1> |
| <p>This tutorial provides a quick introduction to using CarbonData.</p> |
| <h2><a id="Getting_started_with_Apache_CarbonData_23"></a>Getting started with Apache CarbonData</h2> |
| <ul> |
| <li><a href="#installation">Installation</a></li> |
| <li><a href="#InteractiveAnalysis-with-Carbon-Spark-Shell">Interactive Analysis with Carbon-Spark Shell</a> |
| <ul> |
| <li><a href="#basics">Basics</a></li> |
| <li><a href="#executing-queries">Executing Queries</a> |
| <ul> |
| <li><a href="#prerequisites">Prerequisites</a></li> |
| <li><a href="#create-table">Create Table</a></li> |
| <li><a href="#load-data-to-table">Load data to Table</a></li> |
| <li><a href="#query-data-from-table">Query data from table</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li><a href="#carbon-sql-cli">Carbon SQL CLI</a> |
| <ul> |
| <li><a href="#basics">Basics</a></li> |
| <li><a href="#execute-queries-in-cli">Execute Queries in CLI</a></li> |
| </ul> |
| </li> |
| <li><a href="">Building CarbonData</a></li> |
| </ul> |
| <h2><a id="Installation_39"></a>Installation</h2> |
| <ul> |
| <li>Download released package of <a href="http://spark.apache.org/downloads.html">Spark 1.5.0 to 1.6.2</a></li> |
| <li>Download and install <a href="http://thrift-tutorial.readthedocs.io/en/latest/installation.html">Apache Thrift 0.9.3</a>, make sure thrift is added to system path.</li> |
| <li>Download <a href="https://github.com/apache/incubator-carbondata">Apache CarbonData code</a> and build it. Please visit <a href="Installing-CarbonData-And-IDE-Configuartion.md">Building CarbonData And IDE Configuration</a> for more information.</li> |
| </ul> |
| <h2><a id="Interactive_Analysis_with_CarbonSpark_Shell_44"></a>Interactive Analysis with Carbon-Spark Shell</h2> |
| <p>Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more details on Spark shell.</p> |
| <h4><a id="Basics_47"></a>Basics</h4> |
| <p>Start Spark shell by running the following in the Carbon directory:</p> |
| <pre><code>./bin/carbon-spark-shell |
| </code></pre> |
| <p><em>Note</em>: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.</p> |
| <p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br> |
| By default carbon.storelocation is set as :</p> |
| <pre><code>hdfs://IP:PORT/Opt/CarbonStore |
| </code></pre> |
| <p>And you can provide your own store location by providing configuration using --conf option like:</p> |
| <pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=<storelocation> |
| </code></pre> |
| <h4><a id="Executing_Queries_64"></a>Executing Queries</h4> |
| <p><strong>Prerequisites</strong></p> |
| <p>Create sample.csv file in CarbonData directory. The CSV is required for loading data into Carbon.</p> |
| <pre><code>$ cd carbondata |
| $ cat > sample.csv << EOF |
| id,name,city,age |
| 1,david,shenzhen,31 |
| 2,eason,shenzhen,27 |
| 3,jarry,wuhan,35 |
| EOF |
| </code></pre> |
| <p><strong>Create table</strong></p> |
| <pre><code>scala>cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'") |
| </code></pre> |
| <p><strong>Load data to table</strong></p> |
| <pre><code>scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath |
| scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table") |
| </code></pre> |
| <p><strong>Query data from table</strong></p> |
| <pre><code>scala>cc.sql("select * from test_table").show |
| scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show |
| </code></pre> |
| <h2><a id="Carbon_SQL_CLI_98"></a>Carbon SQL CLI</h2> |
| <p>The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more information Spark SQL CLI.</p> |
| <h4><a id="Basics_101"></a>Basics</h4> |
| <p>Start the Carbon Spark SQL CLI, run the following in the Carbon directory :</p> |
| <pre><code>./bin/carbon-spark-sql |
| </code></pre> |
| <p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br> |
| By default carbon.storelocation is set as :</p> |
| <pre><code>hdfs://IP:PORT/Opt/CarbonStore |
| </code></pre> |
| <p>And you can provide your own store location by providing configuration using --conf option like:</p> |
| <pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore |
| </code></pre> |
| <h4><a id="Execute_Queries_in_CLI_118"></a>Execute Queries in CLI</h4> |
| <pre><code>spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata' |
| spark-sql> load data inpath '../sample.csv' into table test_table |
| spark-sql> select city, avg(age), sum(age) from test_table group by city |
| </code></pre> |
| <h2><a id="Building_CarbonData_124"></a>Building CarbonData</h2> |
| <p>To get started, get CarbonData from the <a href="">downloads</a> on the <a href="http://carbondata.incubator.apache.org.">http://carbondata.incubator.apache.org.</a><br> |
| CarbonData uses Hadoop’s client libraries for HDFS and YARN and Spark’s libraries. Downloads are pre-packaged for a handful of popular Spark versions.</p> |
| <p>If you’d like to build CarbonData from source, Please visit <a href="">Building CarbonData And IDE Configuration</a></p> |
| |
| </body></html> |