blob: 2d2ad9320a1023c20e1af933ac937855701ad3cd [file] [log] [blame]
<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style>
p img{ max-width:100%}
pre {
width: 80% !important;
white-space: normal;
}
</style>
</head><body id="preview">
<!--<p>&lt;!–<br>
Licensed to the Apache Software Foundation (ASF) under one<br>
or more contributor license agreements. See the NOTICE file<br>
distributed with this work for additional information<br>
regarding copyright ownership. The ASF licenses this file<br>
to you under the Apache License, Version 2.0 (the<br>
“License”); you may not use this file except in compliance<br>
with the License. You may obtain a copy of the License at</p>
<pre><code> http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
</code></pre>
<p>–&gt;</p>-->
<p><img src="../docs/images/format/CarbonData_logo.png?raw=true" alt="CarbonData_Logo"></p>
<h1><a id="Quick_Start_20"></a>Quick Start</h1>
<p>This tutorial provides a quick introduction to using CarbonData.</p>
<h2><a id="Getting_started_with_Apache_CarbonData_23"></a>Getting started with Apache CarbonData</h2>
<ul>
<li><a href="#installation">Installation</a></li>
<li><a href="#InteractiveAnalysis-with-Carbon-Spark-Shell">Interactive Analysis with Carbon-Spark Shell</a>
<ul>
<li><a href="#basics">Basics</a></li>
<li><a href="#executing-queries">Executing Queries</a>
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#create-table">Create Table</a></li>
<li><a href="#load-data-to-table">Load data to Table</a></li>
<li><a href="#query-data-from-table">Query data from table</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#carbon-sql-cli">Carbon SQL CLI</a>
<ul>
<li><a href="#basics">Basics</a></li>
<li><a href="#execute-queries-in-cli">Execute Queries in CLI</a></li>
</ul>
</li>
<li><a href="">Building CarbonData</a></li>
</ul>
<h2><a id="Installation_39"></a>Installation</h2>
<ul>
<li>Download released package of <a href="http://spark.apache.org/downloads.html">Spark 1.5.0 to 1.6.2</a></li>
<li>Download and install <a href="http://thrift-tutorial.readthedocs.io/en/latest/installation.html">Apache Thrift 0.9.3</a>, make sure thrift is added to system path.</li>
<li>Download <a href="https://github.com/apache/incubator-carbondata">Apache CarbonData code</a> and build it. Please visit <a href="Installing-CarbonData-And-IDE-Configuartion.md">Building CarbonData And IDE Configuration</a> for more information.</li>
</ul>
<h2><a id="Interactive_Analysis_with_CarbonSpark_Shell_44"></a>Interactive Analysis with Carbon-Spark Shell</h2>
<p>Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more details on Spark shell.</p>
<h4><a id="Basics_47"></a>Basics</h4>
<p>Start Spark shell by running the following in the Carbon directory:</p>
<pre><code>./bin/carbon-spark-shell
</code></pre>
<p><em>Note</em>: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.</p>
<p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
By default carbon.storelocation is set as :</p>
<pre><code>hdfs://IP:PORT/Opt/CarbonStore
</code></pre>
<p>And you can provide your own store location by providing configuration using --conf option like:</p>
<pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=&lt;storelocation&gt;
</code></pre>
<h4><a id="Executing_Queries_64"></a>Executing Queries</h4>
<p><strong>Prerequisites</strong></p>
<p>Create sample.csv file in CarbonData directory. The CSV is required for loading data into Carbon.</p>
<pre><code>$ cd carbondata
$ cat &gt; sample.csv &lt;&lt; EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF
</code></pre>
<p><strong>Create table</strong></p>
<pre><code>scala&gt;cc.sql(&quot;create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'&quot;)
</code></pre>
<p><strong>Load data to table</strong></p>
<pre><code>scala&gt;val dataFilePath = new File(&quot;../carbondata/sample.csv&quot;).getCanonicalPath
scala&gt;cc.sql(s&quot;load data inpath '$dataFilePath' into table test_table&quot;)
</code></pre>
<p><strong>Query data from table</strong></p>
<pre><code>scala&gt;cc.sql(&quot;select * from test_table&quot;).show
scala&gt;cc.sql(&quot;select city, avg(age), sum(age) from test_table group by city&quot;).show
</code></pre>
<h2><a id="Carbon_SQL_CLI_98"></a>Carbon SQL CLI</h2>
<p>The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more information Spark SQL CLI.</p>
<h4><a id="Basics_101"></a>Basics</h4>
<p>Start the Carbon Spark SQL CLI, run the following in the Carbon directory :</p>
<pre><code>./bin/carbon-spark-sql
</code></pre>
<p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
By default carbon.storelocation is set as :</p>
<pre><code>hdfs://IP:PORT/Opt/CarbonStore
</code></pre>
<p>And you can provide your own store location by providing configuration using --conf option like:</p>
<pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
</code></pre>
<h4><a id="Execute_Queries_in_CLI_118"></a>Execute Queries in CLI</h4>
<pre><code>spark-sql&gt; create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
spark-sql&gt; load data inpath '../sample.csv' into table test_table
spark-sql&gt; select city, avg(age), sum(age) from test_table group by city
</code></pre>
<h2><a id="Building_CarbonData_124"></a>Building CarbonData</h2>
<p>To get started, get CarbonData from the <a href="">downloads</a> on the <a href="http://carbondata.incubator.apache.org.">http://carbondata.incubator.apache.org.</a><br>
CarbonData uses Hadoop’s client libraries for HDFS and YARN and Spark’s libraries. Downloads are pre-packaged for a handful of popular Spark versions.</p>
<p>If you’d like to build CarbonData from source, Please visit <a href="">Building CarbonData And IDE Configuration</a></p>
</body></html>