src/main/webapp/documents/quickstart.html - carbondata-site - Git at Google


 <!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style>
 p img{ max-width:100%}
 pre {
     width: 80% !important;
     white-space: normal;
 }
 </style>
 </head><body id="preview">
 <!--<p>&lt;!–<br>
 Licensed to the Apache Software Foundation (ASF) under one<br>
 or more contributor license agreements.  See the NOTICE file<br>
 distributed with this work for additional information<br>
 regarding copyright ownership.  The ASF licenses this file<br>
 to you under the Apache License, Version 2.0 (the<br>
 “License”); you may not use this file except in compliance<br>
 with the License.  You may obtain a copy of the License at</p>
 <pre><code>  http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 </code></pre>
 <p>–&gt;</p>-->
 <p><img src="../docs/images/format/CarbonData_logo.png?raw=true" alt="CarbonData_Logo"></p>
 <h1><a id="Quick_Start_20"></a>Quick Start</h1>
 <p>This tutorial provides a quick introduction to using CarbonData.</p>
 <h2><a id="Getting_started_with_Apache_CarbonData_23"></a>Getting started with Apache CarbonData</h2>
 <ul>
 <li><a href="#installation">Installation</a></li>
 <li><a href="#InteractiveAnalysis-with-Carbon-Spark-Shell">Interactive Analysis with Carbon-Spark Shell</a>
 <ul>
 <li><a href="#basics">Basics</a></li>
 <li><a href="#executing-queries">Executing Queries</a>
 <ul>
 <li><a href="#prerequisites">Prerequisites</a></li>
 <li><a href="#create-table">Create Table</a></li>
 <li><a href="#load-data-to-table">Load data to Table</a></li>
 <li><a href="#query-data-from-table">Query data from table</a></li>
 </ul>
 </li>
 </ul>
 </li>
 <li><a href="#carbon-sql-cli">Carbon SQL CLI</a>
 <ul>
 <li><a href="#basics">Basics</a></li>
 <li><a href="#execute-queries-in-cli">Execute Queries in CLI</a></li>
 </ul>
 </li>
 <li><a href="">Building CarbonData</a></li>
 </ul>
 <h2><a id="Installation_39"></a>Installation</h2>
 <ul>
 <li>Download released package of <a href="http://spark.apache.org/downloads.html">Spark 1.5.0 to 1.6.2</a></li>
 <li>Download and install <a href="http://thrift-tutorial.readthedocs.io/en/latest/installation.html">Apache Thrift 0.9.3</a>, make sure thrift is added to system path.</li>
 <li>Download <a href="https://github.com/apache/incubator-carbondata">Apache CarbonData code</a> and build it. Please visit <a href="Installing-CarbonData-And-IDE-Configuartion.md">Building CarbonData And IDE Configuration</a> for more information.</li>
 </ul>
 <h2><a id="Interactive_Analysis_with_CarbonSpark_Shell_44"></a>Interactive Analysis with Carbon-Spark Shell</h2>
 <p>Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more details on Spark shell.</p>
 <h4><a id="Basics_47"></a>Basics</h4>
 <p>Start Spark shell by running the following in the Carbon directory:</p>
 <pre><code>./bin/carbon-spark-shell
 </code></pre>
 <p><em>Note</em>: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.</p>
 <p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
 By default carbon.storelocation is set as :</p>
 <pre><code>hdfs://IP:PORT/Opt/CarbonStore
 </code></pre>
 <p>And you can provide your own store location by providing configuration using --conf option like:</p>
 <pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=&lt;storelocation&gt;
 </code></pre>
 <h4><a id="Executing_Queries_64"></a>Executing Queries</h4>
 <p><strong>Prerequisites</strong></p>
 <p>Create sample.csv file in CarbonData directory. The CSV is required for loading data into Carbon.</p>
 <pre><code>$ cd carbondata
 $ cat &gt; sample.csv &lt;&lt; EOF
   id,name,city,age
   1,david,shenzhen,31
   2,eason,shenzhen,27
   3,jarry,wuhan,35
   EOF
 </code></pre>
 <p><strong>Create table</strong></p>
 <pre><code>scala&gt;cc.sql(&quot;create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'&quot;)
 </code></pre>
 <p><strong>Load data to table</strong></p>
 <pre><code>scala&gt;val dataFilePath = new File(&quot;../carbondata/sample.csv&quot;).getCanonicalPath
 scala&gt;cc.sql(s&quot;load data inpath '$dataFilePath' into table test_table&quot;)
 </code></pre>
 <p><strong>Query data from table</strong></p>
 <pre><code>scala&gt;cc.sql(&quot;select * from test_table&quot;).show
 scala&gt;cc.sql(&quot;select city, avg(age), sum(age) from test_table group by city&quot;).show
 </code></pre>
 <h2><a id="Carbon_SQL_CLI_98"></a>Carbon SQL CLI</h2>
 <p>The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more information Spark SQL CLI.</p>
 <h4><a id="Basics_101"></a>Basics</h4>
 <p>Start the Carbon Spark SQL CLI, run the following in the Carbon directory :</p>
 <pre><code>./bin/carbon-spark-sql
 </code></pre>
 <p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
 By default carbon.storelocation is set as :</p>
 <pre><code>hdfs://IP:PORT/Opt/CarbonStore
 </code></pre>
 <p>And you can provide your own store location by providing configuration using --conf option like:</p>
 <pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
 </code></pre>
 <h4><a id="Execute_Queries_in_CLI_118"></a>Execute Queries in CLI</h4>
 <pre><code>spark-sql&gt; create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
 spark-sql&gt; load data inpath '../sample.csv' into table test_table
 spark-sql&gt; select city, avg(age), sum(age) from test_table group by city
 </code></pre>
 <h2><a id="Building_CarbonData_124"></a>Building CarbonData</h2>
 <p>To get started, get CarbonData from the <a href="">downloads</a> on the <a href="http://carbondata.incubator.apache.org.">http://carbondata.incubator.apache.org.</a><br>
 CarbonData uses Hadoop’s client libraries for HDFS and YARN and Spark’s libraries. Downloads are pre-packaged for a handful of popular Spark versions.</p>
 <p>If you’d like to build CarbonData from source,  Please visit <a href="">Building CarbonData And IDE Configuration</a></p>

 </body></html>

	<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style>
	p img{ max-width:100%}
	pre {
	width: 80% !important;
	white-space: normal;
	}
	</style>
	</head><body id="preview">
	<!--<p><!–<br>
	Licensed to the Apache Software Foundation (ASF) under one<br>
	or more contributor license agreements. See the NOTICE file<br>
	distributed with this work for additional information<br>
	regarding copyright ownership. The ASF licenses this file<br>
	to you under the Apache License, Version 2.0 (the<br>
	“License”); you may not use this file except in compliance<br>
	with the License. You may obtain a copy of the License at</p>
	<pre><code> http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	</code></pre>
	<p>–></p>-->
	<p><img src="../docs/images/format/CarbonData_logo.png?raw=true" alt="CarbonData_Logo"></p>
	<h1><a id="Quick_Start_20"></a>Quick Start</h1>
	<p>This tutorial provides a quick introduction to using CarbonData.</p>
	<h2><a id="Getting_started_with_Apache_CarbonData_23"></a>Getting started with Apache CarbonData</h2>
	<ul>
	<li><a href="#installation">Installation</a></li>
	<li><a href="#InteractiveAnalysis-with-Carbon-Spark-Shell">Interactive Analysis with Carbon-Spark Shell</a>
	<ul>
	<li><a href="#basics">Basics</a></li>
	<li><a href="#executing-queries">Executing Queries</a>
	<ul>
	<li><a href="#prerequisites">Prerequisites</a></li>
	<li><a href="#create-table">Create Table</a></li>
	<li><a href="#load-data-to-table">Load data to Table</a></li>
	<li><a href="#query-data-from-table">Query data from table</a></li>
	</ul>
	</li>
	</ul>
	</li>
	<li><a href="#carbon-sql-cli">Carbon SQL CLI</a>
	<ul>
	<li><a href="#basics">Basics</a></li>
	<li><a href="#execute-queries-in-cli">Execute Queries in CLI</a></li>
	</ul>
	</li>
	<li><a href="">Building CarbonData</a></li>
	</ul>
	<h2><a id="Installation_39"></a>Installation</h2>
	<ul>
	<li>Download released package of <a href="http://spark.apache.org/downloads.html">Spark 1.5.0 to 1.6.2</a></li>
	<li>Download and install <a href="http://thrift-tutorial.readthedocs.io/en/latest/installation.html">Apache Thrift 0.9.3</a>, make sure thrift is added to system path.</li>
	<li>Download <a href="https://github.com/apache/incubator-carbondata">Apache CarbonData code</a> and build it. Please visit <a href="Installing-CarbonData-And-IDE-Configuartion.md">Building CarbonData And IDE Configuration</a> for more information.</li>
	</ul>
	<h2><a id="Interactive_Analysis_with_CarbonSpark_Shell_44"></a>Interactive Analysis with Carbon-Spark Shell</h2>
	<p>Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more details on Spark shell.</p>
	<h4><a id="Basics_47"></a>Basics</h4>
	<p>Start Spark shell by running the following in the Carbon directory:</p>
	<pre><code>./bin/carbon-spark-shell
	</code></pre>
	<p><em>Note</em>: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.</p>
	<p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
	By default carbon.storelocation is set as :</p>
	<pre><code>hdfs://IP:PORT/Opt/CarbonStore
	</code></pre>
	<p>And you can provide your own store location by providing configuration using --conf option like:</p>
	<pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=<storelocation>
	</code></pre>
	<h4><a id="Executing_Queries_64"></a>Executing Queries</h4>
	<p><strong>Prerequisites</strong></p>
	<p>Create sample.csv file in CarbonData directory. The CSV is required for loading data into Carbon.</p>
	<pre><code>$ cd carbondata
	$ cat > sample.csv << EOF
	id,name,city,age
	1,david,shenzhen,31
	2,eason,shenzhen,27
	3,jarry,wuhan,35
	EOF
	</code></pre>
	<p><strong>Create table</strong></p>
	<pre><code>scala>cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
	</code></pre>
	<p><strong>Load data to table</strong></p>
	<pre><code>scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
	scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
	</code></pre>
	<p><strong>Query data from table</strong></p>
	<pre><code>scala>cc.sql("select * from test_table").show
	scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show
	</code></pre>
	<h2><a id="Carbon_SQL_CLI_98"></a>Carbon SQL CLI</h2>
	<p>The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more information Spark SQL CLI.</p>
	<h4><a id="Basics_101"></a>Basics</h4>
	<p>Start the Carbon Spark SQL CLI, run the following in the Carbon directory :</p>
	<pre><code>./bin/carbon-spark-sql
	</code></pre>
	<p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
	By default carbon.storelocation is set as :</p>
	<pre><code>hdfs://IP:PORT/Opt/CarbonStore
	</code></pre>
	<p>And you can provide your own store location by providing configuration using --conf option like:</p>
	<pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
	</code></pre>
	<h4><a id="Execute_Queries_in_CLI_118"></a>Execute Queries in CLI</h4>
	<pre><code>spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
	spark-sql> load data inpath '../sample.csv' into table test_table
	spark-sql> select city, avg(age), sum(age) from test_table group by city
	</code></pre>
	<h2><a id="Building_CarbonData_124"></a>Building CarbonData</h2>
	<p>To get started, get CarbonData from the <a href="">downloads</a> on the <a href="http://carbondata.incubator.apache.org.">http://carbondata.incubator.apache.org.</a><br>
	CarbonData uses Hadoop’s client libraries for HDFS and YARN and Spark’s libraries. Downloads are pre-packaged for a handful of popular Spark versions.</p>
	<p>If you’d like to build CarbonData from source, Please visit <a href="">Building CarbonData And IDE Configuration</a></p>

	</body></html>