blob: b209422600c4b5cf801f999ee03245a173c65dea [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href='images/favicon.ico' rel='shortcut icon' type='image/x-icon'>
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>CarbonData</title>
<style>
</style>
<!-- Bootstrap -->
<link rel="stylesheet" href="css/bootstrap.min.css">
<link href="css/style.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script src="js/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<script defer src="https://use.fontawesome.com/releases/v5.0.8/js/all.js"></script>
</head>
<body>
<header>
<nav class="navbar navbar-default navbar-custom cd-navbar-wrapper">
<div class="container">
<div class="navbar-header">
<button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse"
class="navbar-toggle collapsed" type="button">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="index.html" class="logo">
<img src="images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData logo"/>
</a>
</div>
<div class="navbar-collapse collapse cd_navcontnt" id="navbar">
<ul class="nav navbar-nav navbar-right navlist-custom">
<li><a href="index.html" class="hidden-xs"><i class="fa fa-home" aria-hidden="true"></i> </a>
</li>
<li><a href="index.html" class="hidden-lg hidden-md hidden-sm">Home</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle " data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false"> Download <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.2.0/"
target="_blank">Apache CarbonData 2.2.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.1.1/"
target="_blank">Apache CarbonData 2.1.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.1.0/"
target="_blank">Apache CarbonData 2.1.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.0.1/"
target="_blank">Apache CarbonData 2.0.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.0.0/"
target="_blank">Apache CarbonData 2.0.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.6.1/"
target="_blank">Apache CarbonData 1.6.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.6.0/"
target="_blank">Apache CarbonData 1.6.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.4/"
target="_blank">Apache CarbonData 1.5.4</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.3/"
target="_blank">Apache CarbonData 1.5.3</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.2/"
target="_blank">Apache CarbonData 1.5.2</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.1/"
target="_blank">Apache CarbonData 1.5.1</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Releases"
target="_blank">Release Archive</a></li>
</ul>
</li>
<li><a href="documentation.html" class="active">Documentation</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md"
target="_blank">Contributing to CarbonData</a></li>
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/release-guide.md"
target="_blank">Release Guide</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/PMC+and+Committers+member+list"
target="_blank">Project PMC and Committers</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609"
target="_blank">CarbonData Meetups</a></li>
<li><a href="security.html">Apache CarbonData Security</a></li>
<li><a href="https://issues.apache.org/jira/browse/CARBONDATA" target="_blank">Apache
Jira</a></li>
<li><a href="videogallery.html">CarbonData Videos </a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="apache_link hidden-xs dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="hidden-lg hidden-md hidden-sm dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li>
<a href="#" id="search-icon"><i class="fa fa-search" aria-hidden="true"></i></a>
</li>
</ul>
</div><!--/.nav-collapse -->
<div id="search-box">
<form method="get" action="http://www.google.com/search" target="_blank">
<div class="search-block">
<table border="0" cellpadding="0" width="100%">
<tr>
<td style="width:80%">
<input type="text" name="q" size=" 5" maxlength="255" value=""
class="search-input" placeholder="Search...." required/>
</td>
<td style="width:20%">
<input type="submit" value="Search"/></td>
</tr>
<tr>
<td align="left" style="font-size:75%" colspan="2">
<input type="checkbox" name="sitesearch" value="carbondata.apache.org" checked/>
<span style=" position: relative; top: -3px;"> Only search for CarbonData</span>
</td>
</tr>
</table>
</div>
</form>
</div>
</div>
</nav>
</header> <!-- end Header part -->
<div class="fixed-padding"></div> <!-- top padding with fixde header -->
<section><!-- Dashboard nav -->
<div class="container-fluid q">
<div class="col-sm-12 col-md-12 maindashboard">
<div class="verticalnavbar">
<nav class="b-sticky-nav">
<div class="nav-scroller">
<div class="nav__inner">
<a class="b-nav__intro nav__item" href="./introduction.html">introduction</a>
<a class="b-nav__quickstart nav__item" href="./quick-start-guide.html">quick start</a>
<a class="b-nav__uses nav__item" href="./usecases.html">use cases</a>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__docs nav__item nav__sub__anchor" href="./language-manual.html">Language Reference</a>
<a class="nav__item nav__sub__item" href="./ddl-of-carbondata.html">DDL</a>
<a class="nav__item nav__sub__item" href="./dml-of-carbondata.html">DML</a>
<a class="nav__item nav__sub__item" href="./streaming-guide.html">Streaming</a>
<a class="nav__item nav__sub__item" href="./configuration-parameters.html">Configuration</a>
<a class="nav__item nav__sub__item" href="./index-developer-guide.html">Indexes</a>
<a class="nav__item nav__sub__item" href="./supported-data-types-in-carbondata.html">Data Types</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__datamap nav__item nav__sub__anchor" href="./index-management.html">Index Managament</a>
<a class="nav__item nav__sub__item" href="./bloomfilter-index-guide.html">Bloom Filter</a>
<a class="nav__item nav__sub__item" href="./lucene-index-guide.html">Lucene</a>
<a class="nav__item nav__sub__item" href="./secondary-index-guide.html">Secondary Index</a>
<a class="nav__item nav__sub__item" href="../spatial-index-guide.html">Spatial Index</a>
<a class="nav__item nav__sub__item" href="../mv-guide.html">MV</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__api nav__item nav__sub__anchor" href="./sdk-guide.html">API</a>
<a class="nav__item nav__sub__item" href="./sdk-guide.html">Java SDK</a>
<a class="nav__item nav__sub__item" href="./csdk-guide.html">C++ SDK</a>
</div>
<a class="b-nav__perf nav__item" href="./performance-tuning.html">Performance Tuning</a>
<a class="b-nav__s3 nav__item" href="./s3-guide.html">S3 Storage</a>
<a class="b-nav__indexserver nav__item" href="./index-server.html">Index Server</a>
<a class="b-nav__prestodb nav__item" href="./prestodb-guide.html">PrestoDB Integration</a>
<a class="b-nav__prestosql nav__item" href="./prestosql-guide.html">PrestoSQL Integration</a>
<a class="b-nav__flink nav__item" href="./flink-integration-guide.html">Flink Integration</a>
<a class="b-nav__scd nav__item" href="./scd-and-cdc-guide.html">SCD & CDC</a>
<a class="b-nav__faq nav__item" href="./faq.html">FAQ</a>
<a class="b-nav__contri nav__item" href="./how-to-contribute-to-apache-carbondata.html">Contribute</a>
<a class="b-nav__security nav__item" href="./security.html">Security</a>
<a class="b-nav__release nav__item" href="./release-guide.html">Release Guide</a>
</div>
</div>
<div class="navindicator">
<div class="b-nav__intro navindicator__item"></div>
<div class="b-nav__quickstart navindicator__item"></div>
<div class="b-nav__uses navindicator__item"></div>
<div class="b-nav__docs navindicator__item"></div>
<div class="b-nav__datamap navindicator__item"></div>
<div class="b-nav__api navindicator__item"></div>
<div class="b-nav__perf navindicator__item"></div>
<div class="b-nav__s3 navindicator__item"></div>
<div class="b-nav__indexserver navindicator__item"></div>
<div class="b-nav__prestodb navindicator__item"></div>
<div class="b-nav__prestosql navindicator__item"></div>
<div class="b-nav__flink navindicator__item"></div>
<div class="b-nav__scd navindicator__item"></div>
<div class="b-nav__faq navindicator__item"></div>
<div class="b-nav__contri navindicator__item"></div>
<div class="b-nav__security navindicator__item"></div>
</div>
</nav>
</div>
<div class="mdcontent">
<section>
<div style="padding:10px 15px;">
<div id="viewpage" name="viewpage">
<div class="row">
<div class="col-sm-12 col-md-12">
<div>
<h1>
<a id="quick-start" class="anchor" href="#quick-start" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Quick Start</h1>
<p>This tutorial provides a quick introduction to use CarbonData. To follow along with this guide, download a packaged release of CarbonData from the <a href="https://dist.apache.org/repos/dist/release/carbondata/" target=_blank rel="nofollow">CarbonData website</a>. Alternatively, it can be created following <a href="https://github.com/apache/carbondata/tree/master/build" target=_blank>Building CarbonData</a> steps.</p>
<h2>
<a id="prerequisites" class="anchor" href="#prerequisites" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Prerequisites</h2>
<ul>
<li>
<p>CarbonData supports Spark versions up to 2.4. Please download Spark package from <a href="https://spark.apache.org/downloads.html" target=_blank rel="nofollow">Spark website</a></p>
</li>
<li>
<p>Create a sample.csv file using the following commands. The CSV file is required for loading data into CarbonData</p>
<pre><code>cd carbondata
cat &gt; sample.csv &lt;&lt; EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF
</code></pre>
</li>
</ul>
<h2>
<a id="integration" class="anchor" href="#integration" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Integration</h2>
<h3>
<a id="integration-with-execution-engines" class="anchor" href="#integration-with-execution-engines" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Integration with Execution Engines</h3>
<p>CarbonData can be integrated with Spark, Presto, Flink and Hive execution engines. The below documentation guides on Installing and Configuring with these execution engines.</p>
<h4>
<a id="spark" class="anchor" href="#spark" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Spark</h4>
<p><a href="#installing-and-configuring-carbondata-to-run-locally-with-spark-sql-cli">Installing and Configuring CarbonData to run locally with Spark SQL CLI</a></p>
<p><a href="#installing-and-configuring-carbondata-to-run-locally-with-spark-shell">Installing and Configuring CarbonData to run locally with Spark Shell</a></p>
<p><a href="#installing-and-configuring-carbondata-on-standalone-spark-cluster">Installing and Configuring CarbonData on Standalone Spark Cluster</a></p>
<p><a href="#installing-and-configuring-carbondata-on-spark-on-yarn-cluster">Installing and Configuring CarbonData on Spark on YARN Cluster</a></p>
<p><a href="#query-execution-using-carbondata-thrift-server">Installing and Configuring CarbonData Thrift Server for Query Execution</a></p>
<h4>
<a id="presto" class="anchor" href="#presto" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Presto</h4>
<p><a href="#installing-and-configuring-carbondata-on-presto">Installing and Configuring CarbonData on Presto</a></p>
<h4>
<a id="hive" class="anchor" href="#hive" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Hive</h4>
<p><a href="./hive-guide.html">Installing and Configuring CarbonData on Hive</a></p>
<h3>
<a id="integration-with-storage-engines" class="anchor" href="#integration-with-storage-engines" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Integration with Storage Engines</h3>
<h4>
<a id="hdfs" class="anchor" href="#hdfs" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>HDFS</h4>
<p><a href="#installing-and-configuring-carbondata-on-standalone-spark-cluster">CarbonData supports read and write with HDFS</a></p>
<h4>
<a id="s3" class="anchor" href="#s3" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>S3</h4>
<p><a href="./s3-guide.html">CarbonData supports read and write with S3</a></p>
<h4>
<a id="alluxio" class="anchor" href="#alluxio" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Alluxio</h4>
<p><a href="./alluxio-guide.html">CarbonData supports read and write with Alluxio</a></p>
<h2>
<a id="installing-and-configuring-carbondata-to-run-locally-with-spark-sql-cli" class="anchor" href="#installing-and-configuring-carbondata-to-run-locally-with-spark-sql-cli" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Installing and Configuring CarbonData to run locally with Spark SQL CLI</h2>
<p>This will work with spark 2.3+ versions. In Spark SQL CLI, it uses CarbonExtensions to customize the SparkSession with CarbonData's parser, analyzer, optimizer and physical planning strategy rules in Spark.
To enable CarbonExtensions, we need to add the following configuration.</p>
<table>
<thead>
<tr>
<th>Key</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>spark.sql.extensions</td>
<td>org.apache.spark.sql.CarbonExtensions</td>
</tr>
</tbody>
</table>
<p>Start Spark SQL CLI by running the following command in the Spark directory:</p>
<pre><code>./bin/spark-sql --conf spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars &lt;carbondata assembly jar path&gt;
</code></pre>
<h6>
<a id="creating-a-table" class="anchor" href="#creating-a-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Creating a Table</h6>
<pre><code>CREATE TABLE IF NOT EXISTS test_table (
id string,
name string,
city string,
age Int)
STORED AS carbondata;
</code></pre>
<p><strong>NOTE</strong>: CarbonExtensions only support "STORED AS carbondata" and "USING carbondata"</p>
<h6>
<a id="loading-data-to-a-table" class="anchor" href="#loading-data-to-a-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Loading Data to a Table</h6>
<pre><code>LOAD DATA INPATH '/local-path/sample.csv' INTO TABLE test_table;
LOAD DATA INPATH 'hdfs://hdfs-path/sample.csv' INTO TABLE test_table;
</code></pre>
<pre><code>insert into table test_table select '1', 'name1', 'city1', 1;
</code></pre>
<p><strong>NOTE</strong>: Please provide the real file path of <code>sample.csv</code> for the above script.
If you get "tablestatus.lock" issue, please refer to <a href="faq.html">FAQ</a></p>
<h6>
<a id="query-data-from-a-table" class="anchor" href="#query-data-from-a-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Query Data from a Table</h6>
<pre><code>SELECT * FROM test_table;
</code></pre>
<pre><code>SELECT city, avg(age), sum(age)
FROM test_table
GROUP BY city;
</code></pre>
<h2>
<a id="installing-and-configuring-carbondata-to-run-locally-with-spark-shell" class="anchor" href="#installing-and-configuring-carbondata-to-run-locally-with-spark-shell" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Installing and Configuring CarbonData to run locally with Spark Shell</h2>
<p>Apache Spark Shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit <a href="http://spark.apache.org/docs/latest/" target=_blank rel="nofollow">Apache Spark Documentation</a> for more details on the Spark shell.</p>
<h4>
<a id="basics" class="anchor" href="#basics" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Basics</h4>
<h6>
<a id="option-1-using-carbonsession-deprecated-since-20" class="anchor" href="#option-1-using-carbonsession-deprecated-since-20" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Option 1: Using CarbonSession (deprecated since 2.0)</h6>
<p>Start Spark shell by running the following command in the Spark directory:</p>
<pre><code>./bin/spark-shell --jars &lt;carbondata assembly jar path&gt;
</code></pre>
<p><strong>NOTE</strong>: Path where packaged release of CarbonData was downloaded or assembly jar will be available after <a href="https://github.com/apache/carbondata/blob/master/build/README.md" target=_blank>building CarbonData</a> and can be copied from <code>./assembly/target/scala-2.1x/apache-carbondata_xxx.jar</code></p>
<p>In this shell, SparkSession is readily available as <code>spark</code> and Spark context is readily available as <code>sc</code>.</p>
<p>In order to create a CarbonSession we will have to configure it explicitly in the following manner :</p>
<ul>
<li>Import the following :</li>
</ul>
<pre><code>import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
</code></pre>
<ul>
<li>Create a CarbonSession :</li>
</ul>
<pre><code>val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("&lt;carbon_store_path&gt;")
</code></pre>
<p><strong>NOTE</strong></p>
<ul>
<li>By default metastore location points to <code>../carbon.metastore</code>, user can provide own metastore location to CarbonSession like
<code>SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("&lt;carbon_store_path&gt;", "&lt;local metastore path&gt;")</code>.</li>
<li>Data storage location can be specified by <code>&lt;carbon_store_path&gt;</code>, like <code>/carbon/data/store</code>, <code>hdfs://localhost:9000/carbon/data/store</code> or <code>s3a://carbon/data/store</code>.</li>
</ul>
<h6>
<a id="option-2-using-sparksession-with-carbonextensions" class="anchor" href="#option-2-using-sparksession-with-carbonextensions" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Option 2: Using SparkSession with CarbonExtensions</h6>
<p>Start Spark shell by running the following command in the Spark directory:</p>
<pre><code>./bin/spark-shell --conf spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars &lt;carbondata assembly jar path&gt;
</code></pre>
<p><strong>NOTE</strong></p>
<ul>
<li>In this flow, we can use the built-in SparkSession <code>spark</code> instead of <code>carbon</code>.
We also can create a new SparkSession instead of the built-in SparkSession <code>spark</code> if need.
It need to add "org.apache.spark.sql.CarbonExtensions" into spark configuration "spark.sql.extensions".
<pre><code>SparkSession newSpark = SparkSession
.builder()
.config(sc.getConf)
.enableHiveSupport
.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions")
.getOrCreate()
</code></pre>
</li>
<li>Data storage location can be specified by "spark.sql.warehouse.dir".</li>
</ul>
<h4>
<a id="executing-queries" class="anchor" href="#executing-queries" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Executing Queries</h4>
<h6>
<a id="creating-a-table-1" class="anchor" href="#creating-a-table-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Creating a Table</h6>
<pre><code>carbon.sql(
s"""
| CREATE TABLE IF NOT EXISTS test_table(
| id string,
| name string,
| city string,
| age Int)
| STORED AS carbondata
""".stripMargin)
</code></pre>
<p><strong>NOTE</strong>:
The following table list all supported syntax:</p>
<table>
<thead>
<tr>
<th>create table</th>
<th>SparkSession with CarbonExtensions</th>
<th>CarbonSession</th>
</tr>
</thead>
<tbody>
<tr>
<td>STORED AS carbondata</td>
<td>yes</td>
<td>yes</td>
</tr>
<tr>
<td>USING carbondata</td>
<td>yes</td>
<td>yes</td>
</tr>
<tr>
<td>STORED BY 'carbondata'</td>
<td>no</td>
<td>yes</td>
</tr>
<tr>
<td>STORED BY 'org.apache.carbondata.format'</td>
<td>no</td>
<td>yes</td>
</tr>
</tbody>
</table>
<p>We suggest to use CarbonExtensions instead of CarbonSession.</p>
<h6>
<a id="loading-data-to-a-table-1" class="anchor" href="#loading-data-to-a-table-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Loading Data to a Table</h6>
<pre><code>carbon.sql("LOAD DATA INPATH '/path/to/sample.csv' INTO TABLE test_table")
</code></pre>
<p><strong>NOTE</strong>: Please provide the real file path of <code>sample.csv</code> for the above script.
If you get "tablestatus.lock" issue, please refer to <a href="faq.html">FAQ</a></p>
<h6>
<a id="query-data-from-a-table-1" class="anchor" href="#query-data-from-a-table-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Query Data from a Table</h6>
<pre><code>carbon.sql("SELECT * FROM test_table").show()
carbon.sql(
s"""
| SELECT city, avg(age), sum(age)
| FROM test_table
| GROUP BY city
""".stripMargin).show()
</code></pre>
<h2>
<a id="installing-and-configuring-carbondata-on-standalone-spark-cluster" class="anchor" href="#installing-and-configuring-carbondata-on-standalone-spark-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Installing and Configuring CarbonData on Standalone Spark Cluster</h2>
<h3>
<a id="prerequisites-1" class="anchor" href="#prerequisites-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Prerequisites</h3>
<ul>
<li>Hadoop HDFS and Yarn should be installed and running.</li>
<li>Spark should be installed and running on all the cluster nodes.</li>
<li>CarbonData user should have permission to access HDFS.</li>
</ul>
<h3>
<a id="procedure" class="anchor" href="#procedure" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Procedure</h3>
<ol>
<li>
<p><a href="https://github.com/apache/carbondata/blob/master/build/README.md" target=_blank>Build the CarbonData</a> project and get the assembly jar from <code>./assembly/target/scala-2.1x/apache-carbondata_xxx.jar</code>.</p>
</li>
<li>
<p>Copy <code>./assembly/target/scala-2.1x/apache-carbondata_xxx.jar</code> to <code>$SPARK_HOME/carbonlib</code> folder.</p>
<p><strong>NOTE</strong>: Create the carbonlib folder if it does not exist inside <code>$SPARK_HOME</code> path.</p>
</li>
<li>
<p>Add the carbonlib folder path in the Spark classpath. (Edit <code>$SPARK_HOME/conf/spark-env.sh</code> file and modify the value of <code>SPARK_CLASSPATH</code> by appending <code>$SPARK_HOME/carbonlib/*</code> to the existing value)</p>
</li>
<li>
<p>Copy the <code>./conf/carbon.properties.template</code> file from CarbonData repository to <code>$SPARK_HOME/conf/</code> folder and rename the file to <code>carbon.properties</code>.</p>
</li>
<li>
<p>Repeat Step 2 to Step 5 in all the nodes of the cluster.</p>
</li>
<li>
<p>In Spark node[master], configure the properties mentioned in the following table in <code>$SPARK_HOME/conf/spark-defaults.conf</code> file.</p>
</li>
</ol>
<table>
<thead>
<tr>
<th>Property</th>
<th>Value</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>spark.driver.extraJavaOptions</td>
<td><code>-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties</code></td>
<td>A string of extra JVM options to pass to the driver. For instance, GC settings or other logging.</td>
</tr>
<tr>
<td>spark.executor.extraJavaOptions</td>
<td><code>-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties</code></td>
<td>A string of extra JVM options to pass to executors. For instance, GC settings or other logging. <strong>NOTE</strong>: You can enter multiple values separated by space.</td>
</tr>
</tbody>
</table>
<ol start="7">
<li>Verify the installation. For example:</li>
</ol>
<pre><code>./bin/spark-shell \
--master spark://HOSTNAME:PORT \
--total-executor-cores 2 \
--executor-memory 2G
</code></pre>
<p><strong>NOTE</strong>:</p>
<ul>
<li>property "carbon.storelocation" is deprecated in carbondata 2.0 version. Only the users who used this property in previous versions can still use it in carbon 2.0 version.</li>
<li>Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.</li>
</ul>
<h2>
<a id="installing-and-configuring-carbondata-on-spark-on-yarn-cluster" class="anchor" href="#installing-and-configuring-carbondata-on-spark-on-yarn-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Installing and Configuring CarbonData on Spark on YARN Cluster</h2>
<p>This section provides the procedure to install CarbonData on "Spark on YARN" cluster.</p>
<h3>
<a id="prerequisites-2" class="anchor" href="#prerequisites-2" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Prerequisites</h3>
<ul>
<li>Hadoop HDFS and Yarn should be installed and running.</li>
<li>Spark should be installed and running in all the clients.</li>
<li>CarbonData user should have permission to access HDFS.</li>
</ul>
<h3>
<a id="procedure-1" class="anchor" href="#procedure-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Procedure</h3>
<p>The following steps are only for Driver Nodes. (Driver nodes are the one which starts the spark context.)</p>
<ol>
<li>
<p><a href="https://github.com/apache/carbondata/blob/master/build/README.md" target=_blank>Build the CarbonData</a> project and get the assembly jar from <code>./assembly/target/scala-2.1x/apache-carbondata_xxx.jar</code> and copy to <code>$SPARK_HOME/carbonlib</code> folder.</p>
<p><strong>NOTE</strong>: Create the carbonlib folder if it does not exists inside <code>$SPARK_HOME</code> path.</p>
</li>
<li>
<p>Copy the <code>./conf/carbon.properties.template</code> file from CarbonData repository to <code>$SPARK_HOME/conf/</code> folder and rename the file to <code>carbon.properties</code>.</p>
</li>
<li>
<p>Create <code>tar.gz</code> file of carbonlib folder and move it inside the carbonlib folder.</p>
</li>
</ol>
<pre><code>cd $SPARK_HOME
tar -zcvf carbondata.tar.gz carbonlib/
mv carbondata.tar.gz carbonlib/
</code></pre>
<ol start="4">
<li>Configure the properties mentioned in the following table in <code>$SPARK_HOME/conf/spark-defaults.conf</code> file.</li>
</ol>
<table>
<thead>
<tr>
<th>Property</th>
<th>Description</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>spark.master</td>
<td>Set this value to run the Spark in yarn cluster mode.</td>
<td>Set yarn-client to run the Spark in yarn cluster mode.</td>
</tr>
<tr>
<td>spark.yarn.dist.files</td>
<td>Comma-separated list of files to be placed in the working directory of each executor.</td>
<td><code>$SPARK_HOME/conf/carbon.properties</code></td>
</tr>
<tr>
<td>spark.yarn.dist.archives</td>
<td>Comma-separated list of archives to be extracted into the working directory of each executor.</td>
<td><code>$SPARK_HOME/carbonlib/carbondata.tar.gz</code></td>
</tr>
<tr>
<td>spark.executor.extraJavaOptions</td>
<td>A string of extra JVM options to pass to executors. For instance <strong>NOTE</strong>: You can enter multiple values separated by space.</td>
<td><code>-Dcarbon.properties.filepath = carbon.properties</code></td>
</tr>
<tr>
<td>spark.executor.extraClassPath</td>
<td>Extra classpath entries to prepend to the classpath of executors. <strong>NOTE</strong>: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the values in below parameter spark.driver.extraClassPath</td>
<td><code>carbondata.tar.gz/carbonlib/*</code></td>
</tr>
<tr>
<td>spark.driver.extraClassPath</td>
<td>Extra classpath entries to prepend to the classpath of the driver. <strong>NOTE</strong>: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the value in below parameter spark.driver.extraClassPath.</td>
<td><code>$SPARK_HOME/carbonlib/*</code></td>
</tr>
<tr>
<td>spark.driver.extraJavaOptions</td>
<td>A string of extra JVM options to pass to the driver. For instance, GC settings or other logging.</td>
<td><code>-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties</code></td>
</tr>
</tbody>
</table>
<ol start="5">
<li>Verify the installation.</li>
</ol>
<pre><code>./bin/spark-shell \
--master yarn-client \
--driver-memory 1G \
--executor-memory 2G \
--executor-cores 2
</code></pre>
<p><strong>NOTE</strong>:</p>
<ul>
<li>property "carbon.storelocation" is deprecated in carbondata 2.0 version. Only the users who used this property in previous versions can still use it in carbon 2.0 version.</li>
<li>Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.</li>
<li>If use Spark + Hive 1.1.X, it needs to add carbondata assembly jar and carbondata-hive jar into parameter 'spark.sql.hive.metastore.jars' in spark-default.conf file.</li>
</ul>
<h2>
<a id="query-execution-using-carbondata-thrift-server" class="anchor" href="#query-execution-using-carbondata-thrift-server" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Query Execution Using CarbonData Thrift Server</h2>
<h3>
<a id="starting-carbondata-thrift-server" class="anchor" href="#starting-carbondata-thrift-server" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Starting CarbonData Thrift Server.</h3>
<p>a. cd <code>$SPARK_HOME</code></p>
<p>b. Run the following command to start the CarbonData thrift server.</p>
<pre><code>./bin/spark-submit \
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR
</code></pre>
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>CARBON_ASSEMBLY_JAR</td>
<td>CarbonData assembly jar name present in the <code>$SPARK_HOME/carbonlib/</code> folder.</td>
<td>apache-carbondata-xx.jar</td>
</tr>
</tbody>
</table>
<p>c. Run the following command to work with S3 storage.</p>
<pre><code>./bin/spark-submit \
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR &lt;access_key&gt; &lt;secret_key&gt; &lt;endpoint&gt;
</code></pre>
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>CARBON_ASSEMBLY_JAR</td>
<td>CarbonData assembly jar name present in the <code>$SPARK_HOME/carbonlib/</code> folder.</td>
<td>apache-carbondata-xx.jar</td>
</tr>
<tr>
<td>access_key</td>
<td>Access key for S3 storage</td>
<td></td>
</tr>
<tr>
<td>secret_key</td>
<td>Secret key for S3 storage</td>
<td></td>
</tr>
<tr>
<td>endpoint</td>
<td>Endpoint for connecting to S3 storage</td>
<td></td>
</tr>
</tbody>
</table>
<p><strong>NOTE</strong>: From Spark 1.6, by default the Thrift server runs in multi-session mode. Which means each JDBC/ODBC connection owns a copy of their own SQL configuration and temporary function registry. Cached tables are still shared though. If you prefer to run the Thrift server in single-session mode and share all SQL configuration and temporary function registry, please set option <code>spark.sql.hive.thriftServer.singleSession</code> to <code>true</code>. You may either add this option to <code>spark-defaults.conf</code>, or pass it to <code>spark-submit.sh</code> via <code>--conf</code>:</p>
<pre><code>./bin/spark-submit \
--conf spark.sql.hive.thriftServer.singleSession=true \
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR
</code></pre>
<p><strong>But</strong> in single-session mode, if one user changes the database from one connection, the database of the other connections will be changed too.</p>
<p><strong>Examples</strong></p>
<ul>
<li>Start with default memory and executors.</li>
</ul>
<pre><code>./bin/spark-submit \
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
$SPARK_HOME/carbonlib/apache-carbondata-xxx.jar
</code></pre>
<ul>
<li>Start with Fixed executors and resources.</li>
</ul>
<pre><code>./bin/spark-submit \
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
--num-executors 3 \
--driver-memory 20G \
--executor-memory 250G \
--executor-cores 32 \
$SPARK_HOME/carbonlib/apache-carbondata-xxx.jar
</code></pre>
<h3>
<a id="connecting-to-carbondata-thrift-server-using-beeline" class="anchor" href="#connecting-to-carbondata-thrift-server-using-beeline" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Connecting to CarbonData Thrift Server Using Beeline.</h3>
<pre><code>cd $SPARK_HOME
./sbin/start-thriftserver.sh
./bin/beeline -u jdbc:hive2://&lt;thriftserver_host&gt;:port
Example
./bin/beeline -u jdbc:hive2://10.10.10.10:10000
</code></pre>
<h2>
<a id="installing-and-configuring-carbondata-on-presto" class="anchor" href="#installing-and-configuring-carbondata-on-presto" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Installing and Configuring CarbonData on Presto</h2>
<p><strong>NOTE:</strong> <strong>CarbonData tables cannot be created nor loaded from Presto. User needs to create CarbonData Table and load data into it
either with <a href="#installing-and-configuring-carbondata-to-run-locally-with-spark-shell">Spark</a> or <a href="./sdk-guide.html">SDK</a> or <a href="./csdk-guide.html">C++ SDK</a>.
Once the table is created, it can be queried from Presto.</strong></p>
<p>Please refer the presto guide linked below.</p>
<p>prestodb guide - <a href="./prestodb-guide.html">prestodb</a></p>
<p>prestosql guide - <a href="./prestosql-guide.html">prestosql</a></p>
<p>Once installed the presto with carbonData as per the above guide,
you can use the Presto CLI on the coordinator to query data sources in the catalog using the Presto workers.</p>
<p>List the schemas(databases) available</p>
<pre><code>show schemas;
</code></pre>
<p>Selected the schema where CarbonData table resides</p>
<pre><code>use carbonschema;
</code></pre>
<p>List the available tables</p>
<pre><code>show tables;
</code></pre>
<p>Query from the available tables</p>
<pre><code>select * from carbon_table;
</code></pre>
<p><strong>Note:</strong> Create Tables and data loads should be done before executing queries as we can not create carbon table from this interface.</p>
<script>
// Show selected style on nav item
$(function() { $('.b-nav__quickstart').addClass('selected'); });
</script></div>
</div>
</div>
</div>
<div class="doc-footer">
<a href="#top" class="scroll-top">Top</a>
</div>
</div>
</section>
</div>
</div>
</div>
</section><!-- End systemblock part -->
<script src="js/custom.js"></script>
</body>
</html>