blob: e4f1c915bd2ca4b6e21523b80a5c9ede63659f6b [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href='images/favicon.ico' rel='shortcut icon' type='image/x-icon'>
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>CarbonData</title>
<style>
</style>
<!-- Bootstrap -->
<link rel="stylesheet" href="css/bootstrap.min.css">
<link href="css/style.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script src="js/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<script defer src="https://use.fontawesome.com/releases/v5.0.8/js/all.js"></script>
</head>
<body>
<header>
<nav class="navbar navbar-default navbar-custom cd-navbar-wrapper">
<div class="container">
<div class="navbar-header">
<button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse"
class="navbar-toggle collapsed" type="button">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="index.html" class="logo">
<img src="images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData logo"/>
</a>
</div>
<div class="navbar-collapse collapse cd_navcontnt" id="navbar">
<ul class="nav navbar-nav navbar-right navlist-custom">
<li><a href="index.html" class="hidden-xs"><i class="fa fa-home" aria-hidden="true"></i> </a>
</li>
<li><a href="index.html" class="hidden-lg hidden-md hidden-sm">Home</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle " data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false"> Download <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.0/"
target="_blank">Apache CarbonData 1.5.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.1/"
target="_blank">Apache CarbonData 1.4.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.0/"
target="_blank">Apache CarbonData 1.4.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.1/"
target="_blank">Apache CarbonData 1.3.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/"
target="_blank">Apache CarbonData 1.3.0</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Releases"
target="_blank">Release Archive</a></li>
</ul>
</li>
<li><a href="documentation.html" class="active">Documentation</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md"
target="_blank">Contributing to CarbonData</a></li>
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/release-guide.md"
target="_blank">Release Guide</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/PMC+and+Committers+member+list"
target="_blank">Project PMC and Committers</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609"
target="_blank">CarbonData Meetups</a></li>
<li><a href="security.html">Apache CarbonData Security</a></li>
<li><a href="https://issues.apache.org/jira/browse/CARBONDATA" target="_blank">Apache
Jira</a></li>
<li><a href="videogallery.html">CarbonData Videos </a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="apache_link hidden-xs dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="hidden-lg hidden-md hidden-sm dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li>
<a href="#" id="search-icon"><i class="fa fa-search" aria-hidden="true"></i></a>
</li>
</ul>
</div><!--/.nav-collapse -->
<div id="search-box">
<form method="get" action="http://www.google.com/search" target="_blank">
<div class="search-block">
<table border="0" cellpadding="0" width="100%">
<tr>
<td style="width:80%">
<input type="text" name="q" size=" 5" maxlength="255" value=""
class="search-input" placeholder="Search...." required/>
</td>
<td style="width:20%">
<input type="submit" value="Search"/></td>
</tr>
<tr>
<td align="left" style="font-size:75%" colspan="2">
<input type="checkbox" name="sitesearch" value="carbondata.apache.org" checked/>
<span style=" position: relative; top: -3px;"> Only search for CarbonData</span>
</td>
</tr>
</table>
</div>
</form>
</div>
</div>
</nav>
</header> <!-- end Header part -->
<div class="fixed-padding"></div> <!-- top padding with fixde header -->
<section><!-- Dashboard nav -->
<div class="container-fluid q">
<div class="col-sm-12 col-md-12 maindashboard">
<div class="verticalnavbar">
<nav class="b-sticky-nav">
<div class="nav-scroller">
<div class="nav__inner">
<a class="b-nav__intro nav__item" href="./introduction.html">introduction</a>
<a class="b-nav__quickstart nav__item" href="./quick-start-guide.html">quick start</a>
<a class="b-nav__uses nav__item" href="./usecases.html">use cases</a>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__docs nav__item nav__sub__anchor" href="./language-manual.html">Language Reference</a>
<a class="nav__item nav__sub__item" href="./ddl-of-carbondata.html">DDL</a>
<a class="nav__item nav__sub__item" href="./dml-of-carbondata.html">DML</a>
<a class="nav__item nav__sub__item" href="./streaming-guide.html">Streaming</a>
<a class="nav__item nav__sub__item" href="./configuration-parameters.html">Configuration</a>
<a class="nav__item nav__sub__item" href="./datamap-developer-guide.html">Datamaps</a>
<a class="nav__item nav__sub__item" href="./supported-data-types-in-carbondata.html">Data Types</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__datamap nav__item nav__sub__anchor" href="./datamap-management.html">DataMaps</a>
<a class="nav__item nav__sub__item" href="./bloomfilter-datamap-guide.html">Bloom Filter</a>
<a class="nav__item nav__sub__item" href="./lucene-datamap-guide.html">Lucene</a>
<a class="nav__item nav__sub__item" href="./preaggregate-datamap-guide.html">Pre-Aggregate</a>
<a class="nav__item nav__sub__item" href="./timeseries-datamap-guide.html">Time Series</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__api nav__item nav__sub__anchor" href="./sdk-guide.html">API</a>
<a class="nav__item nav__sub__item" href="./sdk-guide.html">Java SDK</a>
<a class="nav__item nav__sub__item" href="./CSDK-guide.html">C++ SDK</a>
</div>
<a class="b-nav__perf nav__item" href="./performance-tuning.html">Performance Tuning</a>
<a class="b-nav__s3 nav__item" href="./s3-guide.html">S3 Storage</a>
<a class="b-nav__faq nav__item" href="./faq.html">FAQ</a>
<a class="b-nav__contri nav__item" href="./how-to-contribute-to-apache-carbondata.html">Contribute</a>
<a class="b-nav__security nav__item" href="./security.html">Security</a>
<a class="b-nav__release nav__item" href="./release-guide.html">Release Guide</a>
</div>
</div>
<div class="navindicator">
<div class="b-nav__intro navindicator__item"></div>
<div class="b-nav__quickstart navindicator__item"></div>
<div class="b-nav__uses navindicator__item"></div>
<div class="b-nav__docs navindicator__item"></div>
<div class="b-nav__datamap navindicator__item"></div>
<div class="b-nav__api navindicator__item"></div>
<div class="b-nav__perf navindicator__item"></div>
<div class="b-nav__s3 navindicator__item"></div>
<div class="b-nav__faq navindicator__item"></div>
<div class="b-nav__contri navindicator__item"></div>
<div class="b-nav__security navindicator__item"></div>
</div>
</nav>
</div>
<div class="mdcontent">
<section>
<div style="padding:10px 15px;">
<div id="viewpage" name="viewpage">
<div class="row">
<div class="col-sm-12 col-md-12">
<div>
<h1>
<a id="carbondata-pre-aggregate-datamap" class="anchor" href="#carbondata-pre-aggregate-datamap" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>CarbonData Pre-aggregate DataMap</h1>
<ul>
<li><a href="#quick-example">Quick Example</a></li>
<li><a href="#datamap-management">DataMap Management</a></li>
<li><a href="#preaggregate-datamap-introduction">Pre-aggregate Table</a></li>
<li><a href="#loading-data">Loading Data</a></li>
<li><a href="#querying-data">Querying Data</a></li>
<li><a href="#compacting-pre-aggregate-tables">Compaction</a></li>
<li><a href="#data-management-with-pre-aggregate-tables">Data Management</a></li>
</ul>
<h2>
<a id="quick-example" class="anchor" href="#quick-example" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Quick example</h2>
<p>Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME</p>
<p>Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars</p>
<div class="highlight highlight-source-shell"><pre>mvn clean package -DskipTests -Pspark-2.2</pre></div>
<p>Start spark-shell in new terminal, type :paste, then copy and run the following code.</p>
<div class="highlight highlight-source-scala"><pre> <span class="pl-k">import</span> <span class="pl-en">java</span>.<span class="pl-en">io</span>.<span class="pl-en">File</span>
<span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.{<span class="pl-en">CarbonEnv</span>, <span class="pl-en">SparkSession</span>}
<span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">CarbonSession</span>.<span class="pl-en">_</span>
<span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">streaming</span>.{<span class="pl-en">ProcessingTime</span>, <span class="pl-en">StreamingQuery</span>}
<span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">carbondata</span>.<span class="pl-en">core</span>.<span class="pl-en">util</span>.<span class="pl-en">path</span>.<span class="pl-en">CarbonStorePath</span>
<span class="pl-k">val</span> <span class="pl-smi">warehouse</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">File</span>(<span class="pl-s"><span class="pl-pds">"</span>./warehouse<span class="pl-pds">"</span></span>).getCanonicalPath
<span class="pl-k">val</span> <span class="pl-smi">metastore</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">File</span>(<span class="pl-s"><span class="pl-pds">"</span>./metastore<span class="pl-pds">"</span></span>).getCanonicalPath
<span class="pl-k">val</span> <span class="pl-smi">spark</span> <span class="pl-k">=</span> <span class="pl-en">SparkSession</span>
.builder()
.master(<span class="pl-s"><span class="pl-pds">"</span>local<span class="pl-pds">"</span></span>)
.appName(<span class="pl-s"><span class="pl-pds">"</span>preAggregateExample<span class="pl-pds">"</span></span>)
.config(<span class="pl-s"><span class="pl-pds">"</span>spark.sql.warehouse.dir<span class="pl-pds">"</span></span>, warehouse)
.getOrCreateCarbonSession(warehouse, metastore)
spark.sparkContext.setLogLevel(<span class="pl-s"><span class="pl-pds">"</span>ERROR<span class="pl-pds">"</span></span>)
<span class="pl-c"><span class="pl-c">//</span> drop table if exists previously</span>
spark.sql(s<span class="pl-s"><span class="pl-pds">"</span>DROP TABLE IF EXISTS sales<span class="pl-pds">"</span></span>)
<span class="pl-c"><span class="pl-c">//</span> Create main table</span>
spark.sql(
s<span class="pl-s"><span class="pl-pds">"""</span></span>
<span class="pl-s"> | CREATE TABLE sales (</span>
<span class="pl-s"> | user_id string,</span>
<span class="pl-s"> | country string,</span>
<span class="pl-s"> | quantity int,</span>
<span class="pl-s"> | price bigint)</span>
<span class="pl-s"> | STORED AS carbondata</span>
<span class="pl-s"> <span class="pl-pds">"""</span></span>.stripMargin)
<span class="pl-c"><span class="pl-c">//</span> Create pre-aggregate table on the main table</span>
<span class="pl-c"><span class="pl-c">//</span> If main table already have data, following command </span>
<span class="pl-c"><span class="pl-c">//</span> will trigger one immediate load to the pre-aggregate table</span>
spark.sql(
s<span class="pl-s"><span class="pl-pds">"""</span></span>
<span class="pl-s"> | CREATE DATAMAP agg_sales</span>
<span class="pl-s"> | ON TABLE sales</span>
<span class="pl-s"> | USING "preaggregate"</span>
<span class="pl-s"> | AS</span>
<span class="pl-s"> | SELECT country, sum(quantity), avg(price)</span>
<span class="pl-s"> | FROM sales</span>
<span class="pl-s"> | GROUP BY country</span>
<span class="pl-s"> <span class="pl-pds">"""</span></span>.stripMargin)
<span class="pl-k">import</span> <span class="pl-en">spark</span>.<span class="pl-en">implicits</span>.<span class="pl-en">_</span>
<span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">SaveMode</span>
<span class="pl-k">import</span> <span class="pl-en">scala</span>.<span class="pl-en">util</span>.<span class="pl-en">Random</span>
<span class="pl-c"><span class="pl-c">//</span> Load data to the main table, it will also</span>
<span class="pl-c"><span class="pl-c">//</span> trigger immediate load to pre-aggregate table.</span>
<span class="pl-c"><span class="pl-c">//</span> These two loading operation is carried out in a</span>
<span class="pl-c"><span class="pl-c">//</span> transactional manner, meaning that the whole </span>
<span class="pl-c"><span class="pl-c">//</span> operation will fail if one of the loading fails</span>
<span class="pl-k">val</span> <span class="pl-smi">r</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">Random</span>()
spark.sparkContext.parallelize(<span class="pl-c1">1</span> to <span class="pl-c1">10</span>)
.map(x <span class="pl-k">=&gt;</span> (<span class="pl-s"><span class="pl-pds">"</span>ID.<span class="pl-pds">"</span></span> <span class="pl-k">+</span> r.nextInt(<span class="pl-c1">100000</span>), <span class="pl-s"><span class="pl-pds">"</span>country<span class="pl-pds">"</span></span> <span class="pl-k">+</span> x <span class="pl-k">%</span> <span class="pl-c1">8</span>, x <span class="pl-k">%</span> <span class="pl-c1">50</span>, x <span class="pl-k">%</span> <span class="pl-c1">60</span>))
.toDF(<span class="pl-s"><span class="pl-pds">"</span>user_id<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>country<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>quantity<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>price<span class="pl-pds">"</span></span>)
.write
.format(<span class="pl-s"><span class="pl-pds">"</span>carbondata<span class="pl-pds">"</span></span>)
.option(<span class="pl-s"><span class="pl-pds">"</span>tableName<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>sales<span class="pl-pds">"</span></span>)
.option(<span class="pl-s"><span class="pl-pds">"</span>compress<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span>)
.mode(<span class="pl-en">SaveMode</span>.<span class="pl-en">Append</span>)
.save()
spark.sql(
s<span class="pl-s"><span class="pl-pds">"""</span></span>
<span class="pl-s"> |SELECT country, sum(quantity), avg(price)</span>
<span class="pl-s"> | from sales GROUP BY country</span>
<span class="pl-s"> <span class="pl-pds">"""</span></span>.stripMargin).show
spark.stop</pre></div>
<h4>
<a id="datamap-management" class="anchor" href="#datamap-management" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>DataMap Management</h4>
<p>DataMap can be created using following DDL</p>
<pre><code>CREATE DATAMAP [IF NOT EXISTS] datamap_name
ON TABLE main_table
USING "datamap_provider"
DMPROPERTIES ('key'='value', ...)
AS
SELECT statement
</code></pre>
<p>The string followed by USING is called DataMap Provider, in this version CarbonData supports two
kinds of DataMap:</p>
<ol>
<li>preaggregate, for pre-aggregate table. Pre-Aggregate table supports two values for DMPROPERTIES.
a. 'path' is used to specify the store location of the datamap.('path'='/location/').
b. 'partitioning' when set to false enables user to disable partitioning of the datamap.
Default value is true for this property.</li>
<li>timeseries, for timeseries roll-up table. Please refer to <a href="./timeseries-datamap-guide.html">Timeseries DataMap</a>
</li>
</ol>
<p>DataMap can be dropped using following DDL</p>
<pre><code>DROP DATAMAP [IF EXISTS] datamap_name
ON TABLE main_table
</code></pre>
<p>To show all DataMaps created, use:</p>
<pre><code>SHOW DATAMAP
ON TABLE main_table
</code></pre>
<p>It will show all DataMaps created on main table.</p>
<h2>
<a id="preaggregate-datamap-introduction" class="anchor" href="#preaggregate-datamap-introduction" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Preaggregate DataMap Introduction</h2>
<p>Pre-aggregate tables are created as DataMaps and managed as tables internally by CarbonData.
User can create as many pre-aggregate datamaps required to improve query performance,
provided the storage requirements and loading speeds are acceptable.</p>
<p>Once pre-aggregate datamaps are created, CarbonData's SparkSQL optimizer extension supports to
select the most efficient pre-aggregate datamap and rewrite the SQL to query against the selected
datamap instead of the main table. Since the data size of pre-aggregate datamap is smaller,
user queries are much faster. In our previous experience, we have seen 5X to 100X times faster
in production SQLs.</p>
<p>For instance, main table called <strong>sales</strong> which is defined as</p>
<pre><code>CREATE TABLE sales (
order_time timestamp,
user_id string,
sex string,
country string,
quantity int,
price bigint)
STORED AS carbondata
</code></pre>
<p>User can create pre-aggregate tables using the Create DataMap DDL</p>
<pre><code>CREATE DATAMAP agg_sales
ON TABLE sales
USING "preaggregate"
AS
SELECT country, sex, sum(quantity), avg(price)
FROM sales
GROUP BY country, sex
</code></pre>
<h4>
<a id="functions-supported-in-pre-aggregate-table" class="anchor" href="#functions-supported-in-pre-aggregate-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Functions supported in pre-aggregate table</h4>
<table>
<thead>
<tr>
<th>Function</th>
<th align="center">Rollup supported</th>
</tr>
</thead>
<tbody>
<tr>
<td>SUM</td>
<td align="center">Yes</td>
</tr>
<tr>
<td>AVG</td>
<td align="center">Yes</td>
</tr>
<tr>
<td>MAX</td>
<td align="center">Yes</td>
</tr>
<tr>
<td>MIN</td>
<td align="center">Yes</td>
</tr>
<tr>
<td>COUNT</td>
<td align="center">Yes</td>
</tr>
</tbody>
</table>
<h4>
<a id="how-pre-aggregate-tables-are-selected" class="anchor" href="#how-pre-aggregate-tables-are-selected" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How pre-aggregate tables are selected</h4>
<p>When a user query is submitted, during query planning phase, CarbonData will collect all matched
pre-aggregate tables as candidates according to Relational Algebra transformation rules. Then, the
best pre-aggregate table for this query will be selected among the candidates based on cost.
For simplicity, current cost estimation is based on the data size of the pre-aggregate table. (We
assume that query will be faster on smaller table)</p>
<p>For the main table <strong>sales</strong> and pre-aggregate table <strong>agg_sales</strong> created above, following queries</p>
<pre><code>SELECT country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
SELECT sex, sum(quantity) from sales GROUP BY sex
SELECT avg(price), country from sales GROUP BY country
</code></pre>
<p>will be transformed by CarbonData's query planner to query against pre-aggregate table
<strong>agg_sales</strong> instead of the main table <strong>sales</strong></p>
<p>However, for following queries</p>
<pre><code>SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex
SELECT sex, avg(quantity) from sales GROUP BY sex
SELECT country, max(price) from sales GROUP BY country
</code></pre>
<p>will query against main table <strong>sales</strong> only, because it does not satisfy pre-aggregate table
selection logic.</p>
<h2>
<a id="loading-data" class="anchor" href="#loading-data" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Loading data</h2>
<p>For existing table with loaded data, data load to pre-aggregate table will be triggered by the
CREATE DATAMAP statement when user creates the pre-aggregate table. For incremental loads after
aggregates tables are created, loading data to main table triggers the load to pre-aggregate tables
once main table loading is complete.</p>
<p>These loads are transactional
meaning that data on main table and pre-aggregate tables are only visible to the user after all
tables are loaded successfully, if one of these loads fails, new data are not visible in all tables
as if the load operation is not happened.</p>
<h2>
<a id="querying-data" class="anchor" href="#querying-data" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Querying data</h2>
<p>As a technique for query acceleration, Pre-aggregate tables cannot be queried directly.
Queries are to be made on main table. While doing query planning, internally CarbonData will check
associated pre-aggregate tables with the main table, and do query plan transformation accordingly.</p>
<p>User can verify whether a query can leverage pre-aggregate table or not by executing <code>EXPLAIN</code>
command, which will show the transformed logical plan, and thus user can check whether pre-aggregate
table is selected.</p>
<h2>
<a id="compacting-pre-aggregate-tables" class="anchor" href="#compacting-pre-aggregate-tables" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Compacting pre-aggregate tables</h2>
<p>Running Compaction command (<code>ALTER TABLE COMPACT</code>) on main table will <strong>not automatically</strong>
compact the pre-aggregate tables created on the main table. User need to run Compaction command
separately on each pre-aggregate table to compact them.</p>
<p>Compaction is an optional operation for pre-aggregate table. If compaction is performed on
main table but not performed on pre-aggregate table, all queries still can benefit from
pre-aggregate tables. To further improve the query performance, compaction on pre-aggregate tables
can be triggered to merge the segments and files in the pre-aggregate tables.</p>
<h2>
<a id="data-management-with-pre-aggregate-tables" class="anchor" href="#data-management-with-pre-aggregate-tables" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Data Management with pre-aggregate tables</h2>
<p>In current implementation, data consistency needs to be maintained for both main table and pre-aggregate
tables. Once there is pre-aggregate table created on the main table, following command on the main
table
is not supported:</p>
<ol>
<li>Data management command: <code>UPDATE/DELETE/DELETE SEGMENT</code>.</li>
<li>Schema management command: <code>ALTER TABLE DROP COLUMN</code>, <code>ALTER TABLE CHANGE DATATYPE</code>,
<code>ALTER TABLE RENAME</code>. Note that adding a new column is supported, and for dropping columns and
change datatype command, CarbonData will check whether it will impact the pre-aggregate table, if
not, the operation is allowed, otherwise operation will be rejected by throwing exception.</li>
<li>Partition management command: <code>ALTER TABLE ADD/DROP PARTITION</code>
</li>
<li>Complex Datatypes for preaggregate is not supported.</li>
</ol>
<p>However, there is still way to support these operations on main table, in current CarbonData
release, user can do as following:</p>
<ol>
<li>Remove the pre-aggregate table by <code>DROP DATAMAP</code> command</li>
<li>Carry out the data management operation on main table</li>
<li>Create the pre-aggregate table again by <code>CREATE DATAMAP</code> command
Basically, user can manually trigger the operation by re-building the datamap.</li>
</ol>
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__datamap').addClass('selected');
if (!$('.b-nav__datamap').parent().hasClass('nav__item__with__subs--expanded')) {
// Display datamap subnav items
$('.b-nav__datamap').parent().toggleClass('nav__item__with__subs--expanded');
}
});
</script></div>
</div>
</div>
</div>
<div class="doc-footer">
<a href="#top" class="scroll-top">Top</a>
</div>
</div>
</section>
</div>
</div>
</div>
</section><!-- End systemblock part -->
<script src="js/custom.js"></script>
</body>
</html>