| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <link href='images/favicon.ico' rel='shortcut icon' type='image/x-icon'> |
| <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| <title>CarbonData</title> |
| <style> |
| |
| </style> |
| <!-- Bootstrap --> |
| |
| <link rel="stylesheet" href="css/bootstrap.min.css"> |
| <link href="css/style.css" rel="stylesheet"> |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| <script src="js/jquery.min.js"></script> |
| <script src="js/bootstrap.min.js"></script> |
| <script defer src="https://use.fontawesome.com/releases/v5.0.8/js/all.js"></script> |
| |
| |
| </head> |
| <body> |
| <header> |
| <nav class="navbar navbar-default navbar-custom cd-navbar-wrapper"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse" |
| class="navbar-toggle collapsed" type="button"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a href="index.html" class="logo"> |
| <img src="images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData logo"/> |
| </a> |
| </div> |
| <div class="navbar-collapse collapse cd_navcontnt" id="navbar"> |
| <ul class="nav navbar-nav navbar-right navlist-custom"> |
| <li><a href="index.html" class="hidden-xs"><i class="fa fa-home" aria-hidden="true"></i> </a> |
| </li> |
| <li><a href="index.html" class="hidden-lg hidden-md hidden-sm">Home</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle " data-toggle="dropdown" role="button" aria-haspopup="true" |
| aria-expanded="false"> Download <span class="caret"></span></a> |
| <ul class="dropdown-menu"> |
| <li> |
| <a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.0/" |
| target="_blank">Apache CarbonData 1.5.0</a></li> |
| <li> |
| <a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.1/" |
| target="_blank">Apache CarbonData 1.4.1</a></li> |
| <li> |
| <a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.0/" |
| target="_blank">Apache CarbonData 1.4.0</a></li> |
| <li> |
| <a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.1/" |
| target="_blank">Apache CarbonData 1.3.1</a></li> |
| <li> |
| <a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/" |
| target="_blank">Apache CarbonData 1.3.0</a></li> |
| <li> |
| <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Releases" |
| target="_blank">Release Archive</a></li> |
| </ul> |
| </li> |
| <li><a href="documentation.html" class="active">Documentation</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" |
| aria-expanded="false">Community <span class="caret"></span></a> |
| <ul class="dropdown-menu"> |
| <li> |
| <a href="https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md" |
| target="_blank">Contributing to CarbonData</a></li> |
| <li> |
| <a href="https://github.com/apache/carbondata/blob/master/docs/release-guide.md" |
| target="_blank">Release Guide</a></li> |
| <li> |
| <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/PMC+and+Committers+member+list" |
| target="_blank">Project PMC and Committers</a></li> |
| <li> |
| <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609" |
| target="_blank">CarbonData Meetups</a></li> |
| <li><a href="security.html">Apache CarbonData Security</a></li> |
| <li><a href="https://issues.apache.org/jira/browse/CARBONDATA" target="_blank">Apache |
| Jira</a></li> |
| <li><a href="videogallery.html">CarbonData Videos </a></li> |
| </ul> |
| </li> |
| <li class="dropdown"> |
| <a href="http://www.apache.org/" class="apache_link hidden-xs dropdown-toggle" |
| data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a> |
| <ul class="dropdown-menu"> |
| <li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li> |
| <li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html" |
| target="_blank">Sponsorship</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> |
| </ul> |
| </li> |
| |
| <li class="dropdown"> |
| <a href="http://www.apache.org/" class="hidden-lg hidden-md hidden-sm dropdown-toggle" |
| data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a> |
| <ul class="dropdown-menu"> |
| <li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li> |
| <li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html" |
| target="_blank">Sponsorship</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" id="search-icon"><i class="fa fa-search" aria-hidden="true"></i></a> |
| |
| </li> |
| |
| </ul> |
| </div><!--/.nav-collapse --> |
| <div id="search-box"> |
| <form method="get" action="http://www.google.com/search" target="_blank"> |
| <div class="search-block"> |
| <table border="0" cellpadding="0" width="100%"> |
| <tr> |
| <td style="width:80%"> |
| <input type="text" name="q" size=" 5" maxlength="255" value="" |
| class="search-input" placeholder="Search...." required/> |
| </td> |
| <td style="width:20%"> |
| <input type="submit" value="Search"/></td> |
| </tr> |
| <tr> |
| <td align="left" style="font-size:75%" colspan="2"> |
| <input type="checkbox" name="sitesearch" value="carbondata.apache.org" checked/> |
| <span style=" position: relative; top: -3px;"> Only search for CarbonData</span> |
| </td> |
| </tr> |
| </table> |
| </div> |
| </form> |
| </div> |
| </div> |
| </nav> |
| </header> <!-- end Header part --> |
| |
| <div class="fixed-padding"></div> <!-- top padding with fixde header --> |
| |
| <section><!-- Dashboard nav --> |
| <div class="container-fluid q"> |
| <div class="col-sm-12 col-md-12 maindashboard"> |
| <div class="verticalnavbar"> |
| <nav class="b-sticky-nav"> |
| <div class="nav-scroller"> |
| <div class="nav__inner"> |
| <a class="b-nav__intro nav__item" href="./introduction.html">introduction</a> |
| <a class="b-nav__quickstart nav__item" href="./quick-start-guide.html">quick start</a> |
| <a class="b-nav__uses nav__item" href="./usecases.html">use cases</a> |
| |
| <div class="nav__item nav__item__with__subs"> |
| <a class="b-nav__docs nav__item nav__sub__anchor" href="./language-manual.html">Language Reference</a> |
| <a class="nav__item nav__sub__item" href="./ddl-of-carbondata.html">DDL</a> |
| <a class="nav__item nav__sub__item" href="./dml-of-carbondata.html">DML</a> |
| <a class="nav__item nav__sub__item" href="./streaming-guide.html">Streaming</a> |
| <a class="nav__item nav__sub__item" href="./configuration-parameters.html">Configuration</a> |
| <a class="nav__item nav__sub__item" href="./datamap-developer-guide.html">Datamaps</a> |
| <a class="nav__item nav__sub__item" href="./supported-data-types-in-carbondata.html">Data Types</a> |
| </div> |
| |
| <div class="nav__item nav__item__with__subs"> |
| <a class="b-nav__datamap nav__item nav__sub__anchor" href="./datamap-management.html">DataMaps</a> |
| <a class="nav__item nav__sub__item" href="./bloomfilter-datamap-guide.html">Bloom Filter</a> |
| <a class="nav__item nav__sub__item" href="./lucene-datamap-guide.html">Lucene</a> |
| <a class="nav__item nav__sub__item" href="./preaggregate-datamap-guide.html">Pre-Aggregate</a> |
| <a class="nav__item nav__sub__item" href="./timeseries-datamap-guide.html">Time Series</a> |
| </div> |
| |
| <div class="nav__item nav__item__with__subs"> |
| <a class="b-nav__api nav__item nav__sub__anchor" href="./sdk-guide.html">API</a> |
| <a class="nav__item nav__sub__item" href="./sdk-guide.html">Java SDK</a> |
| <a class="nav__item nav__sub__item" href="./CSDK-guide.html">C++ SDK</a> |
| </div> |
| |
| <a class="b-nav__perf nav__item" href="./performance-tuning.html">Performance Tuning</a> |
| <a class="b-nav__s3 nav__item" href="./s3-guide.html">S3 Storage</a> |
| <a class="b-nav__faq nav__item" href="./faq.html">FAQ</a> |
| <a class="b-nav__contri nav__item" href="./how-to-contribute-to-apache-carbondata.html">Contribute</a> |
| <a class="b-nav__security nav__item" href="./security.html">Security</a> |
| <a class="b-nav__release nav__item" href="./release-guide.html">Release Guide</a> |
| </div> |
| </div> |
| <div class="navindicator"> |
| <div class="b-nav__intro navindicator__item"></div> |
| <div class="b-nav__quickstart navindicator__item"></div> |
| <div class="b-nav__uses navindicator__item"></div> |
| <div class="b-nav__docs navindicator__item"></div> |
| <div class="b-nav__datamap navindicator__item"></div> |
| <div class="b-nav__api navindicator__item"></div> |
| <div class="b-nav__perf navindicator__item"></div> |
| <div class="b-nav__s3 navindicator__item"></div> |
| <div class="b-nav__faq navindicator__item"></div> |
| <div class="b-nav__contri navindicator__item"></div> |
| <div class="b-nav__security navindicator__item"></div> |
| </div> |
| </nav> |
| </div> |
| <div class="mdcontent"> |
| <section> |
| <div style="padding:10px 15px;"> |
| <div id="viewpage" name="viewpage"> |
| <div class="row"> |
| <div class="col-sm-12 col-md-12"> |
| <div> |
| <h1> |
| <a id="carbondata-pre-aggregate-datamap" class="anchor" href="#carbondata-pre-aggregate-datamap" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>CarbonData Pre-aggregate DataMap</h1> |
| <ul> |
| <li><a href="#quick-example">Quick Example</a></li> |
| <li><a href="#datamap-management">DataMap Management</a></li> |
| <li><a href="#preaggregate-datamap-introduction">Pre-aggregate Table</a></li> |
| <li><a href="#loading-data">Loading Data</a></li> |
| <li><a href="#querying-data">Querying Data</a></li> |
| <li><a href="#compacting-pre-aggregate-tables">Compaction</a></li> |
| <li><a href="#data-management-with-pre-aggregate-tables">Data Management</a></li> |
| </ul> |
| <h2> |
| <a id="quick-example" class="anchor" href="#quick-example" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Quick example</h2> |
| <p>Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME</p> |
| <p>Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars</p> |
| <div class="highlight highlight-source-shell"><pre>mvn clean package -DskipTests -Pspark-2.2</pre></div> |
| <p>Start spark-shell in new terminal, type :paste, then copy and run the following code.</p> |
| <div class="highlight highlight-source-scala"><pre> <span class="pl-k">import</span> <span class="pl-en">java</span>.<span class="pl-en">io</span>.<span class="pl-en">File</span> |
| <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.{<span class="pl-en">CarbonEnv</span>, <span class="pl-en">SparkSession</span>} |
| <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">CarbonSession</span>.<span class="pl-en">_</span> |
| <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">streaming</span>.{<span class="pl-en">ProcessingTime</span>, <span class="pl-en">StreamingQuery</span>} |
| <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">carbondata</span>.<span class="pl-en">core</span>.<span class="pl-en">util</span>.<span class="pl-en">path</span>.<span class="pl-en">CarbonStorePath</span> |
| |
| <span class="pl-k">val</span> <span class="pl-smi">warehouse</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">File</span>(<span class="pl-s"><span class="pl-pds">"</span>./warehouse<span class="pl-pds">"</span></span>).getCanonicalPath |
| <span class="pl-k">val</span> <span class="pl-smi">metastore</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">File</span>(<span class="pl-s"><span class="pl-pds">"</span>./metastore<span class="pl-pds">"</span></span>).getCanonicalPath |
| |
| <span class="pl-k">val</span> <span class="pl-smi">spark</span> <span class="pl-k">=</span> <span class="pl-en">SparkSession</span> |
| .builder() |
| .master(<span class="pl-s"><span class="pl-pds">"</span>local<span class="pl-pds">"</span></span>) |
| .appName(<span class="pl-s"><span class="pl-pds">"</span>preAggregateExample<span class="pl-pds">"</span></span>) |
| .config(<span class="pl-s"><span class="pl-pds">"</span>spark.sql.warehouse.dir<span class="pl-pds">"</span></span>, warehouse) |
| .getOrCreateCarbonSession(warehouse, metastore) |
| |
| spark.sparkContext.setLogLevel(<span class="pl-s"><span class="pl-pds">"</span>ERROR<span class="pl-pds">"</span></span>) |
| |
| <span class="pl-c"><span class="pl-c">//</span> drop table if exists previously</span> |
| spark.sql(s<span class="pl-s"><span class="pl-pds">"</span>DROP TABLE IF EXISTS sales<span class="pl-pds">"</span></span>) |
| |
| <span class="pl-c"><span class="pl-c">//</span> Create main table</span> |
| spark.sql( |
| s<span class="pl-s"><span class="pl-pds">"""</span></span> |
| <span class="pl-s"> | CREATE TABLE sales (</span> |
| <span class="pl-s"> | user_id string,</span> |
| <span class="pl-s"> | country string,</span> |
| <span class="pl-s"> | quantity int,</span> |
| <span class="pl-s"> | price bigint)</span> |
| <span class="pl-s"> | STORED AS carbondata</span> |
| <span class="pl-s"> <span class="pl-pds">"""</span></span>.stripMargin) |
| |
| <span class="pl-c"><span class="pl-c">//</span> Create pre-aggregate table on the main table</span> |
| <span class="pl-c"><span class="pl-c">//</span> If main table already have data, following command </span> |
| <span class="pl-c"><span class="pl-c">//</span> will trigger one immediate load to the pre-aggregate table</span> |
| spark.sql( |
| s<span class="pl-s"><span class="pl-pds">"""</span></span> |
| <span class="pl-s"> | CREATE DATAMAP agg_sales</span> |
| <span class="pl-s"> | ON TABLE sales</span> |
| <span class="pl-s"> | USING "preaggregate"</span> |
| <span class="pl-s"> | AS</span> |
| <span class="pl-s"> | SELECT country, sum(quantity), avg(price)</span> |
| <span class="pl-s"> | FROM sales</span> |
| <span class="pl-s"> | GROUP BY country</span> |
| <span class="pl-s"> <span class="pl-pds">"""</span></span>.stripMargin) |
| |
| <span class="pl-k">import</span> <span class="pl-en">spark</span>.<span class="pl-en">implicits</span>.<span class="pl-en">_</span> |
| <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">SaveMode</span> |
| <span class="pl-k">import</span> <span class="pl-en">scala</span>.<span class="pl-en">util</span>.<span class="pl-en">Random</span> |
| |
| <span class="pl-c"><span class="pl-c">//</span> Load data to the main table, it will also</span> |
| <span class="pl-c"><span class="pl-c">//</span> trigger immediate load to pre-aggregate table.</span> |
| <span class="pl-c"><span class="pl-c">//</span> These two loading operation is carried out in a</span> |
| <span class="pl-c"><span class="pl-c">//</span> transactional manner, meaning that the whole </span> |
| <span class="pl-c"><span class="pl-c">//</span> operation will fail if one of the loading fails</span> |
| <span class="pl-k">val</span> <span class="pl-smi">r</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">Random</span>() |
| spark.sparkContext.parallelize(<span class="pl-c1">1</span> to <span class="pl-c1">10</span>) |
| .map(x <span class="pl-k">=></span> (<span class="pl-s"><span class="pl-pds">"</span>ID.<span class="pl-pds">"</span></span> <span class="pl-k">+</span> r.nextInt(<span class="pl-c1">100000</span>), <span class="pl-s"><span class="pl-pds">"</span>country<span class="pl-pds">"</span></span> <span class="pl-k">+</span> x <span class="pl-k">%</span> <span class="pl-c1">8</span>, x <span class="pl-k">%</span> <span class="pl-c1">50</span>, x <span class="pl-k">%</span> <span class="pl-c1">60</span>)) |
| .toDF(<span class="pl-s"><span class="pl-pds">"</span>user_id<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>country<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>quantity<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>price<span class="pl-pds">"</span></span>) |
| .write |
| .format(<span class="pl-s"><span class="pl-pds">"</span>carbondata<span class="pl-pds">"</span></span>) |
| .option(<span class="pl-s"><span class="pl-pds">"</span>tableName<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>sales<span class="pl-pds">"</span></span>) |
| .option(<span class="pl-s"><span class="pl-pds">"</span>compress<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span>) |
| .mode(<span class="pl-en">SaveMode</span>.<span class="pl-en">Append</span>) |
| .save() |
| |
| spark.sql( |
| s<span class="pl-s"><span class="pl-pds">"""</span></span> |
| <span class="pl-s"> |SELECT country, sum(quantity), avg(price)</span> |
| <span class="pl-s"> | from sales GROUP BY country</span> |
| <span class="pl-s"> <span class="pl-pds">"""</span></span>.stripMargin).show |
| |
| spark.stop</pre></div> |
| <h4> |
| <a id="datamap-management" class="anchor" href="#datamap-management" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>DataMap Management</h4> |
| <p>DataMap can be created using following DDL</p> |
| <pre><code>CREATE DATAMAP [IF NOT EXISTS] datamap_name |
| ON TABLE main_table |
| USING "datamap_provider" |
| DMPROPERTIES ('key'='value', ...) |
| AS |
| SELECT statement |
| </code></pre> |
| <p>The string followed by USING is called DataMap Provider, in this version CarbonData supports two |
| kinds of DataMap:</p> |
| <ol> |
| <li>preaggregate, for pre-aggregate table. Pre-Aggregate table supports two values for DMPROPERTIES. |
| a. 'path' is used to specify the store location of the datamap.('path'='/location/'). |
| b. 'partitioning' when set to false enables user to disable partitioning of the datamap. |
| Default value is true for this property.</li> |
| <li>timeseries, for timeseries roll-up table. Please refer to <a href="./timeseries-datamap-guide.html">Timeseries DataMap</a> |
| </li> |
| </ol> |
| <p>DataMap can be dropped using following DDL</p> |
| <pre><code>DROP DATAMAP [IF EXISTS] datamap_name |
| ON TABLE main_table |
| </code></pre> |
| <p>To show all DataMaps created, use:</p> |
| <pre><code>SHOW DATAMAP |
| ON TABLE main_table |
| </code></pre> |
| <p>It will show all DataMaps created on main table.</p> |
| <h2> |
| <a id="preaggregate-datamap-introduction" class="anchor" href="#preaggregate-datamap-introduction" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Preaggregate DataMap Introduction</h2> |
| <p>Pre-aggregate tables are created as DataMaps and managed as tables internally by CarbonData. |
| User can create as many pre-aggregate datamaps required to improve query performance, |
| provided the storage requirements and loading speeds are acceptable.</p> |
| <p>Once pre-aggregate datamaps are created, CarbonData's SparkSQL optimizer extension supports to |
| select the most efficient pre-aggregate datamap and rewrite the SQL to query against the selected |
| datamap instead of the main table. Since the data size of pre-aggregate datamap is smaller, |
| user queries are much faster. In our previous experience, we have seen 5X to 100X times faster |
| in production SQLs.</p> |
| <p>For instance, main table called <strong>sales</strong> which is defined as</p> |
| <pre><code>CREATE TABLE sales ( |
| order_time timestamp, |
| user_id string, |
| sex string, |
| country string, |
| quantity int, |
| price bigint) |
| STORED AS carbondata |
| </code></pre> |
| <p>User can create pre-aggregate tables using the Create DataMap DDL</p> |
| <pre><code>CREATE DATAMAP agg_sales |
| ON TABLE sales |
| USING "preaggregate" |
| AS |
| SELECT country, sex, sum(quantity), avg(price) |
| FROM sales |
| GROUP BY country, sex |
| </code></pre> |
| <h4> |
| <a id="functions-supported-in-pre-aggregate-table" class="anchor" href="#functions-supported-in-pre-aggregate-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Functions supported in pre-aggregate table</h4> |
| <table> |
| <thead> |
| <tr> |
| <th>Function</th> |
| <th align="center">Rollup supported</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>SUM</td> |
| <td align="center">Yes</td> |
| </tr> |
| <tr> |
| <td>AVG</td> |
| <td align="center">Yes</td> |
| </tr> |
| <tr> |
| <td>MAX</td> |
| <td align="center">Yes</td> |
| </tr> |
| <tr> |
| <td>MIN</td> |
| <td align="center">Yes</td> |
| </tr> |
| <tr> |
| <td>COUNT</td> |
| <td align="center">Yes</td> |
| </tr> |
| </tbody> |
| </table> |
| <h4> |
| <a id="how-pre-aggregate-tables-are-selected" class="anchor" href="#how-pre-aggregate-tables-are-selected" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How pre-aggregate tables are selected</h4> |
| <p>When a user query is submitted, during query planning phase, CarbonData will collect all matched |
| pre-aggregate tables as candidates according to Relational Algebra transformation rules. Then, the |
| best pre-aggregate table for this query will be selected among the candidates based on cost. |
| For simplicity, current cost estimation is based on the data size of the pre-aggregate table. (We |
| assume that query will be faster on smaller table)</p> |
| <p>For the main table <strong>sales</strong> and pre-aggregate table <strong>agg_sales</strong> created above, following queries</p> |
| <pre><code>SELECT country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex |
| |
| SELECT sex, sum(quantity) from sales GROUP BY sex |
| |
| SELECT avg(price), country from sales GROUP BY country |
| </code></pre> |
| <p>will be transformed by CarbonData's query planner to query against pre-aggregate table |
| <strong>agg_sales</strong> instead of the main table <strong>sales</strong></p> |
| <p>However, for following queries</p> |
| <pre><code>SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex |
| |
| SELECT sex, avg(quantity) from sales GROUP BY sex |
| |
| SELECT country, max(price) from sales GROUP BY country |
| </code></pre> |
| <p>will query against main table <strong>sales</strong> only, because it does not satisfy pre-aggregate table |
| selection logic.</p> |
| <h2> |
| <a id="loading-data" class="anchor" href="#loading-data" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Loading data</h2> |
| <p>For existing table with loaded data, data load to pre-aggregate table will be triggered by the |
| CREATE DATAMAP statement when user creates the pre-aggregate table. For incremental loads after |
| aggregates tables are created, loading data to main table triggers the load to pre-aggregate tables |
| once main table loading is complete.</p> |
| <p>These loads are transactional |
| meaning that data on main table and pre-aggregate tables are only visible to the user after all |
| tables are loaded successfully, if one of these loads fails, new data are not visible in all tables |
| as if the load operation is not happened.</p> |
| <h2> |
| <a id="querying-data" class="anchor" href="#querying-data" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Querying data</h2> |
| <p>As a technique for query acceleration, Pre-aggregate tables cannot be queried directly. |
| Queries are to be made on main table. While doing query planning, internally CarbonData will check |
| associated pre-aggregate tables with the main table, and do query plan transformation accordingly.</p> |
| <p>User can verify whether a query can leverage pre-aggregate table or not by executing <code>EXPLAIN</code> |
| command, which will show the transformed logical plan, and thus user can check whether pre-aggregate |
| table is selected.</p> |
| <h2> |
| <a id="compacting-pre-aggregate-tables" class="anchor" href="#compacting-pre-aggregate-tables" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Compacting pre-aggregate tables</h2> |
| <p>Running Compaction command (<code>ALTER TABLE COMPACT</code>) on main table will <strong>not automatically</strong> |
| compact the pre-aggregate tables created on the main table. User need to run Compaction command |
| separately on each pre-aggregate table to compact them.</p> |
| <p>Compaction is an optional operation for pre-aggregate table. If compaction is performed on |
| main table but not performed on pre-aggregate table, all queries still can benefit from |
| pre-aggregate tables. To further improve the query performance, compaction on pre-aggregate tables |
| can be triggered to merge the segments and files in the pre-aggregate tables.</p> |
| <h2> |
| <a id="data-management-with-pre-aggregate-tables" class="anchor" href="#data-management-with-pre-aggregate-tables" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Data Management with pre-aggregate tables</h2> |
| <p>In current implementation, data consistency needs to be maintained for both main table and pre-aggregate |
| tables. Once there is pre-aggregate table created on the main table, following command on the main |
| table |
| is not supported:</p> |
| <ol> |
| <li>Data management command: <code>UPDATE/DELETE/DELETE SEGMENT</code>.</li> |
| <li>Schema management command: <code>ALTER TABLE DROP COLUMN</code>, <code>ALTER TABLE CHANGE DATATYPE</code>, |
| <code>ALTER TABLE RENAME</code>. Note that adding a new column is supported, and for dropping columns and |
| change datatype command, CarbonData will check whether it will impact the pre-aggregate table, if |
| not, the operation is allowed, otherwise operation will be rejected by throwing exception.</li> |
| <li>Partition management command: <code>ALTER TABLE ADD/DROP PARTITION</code> |
| </li> |
| <li>Complex Datatypes for preaggregate is not supported.</li> |
| </ol> |
| <p>However, there is still way to support these operations on main table, in current CarbonData |
| release, user can do as following:</p> |
| <ol> |
| <li>Remove the pre-aggregate table by <code>DROP DATAMAP</code> command</li> |
| <li>Carry out the data management operation on main table</li> |
| <li>Create the pre-aggregate table again by <code>CREATE DATAMAP</code> command |
| Basically, user can manually trigger the operation by re-building the datamap.</li> |
| </ol> |
| <script> |
| $(function() { |
| // Show selected style on nav item |
| $('.b-nav__datamap').addClass('selected'); |
| |
| if (!$('.b-nav__datamap').parent().hasClass('nav__item__with__subs--expanded')) { |
| // Display datamap subnav items |
| $('.b-nav__datamap').parent().toggleClass('nav__item__with__subs--expanded'); |
| } |
| }); |
| </script></div> |
| </div> |
| </div> |
| </div> |
| <div class="doc-footer"> |
| <a href="#top" class="scroll-top">Top</a> |
| </div> |
| </div> |
| </section> |
| </div> |
| </div> |
| </div> |
| </section><!-- End systemblock part --> |
| <script src="js/custom.js"></script> |
| </body> |
| </html> |