content/preaggregate-datamap-guide.html - carbondata-site - Git at Google

 <!DOCTYPE html>
 <html lang="en">
 <head>
     <meta charset="utf-8">
     <meta http-equiv="X-UA-Compatible" content="IE=edge">
     <meta name="viewport" content="width=device-width, initial-scale=1">
     <link href='images/favicon.ico' rel='shortcut icon' type='image/x-icon'>
     <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
     <title>CarbonData</title>
     <style>

     </style>
     <!-- Bootstrap -->

     <link rel="stylesheet" href="css/bootstrap.min.css">
     <link href="css/style.css" rel="stylesheet">
     <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
     <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
     <!--[if lt IE 9]>
     <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
     <script src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js"></script>
     <![endif]-->
     <script src="js/jquery.min.js"></script>
     <script src="js/bootstrap.min.js"></script>
     <script defer src="https://use.fontawesome.com/releases/v5.0.8/js/all.js"></script>


 </head>
 <body>
 <header>
     <nav class="navbar navbar-default navbar-custom cd-navbar-wrapper">
         <div class="container">
             <div class="navbar-header">
                 <button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse"
                         class="navbar-toggle collapsed" type="button">
                     <span class="sr-only">Toggle navigation</span>
                     <span class="icon-bar"></span>
                     <span class="icon-bar"></span>
                     <span class="icon-bar"></span>
                 </button>
                 <a href="index.html" class="logo">
                     <img src="images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData logo"/>
                 </a>
             </div>
             <div class="navbar-collapse collapse cd_navcontnt" id="navbar">
                 <ul class="nav navbar-nav navbar-right navlist-custom">
                     <li><a href="index.html" class="hidden-xs"><i class="fa fa-home" aria-hidden="true"></i> </a>
                     </li>
                     <li><a href="index.html" class="hidden-lg hidden-md hidden-sm">Home</a></li>
                     <li class="dropdown">
                         <a href="#" class="dropdown-toggle " data-toggle="dropdown" role="button" aria-haspopup="true"
                            aria-expanded="false"> Download <span class="caret"></span></a>
                         <ul class="dropdown-menu">
                             <li>
                                 <a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.0/"
                                    target="_blank">Apache CarbonData 1.5.0</a></li>
                             <li>
                                 <a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.1/"
                                    target="_blank">Apache CarbonData 1.4.1</a></li>
 							<li>
                                 <a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.0/"
                                    target="_blank">Apache CarbonData 1.4.0</a></li>
                             <li>
                                 <a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.1/"
                                    target="_blank">Apache CarbonData 1.3.1</a></li>
                             <li>
                                 <a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/"
                                    target="_blank">Apache CarbonData 1.3.0</a></li>
                             <li>
                                 <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Releases"
                                    target="_blank">Release Archive</a></li>
                         </ul>
                     </li>
                     <li><a href="documentation.html" class="active">Documentation</a></li>
                     <li class="dropdown">
                         <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true"
                            aria-expanded="false">Community <span class="caret"></span></a>
                         <ul class="dropdown-menu">
                             <li>
                                 <a href="https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md"
                                    target="_blank">Contributing to CarbonData</a></li>
                             <li>
                                 <a href="https://github.com/apache/carbondata/blob/master/docs/release-guide.md"
                                    target="_blank">Release Guide</a></li>
                             <li>
                                 <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/PMC+and+Committers+member+list"
                                    target="_blank">Project PMC and Committers</a></li>
                             <li>
                                 <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609"
                                    target="_blank">CarbonData Meetups</a></li>
                             <li><a href="security.html">Apache CarbonData Security</a></li>
                             <li><a href="https://issues.apache.org/jira/browse/CARBONDATA" target="_blank">Apache
                                 Jira</a></li>
                             <li><a href="videogallery.html">CarbonData Videos </a></li>
                         </ul>
                     </li>
                     <li class="dropdown">
                         <a href="http://www.apache.org/" class="apache_link hidden-xs dropdown-toggle"
                            data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
                         <ul class="dropdown-menu">
                             <li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
                             <li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
                             <li><a href="http://www.apache.org/foundation/sponsorship.html"
                                    target="_blank">Sponsorship</a></li>
                             <li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
                         </ul>
                     </li>

                     <li class="dropdown">
                         <a href="http://www.apache.org/" class="hidden-lg hidden-md hidden-sm dropdown-toggle"
                            data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
                         <ul class="dropdown-menu">
                             <li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
                             <li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
                             <li><a href="http://www.apache.org/foundation/sponsorship.html"
                                    target="_blank">Sponsorship</a></li>
                             <li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
                         </ul>
                     </li>

                     <li>
                         <a href="#" id="search-icon"><i class="fa fa-search" aria-hidden="true"></i></a>

                     </li>

                 </ul>
             </div><!--/.nav-collapse -->
             <div id="search-box">
                 <form method="get" action="http://www.google.com/search" target="_blank">
                     <div class="search-block">
                         <table border="0" cellpadding="0" width="100%">
                             <tr>
                                 <td style="width:80%">
                                     <input type="text" name="q" size=" 5" maxlength="255" value=""
                                            class="search-input"  placeholder="Search...."    required/>
                                 </td>
                                 <td style="width:20%">
                                     <input type="submit" value="Search"/></td>
                             </tr>
                             <tr>
                                 <td align="left" style="font-size:75%" colspan="2">
                                     <input type="checkbox" name="sitesearch" value="carbondata.apache.org" checked/>
                                     <span style=" position: relative; top: -3px;"> Only search for CarbonData</span>
                                 </td>
                             </tr>
                         </table>
                     </div>
                 </form>
             </div>
         </div>
     </nav>
 </header> <!-- end Header part -->

 <div class="fixed-padding"></div> <!--  top padding with fixde header  -->

 <section><!-- Dashboard nav -->
     <div class="container-fluid q">
         <div class="col-sm-12  col-md-12 maindashboard">
             <div class="verticalnavbar">
                 <nav class="b-sticky-nav">
                     <div class="nav-scroller">
                         <div class="nav__inner">
                             <a class="b-nav__intro nav__item" href="./introduction.html">introduction</a>
                             <a class="b-nav__quickstart nav__item" href="./quick-start-guide.html">quick start</a>
                             <a class="b-nav__uses nav__item" href="./usecases.html">use cases</a>

                             <div class="nav__item nav__item__with__subs">
                                 <a class="b-nav__docs nav__item nav__sub__anchor" href="./language-manual.html">Language Reference</a>
                                 <a class="nav__item nav__sub__item" href="./ddl-of-carbondata.html">DDL</a>
                                 <a class="nav__item nav__sub__item" href="./dml-of-carbondata.html">DML</a>
                                 <a class="nav__item nav__sub__item" href="./streaming-guide.html">Streaming</a>
                                 <a class="nav__item nav__sub__item" href="./configuration-parameters.html">Configuration</a>
                                 <a class="nav__item nav__sub__item" href="./datamap-developer-guide.html">Datamaps</a>
                                 <a class="nav__item nav__sub__item" href="./supported-data-types-in-carbondata.html">Data Types</a>
                             </div>

                             <div class="nav__item nav__item__with__subs">
                                 <a class="b-nav__datamap nav__item nav__sub__anchor" href="./datamap-management.html">DataMaps</a>
                                 <a class="nav__item nav__sub__item" href="./bloomfilter-datamap-guide.html">Bloom Filter</a>
                                 <a class="nav__item nav__sub__item" href="./lucene-datamap-guide.html">Lucene</a>
                                 <a class="nav__item nav__sub__item" href="./preaggregate-datamap-guide.html">Pre-Aggregate</a>
                                 <a class="nav__item nav__sub__item" href="./timeseries-datamap-guide.html">Time Series</a>
                             </div>

                             <div class="nav__item nav__item__with__subs">
                                 <a class="b-nav__api nav__item nav__sub__anchor" href="./sdk-guide.html">API</a>
                                 <a class="nav__item nav__sub__item" href="./sdk-guide.html">Java SDK</a>
                                 <a class="nav__item nav__sub__item" href="./CSDK-guide.html">C++ SDK</a>
                             </div>

                             <a class="b-nav__perf nav__item" href="./performance-tuning.html">Performance Tuning</a>
                             <a class="b-nav__s3 nav__item" href="./s3-guide.html">S3 Storage</a>
                             <a class="b-nav__faq nav__item" href="./faq.html">FAQ</a>
                             <a class="b-nav__contri nav__item" href="./how-to-contribute-to-apache-carbondata.html">Contribute</a>
                             <a class="b-nav__security nav__item" href="./security.html">Security</a>
                             <a class="b-nav__release nav__item" href="./release-guide.html">Release Guide</a>
                         </div>
                     </div>
                     <div class="navindicator">
                         <div class="b-nav__intro navindicator__item"></div>
                         <div class="b-nav__quickstart navindicator__item"></div>
                         <div class="b-nav__uses navindicator__item"></div>
                         <div class="b-nav__docs navindicator__item"></div>
                         <div class="b-nav__datamap navindicator__item"></div>
                         <div class="b-nav__api navindicator__item"></div>
                         <div class="b-nav__perf navindicator__item"></div>
                         <div class="b-nav__s3 navindicator__item"></div>
                         <div class="b-nav__faq navindicator__item"></div>
                         <div class="b-nav__contri navindicator__item"></div>
                         <div class="b-nav__security navindicator__item"></div>
                     </div>
                 </nav>
             </div>
             <div class="mdcontent">
                 <section>
                     <div style="padding:10px 15px;">
                         <div id="viewpage" name="viewpage">
                             <div class="row">
                                 <div class="col-sm-12  col-md-12">
                                     <div>
 <h1>
 <a id="carbondata-pre-aggregate-datamap" class="anchor" href="#carbondata-pre-aggregate-datamap" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>CarbonData Pre-aggregate DataMap</h1>
 <ul>
 <li><a href="#quick-example">Quick Example</a></li>
 <li><a href="#datamap-management">DataMap Management</a></li>
 <li><a href="#preaggregate-datamap-introduction">Pre-aggregate Table</a></li>
 <li><a href="#loading-data">Loading Data</a></li>
 <li><a href="#querying-data">Querying Data</a></li>
 <li><a href="#compacting-pre-aggregate-tables">Compaction</a></li>
 <li><a href="#data-management-with-pre-aggregate-tables">Data Management</a></li>
 </ul>
 <h2>
 <a id="quick-example" class="anchor" href="#quick-example" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Quick example</h2>
 <p>Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME</p>
 <p>Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar to $SPARK_HOME/jars</p>
 <div class="highlight highlight-source-shell"><pre>mvn clean package -DskipTests -Pspark-2.2</pre></div>
 <p>Start spark-shell in new terminal, type :paste, then copy and run the following code.</p>
 <div class="highlight highlight-source-scala"><pre> <span class="pl-k">import</span> <span class="pl-en">java</span>.<span class="pl-en">io</span>.<span class="pl-en">File</span>
  <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.{<span class="pl-en">CarbonEnv</span>, <span class="pl-en">SparkSession</span>}
  <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">CarbonSession</span>.<span class="pl-en">_</span>
  <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">streaming</span>.{<span class="pl-en">ProcessingTime</span>, <span class="pl-en">StreamingQuery</span>}
  <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">carbondata</span>.<span class="pl-en">core</span>.<span class="pl-en">util</span>.<span class="pl-en">path</span>.<span class="pl-en">CarbonStorePath</span>

  <span class="pl-k">val</span> <span class="pl-smi">warehouse</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">File</span>(<span class="pl-s"><span class="pl-pds">"</span>./warehouse<span class="pl-pds">"</span></span>).getCanonicalPath
  <span class="pl-k">val</span> <span class="pl-smi">metastore</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">File</span>(<span class="pl-s"><span class="pl-pds">"</span>./metastore<span class="pl-pds">"</span></span>).getCanonicalPath

  <span class="pl-k">val</span> <span class="pl-smi">spark</span> <span class="pl-k">=</span> <span class="pl-en">SparkSession</span>
    .builder()
    .master(<span class="pl-s"><span class="pl-pds">"</span>local<span class="pl-pds">"</span></span>)
    .appName(<span class="pl-s"><span class="pl-pds">"</span>preAggregateExample<span class="pl-pds">"</span></span>)
    .config(<span class="pl-s"><span class="pl-pds">"</span>spark.sql.warehouse.dir<span class="pl-pds">"</span></span>, warehouse)
    .getOrCreateCarbonSession(warehouse, metastore)

  spark.sparkContext.setLogLevel(<span class="pl-s"><span class="pl-pds">"</span>ERROR<span class="pl-pds">"</span></span>)

  <span class="pl-c"><span class="pl-c">//</span> drop table if exists previously</span>
  spark.sql(s<span class="pl-s"><span class="pl-pds">"</span>DROP TABLE IF EXISTS sales<span class="pl-pds">"</span></span>)

  <span class="pl-c"><span class="pl-c">//</span> Create main table</span>
  spark.sql(
    s<span class="pl-s"><span class="pl-pds">"""</span></span>
 <span class="pl-s">      | CREATE TABLE sales (</span>
 <span class="pl-s">      | user_id string,</span>
 <span class="pl-s">      | country string,</span>
 <span class="pl-s">      | quantity int,</span>
 <span class="pl-s">      | price bigint)</span>
 <span class="pl-s">      | STORED AS carbondata</span>
 <span class="pl-s">    <span class="pl-pds">"""</span></span>.stripMargin)

  <span class="pl-c"><span class="pl-c">//</span> Create pre-aggregate table on the main table</span>
  <span class="pl-c"><span class="pl-c">//</span> If main table already have data, following command </span>
  <span class="pl-c"><span class="pl-c">//</span> will trigger one immediate load to the pre-aggregate table</span>
  spark.sql(
    s<span class="pl-s"><span class="pl-pds">"""</span></span>
 <span class="pl-s">      | CREATE DATAMAP agg_sales</span>
 <span class="pl-s">      | ON TABLE sales</span>
 <span class="pl-s">      | USING "preaggregate"</span>
 <span class="pl-s">      | AS</span>
 <span class="pl-s">      | SELECT country, sum(quantity), avg(price)</span>
 <span class="pl-s">      | FROM sales</span>
 <span class="pl-s">      | GROUP BY country</span>
 <span class="pl-s">    <span class="pl-pds">"""</span></span>.stripMargin)

   <span class="pl-k">import</span> <span class="pl-en">spark</span>.<span class="pl-en">implicits</span>.<span class="pl-en">_</span>
   <span class="pl-k">import</span> <span class="pl-en">org</span>.<span class="pl-en">apache</span>.<span class="pl-en">spark</span>.<span class="pl-en">sql</span>.<span class="pl-en">SaveMode</span>
   <span class="pl-k">import</span> <span class="pl-en">scala</span>.<span class="pl-en">util</span>.<span class="pl-en">Random</span>

   <span class="pl-c"><span class="pl-c">//</span> Load data to the main table, it will also</span>
   <span class="pl-c"><span class="pl-c">//</span> trigger immediate load to pre-aggregate table.</span>
   <span class="pl-c"><span class="pl-c">//</span> These two loading operation is carried out in a</span>
   <span class="pl-c"><span class="pl-c">//</span> transactional manner, meaning that the whole </span>
   <span class="pl-c"><span class="pl-c">//</span> operation will fail if one of the loading fails</span>
   <span class="pl-k">val</span> <span class="pl-smi">r</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">Random</span>()
   spark.sparkContext.parallelize(<span class="pl-c1">1</span> to <span class="pl-c1">10</span>)
    .map(x <span class="pl-k">=&gt;</span> (<span class="pl-s"><span class="pl-pds">"</span>ID.<span class="pl-pds">"</span></span> <span class="pl-k">+</span> r.nextInt(<span class="pl-c1">100000</span>), <span class="pl-s"><span class="pl-pds">"</span>country<span class="pl-pds">"</span></span> <span class="pl-k">+</span> x <span class="pl-k">%</span> <span class="pl-c1">8</span>, x <span class="pl-k">%</span> <span class="pl-c1">50</span>, x <span class="pl-k">%</span> <span class="pl-c1">60</span>))
    .toDF(<span class="pl-s"><span class="pl-pds">"</span>user_id<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>country<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>quantity<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>price<span class="pl-pds">"</span></span>)
    .write
    .format(<span class="pl-s"><span class="pl-pds">"</span>carbondata<span class="pl-pds">"</span></span>)
    .option(<span class="pl-s"><span class="pl-pds">"</span>tableName<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>sales<span class="pl-pds">"</span></span>)
    .option(<span class="pl-s"><span class="pl-pds">"</span>compress<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span>)
    .mode(<span class="pl-en">SaveMode</span>.<span class="pl-en">Append</span>)
    .save()

   spark.sql(
     s<span class="pl-s"><span class="pl-pds">"""</span></span>
 <span class="pl-s">       |SELECT country, sum(quantity), avg(price)</span>
 <span class="pl-s">       | from sales GROUP BY country</span>
 <span class="pl-s">     <span class="pl-pds">"""</span></span>.stripMargin).show

   spark.stop</pre></div>
 <h4>
 <a id="datamap-management" class="anchor" href="#datamap-management" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>DataMap Management</h4>
 <p>DataMap can be created using following DDL</p>
 <pre><code>CREATE DATAMAP [IF NOT EXISTS] datamap_name
 ON TABLE main_table
 USING "datamap_provider"
 DMPROPERTIES ('key'='value', ...)
 AS
   SELECT statement
 </code></pre>
 <p>The string followed by USING is called DataMap Provider, in this version CarbonData supports two
 kinds of DataMap:</p>
 <ol>
 <li>preaggregate, for pre-aggregate table. Pre-Aggregate table supports two values for DMPROPERTIES.
 a. 'path' is used to specify the store location of the datamap.('path'='/location/').
 b. 'partitioning' when set to false enables user to disable partitioning of the datamap.
 Default value is true for this property.</li>
 <li>timeseries, for timeseries roll-up table. Please refer to <a href="./timeseries-datamap-guide.html">Timeseries DataMap</a>
 </li>
 </ol>
 <p>DataMap can be dropped using following DDL</p>
 <pre><code>DROP DATAMAP [IF EXISTS] datamap_name
 ON TABLE main_table
 </code></pre>
 <p>To show all DataMaps created, use:</p>
 <pre><code>SHOW DATAMAP
 ON TABLE main_table
 </code></pre>
 <p>It will show all DataMaps created on main table.</p>
 <h2>
 <a id="preaggregate-datamap-introduction" class="anchor" href="#preaggregate-datamap-introduction" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Preaggregate DataMap Introduction</h2>
 <p>Pre-aggregate tables are created as DataMaps and managed as tables internally by CarbonData.
 User can create as many pre-aggregate datamaps required to improve query performance,
 provided the storage requirements and loading speeds are acceptable.</p>
 <p>Once pre-aggregate datamaps are created, CarbonData's SparkSQL optimizer extension supports to
 select the most efficient pre-aggregate datamap and rewrite the SQL to query against the selected
 datamap instead of the main table. Since the data size of pre-aggregate datamap is smaller,
 user queries are much faster. In our previous experience, we have seen 5X to 100X times faster
 in production SQLs.</p>
 <p>For instance, main table called <strong>sales</strong> which is defined as</p>
 <pre><code>CREATE TABLE sales (
   order_time timestamp,
   user_id string,
   sex string,
   country string,
   quantity int,
   price bigint)
 STORED AS carbondata
 </code></pre>
 <p>User can create pre-aggregate tables using the Create DataMap DDL</p>
 <pre><code>CREATE DATAMAP agg_sales
 ON TABLE sales
 USING "preaggregate"
 AS
   SELECT country, sex, sum(quantity), avg(price)
   FROM sales
   GROUP BY country, sex
 </code></pre>
 <h4>
 <a id="functions-supported-in-pre-aggregate-table" class="anchor" href="#functions-supported-in-pre-aggregate-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Functions supported in pre-aggregate table</h4>
 <table>
 <thead>
 <tr>
 <th>Function</th>
 <th align="center">Rollup supported</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>SUM</td>
 <td align="center">Yes</td>
 </tr>
 <tr>
 <td>AVG</td>
 <td align="center">Yes</td>
 </tr>
 <tr>
 <td>MAX</td>
 <td align="center">Yes</td>
 </tr>
 <tr>
 <td>MIN</td>
 <td align="center">Yes</td>
 </tr>
 <tr>
 <td>COUNT</td>
 <td align="center">Yes</td>
 </tr>
 </tbody>
 </table>
 <h4>
 <a id="how-pre-aggregate-tables-are-selected" class="anchor" href="#how-pre-aggregate-tables-are-selected" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How pre-aggregate tables are selected</h4>
 <p>When a user query is submitted, during query planning phase, CarbonData will collect all matched
 pre-aggregate tables as candidates according to Relational Algebra transformation rules. Then, the
 best pre-aggregate table for this query will be selected among the candidates based on cost.
 For simplicity, current cost estimation is based on the data size of the pre-aggregate table. (We
 assume that query will be faster on smaller table)</p>
 <p>For the main table <strong>sales</strong> and pre-aggregate table <strong>agg_sales</strong> created above, following queries</p>
 <pre><code>SELECT country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex

 SELECT sex, sum(quantity) from sales GROUP BY sex

 SELECT avg(price), country from sales GROUP BY country
 </code></pre>
 <p>will be transformed by CarbonData's query planner to query against pre-aggregate table
 <strong>agg_sales</strong> instead of the main table <strong>sales</strong></p>
 <p>However, for following queries</p>
 <pre><code>SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex

 SELECT sex, avg(quantity) from sales GROUP BY sex

 SELECT country, max(price) from sales GROUP BY country
 </code></pre>
 <p>will query against main table <strong>sales</strong> only, because it does not satisfy pre-aggregate table
 selection logic.</p>
 <h2>
 <a id="loading-data" class="anchor" href="#loading-data" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Loading data</h2>
 <p>For existing table with loaded data, data load to pre-aggregate table will be triggered by the
 CREATE DATAMAP statement when user creates the pre-aggregate table. For incremental loads after
 aggregates tables are created, loading data to main table triggers the load to pre-aggregate tables
 once main table loading is complete.</p>
 <p>These loads are transactional
 meaning that data on main table and pre-aggregate tables are only visible to the user after all
 tables are loaded successfully, if one of these loads fails, new data are not visible in all tables
 as if the load operation is not happened.</p>
 <h2>
 <a id="querying-data" class="anchor" href="#querying-data" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Querying data</h2>
 <p>As a technique for query acceleration, Pre-aggregate tables cannot be queried directly.
 Queries are to be made on main table. While doing query planning, internally CarbonData will check
 associated pre-aggregate tables with the main table, and do query plan transformation accordingly.</p>
 <p>User can verify whether a query can leverage pre-aggregate table or not by executing <code>EXPLAIN</code>
 command, which will show the transformed logical plan, and thus user can check whether pre-aggregate
 table is selected.</p>
 <h2>
 <a id="compacting-pre-aggregate-tables" class="anchor" href="#compacting-pre-aggregate-tables" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Compacting pre-aggregate tables</h2>
 <p>Running Compaction command (<code>ALTER TABLE COMPACT</code>) on main table will <strong>not automatically</strong>
 compact the pre-aggregate tables created on the main table. User need to run Compaction command
 separately on each pre-aggregate table to compact them.</p>
 <p>Compaction is an optional operation for pre-aggregate table. If compaction is performed on
 main table but not performed on pre-aggregate table, all queries still can benefit from
 pre-aggregate tables. To further improve the query performance, compaction on pre-aggregate tables
 can be triggered to merge the segments and files in the pre-aggregate tables.</p>
 <h2>
 <a id="data-management-with-pre-aggregate-tables" class="anchor" href="#data-management-with-pre-aggregate-tables" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Data Management with pre-aggregate tables</h2>
 <p>In current implementation, data consistency needs to be maintained for both main table and pre-aggregate
 tables. Once there is pre-aggregate table created on the main table, following command on the main
 table
 is not supported:</p>
 <ol>
 <li>Data management command: <code>UPDATE/DELETE/DELETE SEGMENT</code>.</li>
 <li>Schema management command: <code>ALTER TABLE DROP COLUMN</code>, <code>ALTER TABLE CHANGE DATATYPE</code>,
 <code>ALTER TABLE RENAME</code>. Note that adding a new column is supported, and for dropping columns and
 change datatype command, CarbonData will check whether it will impact the pre-aggregate table, if
 not, the operation is allowed, otherwise operation will be rejected by throwing exception.</li>
 <li>Partition management command: <code>ALTER TABLE ADD/DROP PARTITION</code>
 </li>
 <li>Complex Datatypes for preaggregate is not supported.</li>
 </ol>
 <p>However, there is still way to support these operations on main table, in current CarbonData
 release, user can do as following:</p>
 <ol>
 <li>Remove the pre-aggregate table by <code>DROP DATAMAP</code> command</li>
 <li>Carry out the data management operation on main table</li>
 <li>Create the pre-aggregate table again by <code>CREATE DATAMAP</code> command
 Basically, user can manually trigger the operation by re-building the datamap.</li>
 </ol>
 <script>
 $(function() {
   // Show selected style on nav item
   $('.b-nav__datamap').addClass('selected');

   if (!$('.b-nav__datamap').parent().hasClass('nav__item__with__subs--expanded')) {
     // Display datamap subnav items
     $('.b-nav__datamap').parent().toggleClass('nav__item__with__subs--expanded');
   }
 });
 </script></div>
 </div>
 </div>
 </div>
 <div class="doc-footer">
     <a href="#top" class="scroll-top">Top</a>
 </div>
 </div>
 </section>
 </div>
 </div>
 </div>
 </section><!-- End systemblock part -->
 <script src="js/custom.js"></script>
 </body>
 </html>