blob: f5f9678c3f97aae692715797e96b4dfd163f653b [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href='images/favicon.ico' rel='shortcut icon' type='image/x-icon'>
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>CarbonData</title>
<style>
</style>
<!-- Bootstrap -->
<link rel="stylesheet" href="css/bootstrap.min.css">
<link href="css/style.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script src="js/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<script defer src="https://use.fontawesome.com/releases/v5.0.8/js/all.js"></script>
</head>
<body>
<header>
<nav class="navbar navbar-default navbar-custom cd-navbar-wrapper">
<div class="container">
<div class="navbar-header">
<button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse"
class="navbar-toggle collapsed" type="button">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="index.html" class="logo">
<img src="images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData logo"/>
</a>
</div>
<div class="navbar-collapse collapse cd_navcontnt" id="navbar">
<ul class="nav navbar-nav navbar-right navlist-custom">
<li><a href="index.html" class="hidden-xs"><i class="fa fa-home" aria-hidden="true"></i> </a>
</li>
<li><a href="index.html" class="hidden-lg hidden-md hidden-sm">Home</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle " data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false"> Download <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.0/"
target="_blank">Apache CarbonData 1.5.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.1/"
target="_blank">Apache CarbonData 1.4.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.4.0/"
target="_blank">Apache CarbonData 1.4.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.1/"
target="_blank">Apache CarbonData 1.3.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/"
target="_blank">Apache CarbonData 1.3.0</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Releases"
target="_blank">Release Archive</a></li>
</ul>
</li>
<li><a href="documentation.html" class="active">Documentation</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md"
target="_blank">Contributing to CarbonData</a></li>
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/release-guide.md"
target="_blank">Release Guide</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/PMC+and+Committers+member+list"
target="_blank">Project PMC and Committers</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609"
target="_blank">CarbonData Meetups</a></li>
<li><a href="security.html">Apache CarbonData Security</a></li>
<li><a href="https://issues.apache.org/jira/browse/CARBONDATA" target="_blank">Apache
Jira</a></li>
<li><a href="videogallery.html">CarbonData Videos </a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="apache_link hidden-xs dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="hidden-lg hidden-md hidden-sm dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li>
<a href="#" id="search-icon"><i class="fa fa-search" aria-hidden="true"></i></a>
</li>
</ul>
</div><!--/.nav-collapse -->
<div id="search-box">
<form method="get" action="http://www.google.com/search" target="_blank">
<div class="search-block">
<table border="0" cellpadding="0" width="100%">
<tr>
<td style="width:80%">
<input type="text" name="q" size=" 5" maxlength="255" value=""
class="search-input" placeholder="Search...." required/>
</td>
<td style="width:20%">
<input type="submit" value="Search"/></td>
</tr>
<tr>
<td align="left" style="font-size:75%" colspan="2">
<input type="checkbox" name="sitesearch" value="carbondata.apache.org" checked/>
<span style=" position: relative; top: -3px;"> Only search for CarbonData</span>
</td>
</tr>
</table>
</div>
</form>
</div>
</div>
</nav>
</header> <!-- end Header part -->
<div class="fixed-padding"></div> <!-- top padding with fixde header -->
<section><!-- Dashboard nav -->
<div class="container-fluid q">
<div class="col-sm-12 col-md-12 maindashboard">
<div class="verticalnavbar">
<nav class="b-sticky-nav">
<div class="nav-scroller">
<div class="nav__inner">
<a class="b-nav__intro nav__item" href="./introduction.html">introduction</a>
<a class="b-nav__quickstart nav__item" href="./quick-start-guide.html">quick start</a>
<a class="b-nav__uses nav__item" href="./usecases.html">use cases</a>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__docs nav__item nav__sub__anchor" href="./language-manual.html">Language Reference</a>
<a class="nav__item nav__sub__item" href="./ddl-of-carbondata.html">DDL</a>
<a class="nav__item nav__sub__item" href="./dml-of-carbondata.html">DML</a>
<a class="nav__item nav__sub__item" href="./streaming-guide.html">Streaming</a>
<a class="nav__item nav__sub__item" href="./configuration-parameters.html">Configuration</a>
<a class="nav__item nav__sub__item" href="./datamap-developer-guide.html">Datamaps</a>
<a class="nav__item nav__sub__item" href="./supported-data-types-in-carbondata.html">Data Types</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__datamap nav__item nav__sub__anchor" href="./datamap-management.html">DataMaps</a>
<a class="nav__item nav__sub__item" href="./bloomfilter-datamap-guide.html">Bloom Filter</a>
<a class="nav__item nav__sub__item" href="./lucene-datamap-guide.html">Lucene</a>
<a class="nav__item nav__sub__item" href="./preaggregate-datamap-guide.html">Pre-Aggregate</a>
<a class="nav__item nav__sub__item" href="./timeseries-datamap-guide.html">Time Series</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__api nav__item nav__sub__anchor" href="./sdk-guide.html">API</a>
<a class="nav__item nav__sub__item" href="./sdk-guide.html">Java SDK</a>
<a class="nav__item nav__sub__item" href="./CSDK-guide.html">C++ SDK</a>
</div>
<a class="b-nav__perf nav__item" href="./performance-tuning.html">Performance Tuning</a>
<a class="b-nav__s3 nav__item" href="./s3-guide.html">S3 Storage</a>
<a class="b-nav__faq nav__item" href="./faq.html">FAQ</a>
<a class="b-nav__contri nav__item" href="./how-to-contribute-to-apache-carbondata.html">Contribute</a>
<a class="b-nav__security nav__item" href="./security.html">Security</a>
<a class="b-nav__release nav__item" href="./release-guide.html">Release Guide</a>
</div>
</div>
<div class="navindicator">
<div class="b-nav__intro navindicator__item"></div>
<div class="b-nav__quickstart navindicator__item"></div>
<div class="b-nav__uses navindicator__item"></div>
<div class="b-nav__docs navindicator__item"></div>
<div class="b-nav__datamap navindicator__item"></div>
<div class="b-nav__api navindicator__item"></div>
<div class="b-nav__perf navindicator__item"></div>
<div class="b-nav__s3 navindicator__item"></div>
<div class="b-nav__faq navindicator__item"></div>
<div class="b-nav__contri navindicator__item"></div>
<div class="b-nav__security navindicator__item"></div>
</div>
</nav>
</div>
<div class="mdcontent">
<section>
<div style="padding:10px 15px;">
<div id="viewpage" name="viewpage">
<div class="row">
<div class="col-sm-12 col-md-12">
<div>
<h1>
<a id="carbondata-datamap-management" class="anchor" href="#carbondata-datamap-management" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>CarbonData DataMap Management</h1>
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#datamap-management">DataMap Management</a></li>
<li><a href="#automatic-refresh">Automatic Refresh</a></li>
<li><a href="#manual-refresh">Manual Refresh</a></li>
<li><a href="#datamap-catalog">DataMap Catalog</a></li>
<li>
<a href="#datamap-related-commands">DataMap Related Commands</a>
<ul>
<li><a href="#explain">Explain</a></li>
<li><a href="#show-datamap">Show DataMap</a></li>
<li><a href="#compaction-on-datamap">Compaction on DataMap</a></li>
</ul>
</li>
</ul>
<h2>
<a id="overview" class="anchor" href="#overview" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Overview</h2>
<p>DataMap can be created using following DDL</p>
<pre><code> CREATE DATAMAP [IF NOT EXISTS] datamap_name
[ON TABLE main_table]
USING "datamap_provider"
[WITH DEFERRED REBUILD]
DMPROPERTIES ('key'='value', ...)
AS
SELECT statement
</code></pre>
<p>Currently, there are 5 DataMap implementations in CarbonData.</p>
<table>
<thead>
<tr>
<th>DataMap Provider</th>
<th>Description</th>
<th>DMPROPERTIES</th>
<th>Management</th>
</tr>
</thead>
<tbody>
<tr>
<td>preaggregate</td>
<td>single table pre-aggregate table</td>
<td>No DMPROPERTY is required</td>
<td>Automatic</td>
</tr>
<tr>
<td>timeseries</td>
<td>time dimension rollup table</td>
<td>event_time, xx_granularity, please refer to <a href="./timeseries-datamap-guide.html">Timeseries DataMap</a>
</td>
<td>Automatic</td>
</tr>
<tr>
<td>mv</td>
<td>multi-table pre-aggregate table</td>
<td>No DMPROPERTY is required</td>
<td>Manual</td>
</tr>
<tr>
<td>lucene</td>
<td>lucene indexing for text column</td>
<td>index_columns to specifying the index columns</td>
<td>Automatic</td>
</tr>
<tr>
<td>bloomfilter</td>
<td>bloom filter for high cardinality column, geospatial column</td>
<td>index_columns to specifying the index columns</td>
<td>Automatic</td>
</tr>
</tbody>
</table>
<h2>
<a id="datamap-management" class="anchor" href="#datamap-management" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>DataMap Management</h2>
<p>There are two kinds of management semantic for DataMap.</p>
<ol>
<li>Automatic Refresh: Create datamap without <code>WITH DEFERRED REBUILD</code> in the statement, which is by default.</li>
<li>Manual Refresh: Create datamap with <code>WITH DEFERRED REBUILD</code> in the statement</li>
</ol>
<p><strong>CAUTION:</strong>
If user create MV datamap without specifying <code>WITH DEFERRED REBUILD</code>, carbondata will give a warning and treat the datamap as deferred rebuild.</p>
<h3>
<a id="automatic-refresh" class="anchor" href="#automatic-refresh" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Automatic Refresh</h3>
<p>When user creates a datamap on the main table without using <code>WITH DEFERRED REBUILD</code> syntax, the datamap will be managed by system automatically.
For every data load to the main table, system will immediately trigger a load to the datamap automatically. These two data loading (to main table and datamap) is executed in a transactional manner, meaning that it will be either both success or neither success.</p>
<p>The data loading to datamap is incremental based on Segment concept, avoiding a expensive total rebuild.</p>
<p>If user perform following command on the main table, system will return failure. (reject the operation)</p>
<ol>
<li>Data management command: <code>UPDATE/DELETE/DELETE SEGMENT</code>.</li>
<li>Schema management command: <code>ALTER TABLE DROP COLUMN</code>, <code>ALTER TABLE CHANGE DATATYPE</code>,
<code>ALTER TABLE RENAME</code>. Note that adding a new column is supported, and for dropping columns and
change datatype command, CarbonData will check whether it will impact the pre-aggregate table, if
not, the operation is allowed, otherwise operation will be rejected by throwing exception.</li>
<li>Partition management command: `ALTER TABLE ADD/DROP PARTITION</li>
</ol>
<p>If user do want to perform above operations on the main table, user can first drop the datamap, perform the operation, and re-create the datamap again.</p>
<p>If user drop the main table, the datamap will be dropped immediately too.</p>
<p>We do recommend you to use this management for index datamap.</p>
<h3>
<a id="manual-refresh" class="anchor" href="#manual-refresh" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Manual Refresh</h3>
<p>When user creates a datamap specifying manual refresh semantic, the datamap is created with status <em>disabled</em> and query will NOT use this datamap until user can issue REBUILD DATAMAP command to build the datamap. For every REBUILD DATAMAP command, system will trigger a full rebuild of the datamap. After rebuild is done, system will change datamap status to <em>enabled</em>, so that it can be used in query rewrite.</p>
<p>For every new data loading, data update, delete, the related datamap will be made <em>disabled</em>,
which means that the following queries will not benefit from the datamap before it becomes <em>enabled</em> again.</p>
<p>If the main table is dropped by user, the related datamap will be dropped immediately.</p>
<p><strong>Note</strong>:</p>
<ul>
<li>If you are creating a datamap on external table, you need to do manual management of the datamap.</li>
<li>For index datamap such as BloomFilter datamap, there is no need to do manual refresh.
By default it is automatic refresh,
which means its data will get refreshed immediately after the datamap is created or the main table is loaded.
Manual refresh on this datamap will has no impact.</li>
</ul>
<h2>
<a id="datamap-catalog" class="anchor" href="#datamap-catalog" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>DataMap Catalog</h2>
<p>Currently, when user creates a datamap, system will store the datamap metadata in a configurable <em>system</em> folder in HDFS or S3.</p>
<p>In this <em>system</em> folder, it contains:</p>
<ul>
<li>DataMapSchema file. It is a json file containing schema for one datamap. Ses DataMapSchema class. If user creates 100 datamaps (on different tables), there will be 100 files in <em>system</em> folder.</li>
<li>DataMapStatus file. Only one file, it is in json format, and each entry in the file represents for one datamap. Ses DataMapStatusDetail class</li>
</ul>
<p>There is a DataMapCatalog interface to retrieve schema of all datamap, it can be used in optimizer to get the metadata of datamap.</p>
<h2>
<a id="datamap-related-commands" class="anchor" href="#datamap-related-commands" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>DataMap Related Commands</h2>
<h3>
<a id="explain" class="anchor" href="#explain" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Explain</h3>
<p>How can user know whether datamap is used in the query?</p>
<p>User can set enable.query.statistics = true and use EXPLAIN command to know, it will print out something like</p>
<pre lang="text"><code>== CarbonData Profiler ==
Hit mv DataMap: datamap1
Scan Table: default.datamap1_table
+- filter:
+- pruning by CG DataMap
+- all blocklets: 1
skipped blocklets: 0
</code></pre>
<h3>
<a id="show-datamap" class="anchor" href="#show-datamap" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Show DataMap</h3>
<p>There is a SHOW DATAMAPS command, when this is issued, system will read all datamap from <em>system</em> folder and print all information on screen. The current information includes:</p>
<ul>
<li>DataMapName</li>
<li>DataMapProviderName like mv, preaggreagte, timeseries, etc</li>
<li>Associated Table</li>
</ul>
<h3>
<a id="compaction-on-datamap" class="anchor" href="#compaction-on-datamap" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Compaction on DataMap</h3>
<p>This feature applies for preaggregate datamap only</p>
<p>Running Compaction command (<code>ALTER TABLE COMPACT</code>) on main table will <strong>not automatically</strong> compact the pre-aggregate tables created on the main table. User need to run Compaction command separately on each pre-aggregate table to compact them.</p>
<p>Compaction is an optional operation for pre-aggregate table. If compaction is performed on main table but not performed on pre-aggregate table, all queries still can benefit from pre-aggregate tables. To further improve the query performance, compaction on pre-aggregate tables can be triggered to merge the segments and files in the pre-aggregate tables.</p>
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__datamap').addClass('selected');
if (!$('.b-nav__datamap').parent().hasClass('nav__item__with__subs--expanded')) {
// Display datamap subnav items
$('.b-nav__datamap').parent().toggleClass('nav__item__with__subs--expanded');
}
});
</script></div>
</div>
</div>
</div>
<div class="doc-footer">
<a href="#top" class="scroll-top">Top</a>
</div>
</div>
</section>
</div>
</div>
</div>
</section><!-- End systemblock part -->
<script src="js/custom.js"></script>
</body>
</html>