blob: 4300eba0ad1c8776166f62eb259013a1ae01677d [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href='images/favicon.ico' rel='shortcut icon' type='image/x-icon'>
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>CarbonData</title>
<style>
</style>
<!-- Bootstrap -->
<link rel="stylesheet" href="css/bootstrap.min.css">
<link href="css/style.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script src="js/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<script defer src="https://use.fontawesome.com/releases/v5.0.8/js/all.js"></script>
</head>
<body>
<header>
<nav class="navbar navbar-default navbar-custom cd-navbar-wrapper">
<div class="container">
<div class="navbar-header">
<button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse"
class="navbar-toggle collapsed" type="button">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="index.html" class="logo">
<img src="images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData logo"/>
</a>
</div>
<div class="navbar-collapse collapse cd_navcontnt" id="navbar">
<ul class="nav navbar-nav navbar-right navlist-custom">
<li><a href="index.html" class="hidden-xs"><i class="fa fa-home" aria-hidden="true"></i> </a>
</li>
<li><a href="index.html" class="hidden-lg hidden-md hidden-sm">Home</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle " data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false"> Download <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.2.0/"
target="_blank">Apache CarbonData 2.2.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.1.1/"
target="_blank">Apache CarbonData 2.1.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.1.0/"
target="_blank">Apache CarbonData 2.1.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.0.1/"
target="_blank">Apache CarbonData 2.0.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/2.0.0/"
target="_blank">Apache CarbonData 2.0.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.6.1/"
target="_blank">Apache CarbonData 1.6.1</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.6.0/"
target="_blank">Apache CarbonData 1.6.0</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.4/"
target="_blank">Apache CarbonData 1.5.4</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.3/"
target="_blank">Apache CarbonData 1.5.3</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.2/"
target="_blank">Apache CarbonData 1.5.2</a></li>
<li>
<a href="https://dist.apache.org/repos/dist/release/carbondata/1.5.1/"
target="_blank">Apache CarbonData 1.5.1</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Releases"
target="_blank">Release Archive</a></li>
</ul>
</li>
<li><a href="documentation.html" class="active">Documentation</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md"
target="_blank">Contributing to CarbonData</a></li>
<li>
<a href="https://github.com/apache/carbondata/blob/master/docs/release-guide.md"
target="_blank">Release Guide</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/PMC+and+Committers+member+list"
target="_blank">Project PMC and Committers</a></li>
<li>
<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609"
target="_blank">CarbonData Meetups</a></li>
<li><a href="security.html">Apache CarbonData Security</a></li>
<li><a href="https://issues.apache.org/jira/browse/CARBONDATA" target="_blank">Apache
Jira</a></li>
<li><a href="videogallery.html">CarbonData Videos </a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="apache_link hidden-xs dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li class="dropdown">
<a href="http://www.apache.org/" class="hidden-lg hidden-md hidden-sm dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/" target="_blank">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/" target="_blank">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
</ul>
</li>
<li>
<a href="#" id="search-icon"><i class="fa fa-search" aria-hidden="true"></i></a>
</li>
</ul>
</div><!--/.nav-collapse -->
<div id="search-box">
<form method="get" action="http://www.google.com/search" target="_blank">
<div class="search-block">
<table border="0" cellpadding="0" width="100%">
<tr>
<td style="width:80%">
<input type="text" name="q" size=" 5" maxlength="255" value=""
class="search-input" placeholder="Search...." required/>
</td>
<td style="width:20%">
<input type="submit" value="Search"/></td>
</tr>
<tr>
<td align="left" style="font-size:75%" colspan="2">
<input type="checkbox" name="sitesearch" value="carbondata.apache.org" checked/>
<span style=" position: relative; top: -3px;"> Only search for CarbonData</span>
</td>
</tr>
</table>
</div>
</form>
</div>
</div>
</nav>
</header> <!-- end Header part -->
<div class="fixed-padding"></div> <!-- top padding with fixde header -->
<section><!-- Dashboard nav -->
<div class="container-fluid q">
<div class="col-sm-12 col-md-12 maindashboard">
<div class="verticalnavbar">
<nav class="b-sticky-nav">
<div class="nav-scroller">
<div class="nav__inner">
<a class="b-nav__intro nav__item" href="./introduction.html">introduction</a>
<a class="b-nav__quickstart nav__item" href="./quick-start-guide.html">quick start</a>
<a class="b-nav__uses nav__item" href="./usecases.html">use cases</a>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__docs nav__item nav__sub__anchor" href="./language-manual.html">Language Reference</a>
<a class="nav__item nav__sub__item" href="./ddl-of-carbondata.html">DDL</a>
<a class="nav__item nav__sub__item" href="./dml-of-carbondata.html">DML</a>
<a class="nav__item nav__sub__item" href="./streaming-guide.html">Streaming</a>
<a class="nav__item nav__sub__item" href="./configuration-parameters.html">Configuration</a>
<a class="nav__item nav__sub__item" href="./index-developer-guide.html">Indexes</a>
<a class="nav__item nav__sub__item" href="./supported-data-types-in-carbondata.html">Data Types</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__datamap nav__item nav__sub__anchor" href="./index-management.html">Index Managament</a>
<a class="nav__item nav__sub__item" href="./bloomfilter-index-guide.html">Bloom Filter</a>
<a class="nav__item nav__sub__item" href="./lucene-index-guide.html">Lucene</a>
<a class="nav__item nav__sub__item" href="./secondary-index-guide.html">Secondary Index</a>
<a class="nav__item nav__sub__item" href="../spatial-index-guide.html">Spatial Index</a>
<a class="nav__item nav__sub__item" href="../mv-guide.html">MV</a>
</div>
<div class="nav__item nav__item__with__subs">
<a class="b-nav__api nav__item nav__sub__anchor" href="./sdk-guide.html">API</a>
<a class="nav__item nav__sub__item" href="./sdk-guide.html">Java SDK</a>
<a class="nav__item nav__sub__item" href="./csdk-guide.html">C++ SDK</a>
</div>
<a class="b-nav__perf nav__item" href="./performance-tuning.html">Performance Tuning</a>
<a class="b-nav__s3 nav__item" href="./s3-guide.html">S3 Storage</a>
<a class="b-nav__indexserver nav__item" href="./index-server.html">Index Server</a>
<a class="b-nav__prestodb nav__item" href="./prestodb-guide.html">PrestoDB Integration</a>
<a class="b-nav__prestosql nav__item" href="./prestosql-guide.html">PrestoSQL Integration</a>
<a class="b-nav__flink nav__item" href="./flink-integration-guide.html">Flink Integration</a>
<a class="b-nav__scd nav__item" href="./scd-and-cdc-guide.html">SCD & CDC</a>
<a class="b-nav__faq nav__item" href="./faq.html">FAQ</a>
<a class="b-nav__contri nav__item" href="./how-to-contribute-to-apache-carbondata.html">Contribute</a>
<a class="b-nav__security nav__item" href="./security.html">Security</a>
<a class="b-nav__release nav__item" href="./release-guide.html">Release Guide</a>
</div>
</div>
<div class="navindicator">
<div class="b-nav__intro navindicator__item"></div>
<div class="b-nav__quickstart navindicator__item"></div>
<div class="b-nav__uses navindicator__item"></div>
<div class="b-nav__docs navindicator__item"></div>
<div class="b-nav__datamap navindicator__item"></div>
<div class="b-nav__api navindicator__item"></div>
<div class="b-nav__perf navindicator__item"></div>
<div class="b-nav__s3 navindicator__item"></div>
<div class="b-nav__indexserver navindicator__item"></div>
<div class="b-nav__prestodb navindicator__item"></div>
<div class="b-nav__prestosql navindicator__item"></div>
<div class="b-nav__flink navindicator__item"></div>
<div class="b-nav__scd navindicator__item"></div>
<div class="b-nav__faq navindicator__item"></div>
<div class="b-nav__contri navindicator__item"></div>
<div class="b-nav__security navindicator__item"></div>
</div>
</nav>
</div>
<div class="mdcontent">
<section>
<div style="padding:10px 15px;">
<div id="viewpage" name="viewpage">
<div class="row">
<div class="col-sm-12 col-md-12">
<div>
<h1>
<a id="faqs" class="anchor" href="#faqs" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>FAQs</h1>
<ul>
<li><a href="#what-are-bad-records">What are Bad Records?</a></li>
<li><a href="#where-are-bad-records-stored-in-carbondata">Where are Bad Records Stored in CarbonData?</a></li>
<li><a href="#how-to-enable-bad-record-logging">How to enable Bad Record Logging?</a></li>
<li><a href="#how-to-ignore-the-bad-records">How to ignore the Bad Records?</a></li>
<li><a href="#how-to-specify-store-location-while-creating-carbon-session">How to specify store location while creating carbon session?</a></li>
<li><a href="#what-is-carbon-lock-type">What is Carbon Lock Type?</a></li>
<li><a href="#how-to-resolve-abstract-method-error">How to resolve Abstract Method Error?</a></li>
<li><a href="#how-carbon-will-behave-when-execute-insert-operation-in-abnormal-scenarios">How Carbon will behave when execute insert operation in abnormal scenarios?</a></li>
<li><a href="#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side">Why all executors are showing success in Spark UI even after Dataload command failed at Driver side?</a></li>
<li><a href="#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output">Why different time zone result for select query output when query SDK writer output?</a></li>
<li><a href="#how-to-check-lru-cache-memory-footprint">How to check LRU cache memory footprint?</a></li>
<li><a href="#How-to-deal-with-the-trailing-task-in-query">How to deal with the trailing task in query?</a></li>
</ul>
<h1>
<a id="troubleshooting" class="anchor" href="#troubleshooting" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>TroubleShooting</h1>
<ul>
<li><a href="#getting-tablestatuslock-issues-when-loading-data">Getting tablestatus.lock issues When loading data</a></li>
<li><a href="#failed-to-load-thrift-libraries">Failed to load thrift libraries</a></li>
<li><a href="#failed-to-launch-the-spark-shell">Failed to launch the Spark Shell</a></li>
<li><a href="#failed-to-execute-load-query-on-cluster">Failed to execute load query on cluster</a></li>
<li><a href="#failed-to-execute-insert-query-on-cluster">Failed to execute insert query on cluster</a></li>
<li><a href="#failed-to-connect-to-hiveuser-with-thrift">Failed to connect to hiveuser with thrift</a></li>
<li><a href="#failed-to-read-the-metastore-db-during-table-creation">Failed to read the metastore db during table creation</a></li>
<li><a href="#failed-to-load-data-on-the-cluster">Failed to load data on the cluster</a></li>
<li><a href="#failed-to-insert-data-on-the-cluster">Failed to insert data on the cluster</a></li>
<li><a href="#failed-to-execute-concurrent-operations-on-table-by-multiple-workers">Failed to execute Concurrent Operations(Load,Insert,Update) on table by multiple workers</a></li>
<li><a href="#failed-to-create-index-and-drop-index-is-also-not-working">Failed to create_index and drop index is also not working</a></li>
</ul>
<h2></h2>
<h2>
<a id="what-are-bad-records" class="anchor" href="#what-are-bad-records" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>What are Bad Records?</h2>
<p>Records that fail to get loaded into the CarbonData due to data type incompatibility or are empty or have incompatible format are classified as Bad Records.</p>
<h2>
<a id="where-are-bad-records-stored-in-carbondata" class="anchor" href="#where-are-bad-records-stored-in-carbondata" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Where are Bad Records Stored in CarbonData?</h2>
<p>The bad records are stored at the location set in carbon.badRecords.location in carbon.properties file.
By default <strong>carbon.badRecords.location</strong> specifies the following location <code>/opt/Carbon/Spark/badrecords</code>.</p>
<h2>
<a id="how-to-enable-bad-record-logging" class="anchor" href="#how-to-enable-bad-record-logging" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How to enable Bad Record Logging?</h2>
<p>While loading data we can specify the approach to handle Bad Records. In order to analyse the cause of the Bad Records the parameter <code>BAD_RECORDS_LOGGER_ENABLE</code> must be set to value <code>TRUE</code>. There are multiple approaches to handle Bad Records which can be specified by the parameter <code>BAD_RECORDS_ACTION</code>.</p>
<ul>
<li>To pass the incorrect values of the csv rows with NULL value and load the data in CarbonData, set the following in the query :</li>
</ul>
<pre><code>'BAD_RECORDS_ACTION'='FORCE'
</code></pre>
<ul>
<li>To write the Bad Records without passing incorrect values with NULL in the raw csv (set in the parameter <strong>carbon.badRecords.location</strong>), set the following in the query :</li>
</ul>
<pre><code>'BAD_RECORDS_ACTION'='REDIRECT'
</code></pre>
<h2>
<a id="how-to-ignore-the-bad-records" class="anchor" href="#how-to-ignore-the-bad-records" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How to ignore the Bad Records?</h2>
<p>To ignore the Bad Records from getting stored in the raw csv, we need to set the following in the query :</p>
<pre><code>'BAD_RECORDS_ACTION'='IGNORE'
</code></pre>
<h2>
<a id="how-to-specify-store-location-while-creating-carbon-session" class="anchor" href="#how-to-specify-store-location-while-creating-carbon-session" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How to specify store location while creating carbon session?</h2>
<p>The store location specified while creating carbon session is used by the CarbonData to store the meta data like the schema, dictionary files, dictionary meta data and sort indexes.</p>
<p>Try creating <code>carbonsession</code> with <code>storepath</code> specified in the following manner :</p>
<pre><code>val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(&lt;carbon_store_path&gt;)
</code></pre>
<p>Example:</p>
<pre><code>val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://localhost:9000/carbon/store")
</code></pre>
<h2>
<a id="what-is-carbon-lock-type" class="anchor" href="#what-is-carbon-lock-type" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>What is Carbon Lock Type?</h2>
<p>The Apache CarbonData acquires lock on the files to prevent concurrent operation from modifying the same files. The lock can be of the following types depending on the storage location, for HDFS we specify it to be of type HDFSLOCK. By default it is set to type LOCALLOCK.
The property carbon.lock.type configuration specifies the type of lock to be acquired during concurrent operations on table. This property can be set with the following values :</p>
<ul>
<li>
<strong>LOCALLOCK</strong> : This Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently.</li>
<li>
<strong>HDFSLOCK</strong> : This Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and the HDFS supports, file based locking.</li>
</ul>
<h2>
<a id="how-to-resolve-abstract-method-error" class="anchor" href="#how-to-resolve-abstract-method-error" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How to resolve Abstract Method Error?</h2>
<p>In order to build CarbonData project it is necessary to specify the spark profile. The spark profile sets the Spark Version. You need to specify the <code>spark version</code> while using Maven to build project.</p>
<h2>
<a id="how-carbon-will-behave-when-execute-insert-operation-in-abnormal-scenarios" class="anchor" href="#how-carbon-will-behave-when-execute-insert-operation-in-abnormal-scenarios" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How Carbon will behave when execute insert operation in abnormal scenarios?</h2>
<p>Carbon support insert operation, you can refer to the syntax mentioned in <a href="./dml-of-carbondata.html">DML Operations on CarbonData</a>.
First, create a source table in spark-sql and load data into this created table.</p>
<pre><code>CREATE TABLE source_table(
id String,
name String,
city String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
</code></pre>
<pre><code>SELECT * FROM source_table;
id name city
1 jack beijing
2 erlu hangzhou
3 davi shenzhen
</code></pre>
<p><strong>Scenario 1</strong> :</p>
<p>Suppose, the column order in carbon table is different from source table, use script "SELECT * FROM carbon table" to query, will get the column order similar as source table, rather than in carbon table's column order as expected.</p>
<pre><code>CREATE TABLE IF NOT EXISTS carbon_table(
id String,
city String,
name String)
STORED AS carbondata;
</code></pre>
<pre><code>INSERT INTO TABLE carbon_table SELECT * FROM source_table;
</code></pre>
<pre><code>SELECT * FROM carbon_table;
id city name
1 jack beijing
2 erlu hangzhou
3 davi shenzhen
</code></pre>
<p>As result shows, the second column is city in carbon table, but what inside is name, such as jack. This phenomenon is same with insert data into hive table.</p>
<p>If you want to insert data into corresponding column in carbon table, you have to specify the column order same in insert statement.</p>
<pre><code>INSERT INTO TABLE carbon_table SELECT id, city, name FROM source_table;
</code></pre>
<p><strong>Scenario 2</strong> :</p>
<p>Insert operation will be failed when the number of column in carbon table is different from the column specified in select statement. The following insert operation will be failed.</p>
<pre><code>INSERT INTO TABLE carbon_table SELECT id, city FROM source_table;
</code></pre>
<p><strong>Scenario 3</strong> :</p>
<p>When the column type in carbon table is different from the column specified in select statement. The insert operation will still success, but you may get NULL in result, because NULL will be substitute value when conversion type failed.</p>
<h2>
<a id="why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side" class="anchor" href="#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Why all executors are showing success in Spark UI even after Dataload command failed at Driver side?</h2>
<p>Spark executor shows task as failed after the maximum number of retry attempts, but loading the data having bad records and BAD_RECORDS_ACTION (carbon.bad.records.action) is set as "FAIL" will attempt only once but will send the signal to driver as failed instead of throwing the exception to retry, as there is no point to retry if bad record found and BAD_RECORDS_ACTION is set to fail. Hence the Spark executor displays this one attempt as successful but the command has actually failed to execute. Task attempts or executor logs can be checked to observe the failure reason.</p>
<h2>
<a id="why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output" class="anchor" href="#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Why different time zone result for select query output when query SDK writer output?</h2>
<p>SDK writer is an independent entity, hence SDK writer can generate carbondata files from a non-cluster machine that has different time zones. But at cluster when those files are read, it always takes cluster time-zone. Hence, the value of timestamp and date datatype fields are not original value.
If wanted to control timezone of data while writing, then set cluster's time-zone in SDK writer by calling below API.</p>
<pre><code>TimeZone.setDefault(timezoneValue)
</code></pre>
<p><strong>Example:</strong></p>
<pre><code>cluster timezone is Asia/Shanghai
TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
</code></pre>
<h2>
<a id="how-to-check-lru-cache-memory-footprint" class="anchor" href="#how-to-check-lru-cache-memory-footprint" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How to check LRU cache memory footprint?</h2>
<p>To observe the LRU cache memory footprint in the logs, configure the below properties in log4j.properties file.</p>
<pre><code>log4j.logger.org.apache.carbondata.core.cache.CarbonLRUCache = DEBUG
</code></pre>
<p>This property will enable the DEBUG log for the CarbonLRUCache and UnsafeMemoryManager which will print the information of memory consumed using which the LRU cache size can be decided. <strong>Note:</strong> Enabling the DEBUG log will degrade the query performance. Ensure carbon.max.driver.lru.cache.size is configured to observe the current cache size.</p>
<p><strong>Example:</strong></p>
<pre><code>18/09/26 15:05:29 DEBUG CarbonLRUCache: main Required size for entry /home/target/store/default/stored_as_carbondata_table/Fact/Part0/Segment_0/0_1537954529044.carbonindexmerge :: 181 Current cache size :: 0
18/09/26 15:05:30 INFO CarbonLRUCache: main Removed entry from InMemory lru cache :: /home/target/store/default/stored_as_carbondata_table/Fact/Part0/Segment_0/0_1537954529044.carbonindexmerge
</code></pre>
<p><strong>Note:</strong> If <code>Removed entry from InMemory LRU cache</code> are frequently observed in logs, you may have to increase the configured LRU size.</p>
<p>To observe the LRU cache from heap dump, check the heap used by CarbonLRUCache class.</p>
<h2>
<a id="how-to-deal-with-the-trailing-task-in-query" class="anchor" href="#how-to-deal-with-the-trailing-task-in-query" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>How to deal with the trailing task in query?</h2>
<p>When tuning query performance, user may found that a few tasks slow down the overall query progress. To improve performance in such case, user can set spark.locality.wait and spark.speculation=true to enable speculation in spark, which will launch multiple task and get the result the one of the task which is finished first. Besides, user can also consider following configurations to further improve performance in this case.</p>
<p><strong>Example:</strong></p>
<pre><code>spark.locality.wait = 500
spark.speculation = true
spark.speculation.quantile = 0.75
spark.speculation.multiplier = 5
spark.blacklist.enabled = false
</code></pre>
<p><strong>Note:</strong></p>
<p>spark.locality control data locality the value of 500 is used to shorten the waiting time of spark.</p>
<p>spark.speculation is a group of configuration, that can monitor trailing tasks and start new tasks when conditions are met.</p>
<p>spark.blacklist.enabled, avoid reduction of available executors due to blacklist mechanism.</p>
<h2>
<a id="getting-tablestatuslock-issues-when-loading-data" class="anchor" href="#getting-tablestatuslock-issues-when-loading-data" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Getting tablestatus.lock issues When loading data</h2>
<p><strong>Symptom</strong></p>
<pre><code>17/11/11 16:48:13 ERROR LocalFileLock: main hdfs:/localhost:9000/carbon/store/default/hdfstable/tablestatus.lock (No such file or directory)
java.io.FileNotFoundException: hdfs:/localhost:9000/carbon/store/default/hdfstable/tablestatus.lock (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:213)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:101)
</code></pre>
<p><strong>Possible Cause</strong>
If you use <code>&lt;hdfs path&gt;</code> as store path when creating carbonsession, may get the errors,because the default is LOCALLOCK.</p>
<p><strong>Procedure</strong>
Before creating carbonsession, sets as below:</p>
<pre><code>import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.constants.CarbonCommonConstants
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE, "HDFSLOCK")
</code></pre>
<h2>
<a id="failed-to-load-thrift-libraries" class="anchor" href="#failed-to-load-thrift-libraries" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to load thrift libraries</h2>
<p><strong>Symptom</strong></p>
<p>Thrift throws following exception :</p>
<pre><code>thrift: error while loading shared libraries:
libthriftc.so.0: cannot open shared object file: No such file or directory
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The complete path to the directory containing the libraries is not configured correctly.</p>
<p><strong>Procedure</strong></p>
<p>Follow the Apache thrift docs at <a href="https://thrift.apache.org/docs/install" target=_blank rel="nofollow">https://thrift.apache.org/docs/install</a> to install thrift correctly.</p>
<h2>
<a id="failed-to-launch-the-spark-shell" class="anchor" href="#failed-to-launch-the-spark-shell" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to launch the Spark Shell</h2>
<p><strong>Symptom</strong></p>
<p>The shell prompts the following error :</p>
<pre><code>org.apache.spark.sql.CarbonContext$$anon$$apache$spark$sql$catalyst$analysis
$OverrideCatalog$_setter_$org$apache$spark$sql$catalyst$analysis
$OverrideCatalog$$overrides_$e
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The Spark Version and the selected Spark Profile do not match.</p>
<p><strong>Procedure</strong></p>
<ol>
<li>
<p>Ensure your spark version and selected profile for spark are correct.</p>
</li>
<li>
<p>Use the following command :</p>
</li>
</ol>
<pre><code>mvn -Pspark-2.4 -Dspark.version={yourSparkVersion} clean package
</code></pre>
<p>Note : Refrain from using "mvn clean package" without specifying the profile.</p>
<h2>
<a id="failed-to-execute-load-query-on-cluster" class="anchor" href="#failed-to-execute-load-query-on-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to execute load query on cluster</h2>
<p><strong>Symptom</strong></p>
<p>Load query failed with the following exception:</p>
<pre><code>Dictionary file is locked for updation.
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The carbon.properties file is not identical in all the nodes of the cluster.</p>
<p><strong>Procedure</strong></p>
<p>Follow the steps to ensure the carbon.properties file is consistent across all the nodes:</p>
<ol>
<li>
<p>Copy the carbon.properties file from the master node to all the other nodes in the cluster.
For example, you can use ssh to copy this file to all the nodes.</p>
</li>
<li>
<p>For the changes to take effect, restart the Spark cluster.</p>
</li>
</ol>
<h2>
<a id="failed-to-execute-insert-query-on-cluster" class="anchor" href="#failed-to-execute-insert-query-on-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to execute insert query on cluster</h2>
<p><strong>Symptom</strong></p>
<p>Load query failed with the following exception:</p>
<pre><code>Dictionary file is locked for updation.
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The carbon.properties file is not identical in all the nodes of the cluster.</p>
<p><strong>Procedure</strong></p>
<p>Follow the steps to ensure the carbon.properties file is consistent across all the nodes:</p>
<ol>
<li>
<p>Copy the carbon.properties file from the master node to all the other nodes in the cluster.
For example, you can use scp to copy this file to all the nodes.</p>
</li>
<li>
<p>For the changes to take effect, restart the Spark cluster.</p>
</li>
</ol>
<h2>
<a id="failed-to-connect-to-hiveuser-with-thrift" class="anchor" href="#failed-to-connect-to-hiveuser-with-thrift" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to connect to hiveuser with thrift</h2>
<p><strong>Symptom</strong></p>
<p>We get the following exception :</p>
<pre><code>Cannot connect to hiveuser.
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The external process does not have permission to access.</p>
<p><strong>Procedure</strong></p>
<p>Ensure that the Hiveuser in mysql must allow its access to the external processes.</p>
<h2>
<a id="failed-to-read-the-metastore-db-during-table-creation" class="anchor" href="#failed-to-read-the-metastore-db-during-table-creation" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to read the metastore db during table creation</h2>
<p><strong>Symptom</strong></p>
<p>We get the following exception on trying to connect :</p>
<pre><code>Cannot read the metastore db
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The metastore db is dysfunctional.</p>
<p><strong>Procedure</strong></p>
<p>Remove the metastore db from the carbon.metastore in the Spark Directory.</p>
<h2>
<a id="failed-to-load-data-on-the-cluster" class="anchor" href="#failed-to-load-data-on-the-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to load data on the cluster</h2>
<p><strong>Symptom</strong></p>
<p>Data loading fails with the following exception :</p>
<pre><code>Data Load failure exception
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The following issue can cause the failure :</p>
<ol>
<li>
<p>The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.</p>
</li>
<li>
<p>Path to hdfs ddl is not configured correctly in the carbon.properties.</p>
</li>
</ol>
<p><strong>Procedure</strong></p>
<p>Follow the steps to ensure the following configuration files are consistent across all the nodes:</p>
<ol>
<li>
<p>Copy the core-site.xml, hive-site.xml, yarn-site, carbon.properties files from the master node to all the other nodes in the cluster.
For example, you can use scp to copy this file to all the nodes.</p>
<p>Note : Set the path to hdfs ddl in carbon.properties in the master node.</p>
</li>
<li>
<p>For the changes to take effect, restart the Spark cluster.</p>
</li>
</ol>
<h2>
<a id="failed-to-insert-data-on-the-cluster" class="anchor" href="#failed-to-insert-data-on-the-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to insert data on the cluster</h2>
<p><strong>Symptom</strong></p>
<p>Insertion fails with the following exception :</p>
<pre><code>Data Load failure exception
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>The following issue can cause the failure :</p>
<ol>
<li>
<p>The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.</p>
</li>
<li>
<p>Path to hdfs ddl is not configured correctly in the carbon.properties.</p>
</li>
</ol>
<p><strong>Procedure</strong></p>
<p>Follow the steps to ensure the following configuration files are consistent across all the nodes:</p>
<ol>
<li>
<p>Copy the core-site.xml, hive-site.xml, yarn-site, carbon.properties files from the master node to all the other nodes in the cluster.
For example, you can use scp to copy this file to all the nodes.</p>
<p>Note : Set the path to hdfs ddl in carbon.properties in the master node.</p>
</li>
<li>
<p>For the changes to take effect, restart the Spark cluster.</p>
</li>
</ol>
<h2>
<a id="failed-to-execute-concurrent-operations-on-table-by-multiple-workers" class="anchor" href="#failed-to-execute-concurrent-operations-on-table-by-multiple-workers" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to execute Concurrent Operations on table by multiple workers</h2>
<p><strong>Symptom</strong></p>
<p>Execution fails with the following exception :</p>
<pre><code>Table is locked for updation.
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>Concurrency not supported.</p>
<p><strong>Procedure</strong></p>
<p>Worker must wait for the query execution to complete and the table to release the lock for another query execution to succeed.</p>
<h2>
<a id="failed-to-create-index-and-drop-index-is-also-not-working" class="anchor" href="#failed-to-create-index-and-drop-index-is-also-not-working" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to create index and drop index is also not working</h2>
<p><strong>Symptom</strong></p>
<p>Execution fails with the following exception :</p>
<pre><code>HDFS Quota Exceeded
</code></pre>
<p><strong>Possible Cause</strong></p>
<p>HDFS Quota is set, and it is not letting carbondata write or modify any files.</p>
<p><strong>Procedure</strong></p>
<p>Drop that particular index using Drop Index command so as to clear the stale folders.</p>
<script>
// Show selected style on nav item
$(function() { $('.b-nav__faq').addClass('selected'); });
</script></div>
</div>
</div>
</div>
<div class="doc-footer">
<a href="#top" class="scroll-top">Top</a>
</div>
</div>
</section>
</div>
</div>
</div>
</section><!-- End systemblock part -->
<script src="js/custom.js"></script>
</body>
</html>