blob: b89b982738499e5872eddadd308a003f566e04b4 [file] [log] [blame]
<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview">
<p>&lt;!–<br>
Licensed to the Apache Software Foundation (ASF) under one<br>
or more contributor license agreements. See the NOTICE file<br>
distributed with this work for additional information<br>
regarding copyright ownership. The ASF licenses this file<br>
to you under the Apache License, Version 2.0 (the<br>
“License”); you may not use this file except in compliance<br>
with the License. You may obtain a copy of the License at</p>
<pre><code> http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
</code></pre>
<p>–&gt;</p>
<h1><a id="DML_Operations_on_CarbonData_19"></a>DML Operations on CarbonData</h1>
<p>This tutorial will guide you through the data manipulation language support provided by CarbonData.</p>
<h2><a id="Overview_22"></a>Overview</h2>
<p>The following DML operations are supported in CarbonData :</p>
<ul>
<li><a href="#load-data">LOAD DATA</a></li>
<li><a href="#show-segements">SHOW SEGMENTS</a></li>
<li><a href="#delete-segment-by-id">DELETE SEGMENT BY ID</a></li>
<li><a href="#delete-segment-by-date">DELETE SEGMENT BY DATE</a></li>
</ul>
<h2><a id="LOAD_DATA_30"></a>LOAD DATA</h2>
<p>This command loads the user data in raw format to the Carbon specific data format store, this allows Carbon to provides good performance while querying the data.<br>
Please visit <a href="">Data Management</a> for more details on LOAD.</p>
<h3><a id="Syntax_35"></a>Syntax</h3>
<pre><code>LOAD DATA [LOCAL] INPATH 'folder_path' INTO TABLE [db_name.]table_name OPTIONS(property_name=property_value, ...)
</code></pre>
<p>OPTIONS are not mandatory for data loading process. Inside OPTIONS user can provide either of any options like DELIMITER,QUOTECHAR, ESCAPECHAR,MULTILINE as per requirement.</p>
<p>Note: The path shall be canonical path.</p>
<h3><a id="Parameter_Description_45"></a>Parameter Description</h3>
<table class="table table-striped table-bordered">
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Optional</th>
</tr>
</thead>
<tbody>
<tr>
<td>folder_path</td>
<td>Path of raw csv data folder or file.</td>
<td>NO</td>
</tr>
<tr>
<td>db_name</td>
<td>Database name, if it is not specified then it uses current database.</td>
<td>YES</td>
</tr>
<tr>
<td>table_name</td>
<td>The name of the table in provided database.</td>
<td>NO</td>
</tr>
<tr>
<td>OPTIONS</td>
<td>Extra options provided to Load</td>
<td>YES</td>
</tr>
</tbody>
</table>
<h3><a id="Usage_Guideline_55"></a>Usage Guideline</h3>
<p>Following are the options that can be used for loading data:</p>
<ul>
<li>
<p><strong>DELIMITER:</strong> Delimiters can be provided in the load command.</p>
<pre><code>OPTIONS('DELIMITER'=',')
</code></pre>
</li>
<li>
<p><strong>QUOTECHAR:</strong> Quote Characters can be provided in the load command.</p>
<pre><code>OPTIONS('QUOTECHAR'='&quot;')
</code></pre>
</li>
<li>
<p><strong>COMMENTCHAR:</strong> Comment Characters can be provided in the load command if user want to comment lines.</p>
<pre><code>OPTIONS('COMMENTCHAR'='#')
</code></pre>
</li>
<li>
<p><strong>FILEHEADER:</strong> Headers can be provided in the LOAD DATA command if headers are missing in the source files.</p>
<pre><code>OPTIONS('FILEHEADER'='column1,column2')
</code></pre>
</li>
<li>
<p><strong>MULTILINE:</strong> CSV with new line character in quotes.</p>
<pre><code>OPTIONS('MULTILINE'='true')
</code></pre>
</li>
<li>
<p><strong>ESCAPECHAR:</strong> Escape char can be provided if user want strict validation of escape character on CSV.</p>
<pre><code>OPTIONS('ESCAPECHAR'='\')
</code></pre>
</li>
<li>
<p><strong>COMPLEX_DELIMITER_LEVEL_1:</strong> Split the complex type data column in a row (eg., a$b$c --&gt; Array = {a,b,c}).</p>
<pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$')
</code></pre>
</li>
<li>
<p><strong>COMPLEX_DELIMITER_LEVEL_2:</strong> Split the complex type nested data column in a row. Applies level_1 delimiter &amp; applies level_2 based on complex data type (eg., a:b$c:d --&gt; Array&gt; = {{a,b},{c,d}}).</p>
<pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':')
</code></pre>
</li>
<li>
<p><strong>ALL_DICTIONARY_PATH:</strong> All dictionary files path.</p>
<pre><code>OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary')
</code></pre>
</li>
<li>
<p><strong>COLUMNDICT:</strong> Dictionary file path for specified column.</p>
<pre><code>OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2')
</code></pre>
<p>Note: ALL_DICTIONARY_PATH and COLUMNDICT can’t be used together.</p>
</li>
<li>
<p><strong>DATEFORMAT:</strong> Date format for specified column.</p>
<pre><code>OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2')
</code></pre>
<p>Note: Date formats are specified by date pattern strings. The date pattern letters in Carbon are<br>
the same as in JAVA. Refer to <a href="http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html">SimpleDateFormat</a>.</p>
</li>
</ul>
<h3><a id="Example_119"></a>Example:</h3>
<pre><code>LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable
options('DELIMITER'=',', 'QUOTECHAR'='&quot;', 'COMMENTCHAR'='#',
'FILEHEADER'='empno,empname,
designation,doj,workgroupcategory,
workgroupcategoryname,deptno,deptname,projectcode,
projectjoindate,projectenddate,attendance,utilization,salary',
'MULTILINE'='true', 'ESCAPECHAR'='\',
'COMPLEX_DELIMITER_LEVEL_1'='$',
'COMPLEX_DELIMITER_LEVEL_2'=':',
'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary'
)
</code></pre>
<h2><a id="SHOW_SEGMENTS_136"></a>SHOW SEGMENTS</h2>
<p>This command is used to get the segments of carbon table.</p>
<pre><code>SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments;
</code></pre>
<h3><a id="Parameter_Description_144"></a>Parameter Description</h3>
<table class="table table-striped table-bordered">
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Optional</th>
</tr>
</thead>
<tbody>
<tr>
<td>db_name</td>
<td>Database name, if it is not specified then it uses current database.</td>
<td>YES</td>
</tr>
<tr>
<td>table_name</td>
<td>The name of the table in provided database.</td>
<td>NO</td>
</tr>
<tr>
<td>number_of_segments</td>
<td>limit the output to this number.</td>
<td>YES</td>
</tr>
</tbody>
</table>
<h3><a id="Example_152"></a>Example:</h3>
<pre><code>SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4;
</code></pre>
<h2><a id="DELETE_SEGMENT_BY_ID_157"></a>DELETE SEGMENT BY ID</h2>
<p>This command is used to delete segment by using the segment ID. Each segment has a unique segment ID associated with it.<br>
Using this segment ID, you can remove the segment.</p>
<p>The following command will get the segmentID.</p>
<pre><code>SHOW SEGMENTS FOR Table dbname.tablename LIMIT number_of_segments
</code></pre>
<p>After you retrieve the segment ID of the segment that you want to delete, execute the following command to delete the selected segment.</p>
<pre><code>DELETE SEGMENT segment_sequence_id1, segments_sequence_id2, .... FROM TABLE tableName
</code></pre>
<h3><a id="Parameter_Description_171"></a>Parameter Description</h3>
<table class="table table-striped table-bordered">
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Optional</th>
</tr>
</thead>
<tbody>
<tr>
<td>segment_id</td>
<td>Segment Id of the load.</td>
<td>NO</td>
</tr>
<tr>
<td>db_name</td>
<td>Database name, if it is not specified then it uses current database.</td>
<td>YES</td>
</tr>
<tr>
<td>table_name</td>
<td>The name of the table in provided database.</td>
<td>NO</td>
</tr>
</tbody>
</table>
<h3><a id="Example_178"></a>Example:</h3>
<pre><code>DELETE SEGMENT 0 FROM TABLE CarbonDatabase.CarbonTable;
DELETE SEGMENT 0.1,5,8 FROM TABLE CarbonDatabase.CarbonTable;
</code></pre>
<p>Note: Here 0.1 is compacted segment sequence id.</p>
<h2><a id="DELETE_SEGMENT_BY_DATE_185"></a>DELETE SEGMENT BY DATE</h2>
<p>This command will allow to delete the Carbon segment(s) from the store based on the date provided by the user in the DML command.<br>
The segment created before the particular date will be removed from the specific stores.</p>
<pre><code>DELETE FROM TABLE [schema_name.]table_name WHERE[DATE_FIELD]BEFORE [DATE_VALUE]
</code></pre>
<h3><a id="Parameter_Description_194"></a>Parameter Description</h3>
<table class="table table-striped table-bordered">
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Optional</th>
</tr>
</thead>
<tbody>
<tr>
<td>DATE_VALUE</td>
<td>Valid segement load start time value. All the segments before this specified date will be deleted.</td>
<td>NO</td>
</tr>
<tr>
<td>db_name</td>
<td>Database name, if it is not specified then it uses current database.</td>
<td>YES</td>
</tr>
<tr>
<td>table_name</td>
<td>The name of the table in provided database.</td>
<td>NO</td>
</tr>
</tbody>
</table>
<h3><a id="Example_202"></a>Example:</h3>
<pre><code>DELETE SEGMENTS FROM TABLE CarbonDatabase.CarbonTable WHERE STARTTIME BEFORE '2017-06-01 12:05:06';
</code></pre>
</body></html>