src/main/webapp/documents/dml.html - carbondata-site - Git at Google


 <!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview">
 <p>&lt;!–<br>
 Licensed to the Apache Software Foundation (ASF) under one<br>
 or more contributor license agreements.  See the NOTICE file<br>
 distributed with this work for additional information<br>
 regarding copyright ownership.  The ASF licenses this file<br>
 to you under the Apache License, Version 2.0 (the<br>
 “License”); you may not use this file except in compliance<br>
 with the License.  You may obtain a copy of the License at</p>
 <pre><code>  http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 </code></pre>
 <p>–&gt;</p>
 <h1><a id="DML_Operations_on_CarbonData_19"></a>DML Operations on CarbonData</h1>
 <p>This tutorial will guide you through the data manipulation language support provided by CarbonData.</p>
 <h2><a id="Overview_22"></a>Overview</h2>
 <p>The following DML operations are supported in CarbonData :</p>
 <ul>
 <li><a href="#load-data">LOAD DATA</a></li>
 <li><a href="#show-segements">SHOW SEGMENTS</a></li>
 <li><a href="#delete-segment-by-id">DELETE SEGMENT BY ID</a></li>
 <li><a href="#delete-segment-by-date">DELETE SEGMENT BY DATE</a></li>
 </ul>
 <h2><a id="LOAD_DATA_30"></a>LOAD DATA</h2>
 <p>This command loads the user data in raw format to the Carbon specific data format store, this allows Carbon to provides good performance while querying the data.<br>
 Please visit <a href="">Data Management</a> for more details on LOAD.</p>
 <h3><a id="Syntax_35"></a>Syntax</h3>
 <pre><code>LOAD DATA [LOCAL] INPATH 'folder_path' INTO TABLE [db_name.]table_name OPTIONS(property_name=property_value, ...)
 </code></pre>
 <p>OPTIONS are not mandatory for data loading process. Inside OPTIONS user can provide either of any options like DELIMITER,QUOTECHAR, ESCAPECHAR,MULTILINE as per requirement.</p>
 <p>Note: The path shall be canonical path.</p>
 <h3><a id="Parameter_Description_45"></a>Parameter Description</h3>
 <table class="table table-striped table-bordered">
 <thead>
 <tr>
 <th>Parameter</th>
 <th>Description</th>
 <th>Optional</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>folder_path</td>
 <td>Path of raw csv data folder or file.</td>
 <td>NO</td>
 </tr>
 <tr>
 <td>db_name</td>
 <td>Database name, if it is not specified then it uses current database.</td>
 <td>YES</td>
 </tr>
 <tr>
 <td>table_name</td>
 <td>The name of the table in provided database.</td>
 <td>NO</td>
 </tr>
 <tr>
 <td>OPTIONS</td>
 <td>Extra options provided to Load</td>
 <td>YES</td>
 </tr>
 </tbody>
 </table>
 <h3><a id="Usage_Guideline_55"></a>Usage Guideline</h3>
 <p>Following are the options that can be used for loading data:</p>
 <ul>
 <li>
 <p><strong>DELIMITER:</strong> Delimiters can be provided in the load command.</p>
 <pre><code>OPTIONS('DELIMITER'=',')
 </code></pre>
 </li>
 <li>
 <p><strong>QUOTECHAR:</strong> Quote Characters can be provided in the load command.</p>
 <pre><code>OPTIONS('QUOTECHAR'='&quot;')
 </code></pre>
 </li>
 <li>
 <p><strong>COMMENTCHAR:</strong> Comment Characters can be provided in the load command if user want to comment lines.</p>
 <pre><code>OPTIONS('COMMENTCHAR'='#')
 </code></pre>
 </li>
 <li>
 <p><strong>FILEHEADER:</strong> Headers can be provided in the LOAD DATA command if headers are missing in the source files.</p>
 <pre><code>OPTIONS('FILEHEADER'='column1,column2')
 </code></pre>
 </li>
 <li>
 <p><strong>MULTILINE:</strong> CSV with new line character in quotes.</p>
 <pre><code>OPTIONS('MULTILINE'='true')
 </code></pre>
 </li>
 <li>
 <p><strong>ESCAPECHAR:</strong> Escape char can be provided if user want strict validation of escape character on CSV.</p>
 <pre><code>OPTIONS('ESCAPECHAR'='\')
 </code></pre>
 </li>
 <li>
 <p><strong>COMPLEX_DELIMITER_LEVEL_1:</strong> Split the complex type data column in a row (eg., a$b$c --&gt; Array = {a,b,c}).</p>
 <pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$')
 </code></pre>
 </li>
 <li>
 <p><strong>COMPLEX_DELIMITER_LEVEL_2:</strong> Split the complex type nested data column in a row. Applies level_1 delimiter &amp; applies level_2 based on complex data type (eg., a:b$c:d --&gt; Array&gt; = {{a,b},{c,d}}).</p>
 <pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':')
 </code></pre>
 </li>
 <li>
 <p><strong>ALL_DICTIONARY_PATH:</strong> All dictionary files path.</p>
 <pre><code>OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary')
 </code></pre>
 </li>
 <li>
 <p><strong>COLUMNDICT:</strong> Dictionary file path for specified column.</p>
 <pre><code>OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2')
 </code></pre>
 <p>Note: ALL_DICTIONARY_PATH and COLUMNDICT can’t be used together.</p>
 </li>
 <li>
 <p><strong>DATEFORMAT:</strong> Date format for specified column.</p>
 <pre><code>OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2')
 </code></pre>
 <p>Note: Date formats are specified by date pattern strings. The date pattern letters in Carbon are<br>
 the same as in JAVA. Refer to <a href="http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html">SimpleDateFormat</a>.</p>
 </li>
 </ul>
 <h3><a id="Example_119"></a>Example:</h3>
 <pre><code>LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable
                          options('DELIMITER'=',', 'QUOTECHAR'='&quot;', 'COMMENTCHAR'='#',
                                  'FILEHEADER'='empno,empname,
  designation,doj,workgroupcategory,
  workgroupcategoryname,deptno,deptname,projectcode,
  projectjoindate,projectenddate,attendance,utilization,salary',
                                  'MULTILINE'='true', 'ESCAPECHAR'='\',
  'COMPLEX_DELIMITER_LEVEL_1'='$',
                                  'COMPLEX_DELIMITER_LEVEL_2'=':',
                                  'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary'
                                  )

 </code></pre>
 <h2><a id="SHOW_SEGMENTS_136"></a>SHOW SEGMENTS</h2>
 <p>This command is used to get the segments of carbon table.</p>
 <pre><code>SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments;
 </code></pre>
 <h3><a id="Parameter_Description_144"></a>Parameter Description</h3>
 <table class="table table-striped table-bordered">
 <thead>
 <tr>
 <th>Parameter</th>
 <th>Description</th>
 <th>Optional</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>db_name</td>
 <td>Database name, if it is not specified then it uses current database.</td>
 <td>YES</td>
 </tr>
 <tr>
 <td>table_name</td>
 <td>The name of the table in provided database.</td>
 <td>NO</td>
 </tr>
 <tr>
 <td>number_of_segments</td>
 <td>limit the output to this number.</td>
 <td>YES</td>
 </tr>
 </tbody>
 </table>
 <h3><a id="Example_152"></a>Example:</h3>
 <pre><code>SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4;
 </code></pre>
 <h2><a id="DELETE_SEGMENT_BY_ID_157"></a>DELETE SEGMENT BY ID</h2>
 <p>This command is used to delete segment by using the segment ID. Each segment has a unique segment ID associated with it.<br>
 Using this segment ID, you can remove the segment.</p>
 <p>The following command will get the segmentID.</p>
 <pre><code>SHOW SEGMENTS FOR Table dbname.tablename LIMIT number_of_segments
 </code></pre>
 <p>After you retrieve the segment ID of the segment that you want to delete, execute the following command to delete the selected segment.</p>
 <pre><code>DELETE SEGMENT segment_sequence_id1, segments_sequence_id2, .... FROM TABLE tableName
 </code></pre>
 <h3><a id="Parameter_Description_171"></a>Parameter Description</h3>
 <table class="table table-striped table-bordered">
 <thead>
 <tr>
 <th>Parameter</th>
 <th>Description</th>
 <th>Optional</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>segment_id</td>
 <td>Segment Id of the load.</td>
 <td>NO</td>
 </tr>
 <tr>
 <td>db_name</td>
 <td>Database name, if it is not specified then it uses current database.</td>
 <td>YES</td>
 </tr>
 <tr>
 <td>table_name</td>
 <td>The name of the table in provided database.</td>
 <td>NO</td>
 </tr>
 </tbody>
 </table>
 <h3><a id="Example_178"></a>Example:</h3>
 <pre><code>DELETE SEGMENT 0 FROM TABLE CarbonDatabase.CarbonTable;
 DELETE SEGMENT 0.1,5,8 FROM TABLE CarbonDatabase.CarbonTable;
 </code></pre>
 <p>Note: Here 0.1 is compacted segment sequence id.</p>
 <h2><a id="DELETE_SEGMENT_BY_DATE_185"></a>DELETE SEGMENT BY DATE</h2>
 <p>This command will allow to delete the Carbon segment(s) from the store based on the date provided by the user in the DML command.<br>
 The segment created before the particular date will be removed from the specific stores.</p>
 <pre><code>DELETE FROM TABLE [schema_name.]table_name WHERE[DATE_FIELD]BEFORE [DATE_VALUE]
 </code></pre>
 <h3><a id="Parameter_Description_194"></a>Parameter Description</h3>
 <table class="table table-striped table-bordered">
 <thead>
 <tr>
 <th>Parameter</th>
 <th>Description</th>
 <th>Optional</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>DATE_VALUE</td>
 <td>Valid segement load start time value. All the segments before this specified date will be deleted.</td>
 <td>NO</td>
 </tr>
 <tr>
 <td>db_name</td>
 <td>Database name, if it is not specified then it uses current database.</td>
 <td>YES</td>
 </tr>
 <tr>
 <td>table_name</td>
 <td>The name of the table in provided database.</td>
 <td>NO</td>
 </tr>
 </tbody>
 </table>
 <h3><a id="Example_202"></a>Example:</h3>
 <pre><code>DELETE SEGMENTS FROM TABLE CarbonDatabase.CarbonTable WHERE STARTTIME BEFORE '2017-06-01 12:05:06';
 </code></pre>

 </body></html>

	<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview">
	<p><!–<br>
	Licensed to the Apache Software Foundation (ASF) under one<br>
	or more contributor license agreements. See the NOTICE file<br>
	distributed with this work for additional information<br>
	regarding copyright ownership. The ASF licenses this file<br>
	to you under the Apache License, Version 2.0 (the<br>
	“License”); you may not use this file except in compliance<br>
	with the License. You may obtain a copy of the License at</p>
	<pre><code> http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	</code></pre>
	<p>–></p>
	<h1><a id="DML_Operations_on_CarbonData_19"></a>DML Operations on CarbonData</h1>
	<p>This tutorial will guide you through the data manipulation language support provided by CarbonData.</p>
	<h2><a id="Overview_22"></a>Overview</h2>
	<p>The following DML operations are supported in CarbonData :</p>
	<ul>
	<li><a href="#load-data">LOAD DATA</a></li>
	<li><a href="#show-segements">SHOW SEGMENTS</a></li>
	<li><a href="#delete-segment-by-id">DELETE SEGMENT BY ID</a></li>
	<li><a href="#delete-segment-by-date">DELETE SEGMENT BY DATE</a></li>
	</ul>
	<h2><a id="LOAD_DATA_30"></a>LOAD DATA</h2>
	<p>This command loads the user data in raw format to the Carbon specific data format store, this allows Carbon to provides good performance while querying the data.<br>
	Please visit <a href="">Data Management</a> for more details on LOAD.</p>
	<h3><a id="Syntax_35"></a>Syntax</h3>
	<pre><code>LOAD DATA [LOCAL] INPATH 'folder_path' INTO TABLE [db_name.]table_name OPTIONS(property_name=property_value, ...)
	</code></pre>
	<p>OPTIONS are not mandatory for data loading process. Inside OPTIONS user can provide either of any options like DELIMITER,QUOTECHAR, ESCAPECHAR,MULTILINE as per requirement.</p>
	<p>Note: The path shall be canonical path.</p>
	<h3><a id="Parameter_Description_45"></a>Parameter Description</h3>
	<table class="table table-striped table-bordered">
	<thead>
	<tr>
	<th>Parameter</th>
	<th>Description</th>
	<th>Optional</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>folder_path</td>
	<td>Path of raw csv data folder or file.</td>
	<td>NO</td>
	</tr>
	<tr>
	<td>db_name</td>
	<td>Database name, if it is not specified then it uses current database.</td>
	<td>YES</td>
	</tr>
	<tr>
	<td>table_name</td>
	<td>The name of the table in provided database.</td>
	<td>NO</td>
	</tr>
	<tr>
	<td>OPTIONS</td>
	<td>Extra options provided to Load</td>
	<td>YES</td>
	</tr>
	</tbody>
	</table>
	<h3><a id="Usage_Guideline_55"></a>Usage Guideline</h3>
	<p>Following are the options that can be used for loading data:</p>
	<ul>
	<li>
	<p><strong>DELIMITER:</strong> Delimiters can be provided in the load command.</p>
	<pre><code>OPTIONS('DELIMITER'=',')
	</code></pre>
	</li>
	<li>
	<p><strong>QUOTECHAR:</strong> Quote Characters can be provided in the load command.</p>
	<pre><code>OPTIONS('QUOTECHAR'='"')
	</code></pre>
	</li>
	<li>
	<p><strong>COMMENTCHAR:</strong> Comment Characters can be provided in the load command if user want to comment lines.</p>
	<pre><code>OPTIONS('COMMENTCHAR'='#')
	</code></pre>
	</li>
	<li>
	<p><strong>FILEHEADER:</strong> Headers can be provided in the LOAD DATA command if headers are missing in the source files.</p>
	<pre><code>OPTIONS('FILEHEADER'='column1,column2')
	</code></pre>
	</li>
	<li>
	<p><strong>MULTILINE:</strong> CSV with new line character in quotes.</p>
	<pre><code>OPTIONS('MULTILINE'='true')
	</code></pre>
	</li>
	<li>
	<p><strong>ESCAPECHAR:</strong> Escape char can be provided if user want strict validation of escape character on CSV.</p>
	<pre><code>OPTIONS('ESCAPECHAR'='\')
	</code></pre>
	</li>
	<li>
	<p><strong>COMPLEX_DELIMITER_LEVEL_1:</strong> Split the complex type data column in a row (eg., a$b$c --> Array = {a,b,c}).</p>
	<pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$')
	</code></pre>
	</li>
	<li>
	<p><strong>COMPLEX_DELIMITER_LEVEL_2:</strong> Split the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a:b$c:d --> Array> = {{a,b},{c,d}}).</p>
	<pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':')
	</code></pre>
	</li>
	<li>
	<p><strong>ALL_DICTIONARY_PATH:</strong> All dictionary files path.</p>
	<pre><code>OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary')
	</code></pre>
	</li>
	<li>
	<p><strong>COLUMNDICT:</strong> Dictionary file path for specified column.</p>
	<pre><code>OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2')
	</code></pre>
	<p>Note: ALL_DICTIONARY_PATH and COLUMNDICT can’t be used together.</p>
	</li>
	<li>
	<p><strong>DATEFORMAT:</strong> Date format for specified column.</p>
	<pre><code>OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2')
	</code></pre>
	<p>Note: Date formats are specified by date pattern strings. The date pattern letters in Carbon are<br>
	the same as in JAVA. Refer to <a href="http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html">SimpleDateFormat</a>.</p>
	</li>
	</ul>
	<h3><a id="Example_119"></a>Example:</h3>
	<pre><code>LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable
	options('DELIMITER'=',', 'QUOTECHAR'='"', 'COMMENTCHAR'='#',
	'FILEHEADER'='empno,empname,
	designation,doj,workgroupcategory,
	workgroupcategoryname,deptno,deptname,projectcode,
	projectjoindate,projectenddate,attendance,utilization,salary',
	'MULTILINE'='true', 'ESCAPECHAR'='\',
	'COMPLEX_DELIMITER_LEVEL_1'='$',
	'COMPLEX_DELIMITER_LEVEL_2'=':',
	'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary'
	)

	</code></pre>
	<h2><a id="SHOW_SEGMENTS_136"></a>SHOW SEGMENTS</h2>
	<p>This command is used to get the segments of carbon table.</p>
	<pre><code>SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments;
	</code></pre>
	<h3><a id="Parameter_Description_144"></a>Parameter Description</h3>
	<table class="table table-striped table-bordered">
	<thead>
	<tr>
	<th>Parameter</th>
	<th>Description</th>
	<th>Optional</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>db_name</td>
	<td>Database name, if it is not specified then it uses current database.</td>
	<td>YES</td>
	</tr>
	<tr>
	<td>table_name</td>
	<td>The name of the table in provided database.</td>
	<td>NO</td>
	</tr>
	<tr>
	<td>number_of_segments</td>
	<td>limit the output to this number.</td>
	<td>YES</td>
	</tr>
	</tbody>
	</table>
	<h3><a id="Example_152"></a>Example:</h3>
	<pre><code>SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4;
	</code></pre>
	<h2><a id="DELETE_SEGMENT_BY_ID_157"></a>DELETE SEGMENT BY ID</h2>
	<p>This command is used to delete segment by using the segment ID. Each segment has a unique segment ID associated with it.<br>
	Using this segment ID, you can remove the segment.</p>
	<p>The following command will get the segmentID.</p>
	<pre><code>SHOW SEGMENTS FOR Table dbname.tablename LIMIT number_of_segments
	</code></pre>
	<p>After you retrieve the segment ID of the segment that you want to delete, execute the following command to delete the selected segment.</p>
	<pre><code>DELETE SEGMENT segment_sequence_id1, segments_sequence_id2, .... FROM TABLE tableName
	</code></pre>
	<h3><a id="Parameter_Description_171"></a>Parameter Description</h3>
	<table class="table table-striped table-bordered">
	<thead>
	<tr>
	<th>Parameter</th>
	<th>Description</th>
	<th>Optional</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>segment_id</td>
	<td>Segment Id of the load.</td>
	<td>NO</td>
	</tr>
	<tr>
	<td>db_name</td>
	<td>Database name, if it is not specified then it uses current database.</td>
	<td>YES</td>
	</tr>
	<tr>
	<td>table_name</td>
	<td>The name of the table in provided database.</td>
	<td>NO</td>
	</tr>
	</tbody>
	</table>
	<h3><a id="Example_178"></a>Example:</h3>
	<pre><code>DELETE SEGMENT 0 FROM TABLE CarbonDatabase.CarbonTable;
	DELETE SEGMENT 0.1,5,8 FROM TABLE CarbonDatabase.CarbonTable;
	</code></pre>
	<p>Note: Here 0.1 is compacted segment sequence id.</p>
	<h2><a id="DELETE_SEGMENT_BY_DATE_185"></a>DELETE SEGMENT BY DATE</h2>
	<p>This command will allow to delete the Carbon segment(s) from the store based on the date provided by the user in the DML command.<br>
	The segment created before the particular date will be removed from the specific stores.</p>
	<pre><code>DELETE FROM TABLE [schema_name.]table_name WHERE[DATE_FIELD]BEFORE [DATE_VALUE]
	</code></pre>
	<h3><a id="Parameter_Description_194"></a>Parameter Description</h3>
	<table class="table table-striped table-bordered">
	<thead>
	<tr>
	<th>Parameter</th>
	<th>Description</th>
	<th>Optional</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>DATE_VALUE</td>
	<td>Valid segement load start time value. All the segments before this specified date will be deleted.</td>
	<td>NO</td>
	</tr>
	<tr>
	<td>db_name</td>
	<td>Database name, if it is not specified then it uses current database.</td>
	<td>YES</td>
	</tr>
	<tr>
	<td>table_name</td>
	<td>The name of the table in provided database.</td>
	<td>NO</td>
	</tr>
	</tbody>
	</table>
	<h3><a id="Example_202"></a>Example:</h3>
	<pre><code>DELETE SEGMENTS FROM TABLE CarbonDatabase.CarbonTable WHERE STARTTIME BEFORE '2017-06-01 12:05:06';
	</code></pre>

	</body></html>