| |
| <!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview"> |
| <p><!–<br> |
| Licensed to the Apache Software Foundation (ASF) under one<br> |
| or more contributor license agreements. See the NOTICE file<br> |
| distributed with this work for additional information<br> |
| regarding copyright ownership. The ASF licenses this file<br> |
| to you under the Apache License, Version 2.0 (the<br> |
| “License”); you may not use this file except in compliance<br> |
| with the License. You may obtain a copy of the License at</p> |
| <pre><code> http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| </code></pre> |
| <p>–></p> |
| <h1><a id="DML_Operations_on_CarbonData_19"></a>DML Operations on CarbonData</h1> |
| <p>This tutorial will guide you through the data manipulation language support provided by CarbonData.</p> |
| <h2><a id="Overview_22"></a>Overview</h2> |
| <p>The following DML operations are supported in CarbonData :</p> |
| <ul> |
| <li><a href="#load-data">LOAD DATA</a></li> |
| <li><a href="#show-segements">SHOW SEGMENTS</a></li> |
| <li><a href="#delete-segment-by-id">DELETE SEGMENT BY ID</a></li> |
| <li><a href="#delete-segment-by-date">DELETE SEGMENT BY DATE</a></li> |
| </ul> |
| <h2><a id="LOAD_DATA_30"></a>LOAD DATA</h2> |
| <p>This command loads the user data in raw format to the Carbon specific data format store, this allows Carbon to provides good performance while querying the data.<br> |
| Please visit <a href="">Data Management</a> for more details on LOAD.</p> |
| <h3><a id="Syntax_35"></a>Syntax</h3> |
| <pre><code>LOAD DATA [LOCAL] INPATH 'folder_path' INTO TABLE [db_name.]table_name OPTIONS(property_name=property_value, ...) |
| </code></pre> |
| <p>OPTIONS are not mandatory for data loading process. Inside OPTIONS user can provide either of any options like DELIMITER,QUOTECHAR, ESCAPECHAR,MULTILINE as per requirement.</p> |
| <p>Note: The path shall be canonical path.</p> |
| <h3><a id="Parameter_Description_45"></a>Parameter Description</h3> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Parameter</th> |
| <th>Description</th> |
| <th>Optional</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>folder_path</td> |
| <td>Path of raw csv data folder or file.</td> |
| <td>NO</td> |
| </tr> |
| <tr> |
| <td>db_name</td> |
| <td>Database name, if it is not specified then it uses current database.</td> |
| <td>YES</td> |
| </tr> |
| <tr> |
| <td>table_name</td> |
| <td>The name of the table in provided database.</td> |
| <td>NO</td> |
| </tr> |
| <tr> |
| <td>OPTIONS</td> |
| <td>Extra options provided to Load</td> |
| <td>YES</td> |
| </tr> |
| </tbody> |
| </table> |
| <h3><a id="Usage_Guideline_55"></a>Usage Guideline</h3> |
| <p>Following are the options that can be used for loading data:</p> |
| <ul> |
| <li> |
| <p><strong>DELIMITER:</strong> Delimiters can be provided in the load command.</p> |
| <pre><code>OPTIONS('DELIMITER'=',') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>QUOTECHAR:</strong> Quote Characters can be provided in the load command.</p> |
| <pre><code>OPTIONS('QUOTECHAR'='"') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>COMMENTCHAR:</strong> Comment Characters can be provided in the load command if user want to comment lines.</p> |
| <pre><code>OPTIONS('COMMENTCHAR'='#') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>FILEHEADER:</strong> Headers can be provided in the LOAD DATA command if headers are missing in the source files.</p> |
| <pre><code>OPTIONS('FILEHEADER'='column1,column2') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>MULTILINE:</strong> CSV with new line character in quotes.</p> |
| <pre><code>OPTIONS('MULTILINE'='true') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>ESCAPECHAR:</strong> Escape char can be provided if user want strict validation of escape character on CSV.</p> |
| <pre><code>OPTIONS('ESCAPECHAR'='\') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>COMPLEX_DELIMITER_LEVEL_1:</strong> Split the complex type data column in a row (eg., a$b$c --> Array = {a,b,c}).</p> |
| <pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>COMPLEX_DELIMITER_LEVEL_2:</strong> Split the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a:b$c:d --> Array> = {{a,b},{c,d}}).</p> |
| <pre><code>OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>ALL_DICTIONARY_PATH:</strong> All dictionary files path.</p> |
| <pre><code>OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary') |
| </code></pre> |
| </li> |
| <li> |
| <p><strong>COLUMNDICT:</strong> Dictionary file path for specified column.</p> |
| <pre><code>OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2') |
| </code></pre> |
| <p>Note: ALL_DICTIONARY_PATH and COLUMNDICT can’t be used together.</p> |
| </li> |
| <li> |
| <p><strong>DATEFORMAT:</strong> Date format for specified column.</p> |
| <pre><code>OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2') |
| </code></pre> |
| <p>Note: Date formats are specified by date pattern strings. The date pattern letters in Carbon are<br> |
| the same as in JAVA. Refer to <a href="http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html">SimpleDateFormat</a>.</p> |
| </li> |
| </ul> |
| <h3><a id="Example_119"></a>Example:</h3> |
| <pre><code>LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable |
| options('DELIMITER'=',', 'QUOTECHAR'='"', 'COMMENTCHAR'='#', |
| 'FILEHEADER'='empno,empname, |
| designation,doj,workgroupcategory, |
| workgroupcategoryname,deptno,deptname,projectcode, |
| projectjoindate,projectenddate,attendance,utilization,salary', |
| 'MULTILINE'='true', 'ESCAPECHAR'='\', |
| 'COMPLEX_DELIMITER_LEVEL_1'='$', |
| 'COMPLEX_DELIMITER_LEVEL_2'=':', |
| 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary' |
| ) |
| |
| </code></pre> |
| <h2><a id="SHOW_SEGMENTS_136"></a>SHOW SEGMENTS</h2> |
| <p>This command is used to get the segments of carbon table.</p> |
| <pre><code>SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments; |
| </code></pre> |
| <h3><a id="Parameter_Description_144"></a>Parameter Description</h3> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Parameter</th> |
| <th>Description</th> |
| <th>Optional</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>db_name</td> |
| <td>Database name, if it is not specified then it uses current database.</td> |
| <td>YES</td> |
| </tr> |
| <tr> |
| <td>table_name</td> |
| <td>The name of the table in provided database.</td> |
| <td>NO</td> |
| </tr> |
| <tr> |
| <td>number_of_segments</td> |
| <td>limit the output to this number.</td> |
| <td>YES</td> |
| </tr> |
| </tbody> |
| </table> |
| <h3><a id="Example_152"></a>Example:</h3> |
| <pre><code>SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4; |
| </code></pre> |
| <h2><a id="DELETE_SEGMENT_BY_ID_157"></a>DELETE SEGMENT BY ID</h2> |
| <p>This command is used to delete segment by using the segment ID. Each segment has a unique segment ID associated with it.<br> |
| Using this segment ID, you can remove the segment.</p> |
| <p>The following command will get the segmentID.</p> |
| <pre><code>SHOW SEGMENTS FOR Table dbname.tablename LIMIT number_of_segments |
| </code></pre> |
| <p>After you retrieve the segment ID of the segment that you want to delete, execute the following command to delete the selected segment.</p> |
| <pre><code>DELETE SEGMENT segment_sequence_id1, segments_sequence_id2, .... FROM TABLE tableName |
| </code></pre> |
| <h3><a id="Parameter_Description_171"></a>Parameter Description</h3> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Parameter</th> |
| <th>Description</th> |
| <th>Optional</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>segment_id</td> |
| <td>Segment Id of the load.</td> |
| <td>NO</td> |
| </tr> |
| <tr> |
| <td>db_name</td> |
| <td>Database name, if it is not specified then it uses current database.</td> |
| <td>YES</td> |
| </tr> |
| <tr> |
| <td>table_name</td> |
| <td>The name of the table in provided database.</td> |
| <td>NO</td> |
| </tr> |
| </tbody> |
| </table> |
| <h3><a id="Example_178"></a>Example:</h3> |
| <pre><code>DELETE SEGMENT 0 FROM TABLE CarbonDatabase.CarbonTable; |
| DELETE SEGMENT 0.1,5,8 FROM TABLE CarbonDatabase.CarbonTable; |
| </code></pre> |
| <p>Note: Here 0.1 is compacted segment sequence id.</p> |
| <h2><a id="DELETE_SEGMENT_BY_DATE_185"></a>DELETE SEGMENT BY DATE</h2> |
| <p>This command will allow to delete the Carbon segment(s) from the store based on the date provided by the user in the DML command.<br> |
| The segment created before the particular date will be removed from the specific stores.</p> |
| <pre><code>DELETE FROM TABLE [schema_name.]table_name WHERE[DATE_FIELD]BEFORE [DATE_VALUE] |
| </code></pre> |
| <h3><a id="Parameter_Description_194"></a>Parameter Description</h3> |
| <table class="table table-striped table-bordered"> |
| <thead> |
| <tr> |
| <th>Parameter</th> |
| <th>Description</th> |
| <th>Optional</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>DATE_VALUE</td> |
| <td>Valid segement load start time value. All the segments before this specified date will be deleted.</td> |
| <td>NO</td> |
| </tr> |
| <tr> |
| <td>db_name</td> |
| <td>Database name, if it is not specified then it uses current database.</td> |
| <td>YES</td> |
| </tr> |
| <tr> |
| <td>table_name</td> |
| <td>The name of the table in provided database.</td> |
| <td>NO</td> |
| </tr> |
| </tbody> |
| </table> |
| <h3><a id="Example_202"></a>Example:</h3> |
| <pre><code>DELETE SEGMENTS FROM TABLE CarbonDatabase.CarbonTable WHERE STARTTIME BEFORE '2017-06-01 12:05:06'; |
| </code></pre> |
| |
| </body></html> |