Each load into CarbonData is written into a separate folder called Segment.Segments is a powerful concept which helps to maintain consistency of data and easy transaction management.CarbonData provides DML (Data Manipulation Language) commands to maintain the segments.
This command is used to list the segments of CarbonData table.
SHOW [HISTORY] SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments
Example: Show visible segments
SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4
Show all segments, include invisible segments
SHOW HISTORY SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4
This command is used to delete segment by using the segment ID. Each segment has a unique segment ID associated with it. Using this segment ID, you can remove the segment.
The following command will get the segmentID.
SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments
After you retrieve the segment ID of the segment that you want to delete, execute the following command to delete the selected segment.
DELETE FROM TABLE [db_name.]table_name WHERE SEGMENT.ID IN (segment_id1, segments_id2, ...)
Example:
DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0) DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8)
This command will allow to delete the CarbonData segment(s) from the store based on the date provided by the user in the DML command. The segment created before the particular date will be removed from the specific stores.
DELETE FROM TABLE [db_name.]table_name WHERE SEGMENT.STARTTIME BEFORE DATE_VALUE
Example:
DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06'
This command is used to read data from specified segments during CarbonScan.
Get the Segment ID:
SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments
Set the segment IDs for table
SET carbon.input.segments.<database_name>.<table_name> = <list of segment IDs>
NOTE: carbon.input.segments: Specifies the segment IDs to be queried. This property allows you to query specified segments of the specified table. The CarbonScan will read data from specified segments only.
If user wants to query with segments reading in multi threading mode, then CarbonSession. threadSet can be used instead of SET query.
CarbonSession.threadSet ("carbon.input.segments.<database_name>.<table_name>","<list of segment IDs>");
Reset the segment IDs
SET carbon.input.segments.<database_name>.<table_name> = *;
If user wants to query with segments reading in multi threading mode, then CarbonSession. threadSet can be used instead of SET query.
CarbonSession.threadSet ("carbon.input.segments.<database_name>.<table_name>","*");
Examples:
SHOW SEGMENTS FOR carbontable1; SET carbon.input.segments.db.carbontable1 = 1,3,9;
CarbonSession.threadSet ("carbon.input.segments.db.carbontable_Multi_Thread","1,3");
def main(args: Array[String]) { Future { CarbonSession.threadSet ("carbon.input.segments.db.carbontable_Multi_Thread","1") spark.sql("select count(empno) from carbon.input.segments.db.carbontable_Multi_Thread").show(); } }