| --- |
| layout: doc_page |
| title: "DumpSegment tool" |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| # DumpSegment tool |
| |
| The DumpSegment tool can be used to dump the metadata or contents of an Apache Druid (incubating) segment for debugging purposes. Note that the |
| dump is not necessarily a full-fidelity translation of the segment. In particular, not all metadata is included, and |
| complex metric values may not be complete. |
| |
| To run the tool, point it at a segment directory and provide a file for writing output: |
| |
| ``` |
| java -classpath "/my/druid/lib/*" org.apache.druid.cli.Main tools dump-segment \ |
| --directory /home/druid/path/to/segment/ \ |
| --out /home/druid/output.txt |
| ``` |
| |
| ### Output format |
| |
| #### Data dumps |
| |
| By default, or with `--dump rows`, this tool dumps rows of the segment as newline-separate JSON objects, with one |
| object per line, using the default serialization for each column. Normally all columns are included, but if you like, |
| you can limit the dump to specific columns with `--column name`. |
| |
| For example, one line might look like this when pretty-printed: |
| |
| ``` |
| { |
| "__time": 1442018818771, |
| "added": 36, |
| "channel": "#en.wikipedia", |
| "cityName": null, |
| "comment": "added project", |
| "count": 1, |
| "countryIsoCode": null, |
| "countryName": null, |
| "deleted": 0, |
| "delta": 36, |
| "isAnonymous": "false", |
| "isMinor": "false", |
| "isNew": "false", |
| "isRobot": "false", |
| "isUnpatrolled": "false", |
| "iuser": "00001553", |
| "metroCode": null, |
| "namespace": "Talk", |
| "page": "Talk:Oswald Tilghman", |
| "regionIsoCode": null, |
| "regionName": null, |
| "user": "GELongstreet" |
| } |
| ``` |
| |
| #### Metadata dumps |
| |
| With `--dump metadata`, this tool dumps metadata instead of rows. Metadata dumps generated by this tool are in the same |
| format as returned by the [SegmentMetadata query](../querying/segmentmetadataquery.html). |
| |
| #### Bitmap dumps |
| |
| With `--dump bitmaps`, this tool dump bitmap indexes instead of rows. Bitmap dumps generated by this tool include |
| dictionary-encoded string columns only. The output contains a field "bitmapSerdeFactory" describing the type of bitmaps |
| used in the segment, and a field "bitmaps" containing the bitmaps for each value of each column. These are base64 |
| encoded by default, but you can also dump them as lists of row numbers with `--decompress-bitmaps`. |
| |
| Normally all columns are included, but if you like, you can limit the dump to specific columns with `--column name`. |
| |
| Sample output: |
| |
| ``` |
| { |
| "bitmapSerdeFactory": { |
| "type": "concise" |
| }, |
| "bitmaps": { |
| "isRobot": { |
| "false": "//aExfu+Nv3X...", |
| "true": "gAl7OoRByQ..." |
| } |
| } |
| } |
| ``` |
| |
| ### Command line arguments |
| |
| |argument|description|required?| |
| |--------|-----------|---------| |
| |--directory file|Directory containing segment data. This could be generated by unzipping an "index.zip" from deep storage.|yes| |
| |--output file|File to write to, or omit to write to stdout.|yes| |
| |--dump TYPE|Dump either 'rows' (default), 'metadata', or 'bitmaps'|no| |
| |--column columnName|Column to include. Specify multiple times for multiple columns, or omit to include all columns.|no| |
| |--filter json|JSON-encoded [query filter](../querying/filters.html). Omit to include all rows. Only used if dumping rows.|no| |
| |--time-iso8601|Format __time column in ISO8601 format rather than long. Only used if dumping rows.|no| |
| |--decompress-bitmaps|Dump bitmaps as arrays rather than base64-encoded compressed bitmaps. Only used if dumping bitmaps.|no| |