| --- |
| id: thrift |
| title: "Thrift" |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| |
| To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-thrift-extensions` in the extensions load list. |
| |
| This extension enables Druid to ingest thrift compact data online (`ByteBuffer`) and offline (SequenceFile of type `<Writable, BytesWritable>` or LzoThriftBlock File). |
| |
| You may want to use another version of thrift, change the dependency in pom and compile yourself. |
| |
| ## LZO Support |
| |
| If you plan to read LZO-compressed Thrift files, you will need to download version 0.4.19 of the [hadoop-lzo JAR](https://mvnrepository.com/artifact/com.hadoop.gplcompression/hadoop-lzo/0.4.19) and place it in your `extensions/druid-thrift-extensions` directory. |
| |
| ## Thrift Parser |
| |
| |
| | Field | Type | Description | Required | |
| | ----------- | ----------- | ---------------------------------------- | -------- | |
| | type | String | This should say `thrift` | yes | |
| | parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Should be a JSON parseSpec. | yes | |
| | thriftJar | String | path of thrift jar, if not provided, it will try to find the thrift class in classpath. Thrift jar in batch ingestion should be uploaded to HDFS first and configure `jobProperties` with `"tmpjars":"/path/to/your/thrift.jar"` | no | |
| | thriftClass | String | classname of thrift | yes | |
| |
| - Batch Ingestion example - `inputFormat` and `tmpjars` should be set. |
| |
| This is for batch ingestion using the HadoopDruidIndexer. The inputFormat of inputSpec in ioConfig could be one of `"org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat"` and `com.twitter.elephantbird.mapreduce.input.LzoThriftBlockInputFormat`. Be careful, when `LzoThriftBlockInputFormat` is used, thrift class must be provided twice. |
| |
| ```json |
| { |
| "type": "index_hadoop", |
| "spec": { |
| "dataSchema": { |
| "dataSource": "book", |
| "parser": { |
| "type": "thrift", |
| "jarPath": "book.jar", |
| "thriftClass": "org.apache.druid.data.input.thrift.Book", |
| "protocol": "compact", |
| "parseSpec": { |
| "format": "json", |
| ... |
| } |
| }, |
| "metricsSpec": [], |
| "granularitySpec": {} |
| }, |
| "ioConfig": { |
| "type": "hadoop", |
| "inputSpec": { |
| "type": "static", |
| "inputFormat": "org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat", |
| // "inputFormat": "com.twitter.elephantbird.mapreduce.input.LzoThriftBlockInputFormat", |
| "paths": "/user/to/some/book.seq" |
| } |
| }, |
| "tuningConfig": { |
| "type": "hadoop", |
| "jobProperties": { |
| "tmpjars":"/user/h_user_profile/du00/druid/test/book.jar", |
| // "elephantbird.class.for.MultiInputFormat" : "${YOUR_THRIFT_CLASS_NAME}" |
| } |
| } |
| } |
| } |
| ``` |