src/UserGuide/latest/Ecosystem-Integration/Hive-TsFile.md - iotdb-docs - Git at Google

 <!--

     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

         http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 -->
 # Apache Hive(TsFile)

 ## 1. About Hive-TsFile-Connector

 Hive-TsFile-Connector implements the support of Hive for external data sources of Tsfile type. This enables users to operate TsFile by Hive.

 With this connector, you can

 * Load a single TsFile, from either the local file system or hdfs, into hive
 * Load all files in a specific directory, from either the local file system or hdfs, into hive
 * Query the tsfile through HQL.
 * As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive.

 ## 2. System Requirements

 |Hadoop Version |Hive Version | Java Version | TsFile |
 |-------------  |------------ | ------------ |------------ |
 | `2.7.3` or `3.2.1`       |    `2.3.6` or `3.1.2`  | `1.8`        | `1.0.0`|

 > Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile.

 ## 3. Data Type Correspondence

 | TsFile data type | Hive field type |
 | ---------------- | --------------- |
 | BOOLEAN          | Boolean         |
 | INT32            | INT             |
 | INT64       	   | BIGINT          |
 | FLOAT       	   | Float           |
 | DOUBLE      	   | Double          |
 | TEXT      	   | STRING          |


 ## 4. Add Dependency For Hive

 To use hive-connector in hive, we should add the hive-connector jar into hive.

 After downloading the code of iotdb from <https://github.com/apache/iotdb>, you can use the command of `mvn clean package -pl iotdb-connector/hive-connector -am -Dmaven.test.skip=true -P get-jar-with-dependencies` to get a `hive-connector-X.X.X-jar-with-dependencies.jar`.

 Then in hive, use the command of `add jar XXX` to add the dependency. For example:

 ```
 hive> add jar /Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar;

 Added [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] to class path
 Added resources: [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar]
 ```


 ## 5. Create Tsfile-backed Hive tables

 To create a Tsfile-backed table, specify the `serde` as `org.apache.iotdb.hive.TsFileSerDe`,
 specify the `inputformat` as `org.apache.iotdb.hive.TSFHiveInputFormat`,
 and the `outputformat` as `org.apache.iotdb.hive.TSFHiveOutputFormat`.

 Also provide a schema which only contains two fields: `time_stamp` and `sensor_id` for the table.
 `time_stamp` is the time value of the time series
 and `sensor_id` is the sensor name to extract from the tsfile to hive such as `sensor_1`.
 The name of the table can be any valid table names in hive.

 Also a location provided for hive-connector to pull the most current data for the table.

 The location should be a specific directory on your local file system or HDFS to set up Hadoop.
 If it is in your local file system, the location should look like `file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/`

 Last, set the `device_id` in `TBLPROPERTIES` to the device name you want to analyze.

 For example:

 ```
 CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1(
   time_stamp TIMESTAMP,
   sensor_1 BIGINT)
 ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'
 STORED AS
   INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
   OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat'
 LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/'
 TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');
 ```
 In this example, the data of `root.baic2.WWS.leftfrontdoor.plc1.sensor_1` is pulled from the directory of `/data/data/sequence/root.baic2.WWS.leftfrontdoor/`.
 This table results in a description as below:

 ```
 hive> describe only_sensor_1;
 OK
 time_stamp          	timestamp              	from deserializer
 sensor_1            	bigint              	from deserializer
 Time taken: 0.053 seconds, Fetched: 2 row(s)
 ```
 At this point, the Tsfile-backed table can be worked with in Hive like any other table.

 ## 6. Query from TsFile-backed Hive tables

 Before we do any queries, we should set the `hive.input.format` in hive by executing the following command.

 ```
 hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 ```

 Now, we already have an external table named `only_sensor_1` in hive.
 We can use any query operations through HQL to analyse it.

 For example:

 ### 6.1 Select Clause Example

 ```
 hive> select * from only_sensor_1 limit 10;
 OK
 1	1000000
 2	1000001
 3	1000002
 4	1000003
 5	1000004
 6	1000005
 7	1000006
 8	1000007
 9	1000008
 10	1000009
 Time taken: 1.464 seconds, Fetched: 10 row(s)
 ```

 ### 6.2 Aggregate Clause Example

 ```
 hive> select count(*) from only_sensor_1;
 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
 Query ID = jackietien_20191016202416_d1e3e233-d367-4453-b39a-2aac9327a3b6
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=<number>
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=<number>
 In order to set a constant number of reducers:
   set mapreduce.job.reduces=<number>
 Job running in-process (local Hadoop)
 2019-10-16 20:24:18,305 Stage-1 map = 0%,  reduce = 0%
 2019-10-16 20:24:27,443 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_local867757288_0002
 MapReduce Jobs Launched:
 Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 OK
 1000000
 Time taken: 11.334 seconds, Fetched: 1 row(s)
 ```
	<!--

	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	-->
	# Apache Hive(TsFile)

	## 1. About Hive-TsFile-Connector

	Hive-TsFile-Connector implements the support of Hive for external data sources of Tsfile type. This enables users to operate TsFile by Hive.

	With this connector, you can

	* Load a single TsFile, from either the local file system or hdfs, into hive
	* Load all files in a specific directory, from either the local file system or hdfs, into hive
	* Query the tsfile through HQL.
	* As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive.

	## 2. System Requirements

	\|Hadoop Version \|Hive Version \| Java Version \| TsFile \|
	\|------------- \|------------ \| ------------ \|------------ \|
	\| `2.7.3` or `3.2.1` \| `2.3.6` or `3.1.2` \| `1.8` \| `1.0.0`\|

	> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile.

	## 3. Data Type Correspondence

	\| TsFile data type \| Hive field type \|
	\| ---------------- \| --------------- \|
	\| BOOLEAN \| Boolean \|
	\| INT32 \| INT \|
	\| INT64 \| BIGINT \|
	\| FLOAT \| Float \|
	\| DOUBLE \| Double \|
	\| TEXT \| STRING \|


	## 4. Add Dependency For Hive

	To use hive-connector in hive, we should add the hive-connector jar into hive.

	After downloading the code of iotdb from <https://github.com/apache/iotdb>, you can use the command of `mvn clean package -pl iotdb-connector/hive-connector -am -Dmaven.test.skip=true -P get-jar-with-dependencies` to get a `hive-connector-X.X.X-jar-with-dependencies.jar`.

	Then in hive, use the command of `add jar XXX` to add the dependency. For example:

	```
	hive> add jar /Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar;

	Added [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] to class path
	Added resources: [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar]
	```


	## 5. Create Tsfile-backed Hive tables

	To create a Tsfile-backed table, specify the `serde` as `org.apache.iotdb.hive.TsFileSerDe`,
	specify the `inputformat` as `org.apache.iotdb.hive.TSFHiveInputFormat`,
	and the `outputformat` as `org.apache.iotdb.hive.TSFHiveOutputFormat`.

	Also provide a schema which only contains two fields: `time_stamp` and `sensor_id` for the table.
	`time_stamp` is the time value of the time series
	and `sensor_id` is the sensor name to extract from the tsfile to hive such as `sensor_1`.
	The name of the table can be any valid table names in hive.

	Also a location provided for hive-connector to pull the most current data for the table.

	The location should be a specific directory on your local file system or HDFS to set up Hadoop.
	If it is in your local file system, the location should look like `file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/`

	Last, set the `device_id` in `TBLPROPERTIES` to the device name you want to analyze.

	For example:

	```
	CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1(
	time_stamp TIMESTAMP,
	sensor_1 BIGINT)
	ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'
	STORED AS
	INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
	OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat'
	LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/'
	TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');
	```
	In this example, the data of `root.baic2.WWS.leftfrontdoor.plc1.sensor_1` is pulled from the directory of `/data/data/sequence/root.baic2.WWS.leftfrontdoor/`.
	This table results in a description as below:

	```
	hive> describe only_sensor_1;
	OK
	time_stamp timestamp from deserializer
	sensor_1 bigint from deserializer
	Time taken: 0.053 seconds, Fetched: 2 row(s)
	```
	At this point, the Tsfile-backed table can be worked with in Hive like any other table.

	## 6. Query from TsFile-backed Hive tables

	Before we do any queries, we should set the `hive.input.format` in hive by executing the following command.

	```
	hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
	```

	Now, we already have an external table named `only_sensor_1` in hive.
	We can use any query operations through HQL to analyse it.

	For example:

	### 6.1 Select Clause Example

	```
	hive> select * from only_sensor_1 limit 10;
	OK
	1 1000000
	2 1000001
	3 1000002
	4 1000003
	5 1000004
	6 1000005
	7 1000006
	8 1000007
	9 1000008
	10 1000009
	Time taken: 1.464 seconds, Fetched: 10 row(s)
	```

	### 6.2 Aggregate Clause Example

	```
	hive> select count(*) from only_sensor_1;
	WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
	Query ID = jackietien_20191016202416_d1e3e233-d367-4453-b39a-2aac9327a3b6
	Total jobs = 1
	Launching Job 1 out of 1
	Number of reduce tasks determined at compile time: 1
	In order to change the average load for a reducer (in bytes):
	set hive.exec.reducers.bytes.per.reducer=<number>
	In order to limit the maximum number of reducers:
	set hive.exec.reducers.max=<number>
	In order to set a constant number of reducers:
	set mapreduce.job.reduces=<number>
	Job running in-process (local Hadoop)
	2019-10-16 20:24:18,305 Stage-1 map = 0%, reduce = 0%
	2019-10-16 20:24:27,443 Stage-1 map = 100%, reduce = 100%
	Ended Job = job_local867757288_0002
	MapReduce Jobs Launched:
	Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
	Total MapReduce CPU Time Spent: 0 msec
	OK
	1000000
	Time taken: 11.334 seconds, Fetched: 1 row(s)
	```