blob: eea8d152d45dfee87ca56d054cdc1ec741bc2f7b [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Spark Tsfile connector
## aim of design
* Use Spark SQL to read the data of the specified Tsfile and return it to the client in the form of a Spark DataFrame
* Generate Tsfile with data from Spark Dataframe
## Supported formats
Wide table structure: Tsfile native format, IOTDB native path format
| time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware |
|------|-------------------------------|--------------------------|----------------------------|-------------------------------|--------------------------|----------------------------|
| 1 | null | true | null | 2.2 | true | null |
| 2 | null | false | aaa | 2.2 | null | null |
| 3 | null | null | null | 2.1 | true | null |
| 4 | null | true | bbb | null | null | null |
| 5 | null | null | null | null | false | null |
| 6 | null | null | ccc | null | null | null |
Narrow table structure: Relational database schema, IOTDB align by device format
| time | device_name | status | hardware | temperature |
|------|-------------------------------|--------------------------|----------------------------|-------------------------------|
| 1 | root.ln.wf02.wt01 | true | null | 2.2 |
| 1 | root.ln.wf02.wt02 | true | null | null |
| 2 | root.ln.wf02.wt01 | null | null | 2.2 |
| 2 | root.ln.wf02.wt02 | false | aaa | null |
| 3 | root.ln.wf02.wt01 | true | null | 2.1 |
| 4 | root.ln.wf02.wt02 | true | bbb | null |
| 5 | root.ln.wf02.wt01 | false | null | null |
| 6 | root.ln.wf02.wt02 | null | ccc | null |
## Query process steps
#### 1. Table structure inference and generation
This step is to make the table structure of the DataFrame match the table structure of the Tsfile to be queried.
The main logic is inferSchema function in src / main / scala / org / apache / iotdb / spark / tsfile / DefaultSource.scala
#### 2. SQL parsing
The purpose of this step is to transform user SQL statements into Tsfile native query expressions.
The main logic is the buildReader function in src / main / scala / org / apache / iotdb / spark / tsfile / DefaultSource.scala. SQL parsing wide table structure and narrow table structure
#### 3. Wide table structure
The main logic of the SQL analysis of the wide table structure is in src / main / scala / org / apache / iotdb / spark / tsfile / WideConverter.scala. This structure is basically the same as the Tsfile native query structure. No special processing is required, and the SQL statement is directly converted into Corresponding query expression
#### 4. Narrow table structure
The main logic of the SQL analysis of the wide table structure is src / main / scala / org / apache / iotdb / spark / tsfile / NarrowConverter.scala.
Firstly we use required schema to decide which timeseries we should get from time file
```
requiredSchema.foreach((field: StructField) => {
if (field.name != QueryConstant.RESERVED_TIME
&& field.name != NarrowConverter.DEVICE_NAME) {
measurementNames += field.name
}
})
```
After the SQL is converted to an expression, the narrow table structure is different from the Tsfile native query structure. The expression is converted into a disjunction expression related to the device before it can be converted into a query of Tsfile. The conversion code is in src / main / java / org / apache / iotdb / spark / tsfile / qp
example:
```
select time, device_name, s1 from tsfile_table where time > 1588953600000 and time < 1589040000000 and device_name = 'root.group1.d1'
```
Obviously we only need timeseries 'root.group1.d1.s1' and our expression is [time > 1588953600000] and [time < 1589040000000]
#### 5. Query execution
The actual data query execution is performed by the Tsfile native component, see:
* [Tsfile native query process](../TsFile/Read.md)
## Write step flow
Writing is mainly to convert the data in the Dataframe structure into Tsfile's RowRecord, and write using Tsfile Writer
#### Wide table structure
The main conversion code is in the following two files:
* src/main/scala/org/apache/iotdb/spark/tsfile/WideConverter.scala responsible for structural transformation
* src/main/scala/org/apache/iotdb/spark/tsfile/WideTsFileOutputWriter.scala responsible for matching the spark interface and performing writes, which will call the structure conversion function in the previous file
#### Narrow table structure
The main conversion code is in the following two files:
* src/main/scala/org/apache/iotdb/spark/tsfile/NarrowConverter.scala responsible for structural transformation
* src/main/scala/org/apache/iotdb/spark/tsfile/NarrowTsFileOutputWriter.scala responsible for matching the spark interface and performing writes, which will call the structure conversion function in the previous file