Spark IOTDB connector

aim of design

  • Use Spark SQL to read IOTDB data and return it to the client in the form of a Spark DataFrame

main idea

Because IOTDB has the ability to parse and execute SQL, this part can directly forward SQL to the IOTDB process for execution, and then convert the data to RDD.

Implementation process

1.Entrance

  • src/main/scala/org/apache/iotdb/spark/db/DefaultSource.scala

2. Building Relation

Relation mainly saves RDD meta-information, such as column names, partitioning strategies, and so on. Calling Relation's buildScan method can create RDDs

  • src/main/scala/org/apache/iotdb/spark/db/IoTDBRelation.scala

3. Building RDD

RDD executes SQL request to IOTDB and saves cursor

  • The compute method in src / main / scala / org / apache / iotdb / spark / db / IoTDBRDD.scala

4.Iterative RDD

Due to Spark's lazy loading mechanism, the RDD iteration is called specifically when the user traverses the RDD, which is the fetch result of IOTDB

  • getNext method in src / main / scala / org / apache / iotdb / spark / db / IoTDBRDD.scala

Wide and narrow table structure conversion

Wide table structure: IOTDB native path format

timeroot.ln.wf02.wt02.temperatureroot.ln.wf02.wt02.statusroot.ln.wf02.wt02.hardwareroot.ln.wf01.wt01.temperatureroot.ln.wf01.wt01.statusroot.ln.wf01.wt01.hardware
1nulltruenull2.2truenull
2nullfalseaaa2.2nullnull
3nullnullnull2.1truenull
4nulltruebbbnullnullnull
5nullnullnullnullfalsenull
6nullnullcccnullnullnull

Narrow table structure: Relational database schema, IOTDB align by device format

timedevice_namestatushardwaretemperature
1root.ln.wf02.wt01truenull2.2
1root.ln.wf02.wt02truenullnull
2root.ln.wf02.wt01nullnull2.2
2root.ln.wf02.wt02falseaaanull
3root.ln.wf02.wt01truenull2.1
4root.ln.wf02.wt02truebbbnull
5root.ln.wf02.wt01falsenullnull
6root.ln.wf02.wt02nullcccnull

Because the data queried by IOTDB defaults to a wide table structure, a wide-narrow table conversion is required. There are two implementation methods as follows

1. Use the IOTDB group by device statement

This way you can get the narrow table structure directly, and the calculation is done by IOTDB

2. Use Transformer

You can use Transformer to convert between wide and narrow tables. The calculation is done by Spark.

  • src/main/scala/org/apache/iotdb/spark/db/Transformer.scala

Wide table to narrow table uses traversing the device list to generate the corresponding narrow table. The parallelization strategy is better (no shuffle). The narrow table to wide table uses a timestamp-based join operation. There is potential for shuffle. Performance issues