Apache TsFile

Clone this repo:
  1. 4aa7c69 remove cdn (#74) by CritasWang · 2 days ago develop
  2. 571a6da Adding a C++ buid by Christofer Dutz · 3 days ago
  3. 12e5308 create multiple data type in utils used in UT by 周沛辰 · 2 weeks ago
  4. 4ab90f6 Remove synchronized in LocalTsFileOutput and PublicBAOS by shuwenwei · 3 weeks ago
  5. 98d9e8d Update TsFile Introduction Link.md (#65) by SihanLiu2024 · 3 weeks ago

TsFile Document

Maven Version

Abstract

TsFile is a columnar storage file format designed for time series data, which supports efficient compression, high throughput of read and write, and compatibility with various frameworks, such as Spark and Flink. It is easy to integrate TsFile into IoT big data processing frameworks.

Click for More Information

Motivation

Time series data is becoming increasingly important in a wide range of applications, including IoT, intelligent control, finance, log analysis, and monitoring systems.

TsFile is the first existing standard file format for time series data. The industry companies usually write time series data without unification, or use general columnar file format, which makes data collection and processing complicated without a standard. With TsFile, organizations could write data in TsFile inside end devices or gateway, then transfer TsFile to the cloud for unified management in IoTDB and other systems. In this way, we lower the network transmission and the computing resource consumption in the cloud.

TsFile is a specially designed file format rather than a database. Users can open, write, read, and close a TsFile easily like doing operations on a normal file. Besides, more interfaces are available on a TsFile.

TsFile offers several distinctive features and benefits:

  • Efficient Storage and Compression: TsFile employs advanced compression techniques to minimize storage requirements, resulting in reduced disk space consumption and improved system efficiency.

  • Flexible Schema and Metadata Management: TsFile allows for directly write data without pre defining the schema, which is flexible for data aquisition.

  • High Query Performance with time range: TsFile has indexed devices, sensors and time dimensions to accelerate query performance, enabling fast filtering and retrieval of time series data.

  • Seamless Integration: TsFile is designed to seamlessly integrate with existing time series databases such as IoTDB, data processing frameworks, such as Spark and Flink.