blob: 42269be9a9f565ce87722a8ebc8d215be3ff0142 [file] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# tsfile-cli — TsFile Command-Line Tool
`tsfile-cli` is a single, pipe-friendly C++ command-line tool for inspecting **and**
importing Apache TsFile (`.tsfile`) files from the shell — the TsFile analogue of
`parquet-cli` / `pqrs`. Read commands print data to **stdout** and diagnostics to
**stderr**, so they compose with `awk`, `jq`, `sort`, and friends; the `write` command
imports CSV/TSV into a new `.tsfile`. It is built on the public `storage::TsFileReader`
and `storage::TsFileTableWriter` APIs and does not modify the storage engine.
## Building from source
The CLI is part of the C++ module and is built by default (CMake option `BUILD_TOOLS=ON`).
The CMake target is `tsfile_cli`; the produced executable is named `tsfile-cli`.
**Prerequisites:** a C++11 compiler (GCC / Clang / MSVC) and CMake ≥ 3.11. The third-party
dependencies (ANTLR4, Snappy, LZ4, LZOKAY, Zlib, GoogleTest) are bundled under
`cpp/third_party/` and built automatically — no separate install step needed.
Choose any one of the following.
**1. Build script (recommended).** From `cpp/`:
```bash
bash build.sh -t=Debug # -> cpp/build/Debug/bin/tsfile-cli
bash build.sh # Release (default) -> cpp/build/Release/bin/tsfile-cli
bash build.sh install # Release build, then run make install
```
**2. Maven (builds the whole C++ module).** From the repository root:
```bash
./mvnw clean package -P with-cpp # -> cpp/target/build/bin/tsfile-cli
```
**3. Plain CMake.** From `cpp/`:
```bash
mkdir -p build/Debug && cd build/Debug
cmake ../.. -DCMAKE_BUILD_TYPE=Debug
make -j tsfile_cli # -> build/Debug/bin/tsfile-cli
```
> **CMake 4.x note.** The bundled ANTLR4 runtime sets old CMake policies that CMake 4
> rejects (`Policy CMP00xx may not be set to OLD behavior`). The reader and CLI do not use
> ANTLR4, so disable it — `--disable-antlr4` for the build script, or `-DENABLE_ANTLR4=OFF`
> for plain CMake:
>
> ```bash
> bash build.sh -t=Debug --disable-antlr4
> ```
Verify the binary:
```bash
./build/Debug/bin/tsfile-cli --version # -> tsfile-cli (Apache TsFile C++) <version>
./build/Debug/bin/tsfile-cli --help
```
The executable links the `tsfile` shared library built alongside it. To run it from
anywhere, either run it in place by its full path, or explicitly install it with
`bash build.sh install`, `cmake --install .`, or `make install`. The install step places
the binary under `<prefix>/bin` and `libtsfile` under `<prefix>/lib`. The build script
does not install by default.
## Usage
```
tsfile-cli <command> [options] <file.tsfile>
tsfile-cli --help | --version | help
```
Exit codes: `0` success, `1` usage/argument error, `2` file open/corrupt,
`3` query/runtime error.
### Reading
| Command | Description |
|---|---|
| `ls` | List devices (tree model) or tables (table model), one name per line |
| `schema` | Per-series `target, measurement, datatype, encoding, compression` |
| `meta` | File summary: model, device/table/series counts, time range, file size |
| `stats` | Per-series `count, start_time, end_time, min, max, first, last, sum` |
| `count` | Per-series row counts plus a `total` row (from statistics, no page scan) |
| `head` | First N rows (default 10; use `-n`) |
| `cat` | All matching rows, streamed (`table` format buffers to align columns) |
| `sample` | Reproducible reservoir sample (default 10; `-n`, `--seed`) |
The metadata commands (`ls` / `schema` / `meta` / `stats` / `count`) answer most questions
without decoding data pages.
Shared options:
| Option | Meaning |
|---|---|
| `-f, --format csv\|tsv\|json\|table` | Output format; defaults to `table` on a TTY, `tsv` when piped |
| `-d, --device <id>` / `-t, --table <name>` | Scope to one device / table (mutually exclusive) |
| `-m, --measurements a,b,c` | Column projection (`schema`, `stats`, `count`, `head`, `cat`, `sample`) |
| `-n, --limit N` / `--offset N` | Max rows / rows to skip (`head`, `cat`; `--offset` not valid for `sample`) |
| `--start <ms>` / `--end <ms>` | Inclusive epoch-millisecond time range (`head`, `cat`, `sample`) |
| `--seed N` | Reproducible sampling seed (`sample` only) |
| `--tag-filter C OP V` / `--tag-between C L U` / `--tag-not-between C L U` | Table TAG predicate for `head`, `cat`, `sample`; `OP` is `eq`, `neq`, `lt`, `lteq`, `gt`, `gteq`, `regexp`, or `not-regexp` |
| `--no-header` | Omit the header row |
| `--model tree\|table` | Force the model (otherwise auto-detected) |
`json` output is NDJSON (one object per line; numbers/booleans bare, other values quoted,
nulls as `null`; non-finite floats — NaN/Inf — become `null`). CSV output follows RFC 4180.
Timestamps are raw epoch milliseconds. The `table` format buffers all rows in memory to
align columns, so prefer `csv`/`tsv`/`json` when dumping large files.
```bash
BIN=cpp/build/Debug/bin/tsfile-cli
$BIN ls -f tsv data.tsfile # list tables / devices
$BIN meta data.tsfile # quick file overview
$BIN count -t table1 -f tsv data.tsfile # row counts, no page scan
$BIN cat -t table1 --tag-filter device eq dev_1 -m temp -f tsv data.tsfile
$BIN cat -m temp,humidity --start 1700000000000 -f csv data.tsfile | head
$BIN sample -m temp -n 20 --seed 42 -f json data.tsfile | jq .
```
### Writing (import)
`tsfile-cli write` imports CSV/TSV rows into a **new table-model** `.tsfile` (the output is
overwritten). The first input column is the timestamp (epoch milliseconds); the remaining
columns are declared explicitly with `--columns` — there is no type inference.
Timestamps must be **strictly increasing per device**, where a device is identified by its
`tag` column values (rows that share the same tags form one device's timeline). Rows for
different tag combinations may freely interleave and reuse timestamps. Out-of-order input is
rejected with the offending line number, and a failed import leaves no output file behind.
`--output` must differ from the input file.
```
tsfile-cli write --table <name> --columns <spec> -o <out.tsfile> \
[-f csv|tsv] [--no-header] [--header-match] [-v] [<input> | -]
```
`--columns` is a comma-separated list of `name:TYPE:category`, where `category` (case-insensitive)
is `tag` or `field` and `TYPE` (case-insensitive) is one of `BOOLEAN, INT32, INT64, FLOAT, DOUBLE,
STRING, TEXT, TIMESTAMP, DATE, BLOB` — for example `--columns "id1:STRING:tag,s1:INT64:field"`.
`DATE` cells are written as `YYYY-MM-DD`; `TIMESTAMP` cells as epoch milliseconds. Each column is
stored with the engine's default encoding and compression for its type.
| Option | Meaning |
|---|---|
| `--table <name>` | Output table name (lower-cased) |
| `--columns <spec>` | Ordered data columns (excludes the leading timestamp column) |
| `-o, --output <path>` | Output `.tsfile` (required; overwritten) |
| `<input>` / `-` | Input file, or `-` / omitted for stdin |
| `-f csv\|tsv` | Input delimiter (default csv; `json` / `table` are rejected) |
| `--no-header` | Input has no header row (default: first line is a header and is skipped) |
| `--header-match` | Validate header names against `--columns` |
| `-v, --verbose` | Print `wrote N rows to <out>` to stderr (otherwise silent on success) |
An empty cell is written as null. The command is silent on success (Unix-style); pass `-v`
for a one-line summary.
```bash
# round-trip through a pipe
printf 'time,id1,s1\n0,dev,0\n1,dev,10\n' \
| tsfile-cli write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o out.tsfile -
tsfile-cli count -f tsv out.tsfile # -> t1.dev s1 2
```
For tree-model writes, JSON input, or programmatic use, use the C++ SDK directly — see
`cpp/examples/cpp_examples/demo_write.cpp` (`TsFileTableWriter` / `TsFileWriter` + `Tablet`).
## Using the skill with an AI assistant
`cpp/tools/skills/tsfile-cli/SKILL.md` is a machine-readable reference that teaches AI
coding assistants (e.g. Claude Code) how to drive `tsfile-cli` correctly. Such assistants
auto-discover skills from a `.claude/skills/` directory at session start, so "installing"
the skill just means placing it there — either project-level or user-level:
```bash
# project-level (this repository only)
mkdir -p .claude/skills/tsfile-cli
cp cpp/tools/skills/tsfile-cli/SKILL.md .claude/skills/tsfile-cli/SKILL.md
# or user-level (available in all your projects)
mkdir -p ~/.claude/skills/tsfile-cli
cp cpp/tools/skills/tsfile-cli/SKILL.md ~/.claude/skills/tsfile-cli/SKILL.md
```
> The installed `SKILL.md` must begin with its YAML front-matter (`--- … ---`) for the
> assistant to detect it. The in-repo copy carries an Apache license header comment above
> the front-matter; if discovery fails, delete that leading `<!-- … -->` block from the
> installed copy so `---` is the first line.
Start a new assistant session afterward. The skill then activates automatically when you
ask to inspect or import a `.tsfile`; you can also invoke it explicitly (e.g. "use the
tsfile-cli skill").
## Source layout
```text
cpp/tools/
├── tools_main.cc # main(): forwards argv to run_cli
├── cli/ # argument parsing, top-level dispatch, exit codes
├── format/ # csv/tsv/json/table output + CSV/TSV input parsing
├── commands/ # one file per command + shared row-query / statistics helpers
└── skills/tsfile-cli/ # model-facing skill reference (for AI assistants)
```