Add this to your Cargo.toml:
[dependencies] parquet = "^4"
and this to your crate root:
extern crate parquet;
Example usage of reading data:
use std::fs::File; use std::path::Path; use parquet::file::reader::{FileReader, SerializedFileReader}; let file = File::open(&Path::new("/path/to/file")).unwrap(); let reader = SerializedFileReader::new(file).unwrap(); let mut iter = reader.get_row_iter(None).unwrap(); while let Some(record) = iter.next() { println!("{}", record); }
See crate documentation on available API.
If you are upgrading from version 3.0 or previous of this crate, you likely need to change your code to use [ConvertedType
] rather than [LogicalType
] to preserve existing behaviour in your code.
Version 2.4.0 of the Parquet format introduced a LogicalType
to replace the existing ConvertedType
. This crate used parquet::basic::LogicalType
to map to the ConvertedType
, but this has been renamed to parquet::basic::ConvertedType
from version 4.0 of this crate.
The ConvertedType
is deprecated in the format, but is still written to preserve backward compatibility. It is preferred that LogicalType
is used, as it supports nanosecond precision timestamps without using the deprecated Int96
Parquet type.
To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of parquet-format
crate in Cargo.toml.
Parquet requires LLVM. Our windows CI image includes LLVM but to build the libraries locally windows users will have to install LLVM. Follow this link for info.
Run cargo build
or cargo build --release
to build in release mode. Some features take advantage of SSE4.2 instructions, which can be enabled by adding RUSTFLAGS="-C target-feature=+sse4.2"
before the cargo build
command.
Run cargo test
for unit tests. To also run tests related to the binaries, use cargo test --features cli
.
The following binaries are provided (use cargo install --features cli
to install them):
parquet-schema for printing Parquet file schema and metadata. Usage: parquet-schema <file-path>
, where file-path
is the path to a Parquet file. Use -v/--verbose
flag to print full metadata or schema only (when not specified only schema will be printed).
parquet-read for reading records from a Parquet file. Usage: parquet-read <file-path> [num-records]
, where file-path
is the path to a Parquet file, and num-records
is the number of records to read from a file (when not specified all records will be printed). Use -j/--json
to print records in JSON lines format.
parquet-rowcount for reporting the number of records in one or more Parquet files. Usage: parquet-rowcount <file-paths>...
, where <file-paths>...
is a space separated list of one or more files to read.
If you see Library not loaded
error, please make sure LD_LIBRARY_PATH
is set properly:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
Run cargo bench
for benchmarks.
To build documentation, run cargo doc --no-deps
. To compile and view in the browser, run cargo doc --no-deps --open
.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.