Package index

Read datasets

Open multi-file datasets as Arrow Dataset objects.

open_dataset() : Open a multi-file dataset
open_delim_dataset() open_csv_dataset() open_tsv_dataset() : Open a multi-file dataset of CSV or other delimiter-separated format
csv_read_options() : CSV Reading Options
csv_parse_options() : CSV Parsing Options
csv_convert_options() : CSV Convert Options

Write datasets

Write multi-file datasets to disk.

write_dataset() : Write a dataset
write_delim_dataset() write_csv_dataset() write_tsv_dataset() : Write a dataset into partitioned flat files.
csv_write_options() : CSV Writing Options

Read files

Read files in a variety of formats in as tibbles or Arrow Tables.

read_delim_arrow() read_csv_arrow() read_csv2_arrow() read_tsv_arrow() : Read a CSV or other delimited file with Arrow
read_parquet() : Read a Parquet file
read_feather() read_ipc_file() : Read a Feather file (an Arrow IPC file)
read_ipc_stream() : Read Arrow IPC stream format
read_json_arrow() : Read a JSON file

Write files

Write to files in a variety of formats.

write_csv_arrow() : Write CSV file to disk
write_parquet() : Write Parquet file to disk
write_feather() write_ipc_file() : Write a Feather file (an Arrow IPC file)
write_ipc_stream() : Write Arrow IPC stream format
write_to_raw() : Write Arrow data to a raw vector

Creating Arrow data containers

Classes and functions for creating Arrow data containers.

scalar() : Create an Arrow Scalar
arrow_array() : Create an Arrow Array
chunked_array() : Create a Chunked Array
record_batch() : Create a RecordBatch
arrow_table() : Create an Arrow Table
buffer() : Create a Buffer
vctrs_extension_array() vctrs_extension_type() : Extension type for generic typed vectors

Working with Arrow data containers

Functions for converting R objects to Arrow data containers and combining Arrow data containers.

as_arrow_array() : Convert an object to an Arrow Array
as_chunked_array() : Convert an object to an Arrow ChunkedArray
as_record_batch() : Convert an object to an Arrow RecordBatch
as_arrow_table() : Convert an object to an Arrow Table
concat_arrays() c(<Array>) : Concatenate zero or more Arrays
concat_tables() : Concatenate one or more Tables

Arrow data types

int8() int16() int32() int64() uint8() uint16() uint32() uint64() float16() halffloat() float32() float() float64() boolean() bool() utf8() large_utf8() binary() large_binary() fixed_size_binary() string() date32() date64() time32() time64() duration() null() timestamp() decimal() decimal32() decimal64() decimal128() decimal256() struct() list_of() large_list_of() fixed_size_list_of() map_of() : Create Arrow data types
dictionary() : Create a dictionary type
new_extension_type() new_extension_array() register_extension_type() reregister_extension_type() unregister_extension_type() : Extension types
vctrs_extension_array() vctrs_extension_type() : Extension type for generic typed vectors
as_data_type() : Convert an object to an Arrow DataType
infer_type() type() : Infer the arrow Array type from an R object

Fields and schemas

field() : Create a Field
schema() : Create a schema or extract one from an object.
unify_schemas() : Combine and harmonize schemas
as_schema() : Convert an object to an Arrow Schema
infer_schema() : Extract a schema from an object
read_schema() : Read a Schema from a stream

Computation

Functionality for computing values on Arrow data objects.

acero arrow-functions arrow-verbs arrow-dplyr : Functions available in Arrow dplyr queries
call_function() : Call an Arrow compute function
match_arrow() is_in() : Value matching for Arrow objects
value_counts() :
table for Arrow objects
list_compute_functions() : List available Arrow C++ compute functions
register_scalar_function() : Register user-defined functions
show_exec_plan() : Show the details of an Arrow Execution Plan

DuckDB

Pass data to and from DuckDB

to_arrow() : Create an Arrow object from a DuckDB connection
to_duckdb() : Create a (virtual) DuckDB table from an Arrow object

File systems

Functions for working with files on S3 and GCS

s3_bucket() : Connect to an AWS S3 bucket
gs_bucket() : Connect to a Google Cloud Storage (GCS) bucket
copy_files() : Copy files between FileSystems

Flight

load_flight_server() : Load a Python Flight server
flight_connect() : Connect to a Flight server
flight_disconnect() : Explicitly close a Flight client
flight_get() : Get data from a Flight server
flight_put() : Send data to a Flight server
list_flights() flight_path_exists() : See available resources on a Flight server

Arrow Configuration

arrow_info() arrow_available() arrow_with_acero() arrow_with_dataset() arrow_with_substrait() arrow_with_parquet() arrow_with_s3() arrow_with_gcs() arrow_with_json() : Report information on the package's capabilities
cpu_count() set_cpu_count() : Manage the global CPU thread pool in libarrow
io_thread_count() set_io_thread_count() : Manage the global I/O thread pool in libarrow
install_arrow() : Install or upgrade the Arrow library
install_pyarrow() : Install pyarrow for use with reticulate
create_package_with_all_dependencies() : Create a source bundle that includes all thirdparty dependencies

Input/Output

InputStream RandomAccessFile MemoryMappedFile ReadableFile BufferReader : InputStream classes
read_message() : Read a Message from a stream
mmap_open() : Open a memory mapped file
mmap_create() : Create a new read/write memory mapped file of a given size
OutputStream FileOutputStream BufferOutputStream : OutputStream classes
Message : Message class
MessageReader : MessageReader class
compression CompressedOutputStream CompressedInputStream : Compressed stream classes
Codec : Compression Codec class
codec_is_available() : Check whether a compression codec is available

File read/writer interface

ParquetFileReader : ParquetFileReader class
ParquetReaderProperties : ParquetReaderProperties class
ParquetArrowReaderProperties : ParquetArrowReaderProperties class
ParquetFileWriter : ParquetFileWriter class
ParquetWriterProperties : ParquetWriterProperties class
FeatherReader : FeatherReader class
CsvTableReader JsonTableReader : Arrow CSV and JSON table reader classes
CsvReadOptions CsvWriteOptions CsvParseOptions TimestampParser CsvConvertOptions JsonReadOptions JsonParseOptions : File reader options
RecordBatchReader RecordBatchStreamReader RecordBatchFileReader : RecordBatchReader classes
RecordBatchWriter RecordBatchStreamWriter RecordBatchFileWriter : RecordBatchWriter classes
as_record_batch_reader() : Convert an object to an Arrow RecordBatchReader

Low-level C++ wrappers

Low-level R6 class representations of Arrow C++ objects intended for advanced users.

Buffer : Buffer class
Scalar : Arrow scalars
Array DictionaryArray StructArray ListArray LargeListArray FixedSizeListArray MapArray : Array Classes
ChunkedArray : ChunkedArray class
RecordBatch : RecordBatch class
Schema : Schema class
Field : Field class
Table : Table class
DataType : DataType class
ArrayData : ArrayData class
DictionaryType : class DictionaryType
FixedWidthType : FixedWidthType class
ExtensionType : ExtensionType class
ExtensionArray : ExtensionArray class

Dataset and Filesystem R6 classes and helper functions

R6 classes and helper functions useful for when working with multi-file datases in Arrow.

Dataset FileSystemDataset UnionDataset InMemoryDataset DatasetFactory FileSystemDatasetFactory : Multi-file datasets
dataset_factory() : Create a DatasetFactory
Partitioning DirectoryPartitioning HivePartitioning DirectoryPartitioningFactory HivePartitioningFactory : Define Partitioning for a Dataset
Expression : Arrow expressions
Scanner ScannerBuilder : Scan the contents of a dataset
FileFormat ParquetFileFormat IpcFileFormat : Dataset file formats
CsvFileFormat : CSV dataset file format
JsonFileFormat : JSON dataset file format
FileWriteOptions : Format-specific write options
FragmentScanOptions CsvFragmentScanOptions ParquetFragmentScanOptions JsonFragmentScanOptions : Format-specific scan options
hive_partition() : Construct Hive partitioning
map_batches() : Apply a function to a stream of RecordBatches
FileSystem LocalFileSystem S3FileSystem GcsFileSystem SubTreeFileSystem : FileSystem classes
FileInfo : FileSystem entry info
FileSelector : file selector