docs/source/cpp/tables.rst - arrow - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one
 .. or more contributor license agreements.  See the NOTICE file
 .. distributed with this work for additional information
 .. regarding copyright ownership.  The ASF licenses this file
 .. to you under the Apache License, Version 2.0 (the
 .. "License"); you may not use this file except in compliance
 .. with the License.  You may obtain a copy of the License at

 ..   http://www.apache.org/licenses/LICENSE-2.0

 .. Unless required by applicable law or agreed to in writing,
 .. software distributed under the License is distributed on an
 .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 .. KIND, either express or implied.  See the License for the
 .. specific language governing permissions and limitations
 .. under the License.

 .. default-domain:: cpp
 .. highlight:: cpp

 ============
 Tabular Data
 ============

 .. seealso::
    :doc:`Table and RecordBatch API reference <api/table>`.

 While arrays and chunked arrays represent a one-dimensional sequence of
 homogeneous values, data often comes in the form of two-dimensional sets of
 heterogeneous data (such as database tables, CSV files...).  Arrow provides
 several abstractions to handle such data conveniently and efficiently.

 Fields
 ======

 Fields are used to denote the particular columns of a table (and also
 the particular members of a nested data type such as :class:`arrow::StructType`).
 A field, i.e. an instance of :class:`arrow::Field`, holds together a data
 type, a field name and some optional metadata.

 The recommended way to create a field is to call the :func:`arrow::field`
 factory function.

 Schemas
 =======

 A schema describes the overall structure of a two-dimensional dataset such
 as a table.  It holds a sequence of fields together with some optional
 schema-wide metadata (in addition to per-field metadata).  The recommended
 way to create a schema is to call one the :func:`arrow::schema` factory
 function overloads::

    // Create a schema describing datasets with two columns:
    // a int32 column "A" and a utf8-encoded string column "B"
    std::shared_ptr<arrow::Field> field_a, field_b;
    std::shared_ptr<arrow::Schema> schema;

    field_a = arrow::field("A", arrow::int32());
    field_b = arrow::field("B", arrow::utf8());
    schema = arrow::schema({field_a, field_b});

 Tables
 ======

 A :class:`arrow::Table` is a two-dimensional dataset with chunked arrays for
 columns, together with a schema providing field names.  Also, each chunked
 column must have the same logical length in number of elements (although each
 column can be chunked in a different way).

 Record Batches
 ==============

 A :class:`arrow::RecordBatch` is a two-dimensional dataset of a number of
 contiguous arrays, each the same length.  Like a table, a record batch also
 has a schema which must match its arrays' datatypes.

 Record batches are a convenient unit of work for various serialization
 and computation functions, possibly incremental.

 A table can be streamed as an arbitrary number of record batches using
 a :class:`arrow::TableBatchReader`.  Conversely, a logical sequence of
 record batches can be assembled to form a table using one of the
 :func:`arrow::Table::FromRecordBatches` factory function overloads.
	.. Licensed to the Apache Software Foundation (ASF) under one
	.. or more contributor license agreements. See the NOTICE file
	.. distributed with this work for additional information
	.. regarding copyright ownership. The ASF licenses this file
	.. to you under the Apache License, Version 2.0 (the
	.. "License"); you may not use this file except in compliance
	.. with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	.. software distributed under the License is distributed on an
	.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	.. KIND, either express or implied. See the License for the
	.. specific language governing permissions and limitations
	.. under the License.

	.. default-domain:: cpp
	.. highlight:: cpp

	============
	Tabular Data
	============

	.. seealso::
	:doc:`Table and RecordBatch API reference <api/table>`.

	While arrays and chunked arrays represent a one-dimensional sequence of
	homogeneous values, data often comes in the form of two-dimensional sets of
	heterogeneous data (such as database tables, CSV files...). Arrow provides
	several abstractions to handle such data conveniently and efficiently.

	Fields
	======

	Fields are used to denote the particular columns of a table (and also
	the particular members of a nested data type such as :class:`arrow::StructType`).
	A field, i.e. an instance of :class:`arrow::Field`, holds together a data
	type, a field name and some optional metadata.

	The recommended way to create a field is to call the :func:`arrow::field`
	factory function.

	Schemas
	=======

	A schema describes the overall structure of a two-dimensional dataset such
	as a table. It holds a sequence of fields together with some optional
	schema-wide metadata (in addition to per-field metadata). The recommended
	way to create a schema is to call one the :func:`arrow::schema` factory
	function overloads::

	// Create a schema describing datasets with two columns:
	// a int32 column "A" and a utf8-encoded string column "B"
	std::shared_ptr<arrow::Field> field_a, field_b;
	std::shared_ptr<arrow::Schema> schema;

	field_a = arrow::field("A", arrow::int32());
	field_b = arrow::field("B", arrow::utf8());
	schema = arrow::schema({field_a, field_b});

	Tables
	======

	A :class:`arrow::Table` is a two-dimensional dataset with chunked arrays for
	columns, together with a schema providing field names. Also, each chunked
	column must have the same logical length in number of elements (although each
	column can be chunked in a different way).

	Record Batches
	==============

	A :class:`arrow::RecordBatch` is a two-dimensional dataset of a number of
	contiguous arrays, each the same length. Like a table, a record batch also
	has a schema which must match its arrays' datatypes.

	Record batches are a convenient unit of work for various serialization
	and computation functions, possibly incremental.

	A table can be streamed as an arbitrary number of record batches using
	a :class:`arrow::TableBatchReader`. Conversely, a logical sequence of
	record batches can be assembled to form a table using one of the
	:func:`arrow::Table::FromRecordBatches` factory function overloads.