docs/source/format/CDataInterface/PyCapsuleInterface.rst - arrow - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one
 .. or more contributor license agreements.  See the NOTICE file
 .. distributed with this work for additional information
 .. regarding copyright ownership.  The ASF licenses this file
 .. to you under the Apache License, Version 2.0 (the
 .. "License"); you may not use this file except in compliance
 .. with the License.  You may obtain a copy of the License at

 ..   http://www.apache.org/licenses/LICENSE-2.0

 .. Unless required by applicable law or agreed to in writing,
 .. software distributed under the License is distributed on an
 .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 .. KIND, either express or implied.  See the License for the
 .. specific language governing permissions and limitations
 .. under the License.


 .. _arrow-pycapsule-interface:

 =============================
 The Arrow PyCapsule Interface
 =============================

 Rationale
 =========

 The :ref:`C data interface <c-data-interface>`, :ref:`C stream interface <c-stream-interface>`
 and :ref:`C device interface <c-device-data-interface>` allow moving Arrow data between
 different implementations of Arrow. However, these interfaces don't specify how
 Python libraries should expose these structs to other libraries. Prior to this,
 many libraries simply provided export to PyArrow data structures, using the
 ``_import_from_c`` and ``_export_to_c`` methods. However, this always required
 PyArrow to be installed. In addition, those APIs could cause memory leaks if
 handled improperly.

 This interface allows any library to export Arrow data structures to other
 libraries that understand the same protocol.

 Goals
 -----

 * Standardize the `PyCapsule`_ objects that represent ``ArrowSchema``, ``ArrowArray``,
   ``ArrowArrayStream``, ``ArrowDeviceArray`` and ``ArrowDeviceArrayStream``.
 * Define standard methods that export Arrow data into such capsule objects,
   so that any Python library wanting to accept Arrow data as input can call the
   corresponding method instead of hardcoding support for specific Arrow
   producers.


 Non-goals
 ---------

 * Standardize what public APIs should be used for import. This is left up to
   individual libraries.

 PyCapsule Standard
 ==================

 When exporting Arrow data through Python, the C Data Interface / C Stream Interface
 structures should be wrapped in capsules. Capsules avoid invalid access by
 attaching a name to the pointer and avoid memory leaks by attaching a destructor.
 Thus, they are much safer than passing pointers as integers.

 `PyCapsule`_ allows for a ``name`` to be associated with the capsule, allowing
 consumers to verify that the capsule contains the expected kind of data. To make sure
 Arrow structures are recognized, the following names must be used:

 .. list-table::
    :widths: 25 25
    :header-rows: 1

    * - C Interface Type
      - PyCapsule Name
    * - ArrowSchema
      - ``arrow_schema``
    * - ArrowArray
      - ``arrow_array``
    * - ArrowArrayStream
      - ``arrow_array_stream``
    * - ArrowDeviceArray
      - ``arrow_device_array``
    * - ArrowDeviceArrayStream
      - ``arrow_device_array_stream``

 Lifetime Semantics
 ------------------

 The exported PyCapsules should have a destructor that calls the
 :ref:`release callback <c-data-interface-released>`
 of the Arrow struct, if it is not already null. This prevents a memory leak in
 case the capsule was never passed to another consumer.

 If the capsule has been passed to a consumer, the consumer should have moved
 the data and marked the release callback as null, so there isn’t a risk of
 releasing data the consumer is using.
 :ref:`Read more in the C Data Interface specification <c-data-interface-released>`.

 In case of a device struct, the above mentioned release callback is the
 ``release`` member of the embedded ``ArrowArray`` structure.
 :ref:`Read more in the C Device Interface specification <c-device-data-interface-semantics>`.

 Just like in the C Data Interface, the PyCapsule objects defined here can only
 be consumed once.

 For an example of a PyCapsule with a destructor, see `Create a PyCapsule`_.


 Export Protocol
 ===============

 The interface consists of three separate protocols:

 * ``ArrowSchemaExportable``, which defines the ``__arrow_c_schema__`` method.
 * ``ArrowArrayExportable``, which defines the ``__arrow_c_array__`` method.
 * ``ArrowStreamExportable``, which defines the ``__arrow_c_stream__`` method.

 Two additional protocols are defined for the Device interface:

 * ``ArrowDeviceArrayExportable``, which defines the ``__arrow_c_device_array__`` method.
 * ``ArrowDeviceStreamExportable``, which defines the ``__arrow_c_device_stream__`` method.

 ArrowSchema Export
 ------------------

 Schemas, fields, and data types can implement the method ``__arrow_c_schema__``.

 .. py:method:: __arrow_c_schema__(self)

     Export the object as an ArrowSchema.

     :return: A PyCapsule containing a C ArrowSchema representation of the
         object. The capsule must have a name of ``"arrow_schema"``.


 ArrowArray Export
 -----------------

 Arrays and record batches (contiguous tables) can implement the method
 ``__arrow_c_array__``.

 .. py:method:: __arrow_c_array__(self, requested_schema=None)

     Export the object as a pair of ArrowSchema and ArrowArray structures.

     :param requested_schema: A PyCapsule containing a C ArrowSchema representation
         of a requested schema. Conversion to this schema is best-effort. See
         `Schema Requests`_.
     :type requested_schema: PyCapsule or None

     :return: A pair of PyCapsules containing a C ArrowSchema and ArrowArray,
         respectively. The schema capsule should have the name ``"arrow_schema"``
         and the array capsule should have the name ``"arrow_array"``.

 Libraries supporting the Device interface can implement a ``__arrow_c_device_array__``
 method on those objects, which works the same as ``__arrow_c_array__`` except
 for returning an ArrowDeviceArray structure instead of an ArrowArray structure:

 .. py:method:: __arrow_c_device_array__(self, requested_schema=None, **kwargs)

     Export the object as a pair of ArrowSchema and ArrowDeviceArray structures.

     :param requested_schema: A PyCapsule containing a C ArrowSchema representation
         of a requested schema. Conversion to this schema is best-effort. See
         `Schema Requests`_.
     :type requested_schema: PyCapsule or None
     :param kwargs: Additional keyword arguments should only be accepted if they have
         a default value of ``None``, to allow for future addition of new keywords.
         See :ref:`arrow-pycapsule-interface-device-support` for more details.

     :return: A pair of PyCapsules containing a C ArrowSchema and ArrowDeviceArray,
         respectively. The schema capsule should have the name ``"arrow_schema"``
         and the array capsule should have the name ``"arrow_device_array"``.

 ArrowStream Export
 ------------------

 Tables / DataFrames and streams can implement the method ``__arrow_c_stream__``.

 .. py:method:: __arrow_c_stream__(self, requested_schema=None)

     Export the object as an ArrowArrayStream.

     :param requested_schema: A PyCapsule containing a C ArrowSchema representation
         of a requested schema. Conversion to this schema is best-effort. See
         `Schema Requests`_.
     :type requested_schema: PyCapsule or None

     :return: A PyCapsule containing a C ArrowArrayStream representation of the
         object. The capsule must have a name of ``"arrow_array_stream"``.

 Libraries supporting the Device interface can implement a ``__arrow_c_device_stream__``
 method on those objects, which works the same as ``__arrow_c_stream__`` except
 for returning an ArrowDeviceArrayStream structure instead of an ArrowArrayStream
 structure:

 .. py:method:: __arrow_c_device_stream__(self, requested_schema=None, **kwargs)

     Export the object as an ArrowDeviceArrayStream.

     :param requested_schema: A PyCapsule containing a C ArrowSchema representation
         of a requested schema. Conversion to this schema is best-effort. See
         `Schema Requests`_.
     :type requested_schema: PyCapsule or None
     :param kwargs: Additional keyword arguments should only be accepted if they have
         a default value of ``None``, to allow for future addition of new keywords.
         See :ref:`arrow-pycapsule-interface-device-support` for more details.

     :return: A PyCapsule containing a C ArrowDeviceArrayStream representation of the
         object. The capsule must have a name of ``"arrow_device_array_stream"``.

 Schema Requests
 ---------------

 In some cases, there might be multiple possible Arrow representations of the
 same data. For example, a library might have a single integer type, but Arrow
 has multiple integer types with different sizes and sign. As another example,
 Arrow has several possible encodings for an array of strings: 32-bit offsets,
 64-bit offsets, string view, and dictionary-encoded. A sequence of strings could
 export to any one of these Arrow representations.

 In order to allow the caller to request a specific representation, the
 :meth:`__arrow_c_array__` and :meth:`__arrow_c_stream__` methods take an optional
 ``requested_schema`` parameter. This parameter is a PyCapsule containing an
 ``ArrowSchema``.

 The callee should attempt to provide the data in the requested schema. However,
 if the callee cannot provide the data in the requested schema, they may return
 with the same schema as if ``None`` were passed to ``requested_schema``.

 If the caller requests a schema that is not compatible with the data,
 say requesting a schema with a different number of fields, the callee should
 raise an exception. The requested schema mechanism is only meant to negotiate
 between different representations of the same data and not to allow arbitrary
 schema transformations.

 .. _PyCapsule: https://docs.python.org/3/c-api/capsule.html


 .. _arrow-pycapsule-interface-device-support:

 Device Support
 --------------

 The PyCapsule interface has cross hardware support through using the
 :ref:`C device interface <c-device-data-interface>`. This means it is possible
 to exchange data on non-CPU devices (e.g. CUDA GPUs) and to inspect on what
 device the exchanged data lives.

 For exchanging the data structures, this interface has two sets of protocol
 methods: the standard CPU-only versions (:meth:`__arrow_c_array__` and
 :meth:`__arrow_c_stream__`) and the equivalent device-aware versions
 (:meth:`__arrow_c_device_array__`, and :meth:`__arrow_c_device_stream__`).

 For CPU-only producers, it is allowed to either implement only the standard
 CPU-only protocol methods, or either implement both the CPU-only and device-aware
 methods. The absence of the device version methods implies CPU-only data. For
 CPU-only consumers, it is encouraged to be able to consume both versions of the
 protocol.

 For a device-aware producer whose data structures can only reside in
 non-CPU memory, it is recommended to only implement the device version of the
 protocol (e.g. only add ``__arrow_c_device_array__``, and not add ``__arrow_c_array__``).
 Producers that have data structures that can live both on CPU or non-CPU devices
 can implement both versions of the protocol, but the CPU-only versions
 (:meth:`__arrow_c_array__` and :meth:`__arrow_c_stream__`) should be guaranteed
 to contain valid pointers for CPU memory (thus, when trying to export non-CPU data,
 either raise an error or make a copy to CPU memory).

 Producing the ``ArrowDeviceArray`` and ``ArrowDeviceArrayStream`` structures
 is expected to not involve any cross-device copying of data.

 The device-aware methods (:meth:`__arrow_c_device_array__`, and :meth:`__arrow_c_device_stream__`)
 should accept additional keyword arguments (``**kwargs``), if they have a
 default value of ``None``. This allows for future addition of new optional
 keywords, where the default value for such a new keyword will always be ``None``.
 The implementor is responsible for raising a ``NotImplementedError`` for any
 additional keyword being passed by the user which is not recognised. For
 example:

 .. code-block:: python

     def __arrow_c_device_array__(self, requested_schema=None, **kwargs):

         non_default_kwargs = [
             name for name, value in kwargs.items() if value is not None
         ]
         if non_default_kwargs:
             raise NotImplementedError(
                 f"Received unsupported keyword argument(s): {non_default_kwargs}"
             )

         ...

 Protocol Typehints
 ------------------

 The following typehints can be copied into your library to annotate that a
 function accepts an object implementing one of these protocols.

 .. code-block:: python

     from typing import Tuple, Protocol

     class ArrowSchemaExportable(Protocol):
         def __arrow_c_schema__(self) -> object: ...

     class ArrowArrayExportable(Protocol):
         def __arrow_c_array__(
             self,
             requested_schema: object | None = None
         ) -> Tuple[object, object]:
             ...

     class ArrowStreamExportable(Protocol):
         def __arrow_c_stream__(
             self,
             requested_schema: object | None = None
         ) -> object:
             ...

     class ArrowDeviceArrayExportable(Protocol):
         def __arrow_c_device_array__(
             self,
             requested_schema: object | None = None,
             **kwargs,
         ) -> Tuple[object, object]:
             ...

     class ArrowDeviceStreamExportable(Protocol):
         def __arrow_c_device_stream__(
             self,
             requested_schema: object | None = None,
             **kwargs,
         ) -> object:
             ...

 Examples
 ========

 Create a PyCapsule
 ------------------


 To create a PyCapsule, use the `PyCapsule_New <https://docs.python.org/3/c-api/capsule.html#c.PyCapsule_New>`_
 function. The function must be passed a destructor function that will be called
 to release the data the capsule points to. It must first call the release
 callback if it is not null, then free the struct.

 Below is the code to create a PyCapsule for an ``ArrowSchema``. The code for
 ``ArrowArray`` and ``ArrowArrayStream`` is similar.

 .. tab-set::

    .. tab-item:: C

       .. code-block:: c

          #include <Python.h>

          void ReleaseArrowSchemaPyCapsule(PyObject* capsule) {
              struct ArrowSchema* schema =
                  (struct ArrowSchema*)PyCapsule_GetPointer(capsule, "arrow_schema");
              if (schema->release != NULL) {
                  schema->release(schema);
              }
              free(schema);
          }

          PyObject* ExportArrowSchemaPyCapsule() {
              struct ArrowSchema* schema =
                  (struct ArrowSchema*)malloc(sizeof(struct ArrowSchema));
              // Fill in ArrowSchema fields
              // ...
              return PyCapsule_New(schema, "arrow_schema", ReleaseArrowSchemaPyCapsule);
          }

    .. tab-item:: Cython

       .. code-block:: cython

          cimport cpython
          from libc.stdlib cimport malloc, free

          cdef void release_arrow_schema_py_capsule(object schema_capsule):
              cdef ArrowSchema* schema = <ArrowSchema*>cpython.PyCapsule_GetPointer(
                  schema_capsule, 'arrow_schema'
              )
              if schema.release != NULL:
                  schema.release(schema)

              free(schema)

          cdef object export_arrow_schema_py_capsule():
              cdef ArrowSchema* schema = <ArrowSchema*>malloc(sizeof(ArrowSchema))
              # It's recommended to immediately wrap the struct in a capsule, so
              # if subsequent lines raise an exception memory will not be leaked.
              schema.release = NULL
              capsule = cpython.PyCapsule_New(
                  <void*>schema, 'arrow_schema', release_arrow_schema_py_capsule
              )
              # Fill in ArrowSchema fields:
              # schema.format = ...
              # ...
              return capsule


 Consume a PyCapsule
 -------------------

 To consume a PyCapsule, use the `PyCapsule_GetPointer <https://docs.python.org/3/c-api/capsule.html#c.PyCapsule_GetPointer>`_ function
 to get the pointer to the underlying struct. Import the struct using your
 system's Arrow C Data Interface import function. Only after that should the
 capsule be freed.

 The below example shows how to consume a PyCapsule for an ``ArrowSchema``. The
 code for ``ArrowArray`` and ``ArrowArrayStream`` is similar.

 .. tab-set::

    .. tab-item:: C

       .. code-block:: c

          #include <Python.h>

          // If the capsule is not an ArrowSchema, will return NULL and set an exception.
          struct ArrowSchema* GetArrowSchemaPyCapsule(PyObject* capsule) {
            return PyCapsule_GetPointer(capsule, "arrow_schema");
          }

    .. tab-item:: Cython

       .. code-block:: cython

          cimport cpython

          cdef ArrowSchema* get_arrow_schema_py_capsule(object capsule) except NULL:
              return <ArrowSchema*>cpython.PyCapsule_GetPointer(capsule, 'arrow_schema')

 Backwards Compatibility with PyArrow
 ------------------------------------

 When interacting with PyArrow, the PyCapsule interface should be preferred over
 the ``_export_to_c`` and ``_import_from_c`` methods. However, many libraries will
 want to support a range of PyArrow versions. This can be done via Duck typing.

 For example, if your library had an import method such as:

 .. code-block:: python

    # OLD METHOD
    def from_arrow(arr: pa.Array)
        array_import_ptr = make_array_import_ptr()
        schema_import_ptr = make_schema_import_ptr()
        arr._export_to_c(array_import_ptr, schema_import_ptr)
        return import_c_data(array_import_ptr, schema_import_ptr)

 You can rewrite this method to support both PyArrow and other libraries that
 implement the PyCapsule interface:

 .. code-block:: python

    # NEW METHOD
    def from_arrow(arr)
        # Newer versions of PyArrow as well as other libraries with Arrow data
        # implement this method, so prefer it over _export_to_c.
        if hasattr(arr, "__arrow_c_array__"):
             schema_ptr, array_ptr = arr.__arrow_c_array__()
             return import_c_capsule_data(schema_ptr, array_ptr)
        elif isinstance(arr, pa.Array):
             # Deprecated method, used for older versions of PyArrow
             array_import_ptr = make_array_import_ptr()
             schema_import_ptr = make_schema_import_ptr()
             arr._export_to_c(array_import_ptr, schema_import_ptr)
             return import_c_data(array_import_ptr, schema_import_ptr)
        else:
            raise TypeError(f"Cannot import {type(arr)} as Arrow array data.")

 You may also wish to accept objects implementing the protocol in your
 constructors. For example, in PyArrow, the :func:`array` and :func:`record_batch`
 constructors accept any object that implements the :meth:`__arrow_c_array__` method
 protocol. Similarly, the PyArrow's :func:`schema` constructor accepts any object
 that implements the :meth:`__arrow_c_schema__` method.

 Now if your library has an export to PyArrow function, such as:

 .. code-block:: python

    # OLD METHOD
    def to_arrow(self) -> pa.Array:
        array_export_ptr = make_array_export_ptr()
        schema_export_ptr = make_schema_export_ptr()
        self.export_c_data(array_export_ptr, schema_export_ptr)
        return pa.Array._import_from_c(array_export_ptr, schema_export_ptr)

 You can rewrite this function to use the PyCapsule interface by passing your
 object to the :py:func:`array` constructor, which accepts any object that
 implements the protocol. An easy way to check if the PyArrow version is new
 enough to support this is to check whether ``pa.Array`` has the
 ``__arrow_c_array__`` method.

 .. code-block:: python

   import warnings

   # NEW METHOD
   def to_arrow(self) -> pa.Array:
       # PyArrow added support for constructing arrays from objects implementing
       # __arrow_c_array__ in the same version it added the method for it's own
       # arrays. So we can use hasattr to check if the method is available as
       # a proxy for checking the PyArrow version.
       if hasattr(pa.Array, "__arrow_c_array__"):
           return pa.array(self)
       else:
           array_export_ptr = make_array_export_ptr()
           schema_export_ptr = make_schema_export_ptr()
           self.export_c_data(array_export_ptr, schema_export_ptr)
           return pa.Array._import_from_c(array_export_ptr, schema_export_ptr)


 Comparison with Other Protocols
 ===============================

 Comparison to DataFrame Interchange Protocol
 --------------------------------------------

 `The DataFrame Interchange Protocol <https://data-apis.org/dataframe-protocol/latest/>`_
 is another protocol in Python that allows for the sharing of data between libraries.
 This protocol is complementary to the DataFrame Interchange Protocol. Many of
 the objects that implement this protocol will also implement the DataFrame
 Interchange Protocol.

 This protocol is specific to Arrow-based data structures, while the DataFrame
 Interchange Protocol allows non-Arrow data frames and arrays to be shared as well.
 Because of this, these PyCapsules can support Arrow-specific features such as
 nested columns.

 This protocol is also much more minimal than the DataFrame Interchange Protocol.
 It just handles data export, rather than defining accessors for details like
 number of rows or columns.

 In summary, if you are implementing this protocol, you should also consider
 implementing the DataFrame Interchange Protocol.


 Comparison to ``__arrow_array__`` protocol
 ------------------------------------------

 The :ref:`arrow_array_protocol` protocol is a dunder method that
 defines how PyArrow should import an object as an Arrow array. Unlike this
 protocol, it is specific to PyArrow and isn't used by other libraries. It is
 also limited to arrays and does not support schemas, tabular structures, or streams.
	.. Licensed to the Apache Software Foundation (ASF) under one
	.. or more contributor license agreements. See the NOTICE file
	.. distributed with this work for additional information
	.. regarding copyright ownership. The ASF licenses this file
	.. to you under the Apache License, Version 2.0 (the
	.. "License"); you may not use this file except in compliance
	.. with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	.. software distributed under the License is distributed on an
	.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	.. KIND, either express or implied. See the License for the
	.. specific language governing permissions and limitations
	.. under the License.


	.. _arrow-pycapsule-interface:

	=============================
	The Arrow PyCapsule Interface
	=============================

	Rationale
	=========

	The :ref:`C data interface <c-data-interface>`, :ref:`C stream interface <c-stream-interface>`
	and :ref:`C device interface <c-device-data-interface>` allow moving Arrow data between
	different implementations of Arrow. However, these interfaces don't specify how
	Python libraries should expose these structs to other libraries. Prior to this,
	many libraries simply provided export to PyArrow data structures, using the
	``_import_from_c`` and ``_export_to_c`` methods. However, this always required
	PyArrow to be installed. In addition, those APIs could cause memory leaks if
	handled improperly.

	This interface allows any library to export Arrow data structures to other
	libraries that understand the same protocol.

	Goals
	-----

	* Standardize the `PyCapsule`_ objects that represent ``ArrowSchema``, ``ArrowArray``,
	``ArrowArrayStream``, ``ArrowDeviceArray`` and ``ArrowDeviceArrayStream``.
	* Define standard methods that export Arrow data into such capsule objects,
	so that any Python library wanting to accept Arrow data as input can call the
	corresponding method instead of hardcoding support for specific Arrow
	producers.


	Non-goals
	---------

	* Standardize what public APIs should be used for import. This is left up to
	individual libraries.

	PyCapsule Standard
	==================

	When exporting Arrow data through Python, the C Data Interface / C Stream Interface
	structures should be wrapped in capsules. Capsules avoid invalid access by
	attaching a name to the pointer and avoid memory leaks by attaching a destructor.
	Thus, they are much safer than passing pointers as integers.

	`PyCapsule`_ allows for a ``name`` to be associated with the capsule, allowing
	consumers to verify that the capsule contains the expected kind of data. To make sure
	Arrow structures are recognized, the following names must be used:

	.. list-table::
	:widths: 25 25
	:header-rows: 1

	* - C Interface Type
	- PyCapsule Name
	* - ArrowSchema
	- ``arrow_schema``
	* - ArrowArray
	- ``arrow_array``
	* - ArrowArrayStream
	- ``arrow_array_stream``
	* - ArrowDeviceArray
	- ``arrow_device_array``
	* - ArrowDeviceArrayStream
	- ``arrow_device_array_stream``

	Lifetime Semantics
	------------------

	The exported PyCapsules should have a destructor that calls the
	:ref:`release callback <c-data-interface-released>`
	of the Arrow struct, if it is not already null. This prevents a memory leak in
	case the capsule was never passed to another consumer.

	If the capsule has been passed to a consumer, the consumer should have moved
	the data and marked the release callback as null, so there isn’t a risk of
	releasing data the consumer is using.
	:ref:`Read more in the C Data Interface specification <c-data-interface-released>`.

	In case of a device struct, the above mentioned release callback is the
	``release`` member of the embedded ``ArrowArray`` structure.
	:ref:`Read more in the C Device Interface specification <c-device-data-interface-semantics>`.

	Just like in the C Data Interface, the PyCapsule objects defined here can only
	be consumed once.

	For an example of a PyCapsule with a destructor, see `Create a PyCapsule`_.


	Export Protocol
	===============

	The interface consists of three separate protocols:

	* ``ArrowSchemaExportable``, which defines the ``__arrow_c_schema__`` method.
	* ``ArrowArrayExportable``, which defines the ``__arrow_c_array__`` method.
	* ``ArrowStreamExportable``, which defines the ``__arrow_c_stream__`` method.

	Two additional protocols are defined for the Device interface:

	* ``ArrowDeviceArrayExportable``, which defines the ``__arrow_c_device_array__`` method.
	* ``ArrowDeviceStreamExportable``, which defines the ``__arrow_c_device_stream__`` method.

	ArrowSchema Export
	------------------

	Schemas, fields, and data types can implement the method ``__arrow_c_schema__``.

	.. py:method:: __arrow_c_schema__(self)

	Export the object as an ArrowSchema.

	:return: A PyCapsule containing a C ArrowSchema representation of the
	object. The capsule must have a name of ``"arrow_schema"``.


	ArrowArray Export
	-----------------

	Arrays and record batches (contiguous tables) can implement the method
	``__arrow_c_array__``.

	.. py:method:: __arrow_c_array__(self, requested_schema=None)

	Export the object as a pair of ArrowSchema and ArrowArray structures.

	:param requested_schema: A PyCapsule containing a C ArrowSchema representation
	of a requested schema. Conversion to this schema is best-effort. See
	`Schema Requests`_.
	:type requested_schema: PyCapsule or None

	:return: A pair of PyCapsules containing a C ArrowSchema and ArrowArray,
	respectively. The schema capsule should have the name ``"arrow_schema"``
	and the array capsule should have the name ``"arrow_array"``.

	Libraries supporting the Device interface can implement a ``__arrow_c_device_array__``
	method on those objects, which works the same as ``__arrow_c_array__`` except
	for returning an ArrowDeviceArray structure instead of an ArrowArray structure:

	.. py:method:: __arrow_c_device_array__(self, requested_schema=None, **kwargs)

	Export the object as a pair of ArrowSchema and ArrowDeviceArray structures.

	:param requested_schema: A PyCapsule containing a C ArrowSchema representation
	of a requested schema. Conversion to this schema is best-effort. See
	`Schema Requests`_.
	:type requested_schema: PyCapsule or None
	:param kwargs: Additional keyword arguments should only be accepted if they have
	a default value of ``None``, to allow for future addition of new keywords.
	See :ref:`arrow-pycapsule-interface-device-support` for more details.

	:return: A pair of PyCapsules containing a C ArrowSchema and ArrowDeviceArray,
	respectively. The schema capsule should have the name ``"arrow_schema"``
	and the array capsule should have the name ``"arrow_device_array"``.

	ArrowStream Export
	------------------

	Tables / DataFrames and streams can implement the method ``__arrow_c_stream__``.

	.. py:method:: __arrow_c_stream__(self, requested_schema=None)

	Export the object as an ArrowArrayStream.

	:param requested_schema: A PyCapsule containing a C ArrowSchema representation
	of a requested schema. Conversion to this schema is best-effort. See
	`Schema Requests`_.
	:type requested_schema: PyCapsule or None

	:return: A PyCapsule containing a C ArrowArrayStream representation of the
	object. The capsule must have a name of ``"arrow_array_stream"``.

	Libraries supporting the Device interface can implement a ``__arrow_c_device_stream__``
	method on those objects, which works the same as ``__arrow_c_stream__`` except
	for returning an ArrowDeviceArrayStream structure instead of an ArrowArrayStream
	structure:

	.. py:method:: __arrow_c_device_stream__(self, requested_schema=None, **kwargs)

	Export the object as an ArrowDeviceArrayStream.

	:param requested_schema: A PyCapsule containing a C ArrowSchema representation
	of a requested schema. Conversion to this schema is best-effort. See
	`Schema Requests`_.
	:type requested_schema: PyCapsule or None
	:param kwargs: Additional keyword arguments should only be accepted if they have
	a default value of ``None``, to allow for future addition of new keywords.
	See :ref:`arrow-pycapsule-interface-device-support` for more details.

	:return: A PyCapsule containing a C ArrowDeviceArrayStream representation of the
	object. The capsule must have a name of ``"arrow_device_array_stream"``.

	Schema Requests
	---------------

	In some cases, there might be multiple possible Arrow representations of the
	same data. For example, a library might have a single integer type, but Arrow
	has multiple integer types with different sizes and sign. As another example,
	Arrow has several possible encodings for an array of strings: 32-bit offsets,
	64-bit offsets, string view, and dictionary-encoded. A sequence of strings could
	export to any one of these Arrow representations.

	In order to allow the caller to request a specific representation, the
	:meth:`__arrow_c_array__` and :meth:`__arrow_c_stream__` methods take an optional
	``requested_schema`` parameter. This parameter is a PyCapsule containing an
	``ArrowSchema``.

	The callee should attempt to provide the data in the requested schema. However,
	if the callee cannot provide the data in the requested schema, they may return
	with the same schema as if ``None`` were passed to ``requested_schema``.

	If the caller requests a schema that is not compatible with the data,
	say requesting a schema with a different number of fields, the callee should
	raise an exception. The requested schema mechanism is only meant to negotiate
	between different representations of the same data and not to allow arbitrary
	schema transformations.

	.. _PyCapsule: https://docs.python.org/3/c-api/capsule.html


	.. _arrow-pycapsule-interface-device-support:

	Device Support
	--------------

	The PyCapsule interface has cross hardware support through using the
	:ref:`C device interface <c-device-data-interface>`. This means it is possible
	to exchange data on non-CPU devices (e.g. CUDA GPUs) and to inspect on what
	device the exchanged data lives.

	For exchanging the data structures, this interface has two sets of protocol
	methods: the standard CPU-only versions (:meth:`__arrow_c_array__` and
	:meth:`__arrow_c_stream__`) and the equivalent device-aware versions
	(:meth:`__arrow_c_device_array__`, and :meth:`__arrow_c_device_stream__`).

	For CPU-only producers, it is allowed to either implement only the standard
	CPU-only protocol methods, or either implement both the CPU-only and device-aware
	methods. The absence of the device version methods implies CPU-only data. For
	CPU-only consumers, it is encouraged to be able to consume both versions of the
	protocol.

	For a device-aware producer whose data structures can only reside in
	non-CPU memory, it is recommended to only implement the device version of the
	protocol (e.g. only add ``__arrow_c_device_array__``, and not add ``__arrow_c_array__``).
	Producers that have data structures that can live both on CPU or non-CPU devices
	can implement both versions of the protocol, but the CPU-only versions
	(:meth:`__arrow_c_array__` and :meth:`__arrow_c_stream__`) should be guaranteed
	to contain valid pointers for CPU memory (thus, when trying to export non-CPU data,
	either raise an error or make a copy to CPU memory).

	Producing the ``ArrowDeviceArray`` and ``ArrowDeviceArrayStream`` structures
	is expected to not involve any cross-device copying of data.

	The device-aware methods (:meth:`__arrow_c_device_array__`, and :meth:`__arrow_c_device_stream__`)
	should accept additional keyword arguments (``**kwargs``), if they have a
	default value of ``None``. This allows for future addition of new optional
	keywords, where the default value for such a new keyword will always be ``None``.
	The implementor is responsible for raising a ``NotImplementedError`` for any
	additional keyword being passed by the user which is not recognised. For
	example:

	.. code-block:: python

	def __arrow_c_device_array__(self, requested_schema=None, **kwargs):

	non_default_kwargs = [
	name for name, value in kwargs.items() if value is not None
	]
	if non_default_kwargs:
	raise NotImplementedError(
	f"Received unsupported keyword argument(s): {non_default_kwargs}"
	)

	...

	Protocol Typehints
	------------------

	The following typehints can be copied into your library to annotate that a
	function accepts an object implementing one of these protocols.

	.. code-block:: python

	from typing import Tuple, Protocol

	class ArrowSchemaExportable(Protocol):
	def __arrow_c_schema__(self) -> object: ...

	class ArrowArrayExportable(Protocol):
	def __arrow_c_array__(
	self,
	requested_schema: object \| None = None
	) -> Tuple[object, object]:
	...

	class ArrowStreamExportable(Protocol):
	def __arrow_c_stream__(
	self,
	requested_schema: object \| None = None
	) -> object:
	...

	class ArrowDeviceArrayExportable(Protocol):
	def __arrow_c_device_array__(
	self,
	requested_schema: object \| None = None,
	**kwargs,
	) -> Tuple[object, object]:
	...

	class ArrowDeviceStreamExportable(Protocol):
	def __arrow_c_device_stream__(
	self,
	requested_schema: object \| None = None,
	**kwargs,
	) -> object:
	...

	Examples
	========

	Create a PyCapsule
	------------------


	To create a PyCapsule, use the `PyCapsule_New <https://docs.python.org/3/c-api/capsule.html#c.PyCapsule_New>`_
	function. The function must be passed a destructor function that will be called
	to release the data the capsule points to. It must first call the release
	callback if it is not null, then free the struct.

	Below is the code to create a PyCapsule for an ``ArrowSchema``. The code for
	``ArrowArray`` and ``ArrowArrayStream`` is similar.

	.. tab-set::

	.. tab-item:: C

	.. code-block:: c

	#include <Python.h>

	void ReleaseArrowSchemaPyCapsule(PyObject* capsule) {
	struct ArrowSchema* schema =
	(struct ArrowSchema*)PyCapsule_GetPointer(capsule, "arrow_schema");
	if (schema->release != NULL) {
	schema->release(schema);
	}
	free(schema);
	}

	PyObject* ExportArrowSchemaPyCapsule() {
	struct ArrowSchema* schema =
	(struct ArrowSchema*)malloc(sizeof(struct ArrowSchema));
	// Fill in ArrowSchema fields
	// ...
	return PyCapsule_New(schema, "arrow_schema", ReleaseArrowSchemaPyCapsule);
	}

	.. tab-item:: Cython

	.. code-block:: cython

	cimport cpython
	from libc.stdlib cimport malloc, free

	cdef void release_arrow_schema_py_capsule(object schema_capsule):
	cdef ArrowSchema* schema = <ArrowSchema*>cpython.PyCapsule_GetPointer(
	schema_capsule, 'arrow_schema'
	)
	if schema.release != NULL:
	schema.release(schema)

	free(schema)

	cdef object export_arrow_schema_py_capsule():
	cdef ArrowSchema* schema = <ArrowSchema*>malloc(sizeof(ArrowSchema))
	# It's recommended to immediately wrap the struct in a capsule, so
	# if subsequent lines raise an exception memory will not be leaked.
	schema.release = NULL
	capsule = cpython.PyCapsule_New(
	<void*>schema, 'arrow_schema', release_arrow_schema_py_capsule
	)
	# Fill in ArrowSchema fields:
	# schema.format = ...
	# ...
	return capsule


	Consume a PyCapsule
	-------------------

	To consume a PyCapsule, use the `PyCapsule_GetPointer <https://docs.python.org/3/c-api/capsule.html#c.PyCapsule_GetPointer>`_ function
	to get the pointer to the underlying struct. Import the struct using your
	system's Arrow C Data Interface import function. Only after that should the
	capsule be freed.

	The below example shows how to consume a PyCapsule for an ``ArrowSchema``. The
	code for ``ArrowArray`` and ``ArrowArrayStream`` is similar.

	.. tab-set::

	.. tab-item:: C

	.. code-block:: c

	#include <Python.h>

	// If the capsule is not an ArrowSchema, will return NULL and set an exception.
	struct ArrowSchema* GetArrowSchemaPyCapsule(PyObject* capsule) {
	return PyCapsule_GetPointer(capsule, "arrow_schema");
	}

	.. tab-item:: Cython

	.. code-block:: cython

	cimport cpython

	cdef ArrowSchema* get_arrow_schema_py_capsule(object capsule) except NULL:
	return <ArrowSchema*>cpython.PyCapsule_GetPointer(capsule, 'arrow_schema')

	Backwards Compatibility with PyArrow
	------------------------------------

	When interacting with PyArrow, the PyCapsule interface should be preferred over
	the ``_export_to_c`` and ``_import_from_c`` methods. However, many libraries will
	want to support a range of PyArrow versions. This can be done via Duck typing.

	For example, if your library had an import method such as:

	.. code-block:: python

	# OLD METHOD
	def from_arrow(arr: pa.Array)
	array_import_ptr = make_array_import_ptr()
	schema_import_ptr = make_schema_import_ptr()
	arr._export_to_c(array_import_ptr, schema_import_ptr)
	return import_c_data(array_import_ptr, schema_import_ptr)

	You can rewrite this method to support both PyArrow and other libraries that
	implement the PyCapsule interface:

	.. code-block:: python

	# NEW METHOD
	def from_arrow(arr)
	# Newer versions of PyArrow as well as other libraries with Arrow data
	# implement this method, so prefer it over _export_to_c.
	if hasattr(arr, "__arrow_c_array__"):
	schema_ptr, array_ptr = arr.__arrow_c_array__()
	return import_c_capsule_data(schema_ptr, array_ptr)
	elif isinstance(arr, pa.Array):
	# Deprecated method, used for older versions of PyArrow
	array_import_ptr = make_array_import_ptr()
	schema_import_ptr = make_schema_import_ptr()
	arr._export_to_c(array_import_ptr, schema_import_ptr)
	return import_c_data(array_import_ptr, schema_import_ptr)
	else:
	raise TypeError(f"Cannot import {type(arr)} as Arrow array data.")

	You may also wish to accept objects implementing the protocol in your
	constructors. For example, in PyArrow, the :func:`array` and :func:`record_batch`
	constructors accept any object that implements the :meth:`__arrow_c_array__` method
	protocol. Similarly, the PyArrow's :func:`schema` constructor accepts any object
	that implements the :meth:`__arrow_c_schema__` method.

	Now if your library has an export to PyArrow function, such as:

	.. code-block:: python

	# OLD METHOD
	def to_arrow(self) -> pa.Array:
	array_export_ptr = make_array_export_ptr()
	schema_export_ptr = make_schema_export_ptr()
	self.export_c_data(array_export_ptr, schema_export_ptr)
	return pa.Array._import_from_c(array_export_ptr, schema_export_ptr)

	You can rewrite this function to use the PyCapsule interface by passing your
	object to the :py:func:`array` constructor, which accepts any object that
	implements the protocol. An easy way to check if the PyArrow version is new
	enough to support this is to check whether ``pa.Array`` has the
	``__arrow_c_array__`` method.

	.. code-block:: python

	import warnings

	# NEW METHOD
	def to_arrow(self) -> pa.Array:
	# PyArrow added support for constructing arrays from objects implementing
	# __arrow_c_array__ in the same version it added the method for it's own
	# arrays. So we can use hasattr to check if the method is available as
	# a proxy for checking the PyArrow version.
	if hasattr(pa.Array, "__arrow_c_array__"):
	return pa.array(self)
	else:
	array_export_ptr = make_array_export_ptr()
	schema_export_ptr = make_schema_export_ptr()
	self.export_c_data(array_export_ptr, schema_export_ptr)
	return pa.Array._import_from_c(array_export_ptr, schema_export_ptr)


	Comparison with Other Protocols
	===============================

	Comparison to DataFrame Interchange Protocol
	--------------------------------------------

	`The DataFrame Interchange Protocol <https://data-apis.org/dataframe-protocol/latest/>`_
	is another protocol in Python that allows for the sharing of data between libraries.
	This protocol is complementary to the DataFrame Interchange Protocol. Many of
	the objects that implement this protocol will also implement the DataFrame
	Interchange Protocol.

	This protocol is specific to Arrow-based data structures, while the DataFrame
	Interchange Protocol allows non-Arrow data frames and arrays to be shared as well.
	Because of this, these PyCapsules can support Arrow-specific features such as
	nested columns.

	This protocol is also much more minimal than the DataFrame Interchange Protocol.
	It just handles data export, rather than defining accessors for details like
	number of rows or columns.

	In summary, if you are implementing this protocol, you should also consider
	implementing the DataFrame Interchange Protocol.


	Comparison to ``__arrow_array__`` protocol
	------------------------------------------

	The :ref:`arrow_array_protocol` protocol is a dunder method that
	defines how PyArrow should import an object as an Arrow array. Unlike this
	protocol, it is specific to PyArrow and isn't used by other libraries. It is
	also limited to arrays and does not support schemas, tabular structures, or streams.