docs/source/python/cuda.rst - arrow - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one
 .. or more contributor license agreements.  See the NOTICE file
 .. distributed with this work for additional information
 .. regarding copyright ownership.  The ASF licenses this file
 .. to you under the Apache License, Version 2.0 (the
 .. "License"); you may not use this file except in compliance
 .. with the License.  You may obtain a copy of the License at

 ..   http://www.apache.org/licenses/LICENSE-2.0

 .. Unless required by applicable law or agreed to in writing,
 .. software distributed under the License is distributed on an
 .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 .. KIND, either express or implied.  See the License for the
 .. specific language governing permissions and limitations
 .. under the License.

 .. currentmodule:: pyarrow.cuda

 CUDA Integration
 ================

 Arrow is not limited to CPU buffers (located in the computer's main memory,
 also named "host memory").  It also has provisions for accessing buffers
 located on a CUDA-capable GPU device (in "device memory").

 .. note::
    This functionality is optional and must have been enabled at build time.
    If this is not done by your package manager, you might have to build Arrow
    yourself.

 CUDA Contexts
 -------------

 A CUDA context represents access to a particular CUDA-capable device.
 For example, this is creating a CUDA context accessing CUDA device number 0::

    >>> from pyarrow import cuda
    >>> ctx = cuda.Context(0)
    >>>

 CUDA Buffers
 ------------

 A CUDA buffer can be created by copying data from host memory to the memory
 of a CUDA device, using the :meth:`Context.buffer_from_data` method.
 The source data can be any Python buffer-like object, including Arrow buffers::

    >>> import numpy as np
    >>> arr = np.arange(4, dtype=np.int32)
    >>> arr.nbytes
    16
    >>> cuda_buf = ctx.buffer_from_data(arr)
    >>> type(cuda_buf)
    pyarrow._cuda.CudaBuffer
    >>> cuda_buf.size     # The buffer's size in bytes
    16
    >>> cuda_buf.address  # The buffer's address in device memory
    30088364544
    >>> cuda_buf.context.device_number
    0

 Conversely, you can copy back a CUDA buffer to device memory, getting a regular
 CPU buffer::

    >>> buf = cuda_buf.copy_to_host()
    >>> type(buf)
    pyarrow.lib.Buffer
    >>> np.frombuffer(buf, dtype=np.int32)
    array([0, 1, 2, 3], dtype=int32)

 .. warning::
    Many Arrow functions expect a CPU buffer but will not check the buffer's
    actual type.  You will get a crash if you pass a CUDA buffer to such a
    function::

       >>> pa.py_buffer(b"x" * 16).equals(cuda_buf)
       Segmentation fault

 Numba Integration
 -----------------

 There is not much you can do directly with Arrow CUDA buffers from Python,
 but they support interoperation with `Numba <https://numba.pydata.org/>`_,
 a JIT compiler which can turn Python code into optimized CUDA kernels.

 Arrow to Numba
 ~~~~~~~~~~~~~~

 First let's define a Numba CUDA kernel operating on an ``int32`` array.  Here,
 we will simply increment each array element (assuming the array is writable)::

    import numba.cuda

    @numba.cuda.jit
    def increment_by_one(an_array):
        pos = numba.cuda.grid(1)
        if pos < an_array.size:
            an_array[pos] += 1

 Then we need to wrap our CUDA buffer into a Numba "device array" with the right
 array metadata (shape, strides and datatype).  This is necessary so that Numba
 can identify the array's characteristics and compile the kernel with the
 appropriate type declarations.

 In this case the metadata can simply be got from the original Numpy array.
 Note the GPU data isn't copied, just pointed to::

    >>> from numba.cuda.cudadrv.devicearray import DeviceNDArray
    >>> device_arr = DeviceNDArray(arr.shape, arr.strides, arr.dtype, gpu_data=cuda_buf.to_numba())

 (ideally we could have defined an Arrow array in CPU memory, copied it to CUDA
 memory without losing type information, and then invoked the Numba kernel on it
 without constructing the DeviceNDArray by hand; this is not yet possible)

 Finally we can run the Numba CUDA kernel on the Numba device array (here
 with a 16x16 grid size)::

    >>> increment_by_one[16, 16](device_arr)

 And the results can be checked by copying back the CUDA buffer to CPU memory::

    >>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
    array([1, 2, 3, 4], dtype=int32)

 Numba to Arrow
 ~~~~~~~~~~~~~~

 Conversely, a Numba-created device array can be viewed as an Arrow CUDA buffer,
 using the :meth:`CudaBuffer.from_numba` factory method.

 For the sake of example, let's first create a Numba device array::

    >>> arr = np.arange(10, 14, dtype=np.int32)
    >>> arr
    array([10, 11, 12, 13], dtype=int32)
    >>> device_arr = numba.cuda.to_device(arr)

 Then we can create a CUDA buffer pointing the device array's memory.
 We don't need to pass a CUDA context explicitly this time: the appropriate
 CUDA context is automatically retrieved and adapted from the Numba object.

 ::

    >>> cuda_buf = cuda.CudaBuffer.from_numba(device_arr.gpu_data)
    >>> cuda_buf.size
    16
    >>> cuda_buf.address
    30088364032
    >>> cuda_buf.context.device_number
    0

 Of course, we can copy the CUDA buffer back to host memory::

    >>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
    array([10, 11, 12, 13], dtype=int32)

 .. seealso::
    Documentation for Numba's `CUDA support <https://numba.pydata.org/numba-doc/latest/cuda/index.html>`_.
	.. Licensed to the Apache Software Foundation (ASF) under one
	.. or more contributor license agreements. See the NOTICE file
	.. distributed with this work for additional information
	.. regarding copyright ownership. The ASF licenses this file
	.. to you under the Apache License, Version 2.0 (the
	.. "License"); you may not use this file except in compliance
	.. with the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing,
	.. software distributed under the License is distributed on an
	.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	.. KIND, either express or implied. See the License for the
	.. specific language governing permissions and limitations
	.. under the License.

	.. currentmodule:: pyarrow.cuda

	CUDA Integration
	================

	Arrow is not limited to CPU buffers (located in the computer's main memory,
	also named "host memory"). It also has provisions for accessing buffers
	located on a CUDA-capable GPU device (in "device memory").

	.. note::
	This functionality is optional and must have been enabled at build time.
	If this is not done by your package manager, you might have to build Arrow
	yourself.

	CUDA Contexts
	-------------

	A CUDA context represents access to a particular CUDA-capable device.
	For example, this is creating a CUDA context accessing CUDA device number 0::

	>>> from pyarrow import cuda
	>>> ctx = cuda.Context(0)
	>>>

	CUDA Buffers
	------------

	A CUDA buffer can be created by copying data from host memory to the memory
	of a CUDA device, using the :meth:`Context.buffer_from_data` method.
	The source data can be any Python buffer-like object, including Arrow buffers::

	>>> import numpy as np
	>>> arr = np.arange(4, dtype=np.int32)
	>>> arr.nbytes
	16
	>>> cuda_buf = ctx.buffer_from_data(arr)
	>>> type(cuda_buf)
	pyarrow._cuda.CudaBuffer
	>>> cuda_buf.size # The buffer's size in bytes
	16
	>>> cuda_buf.address # The buffer's address in device memory
	30088364544
	>>> cuda_buf.context.device_number
	0

	Conversely, you can copy back a CUDA buffer to device memory, getting a regular
	CPU buffer::

	>>> buf = cuda_buf.copy_to_host()
	>>> type(buf)
	pyarrow.lib.Buffer
	>>> np.frombuffer(buf, dtype=np.int32)
	array([0, 1, 2, 3], dtype=int32)

	.. warning::
	Many Arrow functions expect a CPU buffer but will not check the buffer's
	actual type. You will get a crash if you pass a CUDA buffer to such a
	function::

	>>> pa.py_buffer(b"x" * 16).equals(cuda_buf)
	Segmentation fault

	Numba Integration
	-----------------

	There is not much you can do directly with Arrow CUDA buffers from Python,
	but they support interoperation with `Numba <https://numba.pydata.org/>`_,
	a JIT compiler which can turn Python code into optimized CUDA kernels.

	Arrow to Numba
	~~~~~~~~~~~~~~

	First let's define a Numba CUDA kernel operating on an ``int32`` array. Here,
	we will simply increment each array element (assuming the array is writable)::

	import numba.cuda

	@numba.cuda.jit
	def increment_by_one(an_array):
	pos = numba.cuda.grid(1)
	if pos < an_array.size:
	an_array[pos] += 1

	Then we need to wrap our CUDA buffer into a Numba "device array" with the right
	array metadata (shape, strides and datatype). This is necessary so that Numba
	can identify the array's characteristics and compile the kernel with the
	appropriate type declarations.

	In this case the metadata can simply be got from the original Numpy array.
	Note the GPU data isn't copied, just pointed to::

	>>> from numba.cuda.cudadrv.devicearray import DeviceNDArray
	>>> device_arr = DeviceNDArray(arr.shape, arr.strides, arr.dtype, gpu_data=cuda_buf.to_numba())

	(ideally we could have defined an Arrow array in CPU memory, copied it to CUDA
	memory without losing type information, and then invoked the Numba kernel on it
	without constructing the DeviceNDArray by hand; this is not yet possible)

	Finally we can run the Numba CUDA kernel on the Numba device array (here
	with a 16x16 grid size)::

	>>> increment_by_one[16, 16](device_arr)

	And the results can be checked by copying back the CUDA buffer to CPU memory::

	>>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
	array([1, 2, 3, 4], dtype=int32)

	Numba to Arrow
	~~~~~~~~~~~~~~

	Conversely, a Numba-created device array can be viewed as an Arrow CUDA buffer,
	using the :meth:`CudaBuffer.from_numba` factory method.

	For the sake of example, let's first create a Numba device array::

	>>> arr = np.arange(10, 14, dtype=np.int32)
	>>> arr
	array([10, 11, 12, 13], dtype=int32)
	>>> device_arr = numba.cuda.to_device(arr)

	Then we can create a CUDA buffer pointing the device array's memory.
	We don't need to pass a CUDA context explicitly this time: the appropriate
	CUDA context is automatically retrieved and adapted from the Numba object.

	::

	>>> cuda_buf = cuda.CudaBuffer.from_numba(device_arr.gpu_data)
	>>> cuda_buf.size
	16
	>>> cuda_buf.address
	30088364032
	>>> cuda_buf.context.device_number
	0

	Of course, we can copy the CUDA buffer back to host memory::

	>>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
	array([10, 11, 12, 13], dtype=int32)

	.. seealso::
	Documentation for Numba's `CUDA support <https://numba.pydata.org/numba-doc/latest/cuda/index.html>`_.