| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| .. software distributed under the License is distributed on an |
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| .. KIND, either express or implied. See the License for the |
| .. specific language governing permissions and limitations |
| .. under the License. |
| |
| .. default-domain:: cpp |
| |
| .. _cpp-security: |
| |
| ======================= |
| Security Considerations |
| ======================= |
| |
| .. important:: |
| This document describes the security model for using the Arrow C++ APIs. |
| For better understanding of this document, we recommend that you first read |
| the :ref:`overall security model <format_security>` for the Arrow project. |
| |
| API parameter validity |
| ====================== |
| |
| Many Arrow C++ APIs report errors using the :class:`arrow::Status` and |
| :class:`arrow::Result` types. Such APIs can be assumed to detect common errors |
| in the provided arguments. However, there are also often implicit pre-conditions |
| that have to be upheld; these can usually be deduced from the semantics of an |
| API as described by its documentation. |
| |
| .. seealso:: Arrow C++ :ref:`cpp-conventions` |
| |
| Pointer validity |
| ---------------- |
| |
| Pointers are always assumed to be valid and point to memory of the size required |
| by the API. In particular, it is *forbidden to pass a null pointer* except where |
| the API documentation explicitly says otherwise. |
| |
| Type restrictions |
| ----------------- |
| |
| Some APIs are specified to operate on specific Arrow data types and may not |
| verify that their arguments conform to the expected data types. Passing the |
| wrong kind of data as input may lead to undefined behavior. |
| |
| .. _cpp-valid-data: |
| |
| Data validity |
| ------------- |
| |
| Arrow data, for example passed as :class:`arrow::Array` or :class:`arrow::Table`, |
| is always assumed to be :ref:`valid <format-invalid-data>`. If your program may |
| encounter invalid data, it must explicitly check its validity by calling one of |
| the following validation APIs. |
| |
| Structural validity |
| ''''''''''''''''''' |
| |
| The ``Validate`` methods exposed on various Arrow C++ classes perform relatively |
| inexpensive validity checks that the data is structurally valid. This implies |
| checking the number of buffers, child arrays, and other similar conditions. |
| |
| * :func:`arrow::Array::Validate` |
| * :func:`arrow::RecordBatch::Validate` |
| * :func:`arrow::ChunkedArray::Validate` |
| * :func:`arrow::Table::Validate` |
| * :func:`arrow::Scalar::Validate` |
| |
| These checks typically are constant-time against the number of rows in the data, |
| but linear in the number of descendant fields. They can be good enough to detect |
| potential bugs in your own code. However, they are not enough to detect all classes of |
| invalid data, and they won't protect against all kinds of malicious payloads. |
| |
| Full validity |
| ''''''''''''' |
| |
| The ``ValidateFull`` methods exposed by the same classes perform the same validity |
| checks as the ``Validate`` methods, but they also check the data extensively for |
| any non-conformance to the Arrow spec. In particular, they check all the offsets |
| of variable-length data types, which is of fundamental importance when ingesting |
| untrusted data from sources such as the IPC format (otherwise the variable-length |
| offsets could point outside of the corresponding data buffer). They also check |
| for invalid values, such as invalid UTF-8 strings or decimal values out of range |
| for the advertised precision. |
| |
| * :func:`arrow::Array::ValidateFull` |
| * :func:`arrow::RecordBatch::ValidateFull` |
| * :func:`arrow::ChunkedArray::ValidateFull` |
| * :func:`arrow::Table::ValidateFull` |
| * :func:`arrow::Scalar::ValidateFull` |
| |
| "Safe" and "unsafe" APIs |
| ------------------------ |
| |
| Some APIs are exposed in both "safe" and "unsafe" variants. The naming convention |
| for such pairs varies: sometimes the former has a ``Safe`` suffix (for example |
| ``SliceSafe`` vs. ``Slice``), sometimes the latter has an ``Unsafe`` prefix or |
| suffix (for example ``Append`` vs. ``UnsafeAppend``). |
| |
| In all cases, the "unsafe" API is intended as a more efficient API that |
| eschews some of the checks that the "safe" API performs. It is then up to the |
| caller to ensure that the preconditions are met, otherwise undefined behavior |
| may ensue. |
| |
| The API documentation usually spells out the differences between "safe" and "unsafe" |
| variants, but these typically fall into two categories: |
| |
| * structural checks, such as passing the right Arrow data type or numbers of buffers; |
| * allocation size checks, such as having preallocated enough data for the given input |
| arguments (this is typical of the :ref:`array builders <cpp-api-array-builders>` |
| and :ref:`buffer builders <cpp-api-buffer-builders>`). |
| |
| Ingesting untrusted data |
| ======================== |
| |
| As an exception to the above (see :ref:`cpp-valid-data`), some APIs support ingesting |
| untrusted, potentially malicious data. These are: |
| |
| * the :ref:`IPC reader <cpp-ipc-reading>` APIs |
| * the :ref:`Parquet reader <cpp-parquet-reading>` APIs |
| * the :ref:`CSV reader <cpp-csv-reading>` APIs |
| |
| IPC and Parquet readers |
| ----------------------- |
| |
| You must not assume that these will always return valid Arrow data. The reason |
| for not validating data automatically is that validation can be expensive but |
| unnecessary when reading from trusted data sources. |
| |
| Instead, when using these APIs with potentially invalid data (such as data coming |
| from an untrusted source), you **must** follow these steps: |
| |
| 1. Check any error returned by the API, as with any other API |
| 2. If the API returned successfully, validate the returned Arrow data in full |
| (see "Full validity" above) |
| |
| CSV reader |
| ---------- |
| |
| With the default :class:`conversion options <arrow::csv::ConvertOptions>`, |
| the CSV reader will either return valid Arrow data or error out. Some options, |
| however, allow relaxing the corresponding checks in favor of performance. |