The C++ cookbook combines output from a set of C++ test programs with an reStructuredText (RST) document tree rendered with Sphinx.
Running make cpp
from the cookbook root directory (the one where the README.rst
exists) will compile the test code, run the tests to generate the output, and will compile the cookbook to HTML.
You will see the compiled result inside the build/cpp
directory.
The above process requires conda to be installed and is primarily intended for build systems. See below for more information on setting up a development environment for developing recipes.
Every recipe is a combination of prose written in RST format using the Sphinx documentation system and a snippet of a googletest test.
New recipes can be added to one of the existing .rst
files if they suit that section or you can create new sections by adding additional .rst
files in the source
directory. You just need to remember to add them to the index.rst
file in the toctree
for them to become visible.
Most recipes will reference a snippet of C++ code. For simplicity a custom recipe
directive that can be used like so:
.. recipe:: ../code/creating_arrow_objects.cc CreatingArrays :caption: Creating an array from C++ primitives :dedent: 4
Each recipe
directive has two required arguments. The first is a path to the file containing the source file containing the snippet and the second is the name of the snippet and must correspond to a set of CreateRecipe/EndRecipe calls in the source file.
The directive will generate two code blocks in the cookbook. The first code block will contain the source code itself and will be annotated with any (optional) caption specified on the recipe directive. The second block will contain the test output.
The optional dedent
argument should be used to remove leading white space from your source code.
Each snippet source file contains a set of googletest tests. Feel free to use any googletest features needed to help setup and verify your test. To reference a snippet you need to surround it in BeginRecipe
and EndRecipe
calls. For example:
StartRecipe("CreatingArrays"); arrow::Int32Builder builder; ASSERT_OK(builder.Append(1)); ASSERT_OK(builder.Append(2)); ASSERT_OK(builder.Append(3)); ASSERT_OK_AND_ASSIGN(shared_ptr<arrow::Array> arr, builder.Finish()) rout << arr->ToString() << endl; EndRecipe("CreatingArrays");
The variable rout
is set to a std::ostream
instance that is used to capture test output. Anything output to rout
will show up in the recipe output block when the recipe is rendered into the cookbook.
The Arrow project has its own documentation for the C++ implementation that is hosted at https://arrow.apache.org/docs/cpp/index.html. Fortunately, this documentation is also built with Sphinx and so we can use the extension intersphinx
to reference sections of this documentation. To do so simply write a standard Sphinx reference like so:
Typed subclasses of :cpp:class:`arrow::ArrayBuilder` make it easy to efficiently create Arrow arrays from existing C++ data
A helpful command is python -msphinx.ext.intersphinx https://arrow.apache.org/docs/objects.inv
which will list all of the possible targets to link to.
Running make
at the top level can be rather slow as it will rebuild the entire environment each time. It is primarily intended for use in CI and requires you to have conda installed.
For recipe development you are encouraged to create your own out-of-source cmake build. For example:
mkdir cpp/code/build cd cpp/code/build cmake ../code -DCMAKE_BUILD_TYPE=Debug cmake --build . ctest
Then you can rerun all of the tests with ctest
and you can rebuild and rerun individual tests much more quickly with something like cmake --build . --target creating_arrow_objects && ctest creating_arrow_objects
. Everytime the cmake build is run it will update the recipe output file referenced by the sphinx build so after rerunning a test you can visualize the output by running make html
in the cpp
directory.
If you are using conda then there is file cpp/requirements.yml
which can be used to create an environment for recipe development using the latest stable Arrow version with the command:
conda env create -f cpp/environment.yml
There may be a conda-lock file available for your platform. Use this instead to avoid having to perform the dependency resolution solve.
conda create -n cookbook-cpp --file cpp/conda-osx-arm64.lock
To update dependencies modify cpp/requirements.yml
and then run
cd cpp conda-lock --file environment.yml --kind explicit -p linux-aarch64 -p linux-64 -p osx-arm64
You can also create a conda environment to test your cookbooks against the Arrow Nightly builds using the file cpp/dev.yml
. Using the command:
conda env create -f cpp/dev.yml
This will create a conda environment called cookbook-cpp-dev instead.
The entire document should serve as an example of how to use Arrow C++, not just the referenced snippets. This means that the below style rules and guidelines apply to source code that is not referenced by the cookbook itself.
This cookbook follows the same style rules as Arrow C++ which is the Google style guide with a few exceptions described here
The examples should be as simple as possible. If complex code (e.g. templates) can be used to do something more efficiently then there should be a simple, inefficient version alongside the more complex version.
Do not use auto
in any of the templates unless you must (e.g. lambdas). Cookbook viewers will be using a browser, not an IDE, and it is not always simple to determine the inferred type.
C++ is not, at the moment, a “notebook friendly” language and it does lend itself well to being embedded inside an RST file. As such, we use a custom directive to link the Googletest source files and the RST prose. The directive works with the helper methods BeginRecipe
and EndRecipe
defined in common.h
.
The helper method BeginRecipe
will begin capturing output to rout
. The helper method EndRecipe
will append the captured output and recipe name to string arrays. There is code in main.cc
which runs after the tests run to dump these arrays to a .arrow file (i.e. the arrays will be serialized as a table using the Arrow IPC format).
When the sphinx build runs the directive recipe
(defined in cpp/ext
) will be loaded. During this load the dataset of test outputs will be read. These test outputs will be used whenever a recipe is referenced.
All participation in the Apache Arrow project is governed by the Apache Software Foundation’s code of conduct.