tree: 003adf82b095fc0bc9ba4c47d837759e63963158 [path history] [tgz]
  1. dask/
  2. data/
  3. hdfs/
  4. pandas/
  5. spark/
  6. turbodbc/

Arrow integration testing

Our strategy for integration testing between Arrow implementations is as follows:

  • Test datasets are specified in a custom human-readable, JSON-based format designed for Arrow

  • Each implementation provides a testing executable capable of converting between the JSON and the binary Arrow file representation

  • The test executable is also capable of validating the contents of a binary file against a corresponding JSON file

Environment setup

The integration test data generator and runner is written in Python and currently requires Python 3.5 or higher. You can create a standalone Python distribution and environment for running the tests by using miniconda. On Linux this is:

bash -b -p miniconda
export PATH=`pwd`/miniconda/bin:$PATH

conda create -n arrow-integration python=3.6 nomkl numpy six
conda activate arrow-integration

If you are on macOS, instead use the URL:


After this, you can follow the instructions in the next section.

Running the existing integration tests

First, build the Java and C++ projects. For Java, you must run

mvn package

Now, the integration tests rely on two environment variables which point to the Java arrow-tool JAR and the build path for the C++ executables:


export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar

Here $ARROW_HOME is the location of your Arrow git clone. The $CPP_BUILD_DIR may be different depending on how you built with CMake (in-source or out-of-source).

Once this is done, run the integration tests with (optionally adding --debug for additional output)


python --debug  # additional output