tree: 035a869579a9fdbf84295811bfc9b084488360fb [path history] [tgz]
  1. dask_tests/
  2. hdfs/
  3. functions.sh
  4. parquet_interop.py
  5. README.md
  6. set_env_common.sh
  7. setup_toolchain.sh
  8. test_hdfs.sh
python/testing/README.md

Testing tools for odds and ends

Testing HDFS file interface

./test_hdfs.sh

Testing Dask integration

Initial integration testing with Dask has been Dockerized. To invoke the test run the following command in the arrow root-directory:

bash dev/dask_integration.sh

This script will create a dask directory on the same level as arrow. It will clone the Dask project from Github into dask and do a Python --user install. The Docker code will use the parent directory of arrow as $HOME and that's where Python will install dask into a .local directory.

The output of the Docker session will contain the results of tests of the Dask dataframe followed by the single integration test that now exists for Arrow. That test creates a set of csv-files and then does parallel reading of csv-files into a Dask dataframe. The code for this test resides here in the dask_test directory.