./test_hdfs.sh
Initial integration testing with Dask has been Dockerized. To invoke the test run the following command in the arrow
root-directory:
bash dev/dask_integration.sh
This script will create a dask
directory on the same level as arrow
. It will clone the Dask project from Github into dask
and do a Python --user
install. The Docker code will use the parent directory of arrow
as $HOME
and that's where Python will install dask
into a .local
directory.
The output of the Docker session will contain the results of tests of the Dask dataframe followed by the single integration test that now exists for Arrow. That test creates a set of csv
-files and then does parallel reading of csv
-files into a Dask dataframe. The code for this test resides here in the dask_test
directory.