tree: 643b5d50eacc9e2f349d48a11be300ae863116d4 [path history] [tgz]
  1. benchmarks.env
  2. hooks.sh
  3. README.md
dev/conbench_envs/README.md

Benchmark Builds Env and Hooks

This directory contains:

  • benchmarks.env - list of env vars used for building Arrow C++/Python/R/Java/JavaScript and running benchmarks using conbench.
  • hooks.sh - hooks used by @ursabot benchmark builds that are triggered by @ursabot please benchmark PR comments.

How to add or update Arrow build and run env vars used by @ursabot benchmark builds

  1. Create apache/arrow PR
  2. Update or add env var value in benchmarks.env
  3. Add @ursabot please benchmark comment to PR
  4. Once benchmark builds are done, benchmark results can be viewed via compare/runs links in the PR comment where
  • baseline = PR base HEAD commit with unaltered /dev/conbench_envs/benchmarks.env
  • contender = PR branch HEAD commit with overridden /dev/conbench_envs/benchmarks.env

Why do@ursabot benchmark builds need hooks.sh?

@ursabot benchmark builds are maintained in Ursa‘s private repo. Benchmark builds use hooks.sh functions as hooks to create conda env with Arrow dependencies and build Arrow C++/Python/R/Java/JavaScript from source for a specific Arrow repo’s commit.

Defining hooks in Arrow repo allows benchmark builds for a specific commit to be compatible with the files/scripts in that commit which are used for installing Arrow dependencies and building Arrow. This allows Arrow contributors to asses the perfomance implications of different build options, dependency versions, etc by updating hooks.sh.

Can other repos and services use benchmarks.env and hooks.sh?

Yes, other repos and services are welcome to use benchmarks.env and hooks.sh as long as

  • existing hooks are not removed or renamed.
  • function definitions for exiting hooks can only be updated in the Arrow commit where Arrow build scripts or files with dependencies have been renamed, moved or added.
  • benchmark builds are run using @ursabot please benchmark PR comment to confirm that function definition updates do not break benchmark builds.

How can other repos and services use benchmarks.env and hooks.sh to setup benchmark env?

Here are steps how @ursabot benchmark builds use benchmarks.env and hooks.sh to setup benchmarking env on Ubuntu:

1. Install Arrow dependencies

sudo su
apt-get update -y -q && \
    apt-get install -y -q --no-install-recommends \
        autoconf \
        ca-certificates \
        ccache \
        cmake \
        g++ \
        gcc \
        gdb \
        git \
        libbenchmark-dev \
        libboost-filesystem-dev \
        libboost-regex-dev \
        libboost-system-dev \
        libbrotli-dev \
        libbz2-dev \
        libgflags-dev \
        libcurl4-openssl-dev \
        libgoogle-glog-dev \
        liblz4-dev \
        libprotobuf-dev \
        libprotoc-dev \
        libre2-dev \
        libsnappy-dev \
        libssl-dev \
        libthrift-dev \
        libutf8proc-dev \
        libzstd-dev \
        make \
        ninja-build \
        pkg-config \
        protobuf-compiler \
        rapidjson-dev \
        tzdata \
        wget && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists*

apt-get update -y -q && \
    apt-get install -y -q \
        python3 \
        python3-pip \
        python3-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

2. Install Arrow dependencies for Java

sudo su
apt-get install openjdk-8-jdk
apt-get install maven

Verify that you have at least these versions of java, javac and maven:

# java -version
openjdk version "1.8.0_292"
..
# javac -version
javac 1.8.0_292
...
# mvn -version
Apache Maven 3.6.3
...

3. Install Arrow dependencies for Java Script

sudo apt update
sudo apt -y upgrade
sudo apt update
sudo apt -y install curl dirmngr apt-transport-https lsb-release ca-certificates
curl -fsSL https://deb.nodesource.com/setup_14.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo apt -y install yarn
sudo apt -y install gcc g++ make

Verify that you have at least these versions of node and yarn:

# node --version
v14.17.2
...
# yarn --version
1.22.5
...

4. Install Conda

sudo apt install curl
curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo bash Miniconda3-latest-Linux-x86_64.sh

5. Set env vars:

export ARROW_REPO=https://github.com/apache/arrow.git
export BENCHMARKABLE=e6e9e6ea52b7a8f2682ffc4160168c936ca1d3e6
export BENCHMARKABLE_TYPE=arrow-commit
export PYTHON_VERSION=3.8
export CONBENCH_EMAIL=...
export CONBENCH_URL="https://conbench.ursa.dev"
export CONBENCH_PASSWORD=...
export MACHINE=...

6. Use create_conda_env_with_arrow_python hook to create conda env and build Arrow C++ and Arrow Python

git clone "${ARROW_REPO}"
pushd arrow
git fetch -v --prune -- origin "${BENCHMARKABLE}"
git checkout -f "${BENCHMARKABLE}"
source dev/conbench_envs/hooks.sh create_conda_env_with_arrow_python
popd

7. Install conbench

git clone https://github.com/ursacomputing/conbench.git
pushd conbench
pip install -r requirements-cli.txt
pip install -U PyYAML
python setup.py install
popd

8. Setup benchmarks repo

git clone https://github.com/ursacomputing/benchmarks.git
pushd benchmarks
python setup.py develop
popd

9. Setup conbench credentials

pushd benchmarks
touch .conbench
echo "url: $CONBENCH_URL" >> .conbench
echo "email: $CONBENCH_EMAIL" >> .conbench
echo "password: $CONBENCH_PASSWORD" >> .conbench
echo "host_name: $MACHINE" >> .conbench
popd

10. Run Python benchmarks

cd benchmarks
conbench file-read ALL --iterations=3 --all=true --drop-caches=true 

11. Use install_archery hook to setup archery and run C++ benchmarks

pushd arrow
source dev/conbench_envs/hooks.sh install_archery
popd
cd benchmarks
conbench cpp-micro --iterations=1

12. Use build_arrow_r hook to build Arrow R and run R benchmarks

pushd arrow
source dev/conbench_envs/hooks.sh build_arrow_r
popd
R -e "remotes::install_github('ursacomputing/arrowbench')"
cd benchmarks
conbench dataframe-to-table ALL --iterations=3 --drop-caches=true --language=R

13. Use build_arrow_java and install_archery hooks to build Arrow Java and run Java benchmarks

pushd arrow
source dev/conbench_envs/hooks.sh build_arrow_java
source dev/conbench_envs/hooks.sh install_archery
popd
cd benchmarks
conbench java-micro --iterations=1

14. Use install_java_script_project_dependencies hook to install Java Script dependencies and run Java Script benchmarks

pushd arrow
source dev/conbench_envs/hooks.sh install_java_script_project_dependencies
popd
cd benchmarks
conbench js-micro