dev/README.md - arrow - Git at Google

 <!--
   ~ Licensed to the Apache Software Foundation (ASF) under one
   ~ or more contributor license agreements.  See the NOTICE file
   ~ distributed with this work for additional information
   ~ regarding copyright ownership.  The ASF licenses this file
   ~ to you under the Apache License, Version 2.0 (the
   ~ "License"); you may not use this file except in compliance
   ~ with the License.  You may obtain a copy of the License at
   ~
   ~   http://www.apache.org/licenses/LICENSE-2.0
   ~
   ~ Unless required by applicable law or agreed to in writing,
   ~ software distributed under the License is distributed on an
   ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   ~ KIND, either express or implied.  See the License for the
   ~ specific language governing permissions and limitations
   ~ under the License.
   -->

 # Arrow Developer Scripts

 This directory contains scripts useful to developers when packaging,
 testing, or committing to Arrow.

 Merging a pull request requires being a committer on the project. In addition
 you need to have linked your GitHub and ASF accounts on
 https://gitbox.apache.org/setup/ to be able to push to GitHub as the main
 remote.

 NOTE: It may take some time (a few hours) between when you complete
 the setup at GitBox, and when your GitHub account will be added as a
 committer.

 ## How to Merge a Pull Request

 Please don't merge PRs using the GitHub Web interface. Instead, run
 the following command:

 ```bash
 dev/merge_arrow_pr.sh
 ```

 This creates a new Python virtual environment under `dev/.venv[PY_VERSION]`
 and installs all the necessary dependencies to run the Arrow merge script.
 After installed, it runs the merge script.

 (We don't provide a wrapper script for Windows yet, so under Windows
 you'll have to install Python dependencies yourself and then run
 `dev/merge_arrow_pr.py` directly.)

 The merge script requires tokens for access control. There are two options
 for configuring your tokens: environment variables or a configuration file.

 > Note: Arrow and Parquet only requires a GitHub token.

 #### Pass tokens via Environment Variables

 The merge script uses the GitHub REST API. You must set a
 `GH_TOKEN` environment variable to use a
 [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).
 You need to add `workflow` scope to the Personal Access Token.

 #### Pass tokens via configuration file

 ```
 cp ./merge.conf.sample ~/.config/arrow/merge.conf
 ```
 Update your new `merge.conf` file with your Personal Access Tokens.

 Example output:

 ```text
 Which pull request would you like to merge? (e.g. 34):
 ```

 Type the pull request number (from
 https://github.com/apache/arrow/pulls) and hit enter:

 ```text
 === Pull Request #X ===
 title	GH-#Y: [Component] Title
 source	repo/branch
 target	master
 url	https://api.github.com/apache/arrow/pulls/X
 === GITHUB #Y ===
 Summary		[Component] Title
 Assignee	Name
 Components	Python
 Status		open
 URL		https://github.com/apache/arrow/issues/Y

 Proceed with merging pull request #X? (y/n): y
 ```

 If this looks good, type `y` and hit enter:

 ```text
 Author 1: Name
 Pull request #X merged!
 Merge hash: #hash

 Would you like to update the associated issue? (y/n): y
 Enter fix version [11.0.0]:
 ```

 You can just hit enter and the associated GitHub issue
 will be resolved with the current fix version.

 ```text
 Successfully resolved #Y!
 === GITHUB #Y ===
 Summary		[Component] Title
 Assignee	Name
 Components	Python
 Status		closed
 URL		https://github.com/apache/arrow/issues/Y
 ```

 # Integration testing

 Build the following base image used by multiple tests:

 ```shell
 docker build -t arrow_integration_xenial_base -f docker_common/Dockerfile.xenial.base .
 ```

 ## HDFS C++ / Python support

 ```shell
 docker compose build conda-cpp
 docker compose build conda-python
 docker compose build conda-python-hdfs
 docker compose run --rm conda-python-hdfs
 ```

 ## Apache Spark Integration Tests

 Tests can be run to ensure that the current snapshot of Java and Python Arrow
 works with Spark. This will run a docker image to build Arrow C++
 and Python in a Conda environment, build and install Arrow Java to the local
 Maven repository, build Spark with the new Arrow artifact, and run Arrow
 related unit tests in Spark for Java and Python. Any errors will exit with a
 non-zero value. To run, use the following command:

 ```shell
 docker compose build conda-cpp
 docker compose build conda-python
 docker compose build conda-python-spark
 docker compose run --rm conda-python-spark
 ```

 If you already are building Spark, these commands will map your local Maven
 repo to the image and save time by not having to download all dependencies.
 Be aware, that docker write files as root, which can cause problems for maven
 on the host.

 ```shell
 docker compose run --rm -v $HOME/.m2:/root/.m2 conda-python-spark
 ```

 NOTE: If the Java API has breaking changes, a patched version of Spark might
 need to be used to successfully build.
	<!--
	~ Licensed to the Apache Software Foundation (ASF) under one
	~ or more contributor license agreements. See the NOTICE file
	~ distributed with this work for additional information
	~ regarding copyright ownership. The ASF licenses this file
	~ to you under the Apache License, Version 2.0 (the
	~ "License"); you may not use this file except in compliance
	~ with the License. You may obtain a copy of the License at
	~
	~ http://www.apache.org/licenses/LICENSE-2.0
	~
	~ Unless required by applicable law or agreed to in writing,
	~ software distributed under the License is distributed on an
	~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	~ KIND, either express or implied. See the License for the
	~ specific language governing permissions and limitations
	~ under the License.
	-->

	# Arrow Developer Scripts

	This directory contains scripts useful to developers when packaging,
	testing, or committing to Arrow.

	Merging a pull request requires being a committer on the project. In addition
	you need to have linked your GitHub and ASF accounts on
	https://gitbox.apache.org/setup/ to be able to push to GitHub as the main
	remote.

	NOTE: It may take some time (a few hours) between when you complete
	the setup at GitBox, and when your GitHub account will be added as a
	committer.

	## How to Merge a Pull Request

	Please don't merge PRs using the GitHub Web interface. Instead, run
	the following command:

	```bash
	dev/merge_arrow_pr.sh
	```

	This creates a new Python virtual environment under `dev/.venv[PY_VERSION]`
	and installs all the necessary dependencies to run the Arrow merge script.
	After installed, it runs the merge script.

	(We don't provide a wrapper script for Windows yet, so under Windows
	you'll have to install Python dependencies yourself and then run
	`dev/merge_arrow_pr.py` directly.)

	The merge script requires tokens for access control. There are two options
	for configuring your tokens: environment variables or a configuration file.

	> Note: Arrow and Parquet only requires a GitHub token.

	#### Pass tokens via Environment Variables

	The merge script uses the GitHub REST API. You must set a
	`GH_TOKEN` environment variable to use a
	[Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).
	You need to add `workflow` scope to the Personal Access Token.

	#### Pass tokens via configuration file

	```
	cp ./merge.conf.sample ~/.config/arrow/merge.conf
	```
	Update your new `merge.conf` file with your Personal Access Tokens.

	Example output:

	```text
	Which pull request would you like to merge? (e.g. 34):
	```

	Type the pull request number (from
	https://github.com/apache/arrow/pulls) and hit enter:

	```text
	=== Pull Request #X ===
	title GH-#Y: [Component] Title
	source repo/branch
	target master
	url https://api.github.com/apache/arrow/pulls/X
	=== GITHUB #Y ===
	Summary [Component] Title
	Assignee Name
	Components Python
	Status open
	URL https://github.com/apache/arrow/issues/Y

	Proceed with merging pull request #X? (y/n): y
	```

	If this looks good, type `y` and hit enter:

	```text
	Author 1: Name
	Pull request #X merged!
	Merge hash: #hash

	Would you like to update the associated issue? (y/n): y
	Enter fix version [11.0.0]:
	```

	You can just hit enter and the associated GitHub issue
	will be resolved with the current fix version.

	```text
	Successfully resolved #Y!
	=== GITHUB #Y ===
	Summary [Component] Title
	Assignee Name
	Components Python
	Status closed
	URL https://github.com/apache/arrow/issues/Y
	```

	# Integration testing

	Build the following base image used by multiple tests:

	```shell
	docker build -t arrow_integration_xenial_base -f docker_common/Dockerfile.xenial.base .
	```

	## HDFS C++ / Python support

	```shell
	docker compose build conda-cpp
	docker compose build conda-python
	docker compose build conda-python-hdfs
	docker compose run --rm conda-python-hdfs
	```

	## Apache Spark Integration Tests

	Tests can be run to ensure that the current snapshot of Java and Python Arrow
	works with Spark. This will run a docker image to build Arrow C++
	and Python in a Conda environment, build and install Arrow Java to the local
	Maven repository, build Spark with the new Arrow artifact, and run Arrow
	related unit tests in Spark for Java and Python. Any errors will exit with a
	non-zero value. To run, use the following command:

	```shell
	docker compose build conda-cpp
	docker compose build conda-python
	docker compose build conda-python-spark
	docker compose run --rm conda-python-spark
	```

	If you already are building Spark, these commands will map your local Maven
	repo to the image and save time by not having to download all dependencies.
	Be aware, that docker write files as root, which can cause problems for maven
	on the host.

	```shell
	docker compose run --rm -v $HOME/.m2:/root/.m2 conda-python-spark
	```

	NOTE: If the Java API has breaking changes, a patched version of Spark might
	need to be used to successfully build.