CONTRIBUTING.md - arrow-rs - Git at Google

 <!---
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
 -->

 ## Introduction

 We welcome and encourage contributions of all kinds, such as:

 1. Tickets with issue reports of feature requests
 2. Documentation improvements
 3. Code (PR or PR Review)

 In addition to submitting new PRs, we have a healthy tradition of community members helping review each other's PRs. Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.

 ## Developer's guide to Arrow Rust

 ### Setting Up Your Build Environment

 Install the Rust tool chain:

 https://www.rust-lang.org/tools/install

 Also, make sure your Rust tool chain is up-to-date, because we always use the latest stable version of Rust to test this project.

 ```bash
 rustup update stable
 ```

 ### How to compile

 This is a standard cargo project with workspaces. To build it, you need to have `rust` and `cargo`:

 ```bash
 cargo build
 ```

 You can also use rust's official docker image:

 ```bash
 docker run --rm -v $(pwd):/arrow-rs -it rust /bin/bash -c "cd /arrow-rs && rustup component add rustfmt && cargo build"
 ```

 The command above assumes that are in the root directory of the project, not in the same
 directory as this README.md.

 You can also compile specific workspaces:

 ```bash
 cd arrow && cargo build
 ```

 ### Git Submodules

 Before running tests and examples, it is necessary to set up the local development environment.

 The tests rely on test data that is contained in git submodules.

 To pull down this data run the following:

 ```bash
 git submodule update --init
 ```

 This populates data in two git submodules:

 - `../parquet-testing/data` (sourced from https://github.com/apache/parquet-testing.git)
 - `../testing` (sourced from https://github.com/apache/arrow-testing)

 By default, `cargo test` will look for these directories at their
 standard location. The following environment variables can be used to override the location:

 ```bash
 # Optionally specify a different location for test data
 export PARQUET_TEST_DATA=$(cd ../parquet-testing/data; pwd)
 export ARROW_TEST_DATA=$(cd ../testing/data; pwd)
 ```

 From here on, this is a pure Rust project and `cargo` can be used to run tests, benchmarks, docs and examples as usual.

 ## Running the tests

 Run tests using the Rust standard `cargo test` command:

 ```bash
 # run all unit and integration tests
 cargo test

 # run tests for the arrow crate
 cargo test -p arrow
 ```

 For some changes, you may want to run additional tests. You can find up-to-date information on the current CI tests in [.github/workflows](https://github.com/apache/arrow-rs/tree/master/.github/workflows). Here are some examples of additional tests you may want to run:

 ```bash
 # run tests for the parquet crate
 cargo test -p parquet

 # run arrow tests with all features enabled
 cargo test -p arrow --all-features

 # run the doc tests
 cargo test --doc
 ```

 ## Code Formatting

 Our CI uses `rustfmt` to check code formatting. Before submitting a
 PR be sure to run the following and check for lint issues:

 ```bash
 cargo +stable fmt --all -- --check
 ```

 ## Breaking Changes

 Our [release schedule] allows breaking API changes only in major releases.
 This means that if your PR has a breaking API change, it should be marked as
 `api-change` and it will not be merged until development opens for the next
 major release. See [this ticket] for details.

 [release schedule]: README.md#release-versioning-and-schedule
 [this ticket]: https://github.com/apache/arrow-rs/issues/5907

 ## Clippy Lints

 We use `clippy` for checking lints during development, and CI runs `clippy` checks.

 Run the following to check for `clippy` lints:

 ```bash
 # run clippy with default settings
 cargo clippy --workspace --all-targets --all-features -- -D warnings

 ```

 If you use Visual Studio Code with the `rust-analyzer` plugin, you can enable `clippy` to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.

 One of the concerns with `clippy` is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a `clippy` lint, you may disable the lint and briefly justify it.

 Search for `allow(clippy::` in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.

 - If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
 - If you have several lints on a function or module, you may disable the lint on the function or module.
 - If a lint is pervasive across multiple modules, you may disable it at the crate level.

 ## Running Benchmarks

 Running benchmarks are a good way to test the performance of a change. As benchmarks usually take a long time to run, we recommend running targeted tests instead of the full suite.

 ```bash
 # run all benchmarks
 cargo bench

 # run arrow benchmarks
 cargo bench -p arrow

 # run benchmark for the parse_time function within the arrow-cast crate
 cargo bench -p arrow-cast --bench parse_time
 ```

 To set the baseline for your benchmarks, use the --save-baseline flag:

 ```bash
 git checkout master

 cargo bench --bench parse_time -- --save-baseline master

 git checkout feature

 cargo bench --bench parse_time -- --baseline master
 ```

 ## Git Pre-Commit Hook

 We can use [git pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) to automate various kinds of git pre-commit checking/formatting.

 Suppose you are in the root directory of the project.

 First check if the file already exists:

 ```bash
 ls -l .git/hooks/pre-commit
 ```

 If the file already exists, to avoid mistakenly **overriding**, you MAY have to check
 the link source or file content. Else if not exist, let's safely soft link [pre-commit.sh](pre-commit.sh) as file `.git/hooks/pre-commit`:

 ```bash
 ln -s  ../../pre-commit.sh .git/hooks/pre-commit
 ```

 If sometimes you want to commit without checking, just run `git commit` with `--no-verify`:

 ```bash
 git commit --no-verify -m "... commit message ..."
 ```
	<!---
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	## Introduction

	We welcome and encourage contributions of all kinds, such as:

	1. Tickets with issue reports of feature requests
	2. Documentation improvements
	3. Code (PR or PR Review)

	In addition to submitting new PRs, we have a healthy tradition of community members helping review each other's PRs. Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.

	## Developer's guide to Arrow Rust

	### Setting Up Your Build Environment

	Install the Rust tool chain:

	https://www.rust-lang.org/tools/install

	Also, make sure your Rust tool chain is up-to-date, because we always use the latest stable version of Rust to test this project.

	```bash
	rustup update stable
	```

	### How to compile

	This is a standard cargo project with workspaces. To build it, you need to have `rust` and `cargo`:

	```bash
	cargo build
	```

	You can also use rust's official docker image:

	```bash
	docker run --rm -v $(pwd):/arrow-rs -it rust /bin/bash -c "cd /arrow-rs && rustup component add rustfmt && cargo build"
	```

	The command above assumes that are in the root directory of the project, not in the same
	directory as this README.md.

	You can also compile specific workspaces:

	```bash
	cd arrow && cargo build
	```

	### Git Submodules

	Before running tests and examples, it is necessary to set up the local development environment.

	The tests rely on test data that is contained in git submodules.

	To pull down this data run the following:

	```bash
	git submodule update --init
	```

	This populates data in two git submodules:

	- `../parquet-testing/data` (sourced from https://github.com/apache/parquet-testing.git)
	- `../testing` (sourced from https://github.com/apache/arrow-testing)

	By default, `cargo test` will look for these directories at their
	standard location. The following environment variables can be used to override the location:

	```bash
	# Optionally specify a different location for test data
	export PARQUET_TEST_DATA=$(cd ../parquet-testing/data; pwd)
	export ARROW_TEST_DATA=$(cd ../testing/data; pwd)
	```

	From here on, this is a pure Rust project and `cargo` can be used to run tests, benchmarks, docs and examples as usual.

	## Running the tests

	Run tests using the Rust standard `cargo test` command:

	```bash
	# run all unit and integration tests
	cargo test

	# run tests for the arrow crate
	cargo test -p arrow
	```

	For some changes, you may want to run additional tests. You can find up-to-date information on the current CI tests in [.github/workflows](https://github.com/apache/arrow-rs/tree/master/.github/workflows). Here are some examples of additional tests you may want to run:

	```bash
	# run tests for the parquet crate
	cargo test -p parquet

	# run arrow tests with all features enabled
	cargo test -p arrow --all-features

	# run the doc tests
	cargo test --doc
	```

	## Code Formatting

	Our CI uses `rustfmt` to check code formatting. Before submitting a
	PR be sure to run the following and check for lint issues:

	```bash
	cargo +stable fmt --all -- --check
	```

	## Breaking Changes

	Our [release schedule] allows breaking API changes only in major releases.
	This means that if your PR has a breaking API change, it should be marked as
	`api-change` and it will not be merged until development opens for the next
	major release. See [this ticket] for details.

	[release schedule]: README.md#release-versioning-and-schedule
	[this ticket]: https://github.com/apache/arrow-rs/issues/5907

	## Clippy Lints

	We use `clippy` for checking lints during development, and CI runs `clippy` checks.

	Run the following to check for `clippy` lints:

	```bash
	# run clippy with default settings
	cargo clippy --workspace --all-targets --all-features -- -D warnings

	```

	If you use Visual Studio Code with the `rust-analyzer` plugin, you can enable `clippy` to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.

	One of the concerns with `clippy` is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a `clippy` lint, you may disable the lint and briefly justify it.

	Search for `allow(clippy::` in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.

	- If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
	- If you have several lints on a function or module, you may disable the lint on the function or module.
	- If a lint is pervasive across multiple modules, you may disable it at the crate level.

	## Running Benchmarks

	Running benchmarks are a good way to test the performance of a change. As benchmarks usually take a long time to run, we recommend running targeted tests instead of the full suite.

	```bash
	# run all benchmarks
	cargo bench

	# run arrow benchmarks
	cargo bench -p arrow

	# run benchmark for the parse_time function within the arrow-cast crate
	cargo bench -p arrow-cast --bench parse_time
	```

	To set the baseline for your benchmarks, use the --save-baseline flag:

	```bash
	git checkout master

	cargo bench --bench parse_time -- --save-baseline master

	git checkout feature

	cargo bench --bench parse_time -- --baseline master
	```

	## Git Pre-Commit Hook

	We can use [git pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) to automate various kinds of git pre-commit checking/formatting.

	Suppose you are in the root directory of the project.

	First check if the file already exists:

	```bash
	ls -l .git/hooks/pre-commit
	```

	If the file already exists, to avoid mistakenly overriding, you MAY have to check
	the link source or file content. Else if not exist, let's safely soft link [pre-commit.sh](pre-commit.sh) as file `.git/hooks/pre-commit`:

	```bash
	ln -s ../../pre-commit.sh .git/hooks/pre-commit
	```

	If sometimes you want to commit without checking, just run `git commit` with `--no-verify`:

	```bash
	git commit --no-verify -m "... commit message ..."
	```