AI_GUIDELINE.md - cloudberry - Git at Google

 <!--
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
 -->

 # Guidelines for AI-assisted Contributions

 Apache Cloudberry follows the ASF Generative Tooling Guidance
 for the use of AI-assisted development tools:

 https://www.apache.org/legal/generative-tooling.html

 This document provides additional project-specific guidance and
 best practices for using AI tools in the Cloudberry community.
 It is intended to supplement ASF guidance, not replace it.

 ## Guidelines

 ### 1. You own the code

 AI-generated code carries the same responsibility as code you
 type yourself. Review it before submitting. If a bug ships,
 "the AI wrote it" is not a defense.

 **Example:** As an experiment, you used an LLM to generate a
 new type of executor node. The results were impressive, and you
 wanted to share them with the community. Before opening a PR,
 read every line, verify the logic, and make sure it fits with
 existing code patterns. Someone might use your code in
 production, not just for experiments.

 ### 2. Same quality bar

 AI-assisted contributions must pass the same review, testing,
 and CI standards as any other code. No shortcuts. AI-generated
 code must come with corresponding tests, or be covered by
 existing ones. If the AI wrote the code, you should at least
 write or carefully verify the tests.

 **Example:** You use an LLM to implement a new aggregate
 function. The PR must include regression tests in `src/test`
 that exercise both normal and edge cases.

 ### 3. Watch the license

 Don't let AI introduce code incompatible with the Apache
 License 2.0. You are responsible for ensuring all submitted
 code — AI-generated or not — has proper licensing.

 See [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html)
 for details.

 **Example:** If an AI tool reproduces a snippet from a
 GPL-licensed project, you must not include it. When in doubt,
 rewrite from scratch.

 ### 4. Flag it

 When your PR includes significant AI-generated code, check the
 AI disclosure box in the PR template. You don't have to
 disclose minor AI assistance (autocomplete, reformatting), but
 be transparent about substantial generation.

 You can also record AI assistance directly in the commit
 message using the optional `Assisted-by:` trailer (one line
 per tool), following the same convention used by the Linux
 kernel. See the
 [Linux kernel coding assistants guidance](https://docs.kernel.org/process/coding-assistants.html)
 for background.

 ```
 Assisted-by: ChatGPT
 Assisted-by: GitHub Copilot
 ```

 This trailer is optional and non-binding — it provides
 lightweight provenance information without making AI
 disclosure a strict requirement.

 **Example:** Using an LLM to autocomplete a single function
 signature — no need for a flag. Using an LLM to generate an
 entire new GUC parameter with validation logic — flag it and
 add an `Assisted-by:` trailer. The flag doesn't mean the PR
 skips review or merge criteria, but it gives reviewers more
 context about the generation method and lets them focus on
 architecture and logic rather than specific operators.

 ### 5. No meaningless code refactoring

 Our core is PostgreSQL, and refactoring work has already been
 done here. Rewriting code significantly complicates rebase.
 Also, refactoring changes the code in a way that forces people
 to relearn code they already know. Keep changes as simple as
 possible.

 **Example:** LLMs are eager to refactor. One day you may be
 asked: "This code is not very good. Do you want to improve
 it?" Of course! It could happen several times. Tokens are
 spent, but what is the point of such refactoring?
 (Rhetorical question)

 ### 6. LLM code review

 Some AI review tools (such as GitHub Copilot Review or
 CodeRabbit) may not currently be available for ASF-hosted
 repositories due to operational, budgetary, or permission
 reasons. Contributors can still use personal AI tools locally
 but are responsible for ensuring code quality, compliance with
 licensing terms, and reviewing outcomes.

 **Example:** One could use GitHub Copilot for automated AI
 code review on pull requests. Here are some important points:

 - Copilot suggestions are **non-binding hints**, not
   requirements.
 - If a suggestion is irrelevant or wrong, skip it — you know
   your code best.
 - If a suggestion catches a real issue, fix it like you would
   for any review comment.
 - Copilot does not replace human reviewers. All PRs still need
   approval from a committer.

 ### 7. Talk to maintainers yourself

 Review discussions should reflect the contributor's own
 understanding and technical judgment. AI tools may assist with
 drafting responses, but contributors should engage
 thoughtfully and personally with reviewers. Maintainers invest
 time reviewing your code; respond in kind.

 **Example:** A reviewer asks "why did you choose this approach
 over X?" — write your own answer explaining the tradeoff,
 don't paste an LLM-generated reply.

 ## AGENTS.md

 [AGENTS.md](https://agents.md/) is a README for agents: a
 dedicated, predictable place to provide context and
 instructions to help AI coding agents work on your project.
 We do not ship a repository-level `AGENTS.md` because the
 right content is platform- and user-specific. If you work with
 AI coding agents locally, create your own `AGENTS.md`. You could
 take the template from the `AGENTS.md.template` file.

 ## Good uses of AI

 - Bug fixing and root cause analysis
 - Code review
 - Writing and improving tests
 - Documentation and code comments
 - Build system and CI improvements
 - Security research and vulnerability scanning
 - Learning the codebase faster

 ## Resources

 - [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html)
   — Official Apache guidance on AI tool usage
 - [GitHub Copilot](https://github.com/features/copilot)
   — AI pair programmer and code reviewer
 - [CodeRabbit](https://www.coderabbit.ai/)
   — Yet another AI pair programmer and code reviewer
 - [AGENTS.md](https://agents.md/)
   — README for agents
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Guidelines for AI-assisted Contributions

	Apache Cloudberry follows the ASF Generative Tooling Guidance
	for the use of AI-assisted development tools:

	https://www.apache.org/legal/generative-tooling.html

	This document provides additional project-specific guidance and
	best practices for using AI tools in the Cloudberry community.
	It is intended to supplement ASF guidance, not replace it.

	## Guidelines

	### 1. You own the code

	AI-generated code carries the same responsibility as code you
	type yourself. Review it before submitting. If a bug ships,
	"the AI wrote it" is not a defense.

	Example: As an experiment, you used an LLM to generate a
	new type of executor node. The results were impressive, and you
	wanted to share them with the community. Before opening a PR,
	read every line, verify the logic, and make sure it fits with
	existing code patterns. Someone might use your code in
	production, not just for experiments.

	### 2. Same quality bar

	AI-assisted contributions must pass the same review, testing,
	and CI standards as any other code. No shortcuts. AI-generated
	code must come with corresponding tests, or be covered by
	existing ones. If the AI wrote the code, you should at least
	write or carefully verify the tests.

	Example: You use an LLM to implement a new aggregate
	function. The PR must include regression tests in `src/test`
	that exercise both normal and edge cases.

	### 3. Watch the license

	Don't let AI introduce code incompatible with the Apache
	License 2.0. You are responsible for ensuring all submitted
	code — AI-generated or not — has proper licensing.

	See [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html)
	for details.

	Example: If an AI tool reproduces a snippet from a
	GPL-licensed project, you must not include it. When in doubt,
	rewrite from scratch.

	### 4. Flag it

	When your PR includes significant AI-generated code, check the
	AI disclosure box in the PR template. You don't have to
	disclose minor AI assistance (autocomplete, reformatting), but
	be transparent about substantial generation.

	You can also record AI assistance directly in the commit
	message using the optional `Assisted-by:` trailer (one line
	per tool), following the same convention used by the Linux
	kernel. See the
	[Linux kernel coding assistants guidance](https://docs.kernel.org/process/coding-assistants.html)
	for background.

	```
	Assisted-by: ChatGPT
	Assisted-by: GitHub Copilot
	```

	This trailer is optional and non-binding — it provides
	lightweight provenance information without making AI
	disclosure a strict requirement.

	Example: Using an LLM to autocomplete a single function
	signature — no need for a flag. Using an LLM to generate an
	entire new GUC parameter with validation logic — flag it and
	add an `Assisted-by:` trailer. The flag doesn't mean the PR
	skips review or merge criteria, but it gives reviewers more
	context about the generation method and lets them focus on
	architecture and logic rather than specific operators.

	### 5. No meaningless code refactoring

	Our core is PostgreSQL, and refactoring work has already been
	done here. Rewriting code significantly complicates rebase.
	Also, refactoring changes the code in a way that forces people
	to relearn code they already know. Keep changes as simple as
	possible.

	Example: LLMs are eager to refactor. One day you may be
	asked: "This code is not very good. Do you want to improve
	it?" Of course! It could happen several times. Tokens are
	spent, but what is the point of such refactoring?
	(Rhetorical question)

	### 6. LLM code review

	Some AI review tools (such as GitHub Copilot Review or
	CodeRabbit) may not currently be available for ASF-hosted
	repositories due to operational, budgetary, or permission
	reasons. Contributors can still use personal AI tools locally
	but are responsible for ensuring code quality, compliance with
	licensing terms, and reviewing outcomes.

	Example: One could use GitHub Copilot for automated AI
	code review on pull requests. Here are some important points:

	- Copilot suggestions are non-binding hints, not
	requirements.
	- If a suggestion is irrelevant or wrong, skip it — you know
	your code best.
	- If a suggestion catches a real issue, fix it like you would
	for any review comment.
	- Copilot does not replace human reviewers. All PRs still need
	approval from a committer.

	### 7. Talk to maintainers yourself

	Review discussions should reflect the contributor's own
	understanding and technical judgment. AI tools may assist with
	drafting responses, but contributors should engage
	thoughtfully and personally with reviewers. Maintainers invest
	time reviewing your code; respond in kind.

	Example: A reviewer asks "why did you choose this approach
	over X?" — write your own answer explaining the tradeoff,
	don't paste an LLM-generated reply.

	## AGENTS.md

	[AGENTS.md](https://agents.md/) is a README for agents: a
	dedicated, predictable place to provide context and
	instructions to help AI coding agents work on your project.
	We do not ship a repository-level `AGENTS.md` because the
	right content is platform- and user-specific. If you work with
	AI coding agents locally, create your own `AGENTS.md`. You could
	take the template from the `AGENTS.md.template` file.

	## Good uses of AI

	- Bug fixing and root cause analysis
	- Code review
	- Writing and improving tests
	- Documentation and code comments
	- Build system and CI improvements
	- Security research and vulnerability scanning
	- Learning the codebase faster

	## Resources

	- [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html)
	— Official Apache guidance on AI tool usage
	- [GitHub Copilot](https://github.com/features/copilot)
	— AI pair programmer and code reviewer
	- [CodeRabbit](https://www.coderabbit.ai/)
	— Yet another AI pair programmer and code reviewer
	- [AGENTS.md](https://agents.md/)
	— README for agents