blob: 6a8c684c04d3f079a4901e32beb32227dc0da43c [file] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Guidelines for AI-assisted Contributions
Apache Cloudberry follows the ASF Generative Tooling Guidance
for the use of AI-assisted development tools:
https://www.apache.org/legal/generative-tooling.html
This document provides additional project-specific guidance and
best practices for using AI tools in the Cloudberry community.
It is intended to supplement ASF guidance, not replace it.
## Guidelines
### 1. You own the code
AI-generated code carries the same responsibility as code you
type yourself. Review it before submitting. If a bug ships,
"the AI wrote it" is not a defense.
**Example:** As an experiment, you used an LLM to generate a
new type of executor node. The results were impressive, and you
wanted to share them with the community. Before opening a PR,
read every line, verify the logic, and make sure it fits with
existing code patterns. Someone might use your code in
production, not just for experiments.
### 2. Same quality bar
AI-assisted contributions must pass the same review, testing,
and CI standards as any other code. No shortcuts. AI-generated
code must come with corresponding tests, or be covered by
existing ones. If the AI wrote the code, you should at least
write or carefully verify the tests.
**Example:** You use an LLM to implement a new aggregate
function. The PR must include regression tests in `src/test`
that exercise both normal and edge cases.
### 3. Watch the license
Don't let AI introduce code incompatible with the Apache
License 2.0. You are responsible for ensuring all submitted
code — AI-generated or not — has proper licensing.
See [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html)
for details.
**Example:** If an AI tool reproduces a snippet from a
GPL-licensed project, you must not include it. When in doubt,
rewrite from scratch.
### 4. Flag it
When your PR includes significant AI-generated code, check the
AI disclosure box in the PR template. You don't have to
disclose minor AI assistance (autocomplete, reformatting), but
be transparent about substantial generation.
You can also record AI assistance directly in the commit
message using the optional `Assisted-by:` trailer (one line
per tool), following the same convention used by the Linux
kernel. See the
[Linux kernel coding assistants guidance](https://docs.kernel.org/process/coding-assistants.html)
for background.
```
Assisted-by: ChatGPT
Assisted-by: GitHub Copilot
```
This trailer is optional and non-binding — it provides
lightweight provenance information without making AI
disclosure a strict requirement.
**Example:** Using an LLM to autocomplete a single function
signature — no need for a flag. Using an LLM to generate an
entire new GUC parameter with validation logic — flag it and
add an `Assisted-by:` trailer. The flag doesn't mean the PR
skips review or merge criteria, but it gives reviewers more
context about the generation method and lets them focus on
architecture and logic rather than specific operators.
### 5. No meaningless code refactoring
Our core is PostgreSQL, and refactoring work has already been
done here. Rewriting code significantly complicates rebase.
Also, refactoring changes the code in a way that forces people
to relearn code they already know. Keep changes as simple as
possible.
**Example:** LLMs are eager to refactor. One day you may be
asked: "This code is not very good. Do you want to improve
it?" Of course! It could happen several times. Tokens are
spent, but what is the point of such refactoring?
(Rhetorical question)
### 6. LLM code review
Some AI review tools (such as GitHub Copilot Review or
CodeRabbit) may not currently be available for ASF-hosted
repositories due to operational, budgetary, or permission
reasons. Contributors can still use personal AI tools locally
but are responsible for ensuring code quality, compliance with
licensing terms, and reviewing outcomes.
**Example:** One could use GitHub Copilot for automated AI
code review on pull requests. Here are some important points:
- Copilot suggestions are **non-binding hints**, not
requirements.
- If a suggestion is irrelevant or wrong, skip it — you know
your code best.
- If a suggestion catches a real issue, fix it like you would
for any review comment.
- Copilot does not replace human reviewers. All PRs still need
approval from a committer.
### 7. Talk to maintainers yourself
Review discussions should reflect the contributor's own
understanding and technical judgment. AI tools may assist with
drafting responses, but contributors should engage
thoughtfully and personally with reviewers. Maintainers invest
time reviewing your code; respond in kind.
**Example:** A reviewer asks "why did you choose this approach
over X?" — write your own answer explaining the tradeoff,
don't paste an LLM-generated reply.
## AGENTS.md
[AGENTS.md](https://agents.md/) is a README for agents: a
dedicated, predictable place to provide context and
instructions to help AI coding agents work on your project.
We do not ship a repository-level `AGENTS.md` because the
right content is platform- and user-specific. If you work with
AI coding agents locally, create your own `AGENTS.md`. You could
take the template from the `AGENTS.md.template` file.
## Good uses of AI
- Bug fixing and root cause analysis
- Code review
- Writing and improving tests
- Documentation and code comments
- Build system and CI improvements
- Security research and vulnerability scanning
- Learning the codebase faster
## Resources
- [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html)
— Official Apache guidance on AI tool usage
- [GitHub Copilot](https://github.com/features/copilot)
— AI pair programmer and code reviewer
- [CodeRabbit](https://www.coderabbit.ai/)
— Yet another AI pair programmer and code reviewer
- [AGENTS.md](https://agents.md/)
— README for agents