blob: 5c4a35d93c2e10c272a560dbf4b06a38f3e9ea73 [file] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Apache Groovy — Repository Architecture
A contributor-facing map of how the Groovy compiler and runtime are
organised in this repository. This document is for people working *on*
Groovy. For documentation aimed at people *using* Groovy, see
<https://groovy.apache.org/> and the AsciiDoc sources under
`src/spec/doc/` and `subprojects/<module>/src/spec/doc/`.
This is an overview, not a reference. It exists to give a new
contributor — human or AI — enough orientation to read the code
productively and to avoid a small set of common mis-steps. Code is the
source of truth; this document is a pointer file.
## Repository layout (top level)
| Path | What lives there |
|---|---|
| `src/main/java/org/codehaus/groovy/` | Core compiler and runtime (legacy package — most of the codebase) |
| `src/main/java/org/apache/groovy/` | Newer code added under the `org.apache.groovy.*` package convention |
| `src/main/java/groovy/` | User-facing API (`groovy.lang.*`, `groovy.util.*`, etc.) |
| `src/main/groovy/` | Groovy sources compiled into the core jar |
| `src/main/resources/` | Service files, META-INF, default scripts |
| `src/antlr/` | ANTLR4 grammar (`GroovyLexer.g4`, `GroovyParser.g4`) — see "Generated code" below |
| `src/spec/doc/` | User-facing AsciiDoc reference docs |
| `src/spec/test/` | Executable Groovy snippets `include::`'d by the AsciiDoc sources |
| `src/test/` | JUnit / Spock tests for the core jar |
| `subprojects/` | ~50 modular subprojects (groovy-json, groovy-sql, groovy-xml, groovy-typecheckers, parser-antlr4 wiring, etc.) |
| `subprojects/groovy-binary/` | Aggregator that produces the final distribution and the published spec |
| `subprojects/binary-compatibility/` | Enforces public-API binary compatibility across releases |
| `subprojects/tests-preview/` | Tests that depend on preview JDK features |
| `bootstrap/`, `buildSrc/`, `build-logic/` | Build infrastructure (Gradle convention plugins, bootstrap helpers) |
When in doubt, prefer adding new code under `org.apache.groovy.*`; the
older `org.codehaus.groovy.*` packages remain for legacy reasons but
are kept stable for compatibility.
## Compilation pipeline
The driver is `org.codehaus.groovy.control.CompilationUnit`. A
`SourceUnit` represents a single source file inside it. Compilation
proceeds in numbered phases declared in
[`Phases.java`](src/main/java/org/codehaus/groovy/control/Phases.java)
and exposed as the
[`CompilePhase`](src/main/java/org/codehaus/groovy/control/CompilePhase.java)
enum that AST transformations and customizers attach to:
| # | Phase | What happens | Driver classes |
|---|---|---|---|
| 1 | `INITIALIZATION` | Source files opened, `CompilationUnit` configured, customizers applied | `CompilationUnit`, `CompilerConfiguration` |
| 2 | `PARSING` | ANTLR4 lexer + parser produce a CST (parse tree) | `Antlr4ParserPlugin`, `GroovyLangLexer`, `GroovyLangParser` |
| 3 | `CONVERSION` | CST → AST (`ModuleNode` / `ClassNode` / `MethodNode` / ...) | `AstBuilder` |
| 4 | `SEMANTIC_ANALYSIS` | Class resolution, import handling, validity checks the grammar can't catch | `ResolveVisitor`, `StaticImportVisitor`, `AnnotationConstantsVisitor` |
| 5 | `CANONICALIZATION` | Fill in the AST: synthesised members, generic types, most local AST transforms run here | `ASTTransformationVisitor`, `GenericsVisitor` |
| 6 | `INSTRUCTION_SELECTION` | Optimisations and instruction-set selection; `@CompileStatic` / `@TypeChecked` run here | `OptimizerVisitor`, `StaticTypeCheckingVisitor` |
| 7 | `CLASS_GENERATION` | AST → bytecode in memory | `AsmClassGenerator`, `Verifier`, classes under `classgen/asm/` |
| 8 | `OUTPUT` | Write generated `.class` files | `CompilationUnit` output stage |
| 9 | `FINALIZATION` | Cleanup, `Janitor` callbacks | `CompilationUnit`, `Janitor` |
Each phase iterates over all `SourceUnit`s before the next phase
begins. AST transformations declare which phase they run in; the
canonical question to ask before adding one is *"what state must the
AST be in for this transform to make sense?"* — pick the earliest phase
where that holds.
The phase enum is the right anchor for any documentation that talks
about "when X happens during compilation". Quoting the phase names
verbatim keeps the reference precise; paraphrasing tends to drift.
### Parser (phase 2)
- Grammar lives in `src/antlr/GroovyLexer.g4` and
`src/antlr/GroovyParser.g4`. The generated parser is regenerated
from these sources on every build, so changes belong in the `.g4`
files.
- The ANTLR Gradle plugin generates `GroovyLexer`, `GroovyParser`,
`GroovyParserVisitor`, and `GroovyParserBaseVisitor` into
`build/generated/sources/antlr4/org/apache/groovy/parser/antlr4/`.
- Hand-written code that wires the parser into `CompilationUnit` lives
in `src/main/java/org/apache/groovy/parser/antlr4/`
(`Antlr4PluginFactory`, `Antlr4ParserPlugin`, `GroovyLangLexer`,
`GroovyLangParser`, `AstBuilder`, plus support classes:
`ModifierManager`, `GroovydocManager`, `SemanticPredicates`,
`PositionInfo`).
- `AstBuilder` is the hand-off from CST to AST. It is large; almost
every parser-visible language change touches it.
### AST (phase 3 onward)
- Root: `org.codehaus.groovy.ast.ASTNode`.
- Sub-packages:
- `org.codehaus.groovy.ast.expr` — expression nodes
(`BinaryExpression`, `MethodCallExpression`, ...)
- `org.codehaus.groovy.ast.stmt` — statement nodes
(`BlockStatement`, `ForStatement`, ...)
- `org.codehaus.groovy.ast.tools` — helpers (`GeneralUtils` is the
common one — prefer its factory methods over hand-built nodes)
- Top-level structural nodes: `ModuleNode` (one per source file) →
`ClassNode` → `MethodNode` / `FieldNode` / `PropertyNode` /
`ConstructorNode`.
- `ClassNode` instances for primitive and common types should be
obtained from `ClassHelper`, not constructed directly. Constructing
fresh `ClassNode`s for `int`, `String`, `Object`, etc. is a frequent
source of equality and resolution bugs.
- Visitors: `GroovyCodeVisitor` (expression + statement),
`GroovyClassVisitor` (class members), with `ClassCodeVisitorSupport`
/ `CodeVisitorSupport` as bases, and
`ClassCodeExpressionTransformer` for transforms that rewrite
expressions in place.
### Static type checker (phase 6)
- Entry point: `org.codehaus.groovy.transform.stc.StaticTypeCheckingVisitor`.
- Driven by `@TypeChecked` and `@CompileStatic`. The latter runs the
same checker, then directs `AsmClassGenerator` to emit direct calls
rather than dynamic dispatch.
- Extensible from user code via type-checking extension scripts; see
`src/spec/doc/_type-checking-extensions.adoc` for the user-facing
documentation of that mechanism.
### Class generation (phase 7)
- `org.codehaus.groovy.classgen.AsmClassGenerator` walks the AST and
emits bytecode via ASM. Supporting visitors run here too:
`Verifier` (synthesises bridge methods, accessors, default
constructors), `EnumVisitor`, `EnumCompletionVisitor`,
`InnerClassVisitor`, `InnerClassCompletionVisitor`,
`VariableScopeVisitor`, `ReturnAdder`.
- ASM-specific helpers: `org.codehaus.groovy.classgen.asm.*`.
- The class loader path for compiled classes goes through
`org.codehaus.groovy.reflection.*` and the meta-class system in
`groovy.lang.MetaClass*`.
## Extension points
Most contributor work touches one of these. Each has a dedicated
mechanism — knowing which one applies tells you where the change
belongs:
- **AST transformations** — annotation-driven AST rewrites. Local
transforms run in `CANONICALIZATION` by default; global transforms
apply to every compilation unit and are registered via
`META-INF/services/org.codehaus.groovy.transform.ASTTransformation`.
Implementations live in `org.codehaus.groovy.transform.*`.
`AbstractASTTransformation` is the usual base class, and
`org.codehaus.groovy.ast.tools.GeneralUtils` is the standard library
for building AST fragments.
- **Type-checking extensions** — DSL scripts that hook into the
static type checker. See
`org.codehaus.groovy.transform.stc.GroovyTypeCheckingExtensionSupport`
and the user docs at `src/spec/doc/_type-checking-extensions.adoc`.
- **Compilation customizers** —
`org.codehaus.groovy.control.customizers.*`. Programmatic
configuration applied at `INITIALIZATION`: `ImportCustomizer`,
`ASTTransformationCustomizer`, `SecureASTCustomizer`,
`CompilationCustomizer` (base class for custom ones).
- **Extension modules** — add instance / static methods to existing
classes via descriptor files. Discovered through
`META-INF/groovy/org.codehaus.groovy.runtime.ExtensionModule`. The
GDK itself is built this way; see
`org.codehaus.groovy.runtime.DefaultGroovyMethods` and friends, and
the user-facing description in `src/spec/doc/core-gdk.adoc`.
- **Parser plugin** — `org.codehaus.groovy.control.ParserPluginFactory`
selects the parser. The ANTLR4 implementation is the only supported
one; the older Antlr2-based parser has been removed.
## Generated code
The following are produced by the build and regenerated on every
run, so direct edits to them are overwritten. Changes belong in the
source they're generated from.
| Generated artefact | Source |
|---|---|
| `build/generated/sources/antlr4/org/apache/groovy/parser/antlr4/Groovy{Lexer,Parser,ParserVisitor,ParserBaseVisitor}.java` | `src/antlr/GroovyLexer.g4`, `src/antlr/GroovyParser.g4` |
| Anything under `build/`, `*/build/`, `out/`, `subprojects/*/build/` | The build itself; never committed |
| Repackaged dependency classes (ASM, ANTLR runtime, picocli) | Configured in `build.gradle` under `repackagedDependencies` |
If a `.java` file under `build/generated/...` looks like the right
thing to change, you are looking at the wrong file. The grammar fix
goes in `src/antlr/`.
## Public API boundaries
Groovy has a covenanted public API. The shape of a change determines
which review path applies — see [`CONTRIBUTING.md`](CONTRIBUTING.md).
| Package convention | Audience | Stability |
|---|---|---|
| `groovy.*` | End users (the public API surface) | Strongly stable; breaking changes need a major version |
| `org.apache.groovy.*` | Mixed; preferred location for new code | Stable unless explicitly marked otherwise |
| `org.codehaus.groovy.*` | Historical core; some user-visible, much internal | Stable in practice for things users have come to rely on; treat as public unless marked `@Internal` |
| Anything annotated [`@groovy.transform.Internal`](src/main/java/groovy/transform/Internal.java) or in a package named `internal` | Implementation detail | No stability guarantee |
Binary compatibility against a baseline release is checked by the
`subprojects/binary-compatibility/` module as part of the build. See
[`COMPATIBILITY.md`](COMPATIBILITY.md) for the full stability story:
what counts as breaking, the deprecation policy, and how the
`japicmp`-based check is wired up.
## Tests
- Core: `src/test/`. New tests use JUnit 5
(`org.junit.jupiter.api.Test`); older tests are a mix of JUnit 3
(`extends GroovyTestCase`) and JUnit 4. Spock is bundled and
available, but the core repo's own tests are predominantly JUnit.
- Module-specific: `subprojects/<module>/src/test/`. Same conventions
apply unless the module documents otherwise.
- Documentation examples: `src/spec/test/` and
`subprojects/<module>/src/spec/test/`. These are real Groovy files
that the AsciiDoc sources `include::` to keep examples executable.
A change to a documented example normally touches both files
together.
- Preview-feature tests:
`subprojects/tests-preview/src/test/` — use this when a test
depends on a JDK preview feature.
- Regression tests for a fixed JIRA: standalone test classes follow
the `Groovy<NNNN>` naming (e.g. `Groovy11955.groovy`); a regression
added to an existing class gets a `// GROOVY-<NNNN>` comment
immediately above the new method. Either shape leaves the JIRA ID
searchable. See the "Tests" section in
[`CONTRIBUTING.md`](CONTRIBUTING.md) for the full convention.
Run a single test with:
```
./gradlew :test --tests <FullyQualifiedClassName>
./gradlew :<subproject>:test --tests <FullyQualifiedClassName>
```
## Where to read next
- [`CONTRIBUTING.md`](CONTRIBUTING.md) — how to build, test, and
submit a change.
- [`COMPATIBILITY.md`](COMPATIBILITY.md) — stability tiers, what
counts as a breaking change, deprecation policy, and the
binary-compatibility check.
- [`GOVERNANCE.md`](GOVERNANCE.md) — how decisions get made, where
discussions happen, review modes, and wait periods (placeholder
draft pending dev@ confirmation).
- [`AGENTS.md`](AGENTS.md) — supplemental guidance for AI coding
assistants; layered on top of this document, not a replacement for
it.
- `README.adoc` — the canonical build instructions.
- `src/spec/doc/core-metaprogramming.adoc` — user-facing description
of AST transformations and metaprogramming.
- `src/spec/doc/_type-checking-extensions.adoc` — user-facing
description of the type-checking extension mechanism.
- The Groovy issue tracker (<https://issues.apache.org/jira/browse/GROOVY>)
and the existing test suite are the best source of precedent for any
given change. `git log --grep GROOVY-NNNNN` finds the original fix
for an issue.