Apache Groovy — Repository Architecture

A contributor-facing map of how the Groovy compiler and runtime are organised in this repository. This document is for people working on Groovy. For documentation aimed at people using Groovy, see https://groovy.apache.org/ and the AsciiDoc sources under src/spec/doc/ and subprojects/<module>/src/spec/doc/.

This is an overview, not a reference. It exists to give a new contributor — human or AI — enough orientation to read the code productively and to avoid a small set of common mis-steps. Code is the source of truth; this document is a pointer file.

Repository layout (top level)

PathWhat lives there
src/main/java/org/codehaus/groovy/Core compiler and runtime (legacy package — most of the codebase)
src/main/java/org/apache/groovy/Newer code added under the org.apache.groovy.* package convention
src/main/java/groovy/User-facing API (groovy.lang.*, groovy.util.*, etc.)
src/main/groovy/Groovy sources compiled into the core jar
src/main/resources/Service files, META-INF, default scripts
src/antlr/ANTLR4 grammar (GroovyLexer.g4, GroovyParser.g4) — see “Generated code” below
src/spec/doc/User-facing AsciiDoc reference docs
src/spec/test/Executable Groovy snippets include::'d by the AsciiDoc sources
src/test/JUnit / Spock tests for the core jar
subprojects/~50 modular subprojects (groovy-json, groovy-sql, groovy-xml, groovy-typecheckers, parser-antlr4 wiring, etc.)
subprojects/groovy-binary/Aggregator that produces the final distribution and the published spec
subprojects/binary-compatibility/Enforces public-API binary compatibility across releases
subprojects/tests-preview/Tests that depend on preview JDK features
bootstrap/, buildSrc/, build-logic/Build infrastructure (Gradle convention plugins, bootstrap helpers)

When in doubt, prefer adding new code under org.apache.groovy.*; the older org.codehaus.groovy.* packages remain for legacy reasons but are kept stable for compatibility.

Compilation pipeline

The driver is org.codehaus.groovy.control.CompilationUnit. A SourceUnit represents a single source file inside it. Compilation proceeds in numbered phases declared in Phases.java and exposed as the CompilePhase enum that AST transformations and customizers attach to:

#PhaseWhat happensDriver classes
1INITIALIZATIONSource files opened, CompilationUnit configured, customizers appliedCompilationUnit, CompilerConfiguration
2PARSINGANTLR4 lexer + parser produce a CST (parse tree)Antlr4ParserPlugin, GroovyLangLexer, GroovyLangParser
3CONVERSIONCST → AST (ModuleNode / ClassNode / MethodNode / ...)AstBuilder
4SEMANTIC_ANALYSISClass resolution, import handling, validity checks the grammar can't catchResolveVisitor, StaticImportVisitor, AnnotationConstantsVisitor
5CANONICALIZATIONFill in the AST: synthesised members, generic types, most local AST transforms run hereASTTransformationVisitor, GenericsVisitor
6INSTRUCTION_SELECTIONOptimisations and instruction-set selection; @CompileStatic / @TypeChecked run hereOptimizerVisitor, StaticTypeCheckingVisitor
7CLASS_GENERATIONAST → bytecode in memoryAsmClassGenerator, Verifier, classes under classgen/asm/
8OUTPUTWrite generated .class filesCompilationUnit output stage
9FINALIZATIONCleanup, Janitor callbacksCompilationUnit, Janitor

Each phase iterates over all SourceUnits before the next phase begins. AST transformations declare which phase they run in; the canonical question to ask before adding one is “what state must the AST be in for this transform to make sense?” — pick the earliest phase where that holds.

The phase enum is the right anchor for any documentation that talks about “when X happens during compilation”. Quoting the phase names verbatim keeps the reference precise; paraphrasing tends to drift.

Parser (phase 2)

  • Grammar lives in src/antlr/GroovyLexer.g4 and src/antlr/GroovyParser.g4. The generated parser is regenerated from these sources on every build, so changes belong in the .g4 files.
  • The ANTLR Gradle plugin generates GroovyLexer, GroovyParser, GroovyParserVisitor, and GroovyParserBaseVisitor into build/generated/sources/antlr4/org/apache/groovy/parser/antlr4/.
  • Hand-written code that wires the parser into CompilationUnit lives in src/main/java/org/apache/groovy/parser/antlr4/ (Antlr4PluginFactory, Antlr4ParserPlugin, GroovyLangLexer, GroovyLangParser, AstBuilder, plus support classes: ModifierManager, GroovydocManager, SemanticPredicates, PositionInfo).
  • AstBuilder is the hand-off from CST to AST. It is large; almost every parser-visible language change touches it.

AST (phase 3 onward)

  • Root: org.codehaus.groovy.ast.ASTNode.
  • Sub-packages:
    • org.codehaus.groovy.ast.expr — expression nodes (BinaryExpression, MethodCallExpression, ...)
    • org.codehaus.groovy.ast.stmt — statement nodes (BlockStatement, ForStatement, ...)
    • org.codehaus.groovy.ast.tools — helpers (GeneralUtils is the common one — prefer its factory methods over hand-built nodes)
  • Top-level structural nodes: ModuleNode (one per source file) → ClassNodeMethodNode / FieldNode / PropertyNode / ConstructorNode.
  • ClassNode instances for primitive and common types should be obtained from ClassHelper, not constructed directly. Constructing fresh ClassNodes for int, String, Object, etc. is a frequent source of equality and resolution bugs.
  • Visitors: GroovyCodeVisitor (expression + statement), GroovyClassVisitor (class members), with ClassCodeVisitorSupport / CodeVisitorSupport as bases, and ClassCodeExpressionTransformer for transforms that rewrite expressions in place.

Static type checker (phase 6)

  • Entry point: org.codehaus.groovy.transform.stc.StaticTypeCheckingVisitor.
  • Driven by @TypeChecked and @CompileStatic. The latter runs the same checker, then directs AsmClassGenerator to emit direct calls rather than dynamic dispatch.
  • Extensible from user code via type-checking extension scripts; see src/spec/doc/_type-checking-extensions.adoc for the user-facing documentation of that mechanism.

Class generation (phase 7)

  • org.codehaus.groovy.classgen.AsmClassGenerator walks the AST and emits bytecode via ASM. Supporting visitors run here too: Verifier (synthesises bridge methods, accessors, default constructors), EnumVisitor, EnumCompletionVisitor, InnerClassVisitor, InnerClassCompletionVisitor, VariableScopeVisitor, ReturnAdder.
  • ASM-specific helpers: org.codehaus.groovy.classgen.asm.*.
  • The class loader path for compiled classes goes through org.codehaus.groovy.reflection.* and the meta-class system in groovy.lang.MetaClass*.

Extension points

Most contributor work touches one of these. Each has a dedicated mechanism — knowing which one applies tells you where the change belongs:

  • AST transformations — annotation-driven AST rewrites. Local transforms run in CANONICALIZATION by default; global transforms apply to every compilation unit and are registered via META-INF/services/org.codehaus.groovy.transform.ASTTransformation. Implementations live in org.codehaus.groovy.transform.*. AbstractASTTransformation is the usual base class, and org.codehaus.groovy.ast.tools.GeneralUtils is the standard library for building AST fragments.
  • Type-checking extensions — DSL scripts that hook into the static type checker. See org.codehaus.groovy.transform.stc.GroovyTypeCheckingExtensionSupport and the user docs at src/spec/doc/_type-checking-extensions.adoc.
  • Compilation customizersorg.codehaus.groovy.control.customizers.*. Programmatic configuration applied at INITIALIZATION: ImportCustomizer, ASTTransformationCustomizer, SecureASTCustomizer, CompilationCustomizer (base class for custom ones).
  • Extension modules — add instance / static methods to existing classes via descriptor files. Discovered through META-INF/groovy/org.codehaus.groovy.runtime.ExtensionModule. The GDK itself is built this way; see org.codehaus.groovy.runtime.DefaultGroovyMethods and friends, and the user-facing description in src/spec/doc/core-gdk.adoc.
  • Parser pluginorg.codehaus.groovy.control.ParserPluginFactory selects the parser. The ANTLR4 implementation is the only supported one; the older Antlr2-based parser has been removed.

Generated code

The following are produced by the build and regenerated on every run, so direct edits to them are overwritten. Changes belong in the source they're generated from.

Generated artefactSource
build/generated/sources/antlr4/org/apache/groovy/parser/antlr4/Groovy{Lexer,Parser,ParserVisitor,ParserBaseVisitor}.javasrc/antlr/GroovyLexer.g4, src/antlr/GroovyParser.g4
Anything under build/, */build/, out/, subprojects/*/build/The build itself; never committed
Repackaged dependency classes (ASM, ANTLR runtime, picocli)Configured in build.gradle under repackagedDependencies

If a .java file under build/generated/... looks like the right thing to change, you are looking at the wrong file. The grammar fix goes in src/antlr/.

Public API boundaries

Groovy has a covenanted public API. The shape of a change determines which review path applies — see CONTRIBUTING.md.

Package conventionAudienceStability
groovy.*End users (the public API surface)Strongly stable; breaking changes need a major version
org.apache.groovy.*Mixed; preferred location for new codeStable unless explicitly marked otherwise
org.codehaus.groovy.*Historical core; some user-visible, much internalStable in practice for things users have come to rely on; treat as public unless marked @Internal
Anything annotated @groovy.transform.Internal or in a package named internalImplementation detailNo stability guarantee

Binary compatibility against a baseline release is checked by the subprojects/binary-compatibility/ module as part of the build. See COMPATIBILITY.md for the full stability story: what counts as breaking, the deprecation policy, and how the japicmp-based check is wired up.

Tests

  • Core: src/test/. New tests use JUnit 5 (org.junit.jupiter.api.Test); older tests are a mix of JUnit 3 (extends GroovyTestCase) and JUnit 4. Spock is bundled and available, but the core repo's own tests are predominantly JUnit.
  • Module-specific: subprojects/<module>/src/test/. Same conventions apply unless the module documents otherwise.
  • Documentation examples: src/spec/test/ and subprojects/<module>/src/spec/test/. These are real Groovy files that the AsciiDoc sources include:: to keep examples executable. A change to a documented example normally touches both files together.
  • Preview-feature tests: subprojects/tests-preview/src/test/ — use this when a test depends on a JDK preview feature.
  • Regression tests for a fixed JIRA: standalone test classes follow the Groovy<NNNN> naming (e.g. Groovy11955.groovy); a regression added to an existing class gets a // GROOVY-<NNNN> comment immediately above the new method. Either shape leaves the JIRA ID searchable. See the “Tests” section in CONTRIBUTING.md for the full convention.

Run a single test with:

./gradlew :test --tests <FullyQualifiedClassName>
./gradlew :<subproject>:test --tests <FullyQualifiedClassName>

Where to read next

  • CONTRIBUTING.md — how to build, test, and submit a change.
  • COMPATIBILITY.md — stability tiers, what counts as a breaking change, deprecation policy, and the binary-compatibility check.
  • GOVERNANCE.md — how decisions get made, where discussions happen, review modes, and wait periods (placeholder draft pending dev@ confirmation).
  • AGENTS.md — supplemental guidance for AI coding assistants; layered on top of this document, not a replacement for it.
  • README.adoc — the canonical build instructions.
  • src/spec/doc/core-metaprogramming.adoc — user-facing description of AST transformations and metaprogramming.
  • src/spec/doc/_type-checking-extensions.adoc — user-facing description of the type-checking extension mechanism.
  • The Groovy issue tracker (https://issues.apache.org/jira/browse/GROOVY) and the existing test suite are the best source of precedent for any given change. git log --grep GROOVY-NNNNN finds the original fix for an issue.