blob: cebf7b26c3d36ce50968e100d50cd999a2d80b62 [file] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# AGENTS.md
Apache HBase is a distributed, scalable big data store built on HDFS and cloud
object storage.
## Repo Structure
This is a multi-module Maven project. Modules live in arbitrarily nested
folders; enumerate them by searching for `pom.xml` files (excluding `target/`
directories). The root `pom.xml` defines the full reactor and build order.
Note that some directories from removed or merged modules (e.g.,
`hbase-hadoop2-compat/`, `hbase-protocol/`, `hbase-rsgroup/`) may still exist
as empty shells with only `target/` remnants. If a directory has no `pom.xml`,
it is not part of the active build.
### Client and Server
The fundamental divide in this codebase is client-side vs. server-side, with
several modules shared between them.
- `hbase-client` -- The client library. Builds RPC requests, handles retries,
manages connections. This is the public API that external consumers depend on.
- `hbase-server` -- RegionServer and Master implementations. Processes RPCs,
manages regions, stores data. The largest module by far.
- Shared modules like `hbase-common`, `hbase-protocol-shaded`, and
`hbase-metrics-api` are dependencies of both sides.
When orienting on unfamiliar code, first determine which side of this divide
you are on.
### Module Roles
**Core data path:**
`hbase-client` -> `hbase-server` (via protobuf RPCs defined in
`hbase-protocol-shaded`)
**Gateways** (alternative client entry points):
`hbase-rest` (HTTP/JSON), `hbase-thrift` (Thrift RPC)
**Coprocessors** are HBase's server-side extension framework. They allow custom
code to run inside RegionServer and Master processes, with the same privileges
as the host process. The base `Coprocessor` interface lives in `hbase-client`;
observer and endpoint interfaces (`RegionObserver`, `MasterObserver`, etc.) live
in `hbase-server`. Endpoint implementations live in `hbase-endpoint`. The
built-in `AccessController` coprocessor enforces ACLs; `VisibilityController`
enforces cell-level visibility labels. Third-party coprocessors are loaded via
configuration or table schema.
**Server subsystems** (separated from hbase-server for modularity):
`hbase-balancer`, `hbase-procedure`, `hbase-replication`, `hbase-asyncfs`,
`hbase-zookeeper`, `hbase-http`
**Shared libraries:**
`hbase-common`, `hbase-metrics` + `hbase-metrics-api`, `hbase-logging`,
`hbase-hadoop-compat`
**Extensions:**
`hbase-extensions` (currently `hbase-openssl` for native TLS support)
**Storage codecs:**
`hbase-compression/*` (pluggable algorithms), `hbase-external-blockcache`
**Packaging and shading:**
`hbase-shaded/*`, `hbase-assembly*`, `hbase-resource-bundle`
**Tooling:**
`hbase-shell` (JRuby REPL), `hbase-hbtop`, `hbase-mapreduce`, `hbase-backup`,
`hbase-diagnostics`
**Build infrastructure** (ignore for code tasks):
`hbase-build-configuration`, `hbase-checkstyle`, `hbase-annotations`,
`hbase-archetypes/*`, `hbase-dev-generate-classpath`
**Testing:**
`hbase-testing-util`, `hbase-it`, `hbase-examples`
### Navigating with @InterfaceAudience
Classes are annotated with `@InterfaceAudience` to indicate their intended
consumer:
- `Public` -- Stable client API. External consumers depend on these.
- `LimitedPrivate` -- Internal API shared across modules, scoped to a named
audience (e.g., `COPROC`, `CONFIG`, `REPLICATION`, `AUTHENTICATION`). The
audience name tells you who is expected to call this code.
- `Private` -- Module-internal. Not API.
These annotations are the fastest way to determine whether a class is part of
the external surface or internal plumbing.
### Key Entry Points
When investigating a behavior, start from where it enters the system:
- **Client RPCs**: `RSRpcServices` (RegionServer) and `MasterRpcServices`
(Master) handle all client-initiated RPCs. Trace from the method matching
the RPC name.
- **REST gateway**: resource classes in `hbase-rest` map HTTP verbs to
operations.
- **Thrift gateway**: handler classes in `hbase-thrift` map Thrift methods.
- **Coprocessor hooks**: observer interfaces (`RegionObserver`,
`MasterObserver`, etc.) define extension points. Implementations are loaded
via configuration or table schema.
- **Procedures**: `hbase-procedure` defines the framework; concrete procedures
(table create, region split, etc.) live in `hbase-server`.
- **Configuration**: properties are defined in `hbase-default.xml` (in
`hbase-common`) and overridden by operators in `hbase-site.xml`.
- **Wire format**: `.proto` files in `hbase-protocol-shaded` define every RPC
request/response and all persisted data structures. (Older branches had a
separate `hbase-protocol` module; it has been removed on master.)
### Split Packages
The same Java package often appears in multiple modules (e.g., the
`coprocessor` package exists in `hbase-client`, `hbase-server`,
`hbase-endpoint`, and `hbase-examples`). Each module contributes different
classes to the package. When searching for a class, check which module it
lives in -- the module determines the execution context.
### Related Repositories
[hbase-thirdparty](https://github.com/apache/hbase-thirdparty) is a companion
project that patches and shades key dependencies (protobuf, netty, gson, etc.)
so that HBase's internal use of these libraries does not conflict with
versions on the application classpath. The `hbase-shaded-*` artifacts from
that repo appear as dependencies throughout this project's `pom.xml`. Changes
to shaded dependency versions or patches happen in that repo, not here.
### Developer Tooling
`dev-support/` contains CI configuration, release automation, code analysis
scripts, and other maintainer tools. PR-level CI has migrated to GitHub
Actions (`.github/workflows/`), but nightly and branch-level CI still runs
via configurations in `dev-support/`. That directory also holds release
scripts, docker-based test environments, and various developer utilities.
See `dev-support/README.md` for a full index.
`conf/` holds default configuration templates (`hbase-site.xml`,
`hbase-env.sh`, `log4j2.properties`). `bin/` holds shell scripts for cluster
lifecycle and operations.
`dev-support/design-docs/` collects design documents and proposals for major
features. These capture the rationale behind complex subsystems and are useful
for understanding why the code is structured the way it is.
### Conventions
- Tests mirror source paths: `src/test/java` parallels `src/main/java`
- Generated code (protobuf, etc.) lives in `target/` and is not checked in
- Configuration properties use `hbase.` prefix
- The shell is JRuby wrapping the Java client API
## Documentation
The project website (https://hbase.apache.org) is maintained in this repo under
`hbase-website/`. User-facing and administrator-facing documentation covering
configuration, security, architecture, schema design, operations, APIs, and
more lives in `hbase-website/app/pages/_docs/docs/_mdx/`. The table of
contents and page ordering is defined in the `meta.json` files within that
tree.
The site also serves https://hbase.apache.org/llms-full.txt, which
concatenates all documentation pages into a single text file suitable for
LLM context ingestion.
## Security Model
The project's security model is the entry point for any security analysis
of this codebase. The canonical chain for discovering it is:
AGENTS.md -> SECURITY.md -> https://hbase.apache.org/security-model/
`SECURITY.md` at the repo root is a short pointer to the published model
and the `security@apache.org` reporting flow. The full model (source at
`hbase-website/app/pages/_landing/security-model/content.md`, published
at <https://hbase.apache.org/security-model/>) details trust boundaries,
what constitutes a valid vulnerability, and what does not.
When performing security analysis of this codebase, use the navigation
structure above to determine the role of the code under review, then apply
the security model to interpret findings in context.