blob: 5fd23522a8eb844dcf195e8f5e0ce043d48ef723 [file] [view]
# Local Docker Cluster for Phoenix Adapters
Brings up the full dependency stack (Hadoop / ZooKeeper / HBase / Phoenix)
required to run **phoenix-adapters** on your laptop. Uses upstream images
where they exist; custom only where they don't.
| Component | Version | Image |
| --- | --- | --- |
| Apache ZooKeeper | 3.8.4 | [`library/zookeeper:3.8.4`](https://hub.docker.com/_/zookeeper) (Docker Official) |
| Apache Hadoop (HDFS) | 3.3.6 | [`apache/hadoop:3.3.6`](https://hub.docker.com/r/apache/hadoop) (Apache convenience build) |
| Apache HBase | 2.5.14-hadoop3 | `phoenix-adapters/hbase-phoenix:latest` (custom) |
| Apache Phoenix | 5.3.1 (phoenix-hbase-2.5) | bundled into `phoenix-adapters/hbase-phoenix` |
| Phoenix Adapters REST | this repo | `phoenix-adapters/rest:latest` (custom) |
Versions are kept in lockstep with the top-level [`pom.xml`](../pom.xml).
> **Apple Silicon.** `apache/hadoop:3.3.6` is amd64-only; the compose file
> pins `platform: linux/amd64` so the NameNode/DataNode run under Rosetta
> emulation. Slower than native, but functional.
## Layout
```
docker/
├── Dockerfile.hbase-phoenix # HBase 2.5.14 + Phoenix 5.3.1
├── Dockerfile.phoenix-adapters # Multi-stage build of the REST server
├── docker-compose.yml
├── conf/
│ ├── hbase/{hbase-site.xml,hbase-env.sh}
│ └── phoenix-adapters/hbase-site.xml # Client-side overrides
└── scripts/
├── hbase-entrypoint.sh # hbase-master, hbase-regionserver
├── phoenix-adapters-entrypoint.sh
└── smoke.sh # End-to-end DDB validation suite
```
ZooKeeper and Hadoop config lives entirely in `docker-compose.yml` as env
vars that the upstream images template into XML.
## Quick start
**Prerequisites:** Docker Desktop running; `jq` and `curl` on `PATH`
(`brew install jq` on macOS).
From the **project root**:
```bash
# 1. Bring up the full stack (ZK + HDFS + HBase+Phoenix + REST) and BLOCK
# until every service reports healthy (REST takes ~30-60s on a cold
# start because Phoenix has to bootstrap SYSTEM.* tables).
# First time: ~8-12 min -- most of that is Maven downloading ~1.5 GB
# of dependencies into the BuildKit cache mount; subsequent runs reuse
# the cache and rebuild in seconds.
docker compose -f docker/docker-compose.yml up -d --build --wait
# 2. Validate it works end-to-end (CRUD + UpdateItem + BatchWriteItem + streams).
bash docker/scripts/smoke.sh
# -> "Result: 21 checks PASSED across 18 API calls"
# 3. Use it. The DynamoDB-compatible REST endpoint is at http://localhost:8842 .
# Point any AWS SDK at it (Java/Python/Node.js snippets in
# phoenix-ddb-rest/README.md), or hit it directly with curl:
curl -s -X POST http://localhost:8842/ \
-H 'Content-Type: application/x-amz-json-1.0' \
-H 'X-Amz-Target: DynamoDB_20120810.ListTables' -d '{}'
# 4. Tear down when you're done.
docker compose -f docker/docker-compose.yml down # keep volumes
docker compose -f docker/docker-compose.yml down -v # also wipe HDFS + ZK
```
### URLs
| URL | Service |
| --- | --- |
| http://localhost:8842 | **Phoenix Adapters REST (DynamoDB-compatible)** |
| http://localhost:9870 | HDFS NameNode UI |
| http://localhost:9864 | HDFS DataNode UI |
| http://localhost:16010 | HBase Master UI |
| http://localhost:16030 | HBase RegionServer UI |
Two host ports are remapped because their defaults often collide on dev
machines (macOS AirPlay on 9000, a locally installed Kafka/ZK on 2181):
| Service | Container | Host |
| --- | --- | --- |
| HDFS NameNode RPC | `namenode:9000` | `localhost:19000` |
| ZooKeeper client | `zookeeper:2181` | `localhost:12181` |
Inter-container traffic still uses the standard ports.
### Bring up just the cluster (no REST)
```bash
docker compose -f docker/docker-compose.yml up -d --build --wait \
zookeeper namenode datanode hbase-master hbase-regionserver
```
## Validation suite
`docker/scripts/smoke.sh` exercises every supported DynamoDB API against
the running REST server and asserts the expected behaviour. It prints
each request, response, and assertion as it runs.
```bash
docker compose -f docker/docker-compose.yml up -d --build --wait
bash docker/scripts/smoke.sh
```
Exits `0` on full pass; exits non-zero on the first failed assertion and
prints the offending response.
| Step | API |
| --- | --- |
| 1 | `ListTables` (baseline) |
| 2 | `CreateTable` (with `StreamSpecification` enabled, `NEW_AND_OLD_IMAGES`) |
| 3 | `DescribeTable` |
| 4 | `PutItem` (`id=a`) |
| 5 | `UpdateItem` (`SET score, bonus`, `ReturnValues=ALL_NEW`) |
| 6 | `GetItem` |
| 7 | `PutItem` (`id=b`) |
| 8 | `Scan` |
| 9 | `Query` |
| 10 | `DeleteItem` |
| 11 | `Scan` (after delete) |
| 12 | `BatchWriteItem` (mixed put + delete) |
| 13 | `Scan` paginated (drains all pages) |
| 14 | `ListStreams` |
| 15 | `DescribeStream` (polls until `StreamStatus == ENABLED`) |
| 16 | `GetShardIterator` (`TRIM_HORIZON`) |
| 17 | `GetRecords` (drains all pages) |
| 18 | `DeleteTable` |
## Poking around the cluster
HBase shell:
```bash
docker compose -f docker/docker-compose.yml exec hbase-master hbase shell
```
```text
status
list
create 'demo', 'cf'
put 'demo', 'r1', 'cf:c1', 'hello'
scan 'demo'
```
Phoenix sqlline:
```bash
docker compose -f docker/docker-compose.yml exec hbase-master \
/opt/phoenix/bin/sqlline.py zookeeper:2181
```
```sql
!tables
CREATE TABLE IF NOT EXISTS t1 (id BIGINT PRIMARY KEY, name VARCHAR);
UPSERT INTO t1 VALUES (1, 'phoenix-adapters');
SELECT * FROM t1;
```
## Developer inner loop: code change → live endpoint
```
phoenix-ddb-rest/src/**.java
│ (1) edit on host
docker compose ... up -d --build phoenix-adapters-rest
├── stage 1: mvn package -DskipTests (BuildKit caches ~/.m2)
├── stage 1 output: phoenix-ddb-assembly/target/*-bin.tar.gz
└── stage 2: temurin runtime extracts that tarball
http://localhost:8842/ (new code, live)
```
The cluster (ZK + HDFS + HBase) keeps running across REST rebuilds, and
HBase data persists across full `down`/`up` cycles.
### The loop
1. Edit code in `phoenix-ddb-rest/src/...` or `phoenix-ddb-utils/src/...`.
2. *(Optional)* sanity-check the compile on the host:
```bash
mvn -B -DskipTests -pl phoenix-ddb-rest -am package
```
3. Rebuild and recreate just the REST container:
```bash
docker compose -f docker/docker-compose.yml up -d --build phoenix-adapters-rest
```
No-dep-change rebuilds typically take 30-60 s on a warm cache.
4. Watch logs:
```bash
docker compose -f docker/docker-compose.yml logs -f phoenix-adapters-rest
```
5. Hit the endpoint and verify.
### Quick reference
| Task | Command |
| --- | --- |
| Rebuild REST + restart it | `docker compose -f docker/docker-compose.yml up -d --build phoenix-adapters-rest` |
| Restart REST (no code change) | `docker compose -f docker/docker-compose.yml restart phoenix-adapters-rest` |
| Tail REST logs | `docker compose -f docker/docker-compose.yml logs -f phoenix-adapters-rest` |
| Tail HBase logs | `docker compose -f docker/docker-compose.yml logs -f hbase-master hbase-regionserver` |
| HBase shell | `docker compose -f docker/docker-compose.yml exec hbase-master hbase shell` |
| Phoenix sqlline | `docker compose -f docker/docker-compose.yml exec hbase-master /opt/phoenix/bin/sqlline.py zookeeper:2181` |
| List containers | `docker compose -f docker/docker-compose.yml ps` |
| Stop (keep data) | `docker compose -f docker/docker-compose.yml down` |
| Stop + wipe data | `docker compose -f docker/docker-compose.yml down -v` |
### Edge cases
| Situation | What to do |
| --- | --- |
| Changed `conf/hbase/hbase-site.xml` or `hbase-env.sh` | `docker compose ... up -d --build hbase-master hbase-regionserver`. Existing tables survive. |
| Bumped `hbase.version` / `phoenix.version` in `pom.xml` | Bump matching `ARG`s in `Dockerfile.hbase-phoenix`, then `--build hbase-master hbase-regionserver phoenix-adapters-rest`. Often pair with `down -v`. |
| Added a Maven dep to `phoenix-ddb-rest/pom.xml` | `--build phoenix-adapters-rest`. New dep downloads once; cache warms after. |
| Clean slate | `docker compose ... down -v` then `up -d --build`. |
| Code doesn't seem picked up | You ran `restart` instead of `up --build`. `restart` does not rebuild. |
| Stack left running for days / many smoke iterations | HBase + REST logs grow unbounded inside the containers. `down -v` periodically to reclaim disk. |
### Pre-PR checklist
```bash
# 1. Host-side compile + unit tests (no cluster required).
mvn -B clean install -DskipITs
# 2. End-to-end validation: fresh stack + full DDB round-trip including streams.
docker compose -f docker/docker-compose.yml down -v
docker compose -f docker/docker-compose.yml up -d --build --wait
bash docker/scripts/smoke.sh
# 3. Tear it down.
docker compose -f docker/docker-compose.yml down -v
```
If `smoke.sh` finishes with `Result: 21 checks PASSED across 18 API calls`,
your change is wire-compatible end to end through Phoenix on dockerized
HBase across CRUD, batch, and the change-stream chain.
## Running the REST server outside Docker
1. Bring up only the cluster services.
2. Add cluster hostnames to `/etc/hosts` (HBase advertises hostnames over ZK):
```
127.0.0.1 zookeeper namenode datanode hbase-master hbase-regionserver
```
3. Start the REST server pointing at the dockerized ZooKeeper:
```bash
mvn -DskipTests clean package
tar xzf phoenix-ddb-assembly/target/phoenix-adapters-*-bin.tar.gz -C /tmp
cd /tmp/phoenix-adapters-*
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) # macOS example
export PHOENIX_ADAPTERS_HOME=$(pwd)
bin/phoenix-adapters rest foreground_start -p 8842 -z localhost:12181
```
## Phoenix tuning baked into the image
[`docker/conf/hbase/hbase-site.xml`](conf/hbase/hbase-site.xml) enables what
Phoenix 5.x needs for secondary indexes, DDL events, and the multi-priority
RPC controller:
| Property | Value |
| --- | --- |
| `hbase.coprocessor.master.classes` | `…PhoenixMasterObserver` |
| `hbase.coprocessor.regionserver.classes` | `…PhoenixRegionServerEndpoint` |
| `hbase.regionserver.wal.codec` | `…IndexedWALEditCodec` |
| `hbase.region.server.rpc.scheduler.factory.class` | `…PhoenixRpcSchedulerFactory` |
| `hbase.rpc.controllerfactory.class` | `…ServerRpcControllerFactory` |
| `phoenix.task.handling.interval.ms` | `1000` |
| `phoenix.task.handling.initial.delay.ms` | `1` |
`phoenix-server-hbase-2.5-5.3.1.jar` is copied into `${HBASE_HOME}/lib/` so
the coprocessors and WAL codec are visible to master and every RegionServer.
## Why upstream images for ZK + Hadoop but not HBase?
| Component | Decision | Reason |
| --- | --- | --- |
| ZooKeeper 3.8.4 | Upstream `zookeeper:3.8.4` | Docker Official, exact version, multi-arch. |
| Hadoop 3.3.6 | Upstream `apache/hadoop:3.3.6` | Apache convenience build at the exact version. amd64-only, runs under emulation on Apple Silicon. |
| HBase 2.5.14-hadoop3 | Custom | No official Apache image; community images don't cover `2.5.14-hadoop3`. |
| Phoenix 5.3.1 | Custom (layered on HBase) | No Phoenix image anywhere; server JAR must be on HBase's classpath. |
## Troubleshooting
* **NameNode unhealthy on first start.** First start formats the NameNode
via `ENSURE_NAMENODE_DIR`. Watch with `docker compose ... logs -f namenode`.
* **HBase Master `RegionTooBusyException` / `NotServingRegion`.** Wait ~30 s
after RegionServer comes up; Phoenix bootstraps `SYSTEM.*` tables on its
first connection and the REST server retries transparently.
* **REST exits with `NoClassDefFoundError: org/apache/hadoop/fs/WithErasureCoding`.**
The phoenix-ddb-assembly tarball ships `hadoop-common:3.3.6` (from
`pom.xml`) alongside `hadoop-hdfs:3.4.x` / `hadoop-yarn:3.4.x`
(transitive from `phoenix-core-client`). The 3.4.x JARs register
FileSystem impls that need `WithErasureCoding`, which only exists in
hadoop-common 3.4+. When HBase returns a remote exception during
bootstrap, the client tries to enumerate FileSystem impls, hits
`NoClassDefFoundError`, and poisons the JVM. The REST image
`Dockerfile.phoenix-adapters` strips the 3.4.x `hadoop-hdfs*`,
`hadoop-yarn-*`, `hadoop-mapreduce-client-*`, and `hadoop-distcp-*`
jars after extracting the tarball — the REST server only talks to
HBase via RPC and never opens HDFS directly, so removing them is safe.
If this error reappears, check that those `rm -f` lines in
`Dockerfile.phoenix-adapters` weren't dropped.
* **`Datanode denied communication with namenode`.** Cluster ID mismatch.
`docker compose down -v` and bring the stack back up.
* **`platform mismatch` warnings on Apple Silicon.** Expected for the
Hadoop containers (amd64 image, emulated). No action needed.
## Customising versions
HBase / Phoenix versions are `ARG`s on `Dockerfile.hbase-phoenix`:
```bash
docker compose -f docker/docker-compose.yml build \
--build-arg HBASE_VERSION=2.5.13 \
--build-arg PHOENIX_VERSION=5.3.0 \
hbase-master
```
Hadoop and ZooKeeper versions are pinned by tag in `docker-compose.yml`.
Keep all four in lockstep with `pom.xml`.