| <!-- START doctoc generated TOC please keep comment here to allow auto update --> |
| <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> |
| **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* |
| |
| - [Spec-driven development](#spec-driven-development) |
| - [The idea](#the-idea) |
| - [Three phases, four beats, one loop](#three-phases-four-beats-one-loop) |
| - [Specs are not RFCs](#specs-are-not-rfcs) |
| - [A branch per fix or feature](#a-branch-per-fix-or-feature) |
| - [Why it never pushes](#why-it-never-pushes) |
| - [Security and the dangerously-skip-permissions flag](#security-and-the-dangerously-skip-permissions-flag) |
| - [Keeping specs honest: the update beat](#keeping-specs-honest-the-update-beat) |
| - [Layout](#layout) |
| - [Quick start](#quick-start) |
| - [How this composes with the framework's principles](#how-this-composes-with-the-frameworks-principles) |
| |
| <!-- END doctoc generated TOC please keep comment here to allow auto update --> |
| |
| <!-- SPDX-License-Identifier: Apache-2.0 |
| https://www.apache.org/licenses/LICENSE-2.0 --> |
| |
| # Spec-driven development |
| |
| This document explains the **spec-driven build loop** that lives in |
| [`tools/spec-loop/`](../tools/spec-loop/). It is how this framework can be |
| developed and maintained against a written description of what it does, |
| rather than against whatever happens to be in someone's head on a given |
| afternoon. |
| |
| ## The idea |
| |
| The loop is a small instance of the general |
| [Ralph](https://ghuntley.com/ralph/) technique: run a fresh agent context |
| against a fixed prompt, let it do one well-scoped thing, then run it |
| again with a clean context. The power is in the funnel that feeds it — |
| not "a loop that codes," but a pipeline from *what the product should do* |
| to *one reviewable change at a time*. |
| |
| Two artefacts carry the state between iterations, so nothing depends on a |
| long-lived context window: |
| |
| - **Specs** ([`tools/spec-loop/specs/`](../tools/spec-loop/specs/)) — a |
| faithful, plain-Markdown description of each functional area of the |
| product: what it does, where it lives in the code, the contract it must |
| honour, and its known gaps. The specs are the *desired state*. |
| - **The implementation plan** |
| ([`tools/spec-loop/IMPLEMENTATION_PLAN.md`](../tools/spec-loop/IMPLEMENTATION_PLAN.md)) |
| — the prioritised list of *work items*: the gaps between the specs and |
| the code. Each work item is sized to one branch and one PR. |
| |
| ## Three phases, four beats, one loop |
| |
| Phase one is a human (or a planning conversation) writing the specs. |
| Phases two and three are the loop, swapping prompts: |
| |
| | Beat | Command | What it does | Commits? | |
| |---|---|---|---| |
| | **plan** | `loop.sh plan` | Compares specs against the code; rewrites the plan as prioritised work items. | No | |
| | **build** | `loop.sh` | Implements the single highest-priority work item on its own branch; validates; commits there. | Yes, on a work-item branch | |
| | **update** | `loop.sh update` | Inverse of plan: finds functionality the code has but the specs don't, and brings the specs back in sync. | Yes, on `spec/sync-specs` | |
| | **consolidate** | `loop.sh consolidate` | Shrinks the plan when it grows too long, without dropping planned work. | Yes, the plan only | |
| |
| Every beat loads the same operational context — |
| [`tools/spec-loop/AGENTS.md`](../tools/spec-loop/AGENTS.md) (repo map, |
| validation commands, branch and hard-limit rules) layered on top of the |
| repository-wide [`AGENTS.md`](../AGENTS.md). Each build iteration reads |
| only the spec and source files relevant to its one work item, so a fresh |
| context never drowns in the whole tree. |
| |
| ## Specs are not RFCs |
| |
| The framework already has [`docs/rfcs/`](rfcs/) — normative principle |
| documents that say *why* and define the trust posture. Those are the |
| **constitution**. The specs are the **work orders**: discrete, concrete, |
| grounded in the actual code. The loop **respects** an RFC principle as a |
| constraint (it never pushes, it stays in the sandbox), but it never reads, |
| edits, or restates an RFC, and no spec ever lives under `docs/rfcs/`. The |
| two formats and lifecycles are deliberately separate; see |
| [`tools/spec-loop/specs/README.md`](../tools/spec-loop/specs/README.md). |
| |
| Spec filenames are **topics, not numbers** — `triage-mode.md`, |
| `pairing-mode.md`, `security-issue-lifecycle.md`. There is no numeric |
| prefix, because numbering implies a priority the specs don't carry. |
| Priority lives only in the implementation plan. |
| |
| ## A branch per fix or feature |
| |
| The defining constraint of this loop: **one work item, one branch, one |
| PR**. Before plan/build iterations, the runner snapshots open PRs so the |
| plan beat does not add work already in flight and the build beat skips |
| planned items that an open PR already covers. The build beat returns to |
| the integration base, then carves out a |
| `spec/<slug>` branch for the single work item it is about to implement. It |
| never commits feature work to the base branch, and `loop.sh` stops if it |
| detects that happening. The result of a run is a fan of independent |
| branches, each carrying exactly one change — each independently |
| reviewable and independently revertible. |
| |
| This is the same discipline the framework asks of every state change, |
| applied to the framework's own development. It is also what makes the |
| loop safe to run unattended: the blast radius of any one iteration is a |
| single local branch. |
| |
| ## Why it never pushes |
| |
| `git push` and `gh pr create` are in the `ask` list of |
| [`.claude/settings.json`](../.claude/settings.json) — they require a human |
| confirmation. The loop honours that: it ends every iteration at a **local |
| commit** and prints the exact commands for the human to run: |
| |
| ```bash |
| git push -u origin spec/<slug> |
| gh pr create --web --base main --head spec/<slug> \ |
| --title "<subject>" --body-file <prepared-body> |
| ``` |
| |
| Opening the PR with `--web` is the framework's convention so the reviewer |
| sees the title, body, and generative-AI disclosure in the browser before |
| submitting. The agent drafts; the human presses the button. |
| |
| ## Security and the dangerously-skip-permissions flag |
| |
| The loop runs the agent headless with `--dangerously-skip-permissions`. |
| That deserves a direct explanation, because it looks, at a glance, like |
| it throws away the framework's permission gates. |
| |
| **Why the flag is there.** Headless iterations have no human to answer a |
| per-tool-call prompt. Without the flag, the agent would stall (or, in |
| non-interactive mode, deny) the moment it tried to edit a file or run a |
| validation command. The flag lets the loop do its job — edit, validate, |
| commit — unattended. |
| |
| **What it bypasses, and what it does not.** The framework's sandbox is |
| layered (see [`docs/rfcs/RFC-AI-0004.md`](rfcs/RFC-AI-0004.md) for the |
| normative statement and `docs/setup/secure-agent-internals.md` for the |
| mechanism). `--dangerously-skip-permissions` only reaches the top two: |
| |
| | Layer | Mechanism | Bypassed by the flag? | |
| |---|---|---| |
| | 0. Clean environment | wrapper strips credential-shaped env vars before exec | **No** — it is the launching wrapper, not the agent | |
| | 1. Filesystem + network sandbox | `bubblewrap` + SNI proxy (Linux) / `sandbox-exec` (macOS); default-deny egress | **No** — enforced by the OS, not the agent | |
| | 2. Tool permissions | `.claude/settings.json` `permissions.deny` | **Yes** | |
| | 3. Forced confirmation | `.claude/settings.json` `permissions.ask` on `git push`, `gh …` | **Yes** | |
| |
| So the flag removes the *agent-level* gate (Layers 2–3), but the |
| *OS-level* boundary (Layers 0–1) is untouched — it is enforced beneath |
| the agent and cannot be turned off from inside it. This is exactly the |
| posture the flag's own guidance assumes: it is *"recommended only for |
| sandboxes with no internet access."* |
| |
| **How the loop stays safe anyway.** Three things, in order of |
| importance: |
| |
| 1. **Run it only inside the sandbox harness.** The OS layers the flag |
| cannot bypass are the real boundary. Never run the loop on a bare |
| machine — launch it through the project's `claude-iso`/sandbox wrapper |
| so the filesystem and network allow-lists are in force. |
| 2. **Run it with no push/write credentials in the environment.** The |
| clean-env wrapper already strips them; keep it that way. `github.com` |
| is on the network allow-list, but a `git push` or `gh pr create` with |
| no token cannot authenticate, so it fails closed. As defence in depth |
| the loop also passes |
| `--disallowedTools "Bash(git push *)" "Bash(gh *)"`. |
| 3. **Structural containment.** Every iteration works on its own |
| `spec/<slug>` branch, the loop guards against commits landing on the |
| base branch, and the prompts forbid push/PR. The human-in-the-loop |
| gate is not removed — it is *relocated* from per-tool-call to the |
| push / PR / merge boundary, where the human reviews a finished branch. |
| |
| **Net effect.** During a run the per-call confirmation gate is traded for |
| autonomy, but credentials are absent, egress is fenced, and the blast |
| radius of any iteration is a single local branch the human has not yet |
| pushed. That is the same reason the loop is the project's *manual-loop |
| evidence* and must never be promoted to auto-merge: the autonomy is |
| bounded to producing local branches, nothing more. An operator who wants |
| the per-call gate back can drop the flag and pre-authorise the loop's |
| tools with `--allowedTools` instead — at the cost of the loop pausing on |
| anything it was not pre-authorised to do. |
| |
| ## Keeping specs honest: the update beat |
| |
| Not every contribution comes through the loop — people land new skills and |
| tools the normal way. When that happens the specs fall behind the code. |
| The **update** beat is the fix: it inventories `.claude/skills/`, |
| `tools/`, and `docs/modes.md`, diffs that against the specs, and back-fills |
| or corrects the specs (a `proposed` area that now has a shipped skill |
| becomes `experimental`; a drifted *Where it lives* is corrected; genuinely |
| new functionality gets a new topic-named spec). It edits **only** the spec |
| directory — it documents reality, it doesn't change it — and lands as one |
| reviewable `spec/sync-specs` PR. Run it after a batch of normal PRs merges, |
| or on a schedule. |
| |
| ## Layout |
| |
| ```text |
| tools/spec-loop/ |
| ├── README.md operator quickstart |
| ├── AGENTS.md loop-scoped operational context |
| ├── loop.sh the runner (plan / build / update / consolidate) |
| ├── PROMPT_plan.md gap analysis → plan |
| ├── PROMPT_build.md implement one work item on its own branch |
| ├── PROMPT_update.md back-fill specs from contributed code |
| ├── PROMPT_consolidate.md shrink the plan |
| ├── IMPLEMENTATION_PLAN.md prioritised work items (the gaps) |
| └── specs/ functional description of the product |
| ├── overview.md |
| ├── triage-mode.md mentoring-mode.md drafting-mode.md pairing-mode.md |
| ├── security-issue-lifecycle.md privacy-llm-gate.md |
| ├── agent-isolation-sandbox.md cve-tooling.md |
| ├── adoption-and-setup.md adapters.md |
| └── meta-and-quality-tooling.md |
| ``` |
| |
| ## Quick start |
| |
| ```bash |
| # 1. See what's out of sync, then read the plan it writes. |
| ./tools/spec-loop/loop.sh plan 1 |
| $EDITOR tools/spec-loop/IMPLEMENTATION_PLAN.md |
| |
| # 2. Build the top work item (one branch, one commit) and stop. |
| ./tools/spec-loop/loop.sh 1 |
| |
| # 3. Review the branch it produced, then push + open the PR yourself. |
| git log --oneline -1 |
| git push -u origin spec/<slug> |
| gh pr create --web --base main --head spec/<slug> --title "…" --body-file … |
| |
| # Later: someone merged skills outside the loop — resync the specs. |
| ./tools/spec-loop/loop.sh update 1 |
| ``` |
| |
| Stop any run with `Ctrl+C` or `touch STOP`. By default the loop forks |
| work items from the branch you start it on (typically `main`); set |
| `SPEC_LOOP_BASE` to build on top of a different branch. Set |
| `SPEC_LOOP_AGENT` when the Claude-compatible agent CLI is installed |
| under a command name other than `claude`. Set `SPEC_LOOP_PR_LIMIT` to |
| change how many open PRs are included in the duplicate-work check. |
| |
| ## How this composes with the framework's principles |
| |
| A loop that runs an agent unattended sounds, at first, like the opposite |
| of human-in-the-loop. The branch-per-feature constraint is the |
| reconciliation: the loop's autonomy is bounded to *producing local |
| branches*, and the human gate sits exactly where the framework always |
| puts it — at push, at PR, at merge. Nothing the loop does is visible |
| outside the maintainer's machine until a human chooses to push it. The |
| loop is the *manual* development cycle the framework can later point to as |
| evidence; it is not, and must not become, an auto-merge. |