Introduction — why the PDCA harness exists

Index · next: 01 Render & integrate →

Before the step-by-step, this page answers the why: the problem the harness solves, the benefits it delivers, the features that deliver them, and the vocabulary the rest of the walkthrough assumes. If you just want to run something, skip to step 01 — but five minutes here makes the rest land faster.

The problem

An AI model can write a plausible fix for almost any issue in seconds. That is exactly the danger. A plausible diff is not a verified one, "looks done" is not "is correct," and a fix that passes review can still solve the wrong problem or quietly regress something else. Left unmanaged, AI-driven contribution produces a firehose of unreviewable, unverifiable, undifferentiated output — and the maintainer becomes the bottleneck and the only safety net, reading every line under time pressure.

The instinct is to "let the AI do it and review the PR." That fails at volume: review is the slowest, most fatiguing step, and a raw diff gives the reviewer nothing to anchor on — no spec to check against, no proof the bug existed, no proof it's gone, no statement of what's not proven.

The idea

Treat each contribution as one turn of a PDCA quality cycle — the Plan-Do-Check-Act improvement loop from manufacturing quality control — and automate everything in it except the few judgments only a human should make:

Plan — author a precise spec (the brief).
Do — implement against that spec, and only that spec.
Check — verify the built artifact against the spec along three axes: correctness (does it work?), conformance (does it fit the project?), and validation (is it the right thing?).
Act — periodically, improve the process itself so the gaps one cycle exposed don't recur.

A deterministic driver moves the work through this cycle and stops only at three irreducible human touch points — authoring the Plan, signing off the Check, and the cross-cycle Act review. Everything between is automated; nothing ships without the human sign-off.

The full reasoning behind the model lives in the vendored spec at ../template/PCDA/quality-cycle/ (files 00–10). This walkthrough is the applied counterpart.

Benefits

Verifiable correctness, not vibes. Each cycle ships proof: a test that is red before the fix and green after it. That red→green check is the one thing allowed to block a sign-off — so "verified" has a concrete, deterministic meaning instead of being a reviewer's gut feel.

The human stays in the loop — at three points, not every line. Plan authoring, Check sign-off, and Act are reserved for human judgment. Everything else runs unattended. The maintainer adjudicates a short, specific checklist (§6 NEEDS-HUMAN) instead of re-reading a raw diff.

No silent failures. A check that can't run, or fails for reasons outside the fix's control, doesn't get swept under the rug or block a good fix — it surfaces as an explicit NEEDS-HUMAN item for a person to adjudicate. The gating path that decides "can this be accepted?" contains no LLM at all — it reads only deterministic gate results.

AI coding best practices, codified — not tribal knowledge. The harness bakes in discipline that's usually left to an individual's habits: implement against a written spec, not vibes (the brief); prove correctness with a test that's red-before/green-after, not a model's say-so; scope the builder's tools narrowly so it can edit and run tests but never merge or open a PR unsupervised (step 04); decorrelate the reviewer from the builder — a different model family, no visibility into the builder's notes — so review isn't the same blind spot checking itself (step 05); and ship every fix as a draft PR by default, so a human is always the one who makes it live (step 05). None of it depends on a maintainer remembering to ask for it on any given contribution.

The process gets better over time. The Act beat turns recurring misses — something a maintainer caught at review that no gate owned — into permanent improvements: a new gate, a new brief field, a tightened rule. Each future cycle starts from a better baseline. (The walkthrough shows a real example: a maintainer's review comment about a translation-manifest rule became a proposed deterministic gate — see step 06.)

Scale without losing rigor. Because state lives in files (not a database), the driver is idempotent and resumable, and one Plan session can brief several issues that then build unattended and queue up for a fast, cheap-first sign-off burn-down. You review many contributions in one focused sitting.

Cheap to adopt and cheap to learn. A full cycle can be rehearsed entirely offline — stub models, stub gates, no network — so you can prove the plumbing (and learn the state machine) before spending a single token. See step 02.

Project-agnostic, drop-in. The harness ships the contract (state machine, artifact templates, the gate interface) — not your repo's specifics. You render it into any project and concretize it once; the model logic is reused, the project-specific parts are yours.

Features (and where you meet them)

Feature	What it is	Walkthrough step
Copier template	Render the harness into any repo; pull template updates later with `copier update`	01
`pdca.toml` wiring	One config file binds the driver to your repo's leaves (the per-beat commands) and gates	01
`INTEGRATION.md` (11 items)	The contract for concretizing the generic model to your tracker, branches, fixtures, ruleset	01
File-derived state machine	A bundle's state is computed from which files exist — no DB, fully resumable	02
Offline rehearsal	`pdca flow <id> --rehearse` drives the whole flow with stubs — instant, free	02
The brief	A parsed, field-structured spec every later beat consumes	03
Headless builder, narrow tools	The Do leaf can edit and run tests but not open PRs — STOP discipline by capability	04
Deterministic gates	Your check commands, each tagged gating (blocks) or advisory (informs); baseline-diffed	05
Decorrelated reviewer	An advisory second-opinion leaf that never sees the builder's notes (ideally a different model family)	05
The assembled SUMMARY	brief + gates + review folded into one 10-section verdict, with a §6 NEEDS-HUMAN checklist	05
Sign-off + C6 guard	Four dispositions; `--accept` is refused while any NEEDS-HUMAN item is open	05
Iterate carry-forward	A rejection archives the attempt and folds the reason into the next build	05
Draft-PR publish	Contribute the accepted fix as a draft — a human marks it ready (except non-final wave PRs under opt-in `wave_mode = "merge"`)	05
Cross-cycle Act loop	Periodic review that turns recurring misses into spec/gate/rule deltas	06
Console-script front door	`pdca flow <id>` orchestrates the whole cycle; `pdca flow <ids…>` batches, bare `pdca` is status; `make` is bootstrap-only	README

Vocabulary

Five terms recur throughout. Learn them once:

Bundle — the folder for one contribution, results/issue_<id>/. All its artifacts (brief, patch, gates, SUMMARY) live here; its state is derived from which of those files exist.
Beat — one phase of the cycle: Plan, Do, Check, Act.
Leaf — the command a beat runs (e.g. the builder leaf, the reviewer leaf). Configured in pdca.toml; may be a real model invocation or an offline stub.
Gate — a deterministic Check command with a pass/fail result. Gating gates block a sign-off; advisory ones only inform.
Disposition — the human's §9 verdict: accept, iterate-do, iterate-plan, or discontinue.

The states a bundle moves through:

stateDiagram-v2 [*] --> UNPLANNED UNPLANNED --> PLANNED: brief authored (Plan) PLANNED --> BUILT: patch.diff written (Do) BUILT --> CHECKED: gates + review done (Check) CHECKED --> AWAITING_SIGNOFF: SUMMARY assembled — driver STOPS UNPLANNED --> RESOLVED: tracker settles it first — no cycle ever ran AWAITING_SIGNOFF --> COMPLETE: accept AWAITING_SIGNOFF --> PLANNED: iterate-do (rebuild, same brief) AWAITING_SIGNOFF --> UNPLANNED: iterate-plan (re-spec) AWAITING_SIGNOFF --> DISCONTINUED: discontinue COMPLETE --> [*] DISCONTINUED --> [*] RESOLVED --> [*]

The five halted states — UNPLANNED, AWAITING_SIGNOFF, COMPLETE, DISCONTINUED, RESOLVED — are where the driver hands control back to a human or stops. Everything else it advances through on its own. RESOLVED is the newest of the five: a briefless bundle whose tracker item was closed (duplicate, wontfix, fixed elsewhere) before anyone authored a brief — so it exits the pending set without ever entering a cycle, distinct from DISCONTINUED (a human explicitly abandoned a bundle that did run one).

Is this for you?

The harness fits when you are funnelling AI-generated contributions into a real codebase and need each one verified, scoped, and signed off — especially at volume, across multiple branches, or into a project with real conformance rules (tests, linters, contribution conventions). The worked example throughout this guide — Gramps Testbed v2, which contributes fixes upstream to the Gramps genealogy project and its addons — is exactly that shape.

It's heavier than you need for a one-off script or a solo throwaway. The payoff is in repeated contribution where correctness, traceability, and a maintainer's limited attention all matter.

Ready to set it up against a real repo? → step 01.

Index · next: 01 Render & integrate →