Step 05 — Check: gates, reviewer, and the assembled SUMMARY
← 04 Do · Index · next: 06 Sign-off →
Beat: Check. This is the substance of the harness. Check verifies the built
artifact against the spec along three axes — the "5/5/1" — using three
mechanisms that run in order, then assembles everything into one document for the
human: SUMMARY.md.
BUILT ─► gates (deterministic) ─► reviewer (advisory) ─► assemble SUMMARY ─► AWAITING_SIGNOFF
check-gates.{md,json} check-review.md (driver stops here)
The 5/5/1
Check asks three different kinds of question:
- 5 correctness — does it work?
C1spec ·C2reproduction (red pre-fix) ·C3change ·C4verification (green post-fix) ·C5causal adequacy. - 5 conformance — does it fit the project?
T1structure ·T2shape ·T3runtime ·T4contribution ·T5judgment. - 1 validation — is it the right thing? (scope, success criterion, root cause vs symptom) — a human call.
Of these, the deterministic ones run as gates; C5, T5, and validation are
inherently judgment and route to the reviewer and to you.
1. Gates — the deterministic oracles
The gates you wired in step 01 run automatically.
Each emits a row: check, result, oracle, and whether it's gating. Here is the
real results/issue_11589/check-gates.md:
# Check gates — issue_11589
**Overall (gating): pass**
## Correctness (5)
| Check | Result | Oracle | Gating |
|---|---|---|---|
| C4 fix verified: test red pre-fix, green post-fix | pass | run-verify.sh | yes |
| (C1/C2/C3/C5 — none configured / judgment) | none | — | no |
## Conformance (5)
| Check | Result | Oracle | Rule | Gating |
|---|---|---|---|---|
| T1 structure | pass | gate.py T1 | 1 addon(s) conform | no |
| T2 shape | fail | gate.py T2 | __init__.py: no GPL header (doc16:99) | no |
| T3 runtime: core unit suite | fail | run-unit.sh | Trace/breakpoint trap (core dumped) [baseline] | no |
| T3 runtime: addon unit suites | fail | run-addon-unit.sh | pip install logs (3 failures) [baseline] | no |
| T4 contribution | pass | gate.py T4 | N/A: no commit-msg.txt | no |
Overall (gating): pass even though three rows say fail. This is the whole
point of the gating/advisory split. The only gating check — C4-verify, the
red→green proof — passed, so the contribution is correct. The failing rows are
all advisory: a GPL-header gap in a file the patch never touched, and two
runtime suites failing with [baseline] signatures (a pre-existing core segfault
and an environmental pip issue). They don't block — but they don't vanish either.
They become NEEDS-HUMAN items.
[baseline]vs[delta]. gramps' T3 gate baseline-diffs: a known pre-existing failure is tagged[baseline](ignore — not your fix's fault); a new failure is[delta](your fix may have caused it). You'll see a[delta]bite in step 06.
2. Reviewer — the decorrelated second opinion
Next the reviewer leaf runs against {patch.diff, test, brief.md, check-gates.json} — not build-notes.md (step 04 explained why).
It re-runs the asserted evidence (stash → confirm red, unstash → confirm green),
re-checks that cited path:lines exist on the target branch, and flags scope
creep. Its output, check-review.md, is advisory — it annotates, it never
gates. The blocking path contains no LLM at all.
3. Assembly — the SUMMARY the human signs
The driver folds brief + gates + review into SUMMARY.md, a 10-section document.
The two sections that drive the human decision are §6 NEEDS-HUMAN (what you
must clear) and §9 Check sign-off (where you record the verdict — step
06). When SUMMARY.md exists with an empty §9, the bundle is
AWAITING_SIGNOFF and the driver stops.
Here is the real §6 from issue 11589 — note these are the advisory gate failures
turned into explicit, adjudicable questions (shown already cleared, - [x]):
## 6. NEEDS-HUMAN — items the human must clear before sign-off
- [x] T2 — Gate failed on `__init__.py: no GPL licence header` — but no
`__init__.py` appears in `patch.diff`. Both files the patch *does* touch are
shape-clean. The violation sits in an untouched bundle file → human must decide
whether the pre-existing gap blocks this contribution.
- [x] T3 — All three runtime suites are `fail` but the failure modes are not
plausibly caused by a 2-file addon change: core unit `Trace/breakpoint trap
(core dumped)` (segfault in gramps core, which the diff never touches),
addon-unit `pip install logs` (env), interface `_ErrorHolder`. Marked
"whole-suite baseline" — human must diff against baseline to confirm these are
pre-existing, not regressions.
- [x] T5 — Always-human element. Overall contribution judgment — code quality,
idiom fit, whether the sibling-detection approach is the right shape.
- [x] V — Validation — Whether the change solves the user's problem against the
*real* FilterRules pack layout (13 rule pairs in one dir), not just the temp-dir
stand-in, is a fitness-to-purpose call only the human can make.
This is the harness working as designed: the deterministic gate proved
correctness (C4 green) and surfaced everything it couldn't adjudicate — a
header gap of ambiguous scope, environmental test noise, and the two irreducibly
human checks (T5 judgment, V validation) — as a short, specific checklist.
The human isn't re-reading the whole diff; they're answering four pointed
questions.
The C6 accept-guard
One rule connects §6 to the next step: you cannot --accept while any §6 item
is unchecked. That's the C6 guard. It's why §6 items above are - [x] — the
human worked them before accepting. (--iterate-* and --discontinue are not
guarded — you can redirect or abandon a bundle with §6 still open.)
State: AWAITING_SIGNOFF. Your move — step 06.
← 04 Do · Index · next: 06 Sign-off →