The Check beat (01 - The Quality Cycle §Check) has internal structure — the 5 / 5 / 1 (correctness chain, conformance stack, validation act) — and runs through three components (gates, reviewer, sign-off — see 03 - Cycle Automation §Check). This doc documents the code and process that implements those components: which tool covers which 5/5/1 element, where that tool lives, and how single-sourcing keeps the driver and CI invoking the same implementation. Worked example uses the Gramps testbed at the end; the structure is project-agnostic. Living document.
What "validation tooling" means here
"Validation tooling" is shorthand for the implementation of Check's deterministic gates and advisory reviewer. Check has three components:
- Gates (deterministic, full automation) — the validators, rule scanners, suite runners, and hooks that produce
check-gates.json. - Reviewer (advisory, full automation) — the cross-vendor model that grounds the gate evidence, re-runs the asserted red/green, and emits per-item
PASS / FAIL / NEEDS-HUMANintocheck-review.md. - Sign-off (instrumented, human) — the human completes Check by reading the assembled
SUMMARY.mdand recording §9.
This doc is about the gates and reviewer. Sign-off is human work whose tooling is the result-document presentation in 02 - Cycle Artifacts / 03 - Cycle Automation §Check sign-off, not the subject here.
Two axes — what and where
Validation tooling decomposes along two orthogonal axes:
- What it evaluates — which element of Check's 5/5/1 it covers. The 5/5/1 is the what:
- Correctness chain (5 steps): spec → reproduction → change → verification → causal adequacy.
- Conformance stack (5 tiers): structure → shape → runtime → contribution → judgment.
- Validation (1 act): fitness-to-purpose.
- Where it lives — which home runs the tool. The home is the where:
- Upstream project CI (the project that owns the contribution ruleset gates each PR there).
- Local driver / dev-tooling (the same gates run pre-merge on the contributor's machine, single-sourced with upstream CI).
- Fork-local hooks (
commit-msg, pre-push) plus fork PR CI. - Check's reviewer component (advisory model + tool scope).
- Check's sign-off step (human at the result-document review).
What and where are independent. A given conformance tier may be implemented as code (covering "what") and live in upstream CI (covering "where") — but the same tier's implementation can also mirror locally, so "where" can be multiple homes for one "what". The two-axis framing makes the locations explicit instead of letting tooling drift to "wherever it was first written."
The 5/5/1 × tooling-shape matrix
Each element of the 5/5/1 maps to a tooling shape, and each tool lands in one of Check's three components.
| 5/5/1 element | Tooling shape | Check component | Artifact written |
|---|---|---|---|
| Correctness 1 — Spec | the brief (no code) | (Plan output, Check input) | brief.md |
| Correctness 2 — Reproduction | fixture loader + repro runner; pre-fix red proof | Gates | row in check-gates.json |
| Correctness 3 — Change | the patch (no code) | (Do output, Check input) | patch.diff |
| Correctness 4 — Verification | the shipped test + regression suite | Gates | rows in check-gates.json |
| Correctness 5 — Causal adequacy | judgment (symptom vs. root cause) | Reviewer (advisory), human at sign-off | row in check-review.md; §6 NEEDS-HUMAN if unresolvable |
| Conformance T1 — Structure | structural validator (stdlib + filesystem + spec-format exec-shim) | Gates | rows in check-gates.json |
| Conformance T2 — Shape | shape scanner (semgrep, AST) | Gates | rows in check-gates.json |
| Conformance T3 — Runtime | dependency resolution (find_spec, install-and-import) + clean-env suite |
Gates | rows in check-gates.json |
| Conformance T4 — Contribution | commit-msg hook, branch-target check, version-bump check |
Gates (mostly fork-local + fork PR CI) | rows in check-gates.json |
| Conformance T5 — Judgment | scope, one-logical-fix, message-from-user-perspective | Reviewer (advisory), human at sign-off | row in check-review.md; §6 NEEDS-HUMAN if unresolvable |
| Validation (1 act) — fitness-to-purpose | "is this the right thing at all" | Reviewer (advisory), human at sign-off | row in check-review.md; §6 NEEDS-HUMAN; §9 sign-off |
Three observations the matrix makes explicit:
- Correctness 1 and 3 carry no tooling row — they are inputs to Check (the brief from Plan, the patch from Do), not work Check performs. They appear in the matrix to keep the chain complete.
- Tiers 1–4 of conformance plus correctness steps 2 and 4 are the gates path — fully mechanical, fully automated, the only thing that blocks accept.
- Correctness step 5, Conformance Tier 5, and the validation act are the judgment path — the reviewer attempts each advisory, then the human signs off in §9. The reviewer never gates; the human never edits the fix at Check time (01 - The Quality Cycle §Where the stages touch).
Gate result vocabulary. A gate-path row's result is one of pass / fail / unverifiable / none. A gate passes iff its command exits 0 and fails on any other exit. A gate may instead declare itself unverifiable when it genuinely cannot run its mechanical check (as opposed to running and failing) — exit code 77 (the automake SKIP convention) or print a line containing PDCA-UNVERIFIABLE: <reason> (the marker wins over the exit code, so a gate may exit 0 and still defer; the text after the marker is the recorded reason). An unverifiable row does not count toward overall — it neither silently passes nor hard-fails — and the driver routes it into SUMMARY.md §6 NEEDS-HUMAN, where the C6 accept-guard (06 - Quality-Cycle Guidelines §C6) makes the human clear it before accept. This is the mechanism behind "a green mechanical check is not a correctness verification": a gate with nothing to verify says so rather than manufacturing a green. (none is the separate non-gating matrix-alignment cell decided on the judgment path.)
Single-sourcing — one implementation, multiple invocations
The connection to 03 - Cycle Automation §Where it runs is load-bearing: the gates that run locally during the cycle MUST be the same gates that re-run in CI as the merge-gate. This is enforced not by policy but by single-sourcing the implementation:
- One repository / module owns each tool (the structural validator, the shape scanner's rule file, the runtime checker, the contribution hooks, the correctness re-runners).
- The local driver invokes it with one command.
- CI invokes the same command (against the actual PR).
- No regex copy-pasted across YAML files, no parallel implementation in dev-tooling and CI both, no hand-maintained dependency lists in more than one place.
The anti-pattern this prevents: tooling drift between local and CI, where a contribution passes locally and fails in CI (or vice versa) because the two invocations are different code. Both invocations must read the same single source, so "passes locally" and "passes CI" collapse into the same fact.
Homes — where each gate lives
Conformance Tier-by-tier home assignments differ by project, but the generic rationale follows a small set of rules:
| Home | Lives there because | Tools that fit |
|---|---|---|
| Upstream project CI | Gates every contribution PR; zero-config for contributors; canonical | T1 Structure, T3 Runtime — when the upstream project will host them |
| Local driver / dev-tooling (mirror) | Pre-merge feedback; runs the same gates the upstream CI runs; cycle's inner loop | Any gate that needs to run before a PR is opened — typically a mirror of T1/T3 + the project's correctness re-runners |
| Local driver / dev-tooling (staging) | Gates the project's CI doesn't host yet — staged until upstream accepts | T2 Shape (often semgrep, which an upstream may not yet depend on) |
| Fork-local hooks | Fire pre-commit / pre-push, before the artifact reaches a PR | T4 Contribution: commit-msg format, signing |
| Fork PR CI | Runs on PR open; gates branch-target, version-bump, etc. | T4 Contribution: branch-target, version-bump |
| Check's reviewer component | Judgment cells that can't be mechanized | C5 causal adequacy, T5 judgment, validation act (advisory) |
| Check's sign-off step | Final human call on judgment + clearing NEEDS-HUMAN | All of the reviewer's path, finalized |
The home that does NOT exist: there is no "Act home" for gates. Act improves the rules that gates enforce (01 - The Quality Cycle §Act); the gates themselves run in Check. A new rule lands as an addition to one of the homes above, recorded in process/act-log.md.
Two rule families (project-agnostic)
Validation tooling typically splits into two families that share tools but must not be merged in the layout:
- Family A — project-guideline conformance. The project's own written ruleset for contributions to itself. Subject: contribution artifacts (patches, packages, addons, plugins) against the project's written rules. The 5/5/1 above (especially the conformance stack) is Family A.
- Family B — upstream-defect analysis. Bug-hunting analyzers against the source the project depends on. Subject: upstream code; findings become upstream PRs, not contributions to this project. Family B uses the same tool families (semgrep, type-checkers, flow analysis) but its rules and rule-targets differ.
The split matters because Family B's analyzers ARE NOT conformance checks. A semgrep rule that hunts for missing disconnect() in upstream Gtk code is a bug-hunting rule, not a "Tier 2 shape" rule under the project's own conformance stack. Sharing a layout collapses the distinction; keeping them in separate directories preserves it.
Worked example — Gramps testbed dev-tooling/
Illustrative, not normative. This is one project's filled matrix at an earlier snapshot; the paths (
agent-work/dev-tooling/...) are the Gramps testbed's of that time (since reorganized underengine//dev-tooling/), and the rules (doc-16, semgrep shape rules, the gramps maintenance branches) are gramps's. The generic harness — the single-sourced gate runner that producescheck-gates.json/.mdand renders the full 5/5/1 matrix — ships[built]; what this table's "Status today" column tracks is the project's per-tier rule-writing, which stays the project's work.
The Gramps testbed instantiates this generic structure as follows. References:
- Family A =
addons-sourceconformance (the addon-dev guidelines / "doc 16" ruleset). - Family B =
grampscore defect analysis (None-flow, init-order, missing-disconnect).
Tier × home assignment (testbed slice of Family A)
| Tier | Representative rules | Mechanism | Home in this project | Status today |
|---|---|---|---|---|
| 1 — Structure | folder==id; gramps_target_version present; fname resolves; no __init__.py in addon dir; tests/__init__.py exists; po/template.pot present; TOOL has optionclass; GPL header |
stdlib + .gpr.py exec-shim + filesystem checks |
Upstream addons-source CI (gates every addon PR); testbed mirror for pre-merge |
Partly built upstream (PR #820's test_plugin_registration.py, test_addon_dependencies.py, the po/template.pot job) |
| 2 — Shape | _(f"..."); print() diagnostics; gramps.gui / plugins imports from addons; if cls is Person; direct pgettext; Optional[X]; missing DbTxn; no import register in .gpr.py |
semgrep (rule file + fixtures) | Testbed agent-work/dev-tooling/ (staging; propose upstream once zero-FP-tuned) |
Harness exists (used by Family B); addon rules not yet written |
| 3 — Runtime | requires_mod importable via find_spec (Pillow/PIL); requires_gi / requires_exe mapped; tests pass with deps absent (skip cleanly); GI pins match imports |
Install + run in a clean env | Upstream addons-source CI; testbed mirror |
Most mature: PR #820's find_spec gate, run_addon_tests.py degraded-skip, addon_system_deps.py --unmapped |
| 4 — Contribution | commit summary ≤70 / wrap 80; trailer on last line; #NNNN issue refs; full-hash refs; branch target (addon→60 / core→61); no addon version bump in maintenance PR; no merge commits; POTFILES.in sync (core) |
commit-msg hook (local) + PR CI (fork) |
The gramps and addons-source forks — commits land there, not the testbed |
Greenfield |
| 5 — Judgment | user-perspective commit; one-logical-fix scope; symptom-vs-root-cause; test actually exercises the fix; "wrap every user-visible string"; upstream-isn't-ahead (semantic) | human + advisory cross-vendor reviewer (Codex) | Check's reviewer + sign-off components (03 - Cycle Automation) | The reviewer-contract thread (advisory, decorrelated) |
Correctness chain (testbed slice)
| Step | Implementation | Home |
|---|---|---|
| 2 — Reproduction | tests/interface/* (dogtail), example.gramps fixture |
Local driver + CI (interface-tests.yml) |
| 4 — Verification | the shipped test (per cycle) + existing suite (tests/, gramps/*_test.py) |
Local driver + CI (unit-tests.yml, addon-unit-tests.yml) |
| 5 — Causal adequacy | reviewer (Codex) + human sign-off | Check's reviewer + sign-off |
Two families in the layout
agent-work/dev-tooling/ reorganized by family (instead of by tool) keeps the
two distinct:
agent-work/dev-tooling/
core-analysis/ # FAMILY B — subject: gramps CORE; findings → core PRs
pyright/ # None-flow (existing)
semgrep/ # core bug-hunting rules incl. connect-without-disconnect
codeql/ # reserved flow (existing NOTES)
README.md # "subject = core; NOT addon-dev conformance"
addon-conformance/ # FAMILY A — subject: addons-source; addon-dev tiers
lib/ # shared .gpr.py exec-shim + requires_mod extractor (single-sourced)
tier1-structure/ # stdlib + exec-shim + fs (TESTBED MIRROR of upstream)
tier2-shape/ # semgrep addon rules + fixtures (staging pre-upstream)
README.md # "canonical home = addons-source CI; this mirrors it"
pre-commit/ # hooks (existing) — Tier-4 local + analyzer pre-commits
ide/ # vscode/ + claude-commands/ (existing)
README.md # this matrix; what is NOT here (T1/3 upstream, T4 forks, T5 process)
Names are a proposal, not the point. The point: Family B stays whole and clearly upstream-source-subject; Family A is tiered and explicitly labeled a mirror of its upstream home; the shared exec-shim library has one location so single-sourcing has a place to point.
Build order (concrete deliverables, per 03 - Cycle Automation)
- The gates, single-sourced — structural validator (T1 exec-shim
- T2 semgrep), runtime gate (T3
find_spec+ clean-env suite), correctness re-runners (repro, verification, regression), T4 commit-msg + branch-target + version-bump hooks. Each callable identically by the local driver and by CI.
- T2 semgrep), runtime gate (T3
- The driver (in 03) — calls the gates, withholds
build-notes.mdfrom the reviewer, assemblesSUMMARY.md. - The contribution-batch fan-out (in 03) — uses the gates over N issues.
- The reviewer's
AGENTS.md(in 03) — fixes the judgment-path shape for C5, T5, validation.
The gates are the long pole because both the local driver and the CI re-gate depend on them existing as single-sourced code. Until they exist, Check cannot run unattended and the body cannot fan out.
What this doc is not
This doc does not specify the project's conformance rules (those are the per-repo specification — 06 - Quality Cycle Guidelines §Precondition; in the Gramps worked example, the "doc 16" addon-dev guidelines). It documents the tooling axis — what implements Check, where each piece lives, how single-sourcing connects local and CI — so a project adopting the cycle can plan its build order without ambiguity about which gate goes where.
The rules themselves are subject to Act
(01 - The Quality Cycle §Act): when a rule needs adding, retiring,
relaxing, or tightening, that change lands as an Act delta in
process/act-log.md and modifies the per-repo specification. The
tooling that applies the rules — this doc's subject — is downstream
of the rules themselves and changes only when a rule's home moves,
its mechanism changes, or single-sourcing introduces a new shared
component.