Thoughts · Methodology
Trusting AI output enough
to put your name on it.
MFIC — Mechanically-Falsifiable Independent Control is a discipline for exactly that: independent checks at every handoff that derive their verdict from the contract or the data, never from the producer’s say-so, and hold the authority to reject or halt. It is the internal-control model behind Sarbanes-Oxley (COSO) — segregation of duties, preventive and detective controls, audit trails, reasonable assurance — with the LLM (or a hurried human) as the new untrusted party.
For regulated, high-stakes work, that is the difference between “the AI said so” and an answer you can defend under audit. It’s the discipline behind everything Mecha builds — and the reason a verification result here comes with a gate, not a promise.
How to trust code or data a fallible author — an LLM, or a hurried human — could quietly corrupt. TDD's red phase gives only F (a test can fire); MFIC adds the other three. It's the COSO / Sarbanes-Oxley internal-control model, with the LLM as the new untrusted party.
Definition: an independent check at a handoff boundary that derives its verdict from the contract or the data — never from the producer's account of it — and holds the authority to reject, halt, or steer.
Litmus: if the same agent wrote both the check and the thing checked, could it pass with wrong work? Yes → gameable, not MFIC. No → MFIC.
Each word is load-bearing:
- Mechanically — cases machine-swept, not hand-picked. ¬M → omission bugs.
- Falsifiable — each case a biting refutation you can't pre-arrange. ¬F → vacuous green tests, or weak oracles ("doesn't crash" hiding wrong output).
- Independent — truth source causally independent of the producer (≡ segregation of duties: maker ≠ checker). ¬I → collusive tests sharing one blind spot; the axis TDD leaves open.
- Control — a baseline plus authority to fail or steer. ¬C → telemetry that observes but never blocks.
Two regimes:
- Static (verify the code, in CI) — exhaustion, differential, metamorphic, input-mutation, property-based, perf/leak gates. Detective.
- Dynamic (verify state in flight, in prod) — stage-boundary contracts, schema/invariant preservation, reject-to-previous loops. Preventive + corrective.
Reach for the strongest, cheapest first: finite
domain → exhaust it; inverse → round-trip
decode(encode(x))==x; reference → differential;
transform-invariant → metamorphic; can corrupt known-good input →
input-mutation (a coverage number, not a bool); else →
property-based + shrink. Prefer the oracle the author didn't
write — oracle-free beats an authored "expected value" that
could itself be wrong.
Input-mutation, specifically: flip each bit/byte of a known-valid input and assert the verifier now rejects it; the fraction detected measures how much of the format it actually constrains — turning "I validate the whole format" into a number nobody can hallucinate past. Keep it honest: pair with a specificity corpus (all valid inputs must pass, else a reject-everything checker scores 100%), and score only must-detect regions (padding / CRC-excluded bytes are legitimate don't-cares).
Guardrails: ordinary validation usually fails I (one agent defines the schema and produces against it) or C (it only logs). A warn-only monitor is M-F-I but ¬C — telemetry, not a gate; a check that fails the build is full MFIC. Evidence is constitutive — a control with no trail didn't run; version its measurement history by commit + hardware. Statistical members give reasonable assurance, not proof — the recognized standard, not a deficiency.
Prior art: mutation, metamorphic, differential, property-based testing, fault injection (software V&V); segregation of duties, preventive/detective/corrective controls, audit trail, test of controls, reasonable assurance (COSO / Sarbanes-Oxley).
This is the working summary. The full technical treatment — the independence axis in depth, the producer/approver pattern, the escalation from policy to blocking gates to cryptographic controls, and five worked examples — lives in the complete MFIC methodology.