diff --git a/DANNY_STOCKER_INFRAFABRIC_DOSSIER.md b/DANNY_STOCKER_INFRAFABRIC_DOSSIER.md index 033833e..257935e 100644 --- a/DANNY_STOCKER_INFRAFABRIC_DOSSIER.md +++ b/DANNY_STOCKER_INFRAFABRIC_DOSSIER.md @@ -113,6 +113,12 @@ This dossier includes one public, reproducible proof run: - 0/15 bullet‑list violations in the final output. - 6/15 traces contain a `postprocess_applied` event with before/after SHA256, showing deterministic correction when needed (the correction itself is audited). +**Where deterministic correction happens (audited):** +- Language discipline filter (question language == response language) +- Internal tool/sandbox leakage scrub (removes debugging artifacts) + +When either filter changes the final user-visible output, the trace records `before_sha256` → `after_sha256` plus counters (e.g., `tool_leak_sentences_removed`, `discourse_markers_removed`). + **How to verify (no insider access):** - Bundle: https://infrafabric.io/static/hosted/emo_dave_proof_bundle_20251222T164352Z.tar.gz - Instructions: https://infrafabric.io/static/hosted/EMO_DAVE_PROOF_MODEL_COMPARE_20251222T164352Z.md @@ -137,6 +143,8 @@ This dossier includes one public, reproducible proof run: - It proves the *stack* can enforce specific invariants (language + formatting) across these model tiers for these prompts, with auditable corrections when needed. - It does not prove the models are equivalent on clinical judgment, crisis handling, or long‑horizon reasoning. Those require separate validation and are intentionally not claimed here. +**Economic implication (bounded claim):** once these invariants are enforceable by the stack, model choice becomes a routing problem (default smaller, escalate when TRIAGE demands). Any claimed cost multipliers depend on provider pricing and are not asserted here. + ---