3.7 KiB
3.7 KiB
Week Feedback Summary (LLM Panel) — 2025-12-27
Source: internal CSV export (@ShadowRT-LLM-Feedback)
This is a synthesis of cross-model feedback (Grok, Gemini 1.5 Pro/Flash, GPT-5.2) over the Mon–Sun TV-week stress test packs. It is intended to drive patches to the generator + bible without widening scope.
Themes (cross-day)
- P0: Ensure every dossier has usable “body” sections (some HTML→MD sources collapsed into “cover + inferred mermaids only”, losing mirror integrity and Action Pack utility).
- P0: Control Card / header hygiene: extracted headings sometimes become paragraph-length; this breaks scanability and Jira/backlog export.
- P0: Edition isolation: Action Pack logic can “bleed” across domains (e.g., SaaS controls reused for hardware tokens) unless gates/owners/evidence are domain-aware.
- P1: Mirror payload completeness: tables/licensing tiers and high-signal numeric claims should be preserved and turned into enforceable questions/gates, not summarized away.
- P1: Operational concreteness: “telemetry” and “machine-checkable prerequisites” land well, but reviewers want minimum schemas (event type, freshness window, owner) to reduce hand-waving.
- P2: Prioritization: add lightweight severity ranking so “all Dave Factors” don’t read equally critical.
Day-specific P0s (from structured reviewer notes)
- MON (Enterprise / Microsoft Defender page mirror): missing Action Pack and missing Dave blocks; licensing tier/table not mirrored; turn “3 minute” claims into enforceable gates.
- TUE (Cloud / Aqua SaaS): paragraph blobs leaked into Control Card titles; add hard character limits and summarization.
- WED (Endpoint / SentinelOne): headings conflated with descriptions; enforce short headings; critique “AI analyst” as black box evidence.
- THU (COMSEC-ish / YubiKey FIPS brief): control logic looked SaaS-shaped; require hardware lifecycle / chain-of-custody controls.
- FRI (Startup / Torq page mirror): Action Pack dropout; require stronger scrutiny when sources claim autonomy/agentic behavior.
- SAT (Recap): ensure recap output includes a “what to steal” meta action pack (policy templates).
- SUN (Deep dive / NIST SP 800-207 mirror): reduce abstractness by translating prose into “policy-as-code” style gates.
Implemented fixes (generator + lint)
Implemented in re-voice/src/revoice/generate.py and re-voice/src/revoice/lint.py:
- Robust section extraction fallback for HTML→MD / weakly structured sources:
- Markdown heading parsing fallback.
- Last-resort “cover + body” shape, so
sections[1:]is never empty.
- Action Pack title hygiene:
- New
_compact_title()used for Control Card headings and backlog items to avoid paragraph-length titles.
- New
- Hardware-aware gating:
- New Action Pack gate:
Hardware / identitywith owner/stop condition/evidence artifacts when the source contains FIPS/PIV/FIDO + token/hardware cues.
- New Action Pack gate:
- Lint exemption for Action Pack boilerplate:
- Ignore repeated
- Acceptance:lines so Action Pack backlog doesn’t fail_lint_repeated_lines.
- Ignore repeated
Remaining backlog (proposed next patches)
- Add recap_mode to generate a meta “What to steal” action pack from Mon–Fri without requiring the source to include it.
- Add government_standard_mode translation table (standard prose → gates/owners/evidence), with explicit tagging as operationalization (not new source claims).
- Add high-signal table retention rule to the extractor for common PDF table layouts (licensing tiers, side-by-side comparisons).
- Add lightweight severity ranking (P0/P1/P2 per section) without changing mirror order.