8.4 KiB
8.4 KiB
IF.GOV + IF.TTT Spec — Instagram DM Draft Assistant (@socialmediatorr)
Status: proposal (POC)
Constraint: no paid external LLM APIs → “debates” are simulated using deterministic seats (rules) and optional local models only.
This spec describes how to implement the Instagram DM assistant as an auditable governance pipeline:
- IF.GOV.TRIAGE decides risk + route (normal vs human vs urgent).
- IF.GOV.PANEL simulates a multi‑seat review of the proposed draft reply (no external APIs required).
- IF.TTT records a chain‑of‑custody (hashes + decisions + evidence bundle) so results are provable later.
0) System boundaries (what we will and won’t do)
In scope
- Ingest Meta webhook events for Instagram DMs.
- Produce draft replies (default) using templates + simple intent routing.
- Escalate a tiny fraction of DMs to a human (Sergio) quickly, with a direct “open thread” link.
- Produce IF.TTT‑style trace records and evidence bundles for audit/replay.
- Run “panel debates” without external APIs (rule seats + optional local model seats).
Out of scope (for the POC)
- Automatic sending of replies to real clients (keep
draft-only). - Therapy-by-DM, crisis intervention, diagnosis, or medical claims.
- Storing/exporting full DM transcripts in a public repo.
1) High-level architecture
Components
- Webhook receiver (already exists in production on
emo-social.infrafabric.io): verifies Meta signature and normalizes events. - Event store: append-only storage of DM events + derived decisions (local, private).
- Triage engine (
IF.GOV.TRIAGE): risk + language + intent + confidence. - Draft engine: chooses a reply template (Top 20) or a safe fallback.
- Panel engine (
IF.GOV.PANEL): simulated debate across “seats” → approve/patch/escalate. - Trace recorder (
IF.TTT): emits signed decision records + evidence bundles. - Reviewer UI: queue view for Drafts + Escalations + “open IG thread” action.
Data flow (valid Mermaid)
flowchart LR
W[Meta webhook event] --> V[Verify signature]
V --> N[Normalize event]
N --> ES[Event store append]
ES --> T[IF.GOV.TRIAGE]
T -->|urgent| E[Escalation record]
T -->|normal| D[Draft engine]
T -->|needs-human| H[Human-required record]
D --> P[IF.GOV.PANEL seats]
P --> R[Panel decision]
E --> TR[IF.TTT trace + bundle]
H --> TR
R --> TR
TR --> UI[Reviewer UI queue]
2) IF.GOV.TRIAGE (no external API)
Inputs
sender_id(from webhook)mid(message id)timestamp_mstext(if present; empty allowed)- minimal thread context (last N messages for this sender_id, if available)
Outputs (contract)
{
"triage_version": "if.gov.triage/igdm/v1",
"trace_id": "uuid",
"ts_utc": "2025-12-25T12:00:00Z",
"time_cet": "2025-12-25T13:00:00+01:00",
"sender_id": "123",
"mid": "m_abc",
"language": { "code": "es", "confidence": 0.86, "source": "text_or_thread" },
"intent": { "label": "book|link|video|price|help|other", "confidence": 0.90 },
"risk": {
"tier": "normal|needs-human|urgent",
"score": 0.05,
"reasons": ["..."],
"panel_size": 5
}
}
Triage rules (POC defaults)
- Language detection
- If message has enough text: detect language from message text.
- Else: reuse last confident thread language.
- Else: set
confidence < 0.5and prefer a 1‑line language question.
- Intent detection
- Keyword routing for:
book,link,video,price/cost,call,therapy, etc. - If unknown: intent=
otherwith low confidence.
- Keyword routing for:
- Risk tier
urgentif self-harm/suicide signals OR violence/abuse indicators.needs-humanif: therapeutic disclosure, legal threats, harassment, complex personal crisis, repeated angry loop.normalotherwise.
“Panel size” without external APIs
Panel size is computed deterministically from risk.score (same pattern as the existing guard_engine.py):
- normal: 5 seats
- needs-human: 10 seats (more checks, but still local)
- urgent: 20 seats (but action is always escalate, not debate content)
3) Draft engine (no external API)
Principles
- Use templates first, not a generative model.
- Always mirror the user’s language (or ask a 1‑line language question if uncertain).
- Keep replies short; ask one clear next question when helpful.
- Never invite deep disclosure in DMs; route to “resources / call / book link”.
Draft outputs
{
"draft_version": "igdm.draft/v1",
"trace_id": "uuid",
"template_id": "top20:book:v1:es",
"text": "…",
"placeholders": ["BOOK_LINK"],
"notes": ["language=es", "intent=book"]
}
4) IF.GOV.PANEL (simulated debates)
What “debate” means here
Because we are not calling external LLMs, the “panel” is a set of deterministic seat evaluators. Each seat emits:
- a vote (
approve|request_changes|veto) - reasons (human readable)
- patch suggestions (structured)
Seat roster (minimum viable, 5 seats)
- Safety seat: blocks crisis mishandling; ensures no harmful advice.
- Boundary seat: prevents therapy-by-DM; rewrites “help” flows into routing.
- Language seat: enforces same-language output; no mixing; handles low confidence.
- Privacy seat: avoids unnecessary PII; flags risky asks (phone/email) unless explicitly required.
- Next-step seat: checks the reply has a clear next step (link or one question).
Optional seats (when panel size grows)
- Tone/VoiceDNA seat: checks length + emoji pattern + directness vs DM voice rules.
- Spam/abuse seat: detects harassment loops and routes to block/report guidance.
- Contrarian seat: tries to misread the message and see if the draft fails.
Seat output format
{
"seat": "language",
"vote": "approve|request_changes|veto",
"severity": 0.0,
"reasons": ["..."],
"patches": [
{ "op": "replace_text", "path": "draft.text", "value": "..." }
]
}
Panel aggregation (deterministic)
- If any seat returns
veto→ panel decision becomesescalate_human(orurgent_escalate). - Else if any seat returns
request_changes→ apply patches (in order), re-run seats once. - Else → approve.
Panel decision record
{
"panel_version": "if.gov.panel/igdm/v1",
"trace_id": "uuid",
"panel_size": 5,
"seats": [ { "...": "..." } ],
"decision": "approve_draft|revise_draft|escalate_human|urgent_escalate",
"final_draft_text_sha256": "…",
"reason_summary": "short"
}
5) Escalation UX (how Sergio actually sees it)
Escalation record
{
"escalation_version": "igdm.escalation/v1",
"trace_id": "uuid",
"tier": "urgent|needs-human",
"reason_codes": ["self_harm_signal"],
"sender_id": "123",
"mid": "m_abc",
"time_cet": "2025-12-25T21:13:00+01:00",
"open_links": {
"instagram_thread": "https://www.instagram.com/direct/t/<conversation_id>/",
"fb_inbox": "https://business.facebook.com/latest/inbox/all/?asset_id=<page_id>"
}
}
Notification strategy (POC)
No paid services required:
- Show escalations in a logged-in dashboard on
emo-social.infrafabric.io. - Optional: email later (requires SMTP relay configured); not required for the POC.
6) IF.TTT trace + evidence bundles (provable without leaking)
Two-bundle approach (recommended)
- Private bundle (internal): includes raw message text, stored locally with strict permissions.
- Public bundle (shareable): contains hashes + redacted previews only.
Bundle contents (public)
bundle/
manifest.json
event.json
triage.json
draft.json
panel.json
escalation.json (only if escalated)
sha256sums.txt
signature_ed25519.txt
Minimum “public” fields
message_text_sha256(not raw)draft_text_sha256(not raw)- triage + panel decision + reason codes
- timestamps (UTC + CET)
This is enough to prove: “given these bytes (committed), these deterministic governance steps happened, and this decision was produced”.
7) Rollout plan (safe)
- Triage-only + escalation queue (no drafts yet).
- Draft-only templates for Top 20 intents (no sending).
- Add simulated IF.GOV.PANEL seats and store panel decisions.
- Emit IF.TTT bundles for each event (public + private).
- Add comparison table:
draftvsactual sent(manual) to measure quality. - Only after measured success: consider limited auto-send for low-risk intents, with a kill switch.