# IF.GOV + IF.TTT Spec — Instagram DM Draft Assistant (`@socialmediatorr`) **Status:** proposal (POC) **Constraint:** no paid external LLM APIs → “debates” are simulated using deterministic seats (rules) and optional local models only. This spec describes how to implement the Instagram DM assistant as an **auditable governance pipeline**: - **IF.GOV.TRIAGE** decides risk + route (normal vs human vs urgent). - **IF.GOV.PANEL** simulates a multi‑seat review of the proposed draft reply (no external APIs required). - **IF.TTT** records a chain‑of‑custody (hashes + decisions + evidence bundle) so results are provable later. --- ## 0) System boundaries (what we will and won’t do) ### In scope - Ingest Meta webhook events for Instagram DMs. - Produce **draft replies** (default) using templates + simple intent routing. - Escalate a tiny fraction of DMs to a human (Sergio) quickly, with a direct “open thread” link. - Produce IF.TTT‑style trace records and evidence bundles for audit/replay. - Run “panel debates” **without external APIs** (rule seats + optional local model seats). ### Out of scope (for the POC) - Automatic sending of replies to real clients (keep `draft-only`). - Therapy-by-DM, crisis intervention, diagnosis, or medical claims. - Storing/exporting full DM transcripts in a public repo. --- ## 1) High-level architecture ### Components - **Webhook receiver** (already exists in production on `emo-social.infrafabric.io`): verifies Meta signature and normalizes events. - **Event store**: append-only storage of DM events + derived decisions (local, private). - **Triage engine** (`IF.GOV.TRIAGE`): risk + language + intent + confidence. - **Draft engine**: chooses a reply template (Top 20) or a safe fallback. - **Panel engine** (`IF.GOV.PANEL`): simulated debate across “seats” → approve/patch/escalate. - **Trace recorder** (`IF.TTT`): emits signed decision records + evidence bundles. - **Reviewer UI**: queue view for Drafts + Escalations + “open IG thread” action. ### Data flow (valid Mermaid) ```mermaid flowchart LR W[Meta webhook event] --> V[Verify signature] V --> N[Normalize event] N --> ES[Event store append] ES --> T[IF.GOV.TRIAGE] T -->|urgent| E[Escalation record] T -->|normal| D[Draft engine] T -->|needs-human| H[Human-required record] D --> P[IF.GOV.PANEL seats] P --> R[Panel decision] E --> TR[IF.TTT trace + bundle] H --> TR R --> TR TR --> UI[Reviewer UI queue] ``` --- ## 2) IF.GOV.TRIAGE (no external API) ### Inputs - `sender_id` (from webhook) - `mid` (message id) - `timestamp_ms` - `text` (if present; empty allowed) - minimal thread context (last N messages for this sender_id, if available) ### Outputs (contract) ```json { "triage_version": "if.gov.triage/igdm/v1", "trace_id": "uuid", "ts_utc": "2025-12-25T12:00:00Z", "time_cet": "2025-12-25T13:00:00+01:00", "sender_id": "123", "mid": "m_abc", "language": { "code": "es", "confidence": 0.86, "source": "text_or_thread" }, "intent": { "label": "book|link|video|price|help|other", "confidence": 0.90 }, "risk": { "tier": "normal|needs-human|urgent", "score": 0.05, "reasons": ["..."], "panel_size": 5 } } ``` ### Triage rules (POC defaults) - **Language detection** - If message has enough text: detect language from message text. - Else: reuse last confident thread language. - Else: set `confidence < 0.5` and prefer a 1‑line language question. - **Intent detection** - Keyword routing for: `book`, `link`, `video`, `price/cost`, `call`, `therapy`, etc. - If unknown: intent=`other` with low confidence. - **Risk tier** - `urgent` if self-harm/suicide signals OR violence/abuse indicators. - `needs-human` if: therapeutic disclosure, legal threats, harassment, complex personal crisis, repeated angry loop. - `normal` otherwise. ### “Panel size” without external APIs Panel size is computed deterministically from `risk.score` (same pattern as the existing `guard_engine.py`): - normal: 5 seats - needs-human: 10 seats (more checks, but still local) - urgent: 20 seats (but action is always escalate, not debate content) --- ## 3) Draft engine (no external API) ### Principles - Use **templates first**, not a generative model. - Always mirror the user’s language (or ask a 1‑line language question if uncertain). - Keep replies short; ask one clear next question when helpful. - Never invite deep disclosure in DMs; route to “resources / call / book link”. ### Draft outputs ```json { "draft_version": "igdm.draft/v1", "trace_id": "uuid", "template_id": "top20:book:v1:es", "text": "…", "placeholders": ["BOOK_LINK"], "notes": ["language=es", "intent=book"] } ``` --- ## 4) IF.GOV.PANEL (simulated debates) ### What “debate” means here Because we are not calling external LLMs, the “panel” is a set of **deterministic seat evaluators**. Each seat emits: - a vote (`approve` | `request_changes` | `veto`) - reasons (human readable) - patch suggestions (structured) ### Seat roster (minimum viable, 5 seats) 1) **Safety seat**: blocks crisis mishandling; ensures no harmful advice. 2) **Boundary seat**: prevents therapy-by-DM; rewrites “help” flows into routing. 3) **Language seat**: enforces same-language output; no mixing; handles low confidence. 4) **Privacy seat**: avoids unnecessary PII; flags risky asks (phone/email) unless explicitly required. 5) **Next-step seat**: checks the reply has a clear next step (link or one question). Optional seats (when panel size grows) - **Tone/VoiceDNA seat**: checks length + emoji pattern + directness vs DM voice rules. - **Spam/abuse seat**: detects harassment loops and routes to block/report guidance. - **Contrarian seat**: tries to misread the message and see if the draft fails. ### Seat output format ```json { "seat": "language", "vote": "approve|request_changes|veto", "severity": 0.0, "reasons": ["..."], "patches": [ { "op": "replace_text", "path": "draft.text", "value": "..." } ] } ``` ### Panel aggregation (deterministic) - If any seat returns `veto` → panel decision becomes `escalate_human` (or `urgent_escalate`). - Else if any seat returns `request_changes` → apply patches (in order), re-run seats once. - Else → approve. ### Panel decision record ```json { "panel_version": "if.gov.panel/igdm/v1", "trace_id": "uuid", "panel_size": 5, "seats": [ { "...": "..." } ], "decision": "approve_draft|revise_draft|escalate_human|urgent_escalate", "final_draft_text_sha256": "…", "reason_summary": "short" } ``` --- ## 5) Escalation UX (how Sergio actually sees it) ### Escalation record ```json { "escalation_version": "igdm.escalation/v1", "trace_id": "uuid", "tier": "urgent|needs-human", "reason_codes": ["self_harm_signal"], "sender_id": "123", "mid": "m_abc", "time_cet": "2025-12-25T21:13:00+01:00", "open_links": { "instagram_thread": "https://www.instagram.com/direct/t//", "fb_inbox": "https://business.facebook.com/latest/inbox/all/?asset_id=" } } ``` ### Notification strategy (POC) No paid services required: - Show escalations in a **logged-in dashboard** on `emo-social.infrafabric.io`. - Optional: email later (requires SMTP relay configured); not required for the POC. --- ## 6) IF.TTT trace + evidence bundles (provable without leaking) ### Two-bundle approach (recommended) - **Private bundle** (internal): includes raw message text, stored locally with strict permissions. - **Public bundle** (shareable): contains hashes + redacted previews only. ### Bundle contents (public) ``` bundle/ manifest.json event.json triage.json draft.json panel.json escalation.json (only if escalated) sha256sums.txt signature_ed25519.txt ``` ### Minimum “public” fields - `message_text_sha256` (not raw) - `draft_text_sha256` (not raw) - triage + panel decision + reason codes - timestamps (UTC + CET) This is enough to prove: “given these bytes (committed), these deterministic governance steps happened, and this decision was produced”. --- ## 7) Rollout plan (safe) 1) **Triage-only** + escalation queue (no drafts yet). 2) **Draft-only** templates for Top 20 intents (no sending). 3) Add simulated **IF.GOV.PANEL** seats and store panel decisions. 4) Emit IF.TTT bundles for each event (public + private). 5) Add comparison table: `draft` vs `actual sent` (manual) to measure quality. 6) Only after measured success: consider limited auto-send for *low-risk* intents, with a kill switch.