emo-social-insta-dm-agent/docs/governance/IF_GOV_IGDM_SPEC.md

# IF.GOV + IF.TTT Spec — Instagram DM Draft Assistant (`@socialmediatorr`)

**Status:** proposal (POC)
**Constraint:** no paid external LLM APIs → “debates” are simulated using deterministic seats (rules) and optional local models only.

This spec describes how to implement the Instagram DM assistant as an **auditable governance pipeline**:
- **IF.GOV.TRIAGE** decides risk + route (normal vs human vs urgent).
- **IF.GOV.PANEL** simulates a multi‑seat review of the proposed draft reply (no external APIs required).
- **IF.TTT** records a chain‑of‑custody (hashes + decisions + evidence bundle) so results are provable later.

---

## 0) System boundaries (what we will and won’t do)

### In scope
- Ingest Meta webhook events for Instagram DMs.
- Produce **draft replies** (default) using templates + simple intent routing.
- Escalate a tiny fraction of DMs to a human (Sergio) quickly, with a direct “open thread” link.
- Produce IF.TTT‑style trace records and evidence bundles for audit/replay.
- Run “panel debates” **without external APIs** (rule seats + optional local model seats).

### Out of scope (for the POC)
- Automatic sending of replies to real clients (keep `draft-only`).
- Therapy-by-DM, crisis intervention, diagnosis, or medical claims.
- Storing/exporting full DM transcripts in a public repo.

---

## 1) High-level architecture

### Components
- **Webhook receiver** (already exists in production on `emo-social.infrafabric.io`): verifies Meta signature and normalizes events.
- **Event store**: append-only storage of DM events + derived decisions (local, private).
- **Triage engine** (`IF.GOV.TRIAGE`): risk + language + intent + confidence.
- **Draft engine**: chooses a reply template (Top 20) or a safe fallback.
- **Panel engine** (`IF.GOV.PANEL`): simulated debate across “seats” → approve/patch/escalate.
- **Trace recorder** (`IF.TTT`): emits signed decision records + evidence bundles.
- **Reviewer UI**: queue view for Drafts + Escalations + “open IG thread” action.

### Data flow (valid Mermaid)
```mermaid
flowchart LR
  W[Meta webhook event] --> V[Verify signature]
  V --> N[Normalize event]
  N --> ES[Event store append]
  ES --> T[IF.GOV.TRIAGE]
  T -->|urgent| E[Escalation record]
  T -->|normal| D[Draft engine]
  T -->|needs-human| H[Human-required record]
  D --> P[IF.GOV.PANEL seats]
  P --> R[Panel decision]
  E --> TR[IF.TTT trace + bundle]
  H --> TR
  R --> TR
  TR --> UI[Reviewer UI queue]
```

---

## 2) IF.GOV.TRIAGE (no external API)

### Inputs
- `sender_id` (from webhook)
- `mid` (message id)
- `timestamp_ms`
- `text` (if present; empty allowed)
- minimal thread context (last N messages for this sender_id, if available)

### Outputs (contract)
```json
{
  "triage_version": "if.gov.triage/igdm/v1",
  "trace_id": "uuid",
  "ts_utc": "2025-12-25T12:00:00Z",
  "time_cet": "2025-12-25T13:00:00+01:00",
  "sender_id": "123",
  "mid": "m_abc",
  "language": { "code": "es", "confidence": 0.86, "source": "text_or_thread" },
  "intent": { "label": "book|link|video|price|help|other", "confidence": 0.90 },
  "risk": {
    "tier": "normal|needs-human|urgent",
    "score": 0.05,
    "reasons": ["..."],
    "panel_size": 5
  }
}
```

### Triage rules (POC defaults)
- **Language detection**
  - If message has enough text: detect language from message text.
  - Else: reuse last confident thread language.
  - Else: set `confidence < 0.5` and prefer a 1‑line language question.
- **Intent detection**
  - Keyword routing for: `book`, `link`, `video`, `price/cost`, `call`, `therapy`, etc.
  - If unknown: intent=`other` with low confidence.
- **Risk tier**
  - `urgent` if self-harm/suicide signals OR violence/abuse indicators.
  - `needs-human` if: therapeutic disclosure, legal threats, harassment, complex personal crisis, repeated angry loop.
  - `normal` otherwise.

### “Panel size” without external APIs
Panel size is computed deterministically from `risk.score` (same pattern as the existing `guard_engine.py`):
- normal: 5 seats
- needs-human: 10 seats (more checks, but still local)
- urgent: 20 seats (but action is always escalate, not debate content)

---

## 3) Draft engine (no external API)

### Principles
- Use **templates first**, not a generative model.
- Always mirror the user’s language (or ask a 1‑line language question if uncertain).
- Keep replies short; ask one clear next question when helpful.
- Never invite deep disclosure in DMs; route to “resources / call / book link”.

### Draft outputs
```json
{
  "draft_version": "igdm.draft/v1",
  "trace_id": "uuid",
  "template_id": "top20:book:v1:es",
  "text": "…",
  "placeholders": ["BOOK_LINK"],
  "notes": ["language=es", "intent=book"]
}
```

---

## 4) IF.GOV.PANEL (simulated debates)

### What “debate” means here
Because we are not calling external LLMs, the “panel” is a set of **deterministic seat evaluators**.
Each seat emits:
- a vote (`approve` | `request_changes` | `veto`)
- reasons (human readable)
- patch suggestions (structured)

### Seat roster (minimum viable, 5 seats)
1) **Safety seat**: blocks crisis mishandling; ensures no harmful advice.
2) **Boundary seat**: prevents therapy-by-DM; rewrites “help” flows into routing.
3) **Language seat**: enforces same-language output; no mixing; handles low confidence.
4) **Privacy seat**: avoids unnecessary PII; flags risky asks (phone/email) unless explicitly required.
5) **Next-step seat**: checks the reply has a clear next step (link or one question).

Optional seats (when panel size grows)
- **Tone/VoiceDNA seat**: checks length + emoji pattern + directness vs DM voice rules.
- **Spam/abuse seat**: detects harassment loops and routes to block/report guidance.
- **Contrarian seat**: tries to misread the message and see if the draft fails.

### Seat output format
```json
{
  "seat": "language",
  "vote": "approve|request_changes|veto",
  "severity": 0.0,
  "reasons": ["..."],
  "patches": [
    { "op": "replace_text", "path": "draft.text", "value": "..." }
  ]
}
```

### Panel aggregation (deterministic)
- If any seat returns `veto` → panel decision becomes `escalate_human` (or `urgent_escalate`).
- Else if any seat returns `request_changes` → apply patches (in order), re-run seats once.
- Else → approve.

### Panel decision record
```json
{
  "panel_version": "if.gov.panel/igdm/v1",
  "trace_id": "uuid",
  "panel_size": 5,
  "seats": [ { "...": "..." } ],
  "decision": "approve_draft|revise_draft|escalate_human|urgent_escalate",
  "final_draft_text_sha256": "…",
  "reason_summary": "short"
}
```

---

## 5) Escalation UX (how Sergio actually sees it)

### Escalation record
```json
{
  "escalation_version": "igdm.escalation/v1",
  "trace_id": "uuid",
  "tier": "urgent|needs-human",
  "reason_codes": ["self_harm_signal"],
  "sender_id": "123",
  "mid": "m_abc",
  "time_cet": "2025-12-25T21:13:00+01:00",
  "open_links": {
    "instagram_thread": "https://www.instagram.com/direct/t/<conversation_id>/",
    "fb_inbox": "https://business.facebook.com/latest/inbox/all/?asset_id=<page_id>"
  }
}
```

### Notification strategy (POC)
No paid services required:
- Show escalations in a **logged-in dashboard** on `emo-social.infrafabric.io`.
- Optional: email later (requires SMTP relay configured); not required for the POC.

---

## 6) IF.TTT trace + evidence bundles (provable without leaking)

### Two-bundle approach (recommended)
- **Private bundle** (internal): includes raw message text, stored locally with strict permissions.
- **Public bundle** (shareable): contains hashes + redacted previews only.

### Bundle contents (public)
```
bundle/
  manifest.json
  event.json
  triage.json
  draft.json
  panel.json
  escalation.json   (only if escalated)
  sha256sums.txt
  signature_ed25519.txt
```

### Minimum “public” fields
- `message_text_sha256` (not raw)
- `draft_text_sha256` (not raw)
- triage + panel decision + reason codes
- timestamps (UTC + CET)

This is enough to prove: “given these bytes (committed), these deterministic governance steps happened, and this decision was produced”.

---

## 7) Rollout plan (safe)

1) **Triage-only** + escalation queue (no drafts yet).
2) **Draft-only** templates for Top 20 intents (no sending).
3) Add simulated **IF.GOV.PANEL** seats and store panel decisions.
4) Emit IF.TTT bundles for each event (public + private).
5) Add comparison table: `draft` vs `actual sent` (manual) to measure quality.
6) Only after measured success: consider limited auto-send for *low-risk* intents, with a kill switch.