emo-social-insta-dm-agent/docs/governance/IF_GOV_IGDM_SPEC.md

249 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# IF.GOV + IF.TTT Spec — Instagram DM Draft Assistant (`@socialmediatorr`)
**Status:** proposal (POC)
**Constraint:** no paid external LLM APIs → “debates” are simulated using deterministic seats (rules) and optional local models only.
This spec describes how to implement the Instagram DM assistant as an **auditable governance pipeline**:
- **IF.GOV.TRIAGE** decides risk + route (normal vs human vs urgent).
- **IF.GOV.PANEL** simulates a multiseat review of the proposed draft reply (no external APIs required).
- **IF.TTT** records a chainofcustody (hashes + decisions + evidence bundle) so results are provable later.
---
## 0) System boundaries (what we will and wont do)
### In scope
- Ingest Meta webhook events for Instagram DMs.
- Produce **draft replies** (default) using templates + simple intent routing.
- Escalate a tiny fraction of DMs to a human (Sergio) quickly, with a direct “open thread” link.
- Produce IF.TTTstyle trace records and evidence bundles for audit/replay.
- Run “panel debates” **without external APIs** (rule seats + optional local model seats).
### Out of scope (for the POC)
- Automatic sending of replies to real clients (keep `draft-only`).
- Therapy-by-DM, crisis intervention, diagnosis, or medical claims.
- Storing/exporting full DM transcripts in a public repo.
---
## 1) High-level architecture
### Components
- **Webhook receiver** (already exists in production on `emo-social.infrafabric.io`): verifies Meta signature and normalizes events.
- **Event store**: append-only storage of DM events + derived decisions (local, private).
- **Triage engine** (`IF.GOV.TRIAGE`): risk + language + intent + confidence.
- **Draft engine**: chooses a reply template (Top 20) or a safe fallback.
- **Panel engine** (`IF.GOV.PANEL`): simulated debate across “seats” → approve/patch/escalate.
- **Trace recorder** (`IF.TTT`): emits signed decision records + evidence bundles.
- **Reviewer UI**: queue view for Drafts + Escalations + “open IG thread” action.
### Data flow (valid Mermaid)
```mermaid
flowchart LR
W[Meta webhook event] --> V[Verify signature]
V --> N[Normalize event]
N --> ES[Event store append]
ES --> T[IF.GOV.TRIAGE]
T -->|urgent| E[Escalation record]
T -->|normal| D[Draft engine]
T -->|needs-human| H[Human-required record]
D --> P[IF.GOV.PANEL seats]
P --> R[Panel decision]
E --> TR[IF.TTT trace + bundle]
H --> TR
R --> TR
TR --> UI[Reviewer UI queue]
```
---
## 2) IF.GOV.TRIAGE (no external API)
### Inputs
- `sender_id` (from webhook)
- `mid` (message id)
- `timestamp_ms`
- `text` (if present; empty allowed)
- minimal thread context (last N messages for this sender_id, if available)
### Outputs (contract)
```json
{
"triage_version": "if.gov.triage/igdm/v1",
"trace_id": "uuid",
"ts_utc": "2025-12-25T12:00:00Z",
"time_cet": "2025-12-25T13:00:00+01:00",
"sender_id": "123",
"mid": "m_abc",
"language": { "code": "es", "confidence": 0.86, "source": "text_or_thread" },
"intent": { "label": "book|link|video|price|help|other", "confidence": 0.90 },
"risk": {
"tier": "normal|needs-human|urgent",
"score": 0.05,
"reasons": ["..."],
"panel_size": 5
}
}
```
### Triage rules (POC defaults)
- **Language detection**
- If message has enough text: detect language from message text.
- Else: reuse last confident thread language.
- Else: set `confidence < 0.5` and prefer a 1line language question.
- **Intent detection**
- Keyword routing for: `book`, `link`, `video`, `price/cost`, `call`, `therapy`, etc.
- If unknown: intent=`other` with low confidence.
- **Risk tier**
- `urgent` if self-harm/suicide signals OR violence/abuse indicators.
- `needs-human` if: therapeutic disclosure, legal threats, harassment, complex personal crisis, repeated angry loop.
- `normal` otherwise.
### “Panel size” without external APIs
Panel size is computed deterministically from `risk.score` (same pattern as the existing `guard_engine.py`):
- normal: 5 seats
- needs-human: 10 seats (more checks, but still local)
- urgent: 20 seats (but action is always escalate, not debate content)
---
## 3) Draft engine (no external API)
### Principles
- Use **templates first**, not a generative model.
- Always mirror the users language (or ask a 1line language question if uncertain).
- Keep replies short; ask one clear next question when helpful.
- Never invite deep disclosure in DMs; route to “resources / call / book link”.
### Draft outputs
```json
{
"draft_version": "igdm.draft/v1",
"trace_id": "uuid",
"template_id": "top20:book:v1:es",
"text": "…",
"placeholders": ["BOOK_LINK"],
"notes": ["language=es", "intent=book"]
}
```
---
## 4) IF.GOV.PANEL (simulated debates)
### What “debate” means here
Because we are not calling external LLMs, the “panel” is a set of **deterministic seat evaluators**.
Each seat emits:
- a vote (`approve` | `request_changes` | `veto`)
- reasons (human readable)
- patch suggestions (structured)
### Seat roster (minimum viable, 5 seats)
1) **Safety seat**: blocks crisis mishandling; ensures no harmful advice.
2) **Boundary seat**: prevents therapy-by-DM; rewrites “help” flows into routing.
3) **Language seat**: enforces same-language output; no mixing; handles low confidence.
4) **Privacy seat**: avoids unnecessary PII; flags risky asks (phone/email) unless explicitly required.
5) **Next-step seat**: checks the reply has a clear next step (link or one question).
Optional seats (when panel size grows)
- **Tone/VoiceDNA seat**: checks length + emoji pattern + directness vs DM voice rules.
- **Spam/abuse seat**: detects harassment loops and routes to block/report guidance.
- **Contrarian seat**: tries to misread the message and see if the draft fails.
### Seat output format
```json
{
"seat": "language",
"vote": "approve|request_changes|veto",
"severity": 0.0,
"reasons": ["..."],
"patches": [
{ "op": "replace_text", "path": "draft.text", "value": "..." }
]
}
```
### Panel aggregation (deterministic)
- If any seat returns `veto` → panel decision becomes `escalate_human` (or `urgent_escalate`).
- Else if any seat returns `request_changes` → apply patches (in order), re-run seats once.
- Else → approve.
### Panel decision record
```json
{
"panel_version": "if.gov.panel/igdm/v1",
"trace_id": "uuid",
"panel_size": 5,
"seats": [ { "...": "..." } ],
"decision": "approve_draft|revise_draft|escalate_human|urgent_escalate",
"final_draft_text_sha256": "…",
"reason_summary": "short"
}
```
---
## 5) Escalation UX (how Sergio actually sees it)
### Escalation record
```json
{
"escalation_version": "igdm.escalation/v1",
"trace_id": "uuid",
"tier": "urgent|needs-human",
"reason_codes": ["self_harm_signal"],
"sender_id": "123",
"mid": "m_abc",
"time_cet": "2025-12-25T21:13:00+01:00",
"open_links": {
"instagram_thread": "https://www.instagram.com/direct/t/<conversation_id>/",
"fb_inbox": "https://business.facebook.com/latest/inbox/all/?asset_id=<page_id>"
}
}
```
### Notification strategy (POC)
No paid services required:
- Show escalations in a **logged-in dashboard** on `emo-social.infrafabric.io`.
- Optional: email later (requires SMTP relay configured); not required for the POC.
---
## 6) IF.TTT trace + evidence bundles (provable without leaking)
### Two-bundle approach (recommended)
- **Private bundle** (internal): includes raw message text, stored locally with strict permissions.
- **Public bundle** (shareable): contains hashes + redacted previews only.
### Bundle contents (public)
```
bundle/
manifest.json
event.json
triage.json
draft.json
panel.json
escalation.json (only if escalated)
sha256sums.txt
signature_ed25519.txt
```
### Minimum “public” fields
- `message_text_sha256` (not raw)
- `draft_text_sha256` (not raw)
- triage + panel decision + reason codes
- timestamps (UTC + CET)
This is enough to prove: “given these bytes (committed), these deterministic governance steps happened, and this decision was produced”.
---
## 7) Rollout plan (safe)
1) **Triage-only** + escalation queue (no drafts yet).
2) **Draft-only** templates for Top 20 intents (no sending).
3) Add simulated **IF.GOV.PANEL** seats and store panel decisions.
4) Emit IF.TTT bundles for each event (public + private).
5) Add comparison table: `draft` vs `actual sent` (manual) to measure quality.
6) Only after measured success: consider limited auto-send for *low-risk* intents, with a kill switch.