emo-social-insta-dm-agent/docs/governance/IF_GOV_IGDM_SPEC.md

8.4 KiB
Raw Export PDF Blame History

IF.GOV + IF.TTT Spec — Instagram DM Draft Assistant (@socialmediatorr)

Status: proposal (POC)
Constraint: no paid external LLM APIs → “debates” are simulated using deterministic seats (rules) and optional local models only.

This spec describes how to implement the Instagram DM assistant as an auditable governance pipeline:

  • IF.GOV.TRIAGE decides risk + route (normal vs human vs urgent).
  • IF.GOV.PANEL simulates a multiseat review of the proposed draft reply (no external APIs required).
  • IF.TTT records a chainofcustody (hashes + decisions + evidence bundle) so results are provable later.

0) System boundaries (what we will and wont do)

In scope

  • Ingest Meta webhook events for Instagram DMs.
  • Produce draft replies (default) using templates + simple intent routing.
  • Escalate a tiny fraction of DMs to a human (Sergio) quickly, with a direct “open thread” link.
  • Produce IF.TTTstyle trace records and evidence bundles for audit/replay.
  • Run “panel debates” without external APIs (rule seats + optional local model seats).

Out of scope (for the POC)

  • Automatic sending of replies to real clients (keep draft-only).
  • Therapy-by-DM, crisis intervention, diagnosis, or medical claims.
  • Storing/exporting full DM transcripts in a public repo.

1) High-level architecture

Components

  • Webhook receiver (already exists in production on emo-social.infrafabric.io): verifies Meta signature and normalizes events.
  • Event store: append-only storage of DM events + derived decisions (local, private).
  • Triage engine (IF.GOV.TRIAGE): risk + language + intent + confidence.
  • Draft engine: chooses a reply template (Top 20) or a safe fallback.
  • Panel engine (IF.GOV.PANEL): simulated debate across “seats” → approve/patch/escalate.
  • Trace recorder (IF.TTT): emits signed decision records + evidence bundles.
  • Reviewer UI: queue view for Drafts + Escalations + “open IG thread” action.

Data flow (valid Mermaid)

flowchart LR
  W[Meta webhook event] --> V[Verify signature]
  V --> N[Normalize event]
  N --> ES[Event store append]
  ES --> T[IF.GOV.TRIAGE]
  T -->|urgent| E[Escalation record]
  T -->|normal| D[Draft engine]
  T -->|needs-human| H[Human-required record]
  D --> P[IF.GOV.PANEL seats]
  P --> R[Panel decision]
  E --> TR[IF.TTT trace + bundle]
  H --> TR
  R --> TR
  TR --> UI[Reviewer UI queue]

2) IF.GOV.TRIAGE (no external API)

Inputs

  • sender_id (from webhook)
  • mid (message id)
  • timestamp_ms
  • text (if present; empty allowed)
  • minimal thread context (last N messages for this sender_id, if available)

Outputs (contract)

{
  "triage_version": "if.gov.triage/igdm/v1",
  "trace_id": "uuid",
  "ts_utc": "2025-12-25T12:00:00Z",
  "time_cet": "2025-12-25T13:00:00+01:00",
  "sender_id": "123",
  "mid": "m_abc",
  "language": { "code": "es", "confidence": 0.86, "source": "text_or_thread" },
  "intent": { "label": "book|link|video|price|help|other", "confidence": 0.90 },
  "risk": {
    "tier": "normal|needs-human|urgent",
    "score": 0.05,
    "reasons": ["..."],
    "panel_size": 5
  }
}

Triage rules (POC defaults)

  • Language detection
    • If message has enough text: detect language from message text.
    • Else: reuse last confident thread language.
    • Else: set confidence < 0.5 and prefer a 1line language question.
  • Intent detection
    • Keyword routing for: book, link, video, price/cost, call, therapy, etc.
    • If unknown: intent=other with low confidence.
  • Risk tier
    • urgent if self-harm/suicide signals OR violence/abuse indicators.
    • needs-human if: therapeutic disclosure, legal threats, harassment, complex personal crisis, repeated angry loop.
    • normal otherwise.

“Panel size” without external APIs

Panel size is computed deterministically from risk.score (same pattern as the existing guard_engine.py):

  • normal: 5 seats
  • needs-human: 10 seats (more checks, but still local)
  • urgent: 20 seats (but action is always escalate, not debate content)

3) Draft engine (no external API)

Principles

  • Use templates first, not a generative model.
  • Always mirror the users language (or ask a 1line language question if uncertain).
  • Keep replies short; ask one clear next question when helpful.
  • Never invite deep disclosure in DMs; route to “resources / call / book link”.

Draft outputs

{
  "draft_version": "igdm.draft/v1",
  "trace_id": "uuid",
  "template_id": "top20:book:v1:es",
  "text": "…",
  "placeholders": ["BOOK_LINK"],
  "notes": ["language=es", "intent=book"]
}

4) IF.GOV.PANEL (simulated debates)

What “debate” means here

Because we are not calling external LLMs, the “panel” is a set of deterministic seat evaluators. Each seat emits:

  • a vote (approve | request_changes | veto)
  • reasons (human readable)
  • patch suggestions (structured)

Seat roster (minimum viable, 5 seats)

  1. Safety seat: blocks crisis mishandling; ensures no harmful advice.
  2. Boundary seat: prevents therapy-by-DM; rewrites “help” flows into routing.
  3. Language seat: enforces same-language output; no mixing; handles low confidence.
  4. Privacy seat: avoids unnecessary PII; flags risky asks (phone/email) unless explicitly required.
  5. Next-step seat: checks the reply has a clear next step (link or one question).

Optional seats (when panel size grows)

  • Tone/VoiceDNA seat: checks length + emoji pattern + directness vs DM voice rules.
  • Spam/abuse seat: detects harassment loops and routes to block/report guidance.
  • Contrarian seat: tries to misread the message and see if the draft fails.

Seat output format

{
  "seat": "language",
  "vote": "approve|request_changes|veto",
  "severity": 0.0,
  "reasons": ["..."],
  "patches": [
    { "op": "replace_text", "path": "draft.text", "value": "..." }
  ]
}

Panel aggregation (deterministic)

  • If any seat returns veto → panel decision becomes escalate_human (or urgent_escalate).
  • Else if any seat returns request_changes → apply patches (in order), re-run seats once.
  • Else → approve.

Panel decision record

{
  "panel_version": "if.gov.panel/igdm/v1",
  "trace_id": "uuid",
  "panel_size": 5,
  "seats": [ { "...": "..." } ],
  "decision": "approve_draft|revise_draft|escalate_human|urgent_escalate",
  "final_draft_text_sha256": "…",
  "reason_summary": "short"
}

5) Escalation UX (how Sergio actually sees it)

Escalation record

{
  "escalation_version": "igdm.escalation/v1",
  "trace_id": "uuid",
  "tier": "urgent|needs-human",
  "reason_codes": ["self_harm_signal"],
  "sender_id": "123",
  "mid": "m_abc",
  "time_cet": "2025-12-25T21:13:00+01:00",
  "open_links": {
    "instagram_thread": "https://www.instagram.com/direct/t/<conversation_id>/",
    "fb_inbox": "https://business.facebook.com/latest/inbox/all/?asset_id=<page_id>"
  }
}

Notification strategy (POC)

No paid services required:

  • Show escalations in a logged-in dashboard on emo-social.infrafabric.io.
  • Optional: email later (requires SMTP relay configured); not required for the POC.

6) IF.TTT trace + evidence bundles (provable without leaking)

  • Private bundle (internal): includes raw message text, stored locally with strict permissions.
  • Public bundle (shareable): contains hashes + redacted previews only.

Bundle contents (public)

bundle/
  manifest.json
  event.json
  triage.json
  draft.json
  panel.json
  escalation.json   (only if escalated)
  sha256sums.txt
  signature_ed25519.txt

Minimum “public” fields

  • message_text_sha256 (not raw)
  • draft_text_sha256 (not raw)
  • triage + panel decision + reason codes
  • timestamps (UTC + CET)

This is enough to prove: “given these bytes (committed), these deterministic governance steps happened, and this decision was produced”.


7) Rollout plan (safe)

  1. Triage-only + escalation queue (no drafts yet).
  2. Draft-only templates for Top 20 intents (no sending).
  3. Add simulated IF.GOV.PANEL seats and store panel decisions.
  4. Emit IF.TTT bundles for each event (public + private).
  5. Add comparison table: draft vs actual sent (manual) to measure quality.
  6. Only after measured success: consider limited auto-send for low-risk intents, with a kill switch.