emo-social-insta-dm-agent/README.md

12 KiB
Raw Export PDF Blame History

Emo-Social Insta DM Agent

Project for the Emo-Social Instagram DM agent for @socialmediatorr (distinct from the existing Sergio RAG DB agent).

Includes:

  • Meta Graph API exporter (full DM history)
  • Instagram “Download your information” importer
  • DM analysis pipeline (bot-vs-human, conversions, objections, rescue logic, product eras)
  • IGDM Shadow Mode webhook server (draft-only)
  • Legacy helpers for device login + token derivation (history export)

Where this runs

Build/run the Docker image in pct 250 (ai-dev) and keep production (pct 220) clean.

Credentials

Source of truth creds file (host):

  • /root/tmp/emo-social-meta-app-creds.txt

Inside pct 250, place the same file at the same path:

  • /root/tmp/emo-social-meta-app-creds.txt

Required for export:

  • META_PAGE_ID
  • META_PAGE_ACCESS_TOKEN

Build (pct 250)

  • docker build -t emo-social-insta-dm-agent:dev /root/ai-workspace/emo-social-insta-dm-agent

Note: in this LXC, docker run / podman run can fail due to AppArmor confinement. Use the direct Python commands below unless you change the LXC config to be AppArmor-unconfined.

Obtain tokens (pct 250) — history export only

Note: these steps are for the history exporter. For real-time IG DMs into https://emo-social.infrafabric.io/igdm, use Instagram Business Login (see “Webhooks” below).

meta_device_login requires META_CLIENT_TOKEN (from Meta app dashboard → Settings → Advanced → Client token). Alternatively, set META_CLIENT_ACCESS_TOKEN as APP_ID|CLIENT_TOKEN to override META_APP_ID for device-login only.

If meta_device_login start returns OAuth error (#3) enable "Login from Devices", enable it in the same app id the script prints:

  • https://developers.facebook.com/apps/<APP_ID>/fb-login/settings/ → set Login from Devices = Yes

Start device login (prints a URL + code to enter; no tokens are printed):

  • cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_device_login start

After authorizing in the browser, poll and write META_PAGE_ID + META_PAGE_ACCESS_TOKEN into /root/tmp/emo-social-meta-app-creds.txt:

  • cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_device_login poll --write-page-token --target-ig-user-id 17841466913731557

Fallback: Graph API Explorer → user token → page token

If device-login is blocked, get a Facebook User access token via Graph API Explorer and derive the Page token:

  • Open: https://developers.facebook.com/tools/explorer/
  • Select the correct app, then generate a user token with:
    • pages_show_list,pages_read_engagement,instagram_manage_messages
  • Save the user token to (keep mode 600):
    • /root/tmp/meta-user-access-token.txt
  • Then write META_PAGE_ID + META_PAGE_ACCESS_TOKEN into /root/tmp/emo-social-meta-app-creds.txt (no token values printed):
  • cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_page_token_from_user_token --target-ig-user-id 17841466913731557

Export history (pct 250)

Quick token sanity check (does not print token values):

  • cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_token_doctor

Export to a mounted directory so results persist on disk:

  • docker run --rm -v /root/tmp:/root/tmp emo-social-insta-dm-agent:dev python -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history
    • (Direct) cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history

Workaround: fetch a single thread by user_id

If /conversations listing times out, you can still fetch a single thread if you know the senders user_id (from webhook sender.id):

  • cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.fetch_thread_messages --user-id <SENDER_ID> --max-pages 2 --out /root/tmp/thread.jsonl

Small test run:

  • docker run --rm -v /root/tmp:/root/tmp emo-social-insta-dm-agent:dev python -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history --max-conversations 3 --max-pages 2
    • (Direct) cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history --max-conversations 3 --max-pages 2

Full history fallback (Instagram export)

If Meta app review blocks Graph API DM access, export Sergios IG data via Instagram “Download your information” and import it:

  • cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.import_instagram_export --input /path/to/instagram-export.zip --out /root/tmp/emo-social-insta-dm-export-history

Analyze DM history (Behavioral Cloning / “Biographical Sales”)

This produces the “Sergio persona” artifacts needed for the DM agent:

  • Separates frequent [BOT] templates vs rare [MANUAL] replies (plus [HYBRID]).
  • Builds bot fatigue + script editorial timeline charts.
  • Extracts training pairs (user → Sergio manual reply) from converted threads.
  • Generates rescue playbook (human saves after silence/negative sentiment).
  • Generates objection handlers (price/time/trust/stop → best replies).
  • Builds a quarterly eras CSV (offers/pricing + vocabulary drift).

Outputs are written with mode 600 and may contain sensitive DM content. Keep them out of git.

This repo includes sanitized example reports (no verbatim client DMs) under:

  • reports/socialmediatorr/

Raw analysis artifacts (e.g., training pairs, rescued threads, template caches) should remain in a private working directory such as /root/tmp/ and should not be committed.

Optional: index first (lets you filter recency without scanning every thread):

  • python3 -m sergio_instagram_messaging.index_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-ig-index.jsonl

Then analyze:

  • python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --owner-name "Sergio de Vocht" --index /root/tmp/socialmediatorr-ig-index.jsonl --since-days 180

Analyze an imported history dir (messages/*.jsonl)

If you already ran import_instagram_export (or have a partial import output), point the analyzer at that directory:

  • python3 -m sergio_instagram_messaging.analyze_instagram_export --input /root/tmp/socialmediatorr-ig-export-history --out /root/tmp/socialmediatorr-agent-analysis --owner-name "Sergio de Vocht"

Two-stage workflow (verify templates before full run)

Pass 1 generates top_outgoing_templates.json + template_counts.jsonl:

  • python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --stage pass1

Pass 2 reuses the cache and writes the full deliverables:

  • python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --stage pass2 --templates-cache /root/tmp/socialmediatorr-agent-analysis/top_outgoing_templates.json

Human-readable report (English)

After analysis, generate a single Markdown report:

  • python3 -m sergio_instagram_messaging.generate_dm_report --analysis-dir /root/tmp/socialmediatorr-agent-analysis

Plain-English deep report (Mermaid diagrams)

Generate the deeper “no raw quotes” report directly from an Instagram export folder:

  • python3 -m sergio_instagram_messaging.generate_dm_report_detailed --export-input /path/to/export-root --out /root/tmp/dm_history_report_en_detailed.md

VoiceDNA (manual reply style)

voice_dna/voiceDNA_socialmediatorr_insta_dm.json is a safe-to-store style fingerprint generated from the last 6 months of manual (non-template) DM replies (no raw DM quotes are included).

It also encodes a hard rule for the bot:

  • Always reply in the users input language (English / Spanish / French / Catalan), with a short clarification if the users message is too short to detect.
    • If language is too short to detect: reuse the last language seen in that thread; if still unknown, default to Spanish (no language menus).

Regenerate from a local Instagram export folder:

  • python3 -m sergio_instagram_messaging.generate_voice_dna --export-input /path/to/export-root --out voice_dna/voiceDNA_socialmediatorr_insta_dm.json --owner-name "Sergio de Vocht" --window-months 6

Ready Replies (Top 20)

Multi-language ready-made answers for the Top question topics (aligned to the VoiceDNA language-mirroring policy):

  • reply_library/top20_ready_answers.json (programmatic)
  • reply_library/top20_ready_answers.md (copy/paste)

Webhooks (new messages → auto-reply)

To receive real Instagram DMs (inbound + outbound echo) you need:

  1. Webhooks product configured in the Meta app (callback URL + verify token + instagram fields).
  2. An Instagram Business Login token with instagram_business_manage_messages.
  3. The IG account subscribed to the app (POST /me/subscribed_apps on graph.instagram.com).

The production webhook endpoint exists at:

  • https://emo-social.infrafabric.io/meta/webhook

Shadow mode (draft-only) — operational

This repo includes a draft-only webhook server that:

  • receives Meta webhook events (including is_echo outgoing messages)
  • writes a draft reply (Top 20 templates + language mirroring)
  • stores the draft and later links it to the actual outgoing reply for side-by-side comparison
  • never sends a message (unless you explicitly add a sending step later)

Language policy (pragmatic):

  • Default to Spanish for unclear first messages; switch automatically once the user writes clearly in French/English/Catalan.

Run locally (requires META_VERIFY_TOKEN + META_APP_SECRET in env; and for /meta/ig/connect also META_APP_ID + IGDM_IG_REDIRECT_URI):

  • python3 -m sergio_instagram_messaging.igdm_shadow_server --host 127.0.0.1 --port 5051 --db /root/tmp/igdm/igdm.sqlite --reply-library reply_library/top20_ready_answers.json

Production (pct 220):

  • systemd: igdm-shadow.service (listens on 127.0.0.1:5051)
  • nginx routes:
    • /meta/webhookigdm-shadow (no auth, required by Meta)
    • /meta/ig/connectigdm-shadow (OAuth gated; starts Instagram Business Login)
    • /meta/ig/callbackigdm-shadow (public; used by Instagram OAuth redirect)
    • /igdm and /api/igdm/*igdm-shadow (OAuth gated via oauth2-proxy)

Connect Instagram Business Login (required for message delivery)

If no real DMs are showing up in https://emo-social.infrafabric.io/igdm, the most common missing step is that Instagram Business Login was never completed (so Meta never starts sending webhook events).

  1. In Meta app dashboard → Instagram Business Login → Settings, add redirect URL:
    • https://emo-social.infrafabric.io/meta/ig/callback
  2. Visit the dashboard:
    • https://emo-social.infrafabric.io/igdm
  3. Click Connect / Reconnect Instagram (Business Login) and complete the Instagram consent screen.
  4. Send a DM to @socialmediatorr (e.g. “book”, “livre”) and confirm it appears in the table.

The server stores the IG long-lived access token (mode 600) at:

  • pct 220: /opt/if-emotion/data/igdm/ig_token.json

Older Meta flows use Facebook device login + Page-scoped subscriptions. They frequently fail with “invalid scopes” and are not required if Instagram Business Login is working.

If you still need them for debugging, see the scripts:

  • python3 -m sergio_instagram_messaging.meta_device_login
  • python3 -m sergio_instagram_messaging.meta_page_token_from_user_token
  • python3 -m sergio_instagram_messaging.meta_subscribe_page