emo-social-insta-dm-agent/README.md

230 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Emo-Social Insta DM Agent
Project for the **Emo-Social Instagram DM agent** for `@socialmediatorr` (distinct from the existing Sergio RAG DB agent).
Includes:
- Meta Graph API exporter (full DM history)
- Instagram “Download your information” importer
- DM analysis pipeline (bot-vs-human, conversions, objections, rescue logic, product eras)
- IGDM Shadow Mode webhook server (draft-only)
- Legacy helpers for device login + token derivation (history export)
## Where this runs
Build/run the Docker image in `pct 250` (ai-dev) and keep production (`pct 220`) clean.
## Credentials
Source of truth creds file (host):
- `/root/tmp/emo-social-meta-app-creds.txt`
Inside `pct 250`, place the same file at the same path:
- `/root/tmp/emo-social-meta-app-creds.txt`
Required for export:
- `META_PAGE_ID`
- `META_PAGE_ACCESS_TOKEN`
## Build (pct 250)
- `docker build -t emo-social-insta-dm-agent:dev /root/ai-workspace/emo-social-insta-dm-agent`
Note: in this LXC, `docker run` / `podman run` can fail due to AppArmor confinement. Use the **direct Python** commands below unless you change the LXC config to be AppArmor-unconfined.
## Obtain tokens (pct 250) — history export only
Note: these steps are for the **history exporter**. For **real-time IG DMs** into `https://emo-social.infrafabric.io/igdm`, use **Instagram Business Login** (see “Webhooks” below).
`meta_device_login` requires `META_CLIENT_TOKEN` (from Meta app dashboard → Settings → Advanced → Client token).
Alternatively, set `META_CLIENT_ACCESS_TOKEN` as `APP_ID|CLIENT_TOKEN` to override `META_APP_ID` for device-login only.
If `meta_device_login start` returns OAuth error `(#3) enable "Login from Devices"`, enable it in the *same app id* the script prints:
- `https://developers.facebook.com/apps/<APP_ID>/fb-login/settings/` → set **Login from Devices** = Yes
Start device login (prints a URL + code to enter; no tokens are printed):
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_device_login start`
After authorizing in the browser, poll and write `META_PAGE_ID` + `META_PAGE_ACCESS_TOKEN` into `/root/tmp/emo-social-meta-app-creds.txt`:
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_device_login poll --write-page-token --target-ig-user-id 17841466913731557`
### Fallback: Graph API Explorer → user token → page token
If device-login is blocked, get a **Facebook User access token** via Graph API Explorer and derive the Page token:
- Open: `https://developers.facebook.com/tools/explorer/`
- Select the correct app, then generate a user token with:
- `pages_show_list,pages_read_engagement,instagram_manage_messages`
- Save the user token to (keep mode `600`):
- `/root/tmp/meta-user-access-token.txt`
- Then write `META_PAGE_ID` + `META_PAGE_ACCESS_TOKEN` into `/root/tmp/emo-social-meta-app-creds.txt` (no token values printed):
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_page_token_from_user_token --target-ig-user-id 17841466913731557`
## Export history (pct 250)
Quick token sanity check (does not print token values):
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_token_doctor`
Export to a mounted directory so results persist on disk:
- `docker run --rm -v /root/tmp:/root/tmp emo-social-insta-dm-agent:dev python -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history`
- (Direct) `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history`
### Workaround: fetch a single thread by `user_id`
If `/conversations` listing times out, you can still fetch a single thread if you know the senders `user_id` (from webhook `sender.id`):
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.fetch_thread_messages --user-id <SENDER_ID> --max-pages 2 --out /root/tmp/thread.jsonl`
Small test run:
- `docker run --rm -v /root/tmp:/root/tmp emo-social-insta-dm-agent:dev python -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history --max-conversations 3 --max-pages 2`
- (Direct) `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history --max-conversations 3 --max-pages 2`
## Full history fallback (Instagram export)
If Meta app review blocks Graph API DM access, export Sergios IG data via Instagram “Download your information” and import it:
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.import_instagram_export --input /path/to/instagram-export.zip --out /root/tmp/emo-social-insta-dm-export-history`
## Analyze DM history (Behavioral Cloning / “Biographical Sales”)
This produces the “Sergio persona” artifacts needed for the DM agent:
- Separates frequent `[BOT]` templates vs rare `[MANUAL]` replies (plus `[HYBRID]`).
- Builds **bot fatigue** + **script editorial timeline** charts.
- Extracts **training pairs** (user → Sergio manual reply) from converted threads.
- Generates **rescue playbook** (human saves after silence/negative sentiment).
- Generates **objection handlers** (price/time/trust/stop → best replies).
- Builds a quarterly **eras** CSV (offers/pricing + vocabulary drift).
Outputs are written with mode `600` and may contain sensitive DM content. Keep them out of git.
This repo includes **sanitized** example reports (no verbatim client DMs) under:
- `reports/socialmediatorr/`
Raw analysis artifacts (e.g., training pairs, rescued threads, template caches) should remain in a private working directory such as `/root/tmp/` and should not be committed.
### Analyze a raw Instagram export folder (recommended)
Optional: index first (lets you filter recency without scanning every thread):
- `python3 -m sergio_instagram_messaging.index_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-ig-index.jsonl`
Then analyze:
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --owner-name "Sergio de Vocht" --index /root/tmp/socialmediatorr-ig-index.jsonl --since-days 180`
### Analyze an imported history dir (messages/*.jsonl)
If you already ran `import_instagram_export` (or have a partial import output), point the analyzer at that directory:
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /root/tmp/socialmediatorr-ig-export-history --out /root/tmp/socialmediatorr-agent-analysis --owner-name "Sergio de Vocht"`
### Two-stage workflow (verify templates before full run)
Pass 1 generates `top_outgoing_templates.json` + `template_counts.jsonl`:
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --stage pass1`
Pass 2 reuses the cache and writes the full deliverables:
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --stage pass2 --templates-cache /root/tmp/socialmediatorr-agent-analysis/top_outgoing_templates.json`
### Human-readable report (English)
After analysis, generate a single Markdown report:
- `python3 -m sergio_instagram_messaging.generate_dm_report --analysis-dir /root/tmp/socialmediatorr-agent-analysis`
### Plain-English deep report (Mermaid diagrams)
Generate the deeper “no raw quotes” report directly from an Instagram export folder:
- `python3 -m sergio_instagram_messaging.generate_dm_report_detailed --export-input /path/to/export-root --out /root/tmp/dm_history_report_en_detailed.md`
## VoiceDNA (manual reply style)
`voice_dna/voiceDNA_socialmediatorr_insta_dm.json` is a **safe-to-store** style fingerprint generated from the last 6 months of **manual (non-template) DM replies** (no raw DM quotes are included).
It also encodes a hard rule for the bot:
- Always reply in the **users input language** (English / Spanish / French / Catalan), with a short clarification if the users message is too short to detect.
- If language is too short to detect: reuse the last language seen in that thread; if still unknown, **default to Spanish** (no language menus).
Regenerate from a local Instagram export folder:
- `python3 -m sergio_instagram_messaging.generate_voice_dna --export-input /path/to/export-root --out voice_dna/voiceDNA_socialmediatorr_insta_dm.json --owner-name "Sergio de Vocht" --window-months 6`
## Ready Replies (Top 20)
Multi-language ready-made answers for the Top question topics (aligned to the VoiceDNA language-mirroring policy):
- `reply_library/top20_ready_answers.json` (programmatic)
- `reply_library/top20_ready_answers.md` (copy/paste)
## Webhooks (new messages → auto-reply)
To receive real Instagram DMs (inbound + outbound echo) you need:
1) Webhooks product configured in the Meta app (callback URL + verify token + instagram fields).
2) An **Instagram Business Login** token with `instagram_business_manage_messages`.
3) The IG account subscribed to the app (`POST /me/subscribed_apps` on `graph.instagram.com`).
The production webhook endpoint exists at:
- `https://emo-social.infrafabric.io/meta/webhook`
## Shadow mode (draft-only) — operational
This repo includes a **draft-only** webhook server that:
- receives Meta webhook events (including `is_echo` outgoing messages)
- writes a draft reply (Top 20 templates + language mirroring)
- stores the draft and later links it to the **actual** outgoing reply for side-by-side comparison
- never sends a message (unless you explicitly add a sending step later)
Language policy (pragmatic):
- Default to Spanish for unclear first messages; switch automatically once the user writes clearly in French/English/Catalan.
Run locally (requires `META_VERIFY_TOKEN` + `META_APP_SECRET` in env; and for `/meta/ig/connect` also `META_APP_ID` + `IGDM_IG_REDIRECT_URI`):
- `python3 -m sergio_instagram_messaging.igdm_shadow_server --host 127.0.0.1 --port 5051 --db /root/tmp/igdm/igdm.sqlite --reply-library reply_library/top20_ready_answers.json`
Production (`pct 220`):
- systemd: `igdm-shadow.service` (listens on `127.0.0.1:5051`)
- nginx routes:
- `/meta/webhook``igdm-shadow` (no auth, required by Meta)
- `/meta/ig/connect``igdm-shadow` (OAuth gated; starts Instagram Business Login)
- `/meta/ig/callback``igdm-shadow` (public; used by Instagram OAuth redirect)
- `/igdm` and `/api/igdm/*``igdm-shadow` (OAuth gated via `oauth2-proxy`)
### Connect Instagram Business Login (required for message delivery)
If **no real DMs** are showing up in `https://emo-social.infrafabric.io/igdm`, the most common missing step is that Instagram Business Login was never completed (so Meta never starts sending webhook events).
1) In Meta app dashboard → Instagram Business Login → Settings, add redirect URL:
- `https://emo-social.infrafabric.io/meta/ig/callback`
2) Visit the dashboard:
- `https://emo-social.infrafabric.io/igdm`
3) Click **Connect / Reconnect Instagram (Business Login)** and complete the Instagram consent screen.
4) Send a DM to `@socialmediatorr` (e.g. “book”, “livre”) and confirm it appears in the table.
The server stores the IG long-lived access token (mode `600`) at:
- `pct 220`: `/opt/if-emotion/data/igdm/ig_token.json`
### Legacy (not recommended): device login + Page subscription
Older Meta flows use Facebook device login + Page-scoped subscriptions. They frequently fail with “invalid scopes” and are not required if Instagram Business Login is working.
If you still need them for debugging, see the scripts:
- `python3 -m sergio_instagram_messaging.meta_device_login`
- `python3 -m sergio_instagram_messaging.meta_page_token_from_user_token`
- `python3 -m sergio_instagram_messaging.meta_subscribe_page`