230 lines
12 KiB
Markdown
230 lines
12 KiB
Markdown
# Emo-Social Insta DM Agent
|
||
|
||
Project for the **Emo-Social Instagram DM agent** for `@socialmediatorr` (distinct from the existing Sergio RAG DB agent).
|
||
|
||
Includes:
|
||
- Meta Graph API exporter (full DM history)
|
||
- Instagram “Download your information” importer
|
||
- DM analysis pipeline (bot-vs-human, conversions, objections, rescue logic, product eras)
|
||
- IGDM Shadow Mode webhook server (draft-only)
|
||
- Legacy helpers for device login + token derivation (history export)
|
||
|
||
## Where this runs
|
||
|
||
Build/run the Docker image in `pct 250` (ai-dev) and keep production (`pct 220`) clean.
|
||
|
||
## Credentials
|
||
|
||
Source of truth creds file (host):
|
||
|
||
- `/root/tmp/emo-social-meta-app-creds.txt`
|
||
|
||
Inside `pct 250`, place the same file at the same path:
|
||
|
||
- `/root/tmp/emo-social-meta-app-creds.txt`
|
||
|
||
Required for export:
|
||
|
||
- `META_PAGE_ID`
|
||
- `META_PAGE_ACCESS_TOKEN`
|
||
|
||
## Build (pct 250)
|
||
|
||
- `docker build -t emo-social-insta-dm-agent:dev /root/ai-workspace/emo-social-insta-dm-agent`
|
||
|
||
Note: in this LXC, `docker run` / `podman run` can fail due to AppArmor confinement. Use the **direct Python** commands below unless you change the LXC config to be AppArmor-unconfined.
|
||
|
||
## Obtain tokens (pct 250) — history export only
|
||
|
||
Note: these steps are for the **history exporter**. For **real-time IG DMs** into `https://emo-social.infrafabric.io/igdm`, use **Instagram Business Login** (see “Webhooks” below).
|
||
|
||
`meta_device_login` requires `META_CLIENT_TOKEN` (from Meta app dashboard → Settings → Advanced → Client token).
|
||
Alternatively, set `META_CLIENT_ACCESS_TOKEN` as `APP_ID|CLIENT_TOKEN` to override `META_APP_ID` for device-login only.
|
||
|
||
If `meta_device_login start` returns OAuth error `(#3) enable "Login from Devices"`, enable it in the *same app id* the script prints:
|
||
|
||
- `https://developers.facebook.com/apps/<APP_ID>/fb-login/settings/` → set **Login from Devices** = Yes
|
||
|
||
Start device login (prints a URL + code to enter; no tokens are printed):
|
||
|
||
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_device_login start`
|
||
|
||
After authorizing in the browser, poll and write `META_PAGE_ID` + `META_PAGE_ACCESS_TOKEN` into `/root/tmp/emo-social-meta-app-creds.txt`:
|
||
|
||
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_device_login poll --write-page-token --target-ig-user-id 17841466913731557`
|
||
|
||
### Fallback: Graph API Explorer → user token → page token
|
||
|
||
If device-login is blocked, get a **Facebook User access token** via Graph API Explorer and derive the Page token:
|
||
|
||
- Open: `https://developers.facebook.com/tools/explorer/`
|
||
- Select the correct app, then generate a user token with:
|
||
- `pages_show_list,pages_read_engagement,instagram_manage_messages`
|
||
- Save the user token to (keep mode `600`):
|
||
- `/root/tmp/meta-user-access-token.txt`
|
||
- Then write `META_PAGE_ID` + `META_PAGE_ACCESS_TOKEN` into `/root/tmp/emo-social-meta-app-creds.txt` (no token values printed):
|
||
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_page_token_from_user_token --target-ig-user-id 17841466913731557`
|
||
|
||
## Export history (pct 250)
|
||
|
||
Quick token sanity check (does not print token values):
|
||
|
||
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.meta_token_doctor`
|
||
|
||
Export to a mounted directory so results persist on disk:
|
||
|
||
- `docker run --rm -v /root/tmp:/root/tmp emo-social-insta-dm-agent:dev python -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history`
|
||
- (Direct) `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history`
|
||
|
||
### Workaround: fetch a single thread by `user_id`
|
||
|
||
If `/conversations` listing times out, you can still fetch a single thread if you know the sender’s `user_id` (from webhook `sender.id`):
|
||
|
||
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.fetch_thread_messages --user-id <SENDER_ID> --max-pages 2 --out /root/tmp/thread.jsonl`
|
||
|
||
Small test run:
|
||
|
||
- `docker run --rm -v /root/tmp:/root/tmp emo-social-insta-dm-agent:dev python -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history --max-conversations 3 --max-pages 2`
|
||
- (Direct) `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.export_meta_ig_history --out /root/tmp/emo-social-insta-dm-history --max-conversations 3 --max-pages 2`
|
||
|
||
## Full history fallback (Instagram export)
|
||
|
||
If Meta app review blocks Graph API DM access, export Sergio’s IG data via Instagram “Download your information” and import it:
|
||
|
||
- `cd /root/ai-workspace/emo-social-insta-dm-agent && python3 -m sergio_instagram_messaging.import_instagram_export --input /path/to/instagram-export.zip --out /root/tmp/emo-social-insta-dm-export-history`
|
||
|
||
## Analyze DM history (Behavioral Cloning / “Biographical Sales”)
|
||
|
||
This produces the “Sergio persona” artifacts needed for the DM agent:
|
||
|
||
- Separates frequent `[BOT]` templates vs rare `[MANUAL]` replies (plus `[HYBRID]`).
|
||
- Builds **bot fatigue** + **script editorial timeline** charts.
|
||
- Extracts **training pairs** (user → Sergio manual reply) from converted threads.
|
||
- Generates **rescue playbook** (human saves after silence/negative sentiment).
|
||
- Generates **objection handlers** (price/time/trust/stop → best replies).
|
||
- Builds a quarterly **eras** CSV (offers/pricing + vocabulary drift).
|
||
|
||
Outputs are written with mode `600` and may contain sensitive DM content. Keep them out of git.
|
||
|
||
This repo includes **sanitized** example reports (no verbatim client DMs) under:
|
||
|
||
- `reports/socialmediatorr/`
|
||
|
||
Raw analysis artifacts (e.g., training pairs, rescued threads, template caches) should remain in a private working directory such as `/root/tmp/` and should not be committed.
|
||
|
||
### Analyze a raw Instagram export folder (recommended)
|
||
|
||
Optional: index first (lets you filter recency without scanning every thread):
|
||
|
||
- `python3 -m sergio_instagram_messaging.index_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-ig-index.jsonl`
|
||
|
||
Then analyze:
|
||
|
||
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --owner-name "Sergio de Vocht" --index /root/tmp/socialmediatorr-ig-index.jsonl --since-days 180`
|
||
|
||
### Analyze an imported history dir (messages/*.jsonl)
|
||
|
||
If you already ran `import_instagram_export` (or have a partial import output), point the analyzer at that directory:
|
||
|
||
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /root/tmp/socialmediatorr-ig-export-history --out /root/tmp/socialmediatorr-agent-analysis --owner-name "Sergio de Vocht"`
|
||
|
||
### Two-stage workflow (verify templates before full run)
|
||
|
||
Pass 1 generates `top_outgoing_templates.json` + `template_counts.jsonl`:
|
||
|
||
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --stage pass1`
|
||
|
||
Pass 2 reuses the cache and writes the full deliverables:
|
||
|
||
- `python3 -m sergio_instagram_messaging.analyze_instagram_export --input /path/to/your_instagram_activity --out /root/tmp/socialmediatorr-agent-analysis --stage pass2 --templates-cache /root/tmp/socialmediatorr-agent-analysis/top_outgoing_templates.json`
|
||
|
||
### Human-readable report (English)
|
||
|
||
After analysis, generate a single Markdown report:
|
||
|
||
- `python3 -m sergio_instagram_messaging.generate_dm_report --analysis-dir /root/tmp/socialmediatorr-agent-analysis`
|
||
|
||
### Plain-English deep report (Mermaid diagrams)
|
||
|
||
Generate the deeper “no raw quotes” report directly from an Instagram export folder:
|
||
|
||
- `python3 -m sergio_instagram_messaging.generate_dm_report_detailed --export-input /path/to/export-root --out /root/tmp/dm_history_report_en_detailed.md`
|
||
|
||
## VoiceDNA (manual reply style)
|
||
|
||
`voice_dna/voiceDNA_socialmediatorr_insta_dm.json` is a **safe-to-store** style fingerprint generated from the last 6 months of **manual (non-template) DM replies** (no raw DM quotes are included).
|
||
|
||
It also encodes a hard rule for the bot:
|
||
|
||
- Always reply in the **user’s input language** (English / Spanish / French / Catalan), with a short clarification if the user’s message is too short to detect.
|
||
- If language is too short to detect: reuse the last language seen in that thread; if still unknown, **default to Spanish** (no language menus).
|
||
|
||
Regenerate from a local Instagram export folder:
|
||
|
||
- `python3 -m sergio_instagram_messaging.generate_voice_dna --export-input /path/to/export-root --out voice_dna/voiceDNA_socialmediatorr_insta_dm.json --owner-name "Sergio de Vocht" --window-months 6`
|
||
|
||
## Ready Replies (Top 20)
|
||
|
||
Multi-language ready-made answers for the Top question topics (aligned to the VoiceDNA language-mirroring policy):
|
||
|
||
- `reply_library/top20_ready_answers.json` (programmatic)
|
||
- `reply_library/top20_ready_answers.md` (copy/paste)
|
||
|
||
## Webhooks (new messages → auto-reply)
|
||
|
||
To receive real Instagram DMs (inbound + outbound echo) you need:
|
||
|
||
1) Webhooks product configured in the Meta app (callback URL + verify token + instagram fields).
|
||
2) An **Instagram Business Login** token with `instagram_business_manage_messages`.
|
||
3) The IG account subscribed to the app (`POST /me/subscribed_apps` on `graph.instagram.com`).
|
||
|
||
The production webhook endpoint exists at:
|
||
|
||
- `https://emo-social.infrafabric.io/meta/webhook`
|
||
|
||
## Shadow mode (draft-only) — operational
|
||
|
||
This repo includes a **draft-only** webhook server that:
|
||
- receives Meta webhook events (including `is_echo` outgoing messages)
|
||
- writes a draft reply (Top 20 templates + language mirroring)
|
||
- stores the draft and later links it to the **actual** outgoing reply for side-by-side comparison
|
||
- never sends a message (unless you explicitly add a sending step later)
|
||
|
||
Language policy (pragmatic):
|
||
- Default to Spanish for unclear first messages; switch automatically once the user writes clearly in French/English/Catalan.
|
||
|
||
Run locally (requires `META_VERIFY_TOKEN` + `META_APP_SECRET` in env; and for `/meta/ig/connect` also `META_APP_ID` + `IGDM_IG_REDIRECT_URI`):
|
||
|
||
- `python3 -m sergio_instagram_messaging.igdm_shadow_server --host 127.0.0.1 --port 5051 --db /root/tmp/igdm/igdm.sqlite --reply-library reply_library/top20_ready_answers.json`
|
||
|
||
Production (`pct 220`):
|
||
- systemd: `igdm-shadow.service` (listens on `127.0.0.1:5051`)
|
||
- nginx routes:
|
||
- `/meta/webhook` → `igdm-shadow` (no auth, required by Meta)
|
||
- `/meta/ig/connect` → `igdm-shadow` (OAuth gated; starts Instagram Business Login)
|
||
- `/meta/ig/callback` → `igdm-shadow` (public; used by Instagram OAuth redirect)
|
||
- `/igdm` and `/api/igdm/*` → `igdm-shadow` (OAuth gated via `oauth2-proxy`)
|
||
|
||
### Connect Instagram Business Login (required for message delivery)
|
||
|
||
If **no real DMs** are showing up in `https://emo-social.infrafabric.io/igdm`, the most common missing step is that Instagram Business Login was never completed (so Meta never starts sending webhook events).
|
||
|
||
1) In Meta app dashboard → Instagram Business Login → Settings, add redirect URL:
|
||
- `https://emo-social.infrafabric.io/meta/ig/callback`
|
||
2) Visit the dashboard:
|
||
- `https://emo-social.infrafabric.io/igdm`
|
||
3) Click **Connect / Reconnect Instagram (Business Login)** and complete the Instagram consent screen.
|
||
4) Send a DM to `@socialmediatorr` (e.g. “book”, “livre”) and confirm it appears in the table.
|
||
|
||
The server stores the IG long-lived access token (mode `600`) at:
|
||
- `pct 220`: `/opt/if-emotion/data/igdm/ig_token.json`
|
||
|
||
### Legacy (not recommended): device login + Page subscription
|
||
|
||
Older Meta flows use Facebook device login + Page-scoped subscriptions. They frequently fail with “invalid scopes” and are not required if Instagram Business Login is working.
|
||
|
||
If you still need them for debugging, see the scripts:
|
||
|
||
- `python3 -m sergio_instagram_messaging.meta_device_login`
|
||
- `python3 -m sergio_instagram_messaging.meta_page_token_from_user_token`
|
||
- `python3 -m sergio_instagram_messaging.meta_subscribe_page`
|