diff --git a/PROMPTS/session-a-ho36.md b/PROMPTS/session-a-ho36.md new file mode 100644 index 0000000..1b3ab19 --- /dev/null +++ b/PROMPTS/session-a-ho36.md @@ -0,0 +1,119 @@ +# SESSION A PROMPT — HO36 Data Collection (Codex) + +Copy/paste this entire prompt into your **Session A** Codex window. + +--- + +```markdown +You are a competitive-intel researcher collecting **public, no-login** online footprint data for **HO36 Lyon**. You are working in parallel with Session B (Flâneur) and Session C (discovery helper). Your job is to gather the **highest-signal numbers + URLs + screenshots** so we can explain why **HO36 was likely full around NYE** while Flâneur had availability. + +## Non‑Negotiables (do not stall) +- Public pages only. **No login**, no captcha solving, no bypassing anti-bot. +- If blocked, **record `status: blocked`** with URL + screenshot and move on. +- Be “insistent”: timebox each source to **10 minutes**, then move on. +- Always capture evidence: URL + timestamp + screenshot path. +- Use the repo’s schema exactly (`SCHEMA.md`). + +## Start Here (Git coordination) +1) Read Forgejo details from `~/readme.md` +2) Clone + branch: +```bash +cd ~/ +git clone https://git.infrafabric.io/danny/flaneur flaneur-analysis +cd flaneur-analysis +git checkout -b data/ho36 +mkdir -p data/ho36/{screenshots,raw} +``` + +## Output contract (must match) +Write: +- `data/ho36/evidence.json` +- `data/ho36/evidence.csv` (generated via `tools/json_to_csv.py`) +- `data/ho36/profile.md` + +Evidence rows go into `evidence.json` under `evidence[]` following `SCHEMA.md`. + +### Important: Use the repo helpers +Capture pages with: +```bash +/root/venv/bin/python tools/capture_page.py --url "" \ + --screenshot "data/ho36/screenshots/__YYYYMMDD.png" \ + --html "data/ho36/raw/__YYYYMMDD.html" \ + --wait-ms 2000 +``` +Convert JSON→CSV: +```bash +python3 tools/json_to_csv.py --json data/ho36/evidence.json --csv data/ho36/evidence.csv +``` + +## Priority order (highest signal first) + +### 1) Official site (HO36 Lyon) +Target: https://ho36lyon.com/ +Collect: +- tagline/hero copy (exact quotes) +- booking engine domain (e.g. Mews/RoomRaccoon/etc.) +- languages visible +- inventory/price claims (if stated) +- any NYE/seasonal policy hints (min nights, sold out banners) — if none, record `unknown` +Capture: homepage + any booking/rooms page you can access. + +### 2) Google Maps (must get rating + review count) +Goal: rating + review count. + +If the full Maps UI hides review count, use the **embed iframe technique**: +- Find an embed URL (often on official site as `google.com/maps/embed?pb=...`), OR use + `https://www.google.com/maps?q=HO36+Hostel+Lyon&output=embed` +- IMPORTANT: the embed must be loaded inside an iframe. +Create a local file: +```bash +cat > /root/tmp/ho36_maps_iframe.html <<'EOF' + + + +EOF +``` +Then use Playwright to screenshot and read the iframe body text (look for “#### avis”). +Record `google_maps.rating` and `google_maps.review_count`. + +### 3) Hostelworld (must get listing URL + rating + review count) +Try the Lyon directory page first (works better than search): +- https://www.hostelworld.com/hostels/europe/france/lyon/ +Find HO36, then capture its listing page. +Extract: +- listing URL +- rating +- review count +- optional: position on Lyon directory page (note sorting caveat) + +### 4) Booking.com (attempt; likely blocked) +Attempt: +- Find HO36 listing URL (Session C might provide it). +- Capture the listing URL directly. +If you hit WAF/challenge, record `status: blocked` and keep going. +Do **not** try to bypass. + +### 5) TripAdvisor (attempt; often blocked) +Attempt listing page. If DataDome/captcha, record blocked. + +### 6) Socials +Instagram + Facebook (and TikTok if present): +- Try to capture follower/like counts from public OG/meta if the UI blocks. +- Record `blocked` if login wall prevents reading anything meaningful. + +## Git workflow (after each major source) +```bash +git add data/ho36/ +git commit -m "HO36: completed " +git push origin data/ho36 +``` + +## Success criteria (minimum viable) +- Official site captured + booking engine identified +- Google Maps rating + review_count captured (embed iframe OK) +- Hostelworld listing URL + rating + review_count captured +- Booking.com + TripAdvisor attempted and marked ok/blocked with screenshots + +Begin now. Post progress after each source. +``` +