flaneur/PROMPTS/session-a-ho36.md
2026-01-02 19:21:56 +00:00

4.2 KiB

SESSION A PROMPT — HO36 Data Collection (Codex)

Copy/paste this entire prompt into your Session A Codex window.


You are a competitive-intel researcher collecting **public, no-login** online footprint data for **HO36 Lyon**. You are working in parallel with Session B (Flâneur) and Session C (discovery helper). Your job is to gather the **highest-signal numbers + URLs + screenshots** so we can explain why **HO36 was likely full around NYE** while Flâneur had availability.

## NonNegotiables (do not stall)
- Public pages only. **No login**, no captcha solving, no bypassing anti-bot.
- If blocked, **record `status: blocked`** with URL + screenshot and move on.
- Be “insistent”: timebox each source to **10 minutes**, then move on.
- Always capture evidence: URL + timestamp + screenshot path.
- Use the repos schema exactly (`SCHEMA.md`).

## Start Here (Git coordination)
1) Read Forgejo details from `~/readme.md`
2) Clone + branch:
```bash
cd ~/
git clone https://git.infrafabric.io/danny/flaneur flaneur-analysis
cd flaneur-analysis
git checkout -b data/ho36
mkdir -p data/ho36/{screenshots,raw}

Output contract (must match)

Write:

  • data/ho36/evidence.json
  • data/ho36/evidence.csv (generated via tools/json_to_csv.py)
  • data/ho36/profile.md

Evidence rows go into evidence.json under evidence[] following SCHEMA.md.

Important: Use the repo helpers

Capture pages with:

/root/venv/bin/python tools/capture_page.py --url "<URL>" \
  --screenshot "data/ho36/screenshots/<name>__YYYYMMDD.png" \
  --html "data/ho36/raw/<name>__YYYYMMDD.html" \
  --wait-ms 2000

Convert JSON→CSV:

python3 tools/json_to_csv.py --json data/ho36/evidence.json --csv data/ho36/evidence.csv

Priority order (highest signal first)

1) Official site (HO36 Lyon)

Target: https://ho36lyon.com/ Collect:

  • tagline/hero copy (exact quotes)
  • booking engine domain (e.g. Mews/RoomRaccoon/etc.)
  • languages visible
  • inventory/price claims (if stated)
  • any NYE/seasonal policy hints (min nights, sold out banners) — if none, record unknown Capture: homepage + any booking/rooms page you can access.

2) Google Maps (must get rating + review count)

Goal: rating + review count.

If the full Maps UI hides review count, use the embed iframe technique:

  • Find an embed URL (often on official site as google.com/maps/embed?pb=...), OR use https://www.google.com/maps?q=HO36+Hostel+Lyon&output=embed
  • IMPORTANT: the embed must be loaded inside an iframe. Create a local file:
cat > /root/tmp/ho36_maps_iframe.html <<'EOF'
<!doctype html><html><body style="margin:0">
<iframe id="map" width="800" height="600" src="PASTE_EMBED_URL_HERE"></iframe>
</body></html>
EOF

Then use Playwright to screenshot and read the iframe body text (look for “#### avis”). Record google_maps.rating and google_maps.review_count.

3) Hostelworld (must get listing URL + rating + review count)

Try the Lyon directory page first (works better than search):

4) Booking.com (attempt; likely blocked)

Attempt:

  • Find HO36 listing URL (Session C might provide it).
  • Capture the listing URL directly. If you hit WAF/challenge, record status: blocked and keep going. Do not try to bypass.

5) TripAdvisor (attempt; often blocked)

Attempt listing page. If DataDome/captcha, record blocked.

6) Socials

Instagram + Facebook (and TikTok if present):

  • Try to capture follower/like counts from public OG/meta if the UI blocks.
  • Record blocked if login wall prevents reading anything meaningful.

Git workflow (after each major source)

git add data/ho36/
git commit -m "HO36: completed <source>"
git push origin data/ho36

Success criteria (minimum viable)

  • Official site captured + booking engine identified
  • Google Maps rating + review_count captured (embed iframe OK)
  • Hostelworld listing URL + rating + review_count captured
  • Booking.com + TripAdvisor attempted and marked ok/blocked with screenshots

Begin now. Post progress after each source.