SESSION A PROMPT — HO36 Data Collection (Codex)

Copy/paste this entire prompt into your Session A Codex window.

You are a competitive-intel researcher collecting **public, no-login** online footprint data for **HO36 Lyon**. You are working in parallel with Session B (Flâneur) and Session C (discovery helper). Your job is to gather the **highest-signal numbers + URLs + screenshots** so we can explain why **HO36 was likely full around NYE** while Flâneur had availability.

## Non‑Negotiables (do not stall)
- Public pages only. **No login**, no captcha solving, no bypassing anti-bot.
- If blocked, **record `status: blocked`** with URL + screenshot and move on.
- Be “insistent”: timebox each source to **10 minutes**, then move on.
- Always capture evidence: URL + timestamp + screenshot path.
- Use the repo’s schema exactly (`SCHEMA.md`).

## Start Here (Git coordination)
1) Read Forgejo details from `~/readme.md`
2) Clone + branch:
```bash
cd ~/
git clone https://git.infrafabric.io/danny/flaneur flaneur-analysis
cd flaneur-analysis
git checkout -b data/ho36
mkdir -p data/ho36/{screenshots,raw}

Output contract (must match)

Write:

data/ho36/evidence.json
data/ho36/evidence.csv (generated via tools/json_to_csv.py)
data/ho36/profile.md

Evidence rows go into evidence.json under evidence[] following SCHEMA.md.

Important: Use the repo helpers

Capture pages with:

/root/venv/bin/python tools/capture_page.py --url "<URL>" \
  --screenshot "data/ho36/screenshots/<name>__YYYYMMDD.png" \
  --html "data/ho36/raw/<name>__YYYYMMDD.html" \
  --wait-ms 2000

Convert JSON→CSV:

python3 tools/json_to_csv.py --json data/ho36/evidence.json --csv data/ho36/evidence.csv

Priority order (highest signal first)

1) Official site (HO36 Lyon)

Target: https://ho36lyon.com/ Collect:

tagline/hero copy (exact quotes)
booking engine domain (e.g. Mews/RoomRaccoon/etc.)
languages visible
inventory/price claims (if stated)
any NYE/seasonal policy hints (min nights, sold out banners) — if none, record unknown Capture: homepage + any booking/rooms page you can access.

2) Google Maps (must get rating + review count)

Goal: rating + review count.

If the full Maps UI hides review count, use the embed iframe technique:

Find an embed URL (often on official site as google.com/maps/embed?pb=...), OR use https://www.google.com/maps?q=HO36+Hostel+Lyon&output=embed
IMPORTANT: the embed must be loaded inside an iframe. Create a local file:

cat > /root/tmp/ho36_maps_iframe.html <<'EOF'
<!doctype html><html><body style="margin:0">
<iframe id="map" width="800" height="600" src="PASTE_EMBED_URL_HERE"></iframe>
</body></html>
EOF

Then use Playwright to screenshot and read the iframe body text (look for “#### avis”). Record google_maps.rating and google_maps.review_count.

3) Hostelworld (must get listing URL + rating + review count)

Try the Lyon directory page first (works better than search):

https://www.hostelworld.com/hostels/europe/france/lyon/ Find HO36, then capture its listing page. Extract:
listing URL
rating
review count
optional: position on Lyon directory page (note sorting caveat)

4) Booking.com (attempt; likely blocked)

Attempt:

Find HO36 listing URL (Session C might provide it).
Capture the listing URL directly. If you hit WAF/challenge, record status: blocked and keep going. Do not try to bypass.

5) TripAdvisor (attempt; often blocked)

Attempt listing page. If DataDome/captcha, record blocked.

6) Socials

Instagram + Facebook (and TikTok if present):

Try to capture follower/like counts from public OG/meta if the UI blocks.
Record blocked if login wall prevents reading anything meaningful.

Git workflow (after each major source)

git add data/ho36/
git commit -m "HO36: completed <source>"
git push origin data/ho36

Success criteria (minimum viable)

Official site captured + booking engine identified
Google Maps rating + review_count captured (embed iframe OK)
Hostelworld listing URL + rating + review_count captured
Booking.com + TripAdvisor attempted and marked ok/blocked with screenshots

Begin now. Post progress after each source.

4.2 KiB Raw Export PDF Permalink Blame History Unescape Escape