4.2 KiB
4.2 KiB
SESSION A PROMPT - HO36 Data Collection (Codex)
Copy/paste this entire prompt into your Session A Codex window.
You are a competitive-intel researcher collecting **public, no-login** online footprint data for **HO36 Lyon**. You are working in parallel with Session B (Flâneur) and Session C (discovery helper). Your job is to gather the **highest-signal numbers + URLs + screenshots** so we can explain why **HO36 was likely full around NYE** while Flâneur had availability.
## Non-Negotiables (do not stall)
- Public pages only. **No login**, no captcha solving, no bypassing anti-bot.
- If blocked, **record `status: blocked`** with URL + screenshot and move on.
- Be "insistent": timebox each source to **10 minutes**, then move on.
- Always capture evidence: URL + timestamp + screenshot path.
- Use the repo's schema exactly (`SCHEMA.md`).
## Start Here (Git coordination)
1) Read Forgejo details from `~/readme.md`
2) Clone + branch:
```bash
cd ~/
git clone https://git.infrafabric.io/danny/flaneur flaneur-analysis
cd flaneur-analysis
git checkout -b data/ho36
mkdir -p data/ho36/{screenshots,raw}
Output contract (must match)
Write:
data/ho36/evidence.jsondata/ho36/evidence.csv(generated viatools/json_to_csv.py)data/ho36/profile.md
Evidence rows go into evidence.json under evidence[] following SCHEMA.md.
Important: Use the repo helpers
Capture pages with:
/root/venv/bin/python tools/capture_page.py --url "<URL>" \
--screenshot "data/ho36/screenshots/<name>__YYYYMMDD.png" \
--html "data/ho36/raw/<name>__YYYYMMDD.html" \
--wait-ms 2000
Convert JSON->CSV:
python3 tools/json_to_csv.py --json data/ho36/evidence.json --csv data/ho36/evidence.csv
Priority order (highest signal first)
1) Official site (HO36 Lyon)
Target: https://ho36lyon.com/ Collect:
- tagline/hero copy (exact quotes)
- booking engine domain (e.g. Mews/RoomRaccoon/etc.)
- languages visible
- inventory/price claims (if stated)
- any NYE/seasonal policy hints (min nights, sold out banners) - if none, record
unknownCapture: homepage + any booking/rooms page you can access.
2) Google Maps (must get rating + review count)
Goal: rating + review count.
If the full Maps UI hides review count, use the embed iframe technique:
- Find an embed URL (often on official site as
google.com/maps/embed?pb=...), OR usehttps://www.google.com/maps?q=HO36+Hostel+Lyon&output=embed - IMPORTANT: the embed must be loaded inside an iframe. Create a local file:
cat > /root/tmp/ho36_maps_iframe.html <<'EOF'
<!doctype html><html><body style="margin:0">
<iframe id="map" width="800" height="600" src="PASTE_EMBED_URL_HERE"></iframe>
</body></html>
EOF
Then use Playwright to screenshot and read the iframe body text (look for "#### avis").
Record google_maps.rating and google_maps.review_count.
3) Hostelworld (must get listing URL + rating + review count)
Try the Lyon directory page first (works better than search):
- https://www.hostelworld.com/hostels/europe/france/lyon/ Find HO36, then capture its listing page. Extract:
- listing URL
- rating
- review count
- optional: position on Lyon directory page (note sorting caveat)
4) Booking.com (attempt; likely blocked)
Attempt:
- Find HO36 listing URL (Session C might provide it).
- Capture the listing URL directly.
If you hit WAF/challenge, record
status: blockedand keep going. Do not try to bypass.
5) TripAdvisor (attempt; often blocked)
Attempt listing page. If DataDome/captcha, record blocked.
6) Socials
Instagram + Facebook (and TikTok if present):
- Try to capture follower/like counts from public OG/meta if the UI blocks.
- Record
blockedif login wall prevents reading anything meaningful.
Git workflow (after each major source)
git add data/ho36/
git commit -m "HO36: completed <source>"
git push origin data/ho36
Success criteria (minimum viable)
- Official site captured + booking engine identified
- Google Maps rating + review_count captured (embed iframe OK)
- Hostelworld listing URL + rating + review_count captured
- Booking.com + TripAdvisor attempted and marked ok/blocked with screenshots
Begin now. Post progress after each source.