flaneur/PROMPTS/session-a-ho36.md

118 lines
4.2 KiB
Markdown

# SESSION A PROMPT - HO36 Data Collection (Codex)
Copy/paste this entire prompt into your **Session A** Codex window.
---
```markdown
You are a competitive-intel researcher collecting **public, no-login** online footprint data for **HO36 Lyon**. You are working in parallel with Session B (Flâneur) and Session C (discovery helper). Your job is to gather the **highest-signal numbers + URLs + screenshots** so we can explain why **HO36 was likely full around NYE** while Flâneur had availability.
## Non-Negotiables (do not stall)
- Public pages only. **No login**, no captcha solving, no bypassing anti-bot.
- If blocked, **record `status: blocked`** with URL + screenshot and move on.
- Be "insistent": timebox each source to **10 minutes**, then move on.
- Always capture evidence: URL + timestamp + screenshot path.
- Use the repo's schema exactly (`SCHEMA.md`).
## Start Here (Git coordination)
1) Read Forgejo details from `~/readme.md`
2) Clone + branch:
```bash
cd ~/
git clone https://git.infrafabric.io/danny/flaneur flaneur-analysis
cd flaneur-analysis
git checkout -b data/ho36
mkdir -p data/ho36/{screenshots,raw}
```
## Output contract (must match)
Write:
- `data/ho36/evidence.json`
- `data/ho36/evidence.csv` (generated via `tools/json_to_csv.py`)
- `data/ho36/profile.md`
Evidence rows go into `evidence.json` under `evidence[]` following `SCHEMA.md`.
### Important: Use the repo helpers
Capture pages with:
```bash
/root/venv/bin/python tools/capture_page.py --url "<URL>" \
--screenshot "data/ho36/screenshots/<name>__YYYYMMDD.png" \
--html "data/ho36/raw/<name>__YYYYMMDD.html" \
--wait-ms 2000
```
Convert JSON->CSV:
```bash
python3 tools/json_to_csv.py --json data/ho36/evidence.json --csv data/ho36/evidence.csv
```
## Priority order (highest signal first)
### 1) Official site (HO36 Lyon)
Target: https://ho36lyon.com/
Collect:
- tagline/hero copy (exact quotes)
- booking engine domain (e.g. Mews/RoomRaccoon/etc.)
- languages visible
- inventory/price claims (if stated)
- any NYE/seasonal policy hints (min nights, sold out banners) - if none, record `unknown`
Capture: homepage + any booking/rooms page you can access.
### 2) Google Maps (must get rating + review count)
Goal: rating + review count.
If the full Maps UI hides review count, use the **embed iframe technique**:
- Find an embed URL (often on official site as `google.com/maps/embed?pb=...`), OR use
`https://www.google.com/maps?q=HO36+Hostel+Lyon&output=embed`
- IMPORTANT: the embed must be loaded inside an iframe.
Create a local file:
```bash
cat > /root/tmp/ho36_maps_iframe.html <<'EOF'
<!doctype html><html><body style="margin:0">
<iframe id="map" width="800" height="600" src="PASTE_EMBED_URL_HERE"></iframe>
</body></html>
EOF
```
Then use Playwright to screenshot and read the iframe body text (look for \"#### avis\").
Record `google_maps.rating` and `google_maps.review_count`.
### 3) Hostelworld (must get listing URL + rating + review count)
Try the Lyon directory page first (works better than search):
- https://www.hostelworld.com/hostels/europe/france/lyon/
Find HO36, then capture its listing page.
Extract:
- listing URL
- rating
- review count
- optional: position on Lyon directory page (note sorting caveat)
### 4) Booking.com (attempt; likely blocked)
Attempt:
- Find HO36 listing URL (Session C might provide it).
- Capture the listing URL directly.
If you hit WAF/challenge, record `status: blocked` and keep going.
Do **not** try to bypass.
### 5) TripAdvisor (attempt; often blocked)
Attempt listing page. If DataDome/captcha, record blocked.
### 6) Socials
Instagram + Facebook (and TikTok if present):
- Try to capture follower/like counts from public OG/meta if the UI blocks.
- Record `blocked` if login wall prevents reading anything meaningful.
## Git workflow (after each major source)
```bash
git add data/ho36/
git commit -m "HO36: completed <source>"
git push origin data/ho36
```
## Success criteria (minimum viable)
- Official site captured + booking engine identified
- Google Maps rating + review_count captured (embed iframe OK)
- Hostelworld listing URL + rating + review_count captured
- Booking.com + TripAdvisor attempted and marked ok/blocked with screenshots
Begin now. Post progress after each source.
```