# Session Resilience (avoid “lost work” when chats reset) Codex/chat sessions can lose **conversation context** when a connection drops. The filesystem does not: the durable source of truth is the repo. This project adds a few boring mechanisms to make resuming work deterministic. ## What to trust - **Repo state**: `README.md`, `STATUS.md`, `docs/` are canonical. - **CI**: `./scripts/ci.sh` is the fastest sanity check. - **Artifacts**: `out/` contains the latest reports from CI runs. ## Quick resume checklist (30 seconds) From the repo root: - `./scripts/resume.sh` (recommended) - `./scripts/audit.sh` (raw) - `./scripts/state.sh` (writes `docs/SESSION_STATE.md` for pasteable state) - `./scripts/ci.sh` If both look sane, you’re back. ## Multi-session coordination (avoid Codex waiting) When running multiple Codex sessions in parallel, coordinate through the repo: - Task queue: `docs/13-task-board.md` - Session context: `docs/SESSION_STATE.md` - Durable restore points: `docs/CHECKPOINTS.md` + `out/checkpoints/` Workflow: 1. Pick a task from `docs/13-task-board.md` and **claim it** (owner + working_set). 2. Stay inside your `working_set` to avoid conflicts. 3. Run `./scripts/ci.sh`. 4. Create a restore point: `./scripts/checkpoint.sh "task : "` 5. Mark the task done and paste the checkpoint reference. If a change must block others (shared files), create a temporary lock file: - `docs/LOCK_.md` ## Coordination contract (short) Use this when handing off across sessions: 1. **Claim the task** in `docs/13-task-board.md`. 2. **Declare the working set** in `docs/SESSION_STATE.md`. 3. **Lock shared areas** via `docs/LOCK_.md` if needed. 4. **Checkpoint after changes** (`./scripts/checkpoint.sh "note"`). 5. **Close the task** and paste the checkpoint reference. ## “It looks rolled back” (most common disconnect trap) If you resume a session and it looks like files “disappeared”, it’s almost always one of: - You’re looking at a **clean git checkout** somewhere else (remote machine / new container) and the prior work was never committed. - You’re looking at a **stale copy outside the repo** (see warning below). What to do: 1. Run `./scripts/audit.sh` and inspect the `git status --porcelain` output. 2. Check `out/checkpoints/` for the latest `iftypeset_checkpoint_*.tar.gz`. 3. If you need to move the work to a new machine/session, copy the latest checkpoint tarball and extract it: - `tar -xzf out/checkpoints/iftypeset_checkpoint_.tar.gz -C ` Important: checkpoints snapshot the repo tree (including untracked files). They are the durable “chat-proof” handoff mechanism even when you don’t want to commit yet. ## Create a checkpoint (2 minutes) When you finish a meaningful chunk of work (new rule batches, QA changes, renderer changes), run: - `./scripts/checkpoint.sh "what changed"` This: - runs CI and stores the CI JSON in `out/checkpoints/` - creates a compressed snapshot tarball in `out/checkpoints/` - appends a new entry to `docs/CHECKPOINTS.md` with the snapshot hash - writes `docs/SESSION_STATE.md` so the snapshot includes a pasteable “resume context” This gives you a **portable restore point** even if the chat transcript is gone. ## Best practice (recommended) - Push to a remote early (Forgejo/GitHub). A remote is the best anti-loss mechanism. - Treat `STATUS.md` as the “1-page truth” for what exists and what’s next. - Don’t rely on chat logs for state; copy any critical decisions into `docs/`. ## Common “it looks rolled back” trap If you have multiple status documents on disk, prefer the one inside the repo: - ✅ canonical: `ai-workspace/iftypeset/docs/09-project-status.md` - ⚠️ non-canonical copies (e.g., `/root/docs/09-project-status.md`) can drift and misreport counts.