3.8 KiB
Session Resilience (avoid “lost work” when chats reset)
Codex/chat sessions can lose conversation context when a connection drops. The filesystem does not: the durable source of truth is the repo.
This project adds a few boring mechanisms to make resuming work deterministic.
What to trust
- Repo state:
README.md,STATUS.md,docs/are canonical. - CI:
./scripts/ci.shis the fastest sanity check. - Artifacts:
out/contains the latest reports from CI runs.
Quick resume checklist (30 seconds)
From the repo root:
./scripts/resume.sh(recommended)./scripts/audit.sh(raw)./scripts/state.sh(writesdocs/SESSION_STATE.mdfor pasteable state)./scripts/ci.sh
If both look sane, you’re back.
Multi-session coordination (avoid Codex waiting)
When running multiple Codex sessions in parallel, coordinate through the repo:
- Task queue:
docs/13-task-board.md - Session context:
docs/SESSION_STATE.md - Durable restore points:
docs/CHECKPOINTS.md+out/checkpoints/
Workflow:
- Pick a task from
docs/13-task-board.mdand claim it (owner + working_set). - Stay inside your
working_setto avoid conflicts. - Run
./scripts/ci.sh. - Create a restore point:
./scripts/checkpoint.sh "task <id>: <note>" - Mark the task done and paste the checkpoint reference.
If a change must block others (shared files), create a temporary lock file:
docs/LOCK_<area>.md
Coordination contract (short)
Use this when handing off across sessions:
- Claim the task in
docs/13-task-board.md. - Declare the working set in
docs/SESSION_STATE.md. - Lock shared areas via
docs/LOCK_<area>.mdif needed. - Checkpoint after changes (
./scripts/checkpoint.sh "note"). - Close the task and paste the checkpoint reference.
“It looks rolled back” (most common disconnect trap)
If you resume a session and it looks like files “disappeared”, it’s almost always one of:
- You’re looking at a clean git checkout somewhere else (remote machine / new container) and the prior work was never committed.
- You’re looking at a stale copy outside the repo (see warning below).
What to do:
-
Run
./scripts/audit.shand inspect thegit status --porcelainoutput. -
Check
out/checkpoints/for the latestiftypeset_checkpoint_*.tar.gz. -
If you need to move the work to a new machine/session, copy the latest checkpoint tarball and extract it:
tar -xzf out/checkpoints/iftypeset_checkpoint_<timestamp>.tar.gz -C <new_dir>
Important: checkpoints snapshot the repo tree (including untracked files). They are the durable “chat-proof” handoff mechanism even when you don’t want to commit yet.
Create a checkpoint (2 minutes)
When you finish a meaningful chunk of work (new rule batches, QA changes, renderer changes), run:
./scripts/checkpoint.sh "what changed"
This:
- runs CI and stores the CI JSON in
out/checkpoints/ - creates a compressed snapshot tarball in
out/checkpoints/ - appends a new entry to
docs/CHECKPOINTS.mdwith the snapshot hash - writes
docs/SESSION_STATE.mdso the snapshot includes a pasteable “resume context”
This gives you a portable restore point even if the chat transcript is gone.
Best practice (recommended)
- Push to a remote early (Forgejo/GitHub). A remote is the best anti-loss mechanism.
- Treat
STATUS.mdas the “1-page truth” for what exists and what’s next. - Don’t rely on chat logs for state; copy any critical decisions into
docs/.
Common “it looks rolled back” trap
If you have multiple status documents on disk, prefer the one inside the repo:
- ✅ canonical:
ai-workspace/iftypeset/docs/09-project-status.md - ⚠️ non-canonical copies (e.g.,
/root/docs/09-project-status.md) can drift and misreport counts.