# Session Resilience (avoid “lost work” when chats reset)

Codex/chat sessions can lose **conversation context** when a connection drops. The filesystem does not: the durable source of truth is the repo.

This project adds a few boring mechanisms to make resuming work deterministic.

## What to trust

- **Repo state**: `README.md`, `STATUS.md`, `docs/` are canonical.
- **CI**: `./scripts/ci.sh` is the fastest sanity check.
- **Artifacts**: `out/` contains the latest reports from CI runs.

## Quick resume checklist (30 seconds)

From the repo root:

- `./scripts/resume.sh` (recommended)
- `./scripts/audit.sh` (raw)
- `./scripts/state.sh` (writes `docs/SESSION_STATE.md` for pasteable state)
- `./scripts/ci.sh`

If both look sane, you’re back.

## Multi-session coordination (avoid Codex waiting)

When running multiple Codex sessions in parallel, coordinate through the repo:

- Task queue: `docs/13-task-board.md`
- Session context: `docs/SESSION_STATE.md`
- Durable restore points: `docs/CHECKPOINTS.md` + `out/checkpoints/`

Workflow:

1. Pick a task from `docs/13-task-board.md` and **claim it** (owner + working_set).
2. Stay inside your `working_set` to avoid conflicts.
3. Run `./scripts/ci.sh`.
4. Create a restore point: `./scripts/checkpoint.sh "task <id>: <note>"`
5. Mark the task done and paste the checkpoint reference.

If a change must block others (shared files), create a temporary lock file:

- `docs/LOCK_<area>.md`

## Coordination contract (short)

Use this when handing off across sessions:

1. **Claim the task** in `docs/13-task-board.md`.
2. **Declare the working set** in `docs/SESSION_STATE.md`.
3. **Lock shared areas** via `docs/LOCK_<area>.md` if needed.
4. **Checkpoint after changes** (`./scripts/checkpoint.sh "note"`).
5. **Close the task** and paste the checkpoint reference.

## “It looks rolled back” (most common disconnect trap)

If you resume a session and it looks like files “disappeared”, it’s almost always one of:

- You’re looking at a **clean git checkout** somewhere else (remote machine / new container) and the prior work was never committed.
- You’re looking at a **stale copy outside the repo** (see warning below).

What to do:

1. Run `./scripts/audit.sh` and inspect the `git status --porcelain` output.
2. Check `out/checkpoints/` for the latest `iftypeset_checkpoint_*.tar.gz`.
3. If you need to move the work to a new machine/session, copy the latest checkpoint tarball and extract it:

   - `tar -xzf out/checkpoints/iftypeset_checkpoint_<timestamp>.tar.gz -C <new_dir>`

Important: checkpoints snapshot the repo tree (including untracked files). They are the durable “chat-proof” handoff mechanism even when you don’t want to commit yet.

## Create a checkpoint (2 minutes)

When you finish a meaningful chunk of work (new rule batches, QA changes, renderer changes), run:

- `./scripts/checkpoint.sh "what changed"`

This:

- runs CI and stores the CI JSON in `out/checkpoints/`
- creates a compressed snapshot tarball in `out/checkpoints/`
- appends a new entry to `docs/CHECKPOINTS.md` with the snapshot hash
- writes `docs/SESSION_STATE.md` so the snapshot includes a pasteable “resume context”

This gives you a **portable restore point** even if the chat transcript is gone.

## Best practice (recommended)

- Push to a remote early (Forgejo/GitHub). A remote is the best anti-loss mechanism.
- Treat `STATUS.md` as the “1-page truth” for what exists and what’s next.
- Don’t rely on chat logs for state; copy any critical decisions into `docs/`.

## Common “it looks rolled back” trap

If you have multiple status documents on disk, prefer the one inside the repo:

- ✅ canonical: `ai-workspace/iftypeset/docs/09-project-status.md`
- ⚠️ non-canonical copies (e.g., `/root/docs/09-project-status.md`) can drift and misreport counts.