# Rule Coverage Roadmap (CMOS / Bringhurst)

Goal: move coverage close to **100% of rule-bearing sections** in the Chicago Manual of Style (18th ed) and Bringhurst (Elements of Typographic Style), without reproducing book text.

## Definition of "100%"

Coverage is **section-based**, not "full-book OCR." To keep multi-session work honest and queueable, we track two milestones:

- **Milestone A (mapped):** a section is no longer `uncovered` once at least one paraphrased rule exists with a valid pointer. In practice this corresponds to `status=partial` in the coverage map.
- **Milestone B (complete):** a section is only considered complete once it is marked `covered` or `out_of_scope`.
  - `covered` means a completion pass was done and the section’s rule-bearing guidance is believed to be fully represented as paraphrased rules (no verbatim text).
  - `out_of_scope` means the section is primarily narrative/historical/non-prescriptive and is explicitly excluded.

This avoids over-claiming: `partial` means “some rules exist”, not “we’re done.”

## Inputs (local only)

- `/root/docs/_uploads/The Chicago Manual of Style (18th ed OCR).pdf`
- `/root/docs/_uploads/Robert Bringhurst – The Elements of Typographic Style (OCR).pdf`

OCR is used only to locate sections + page pointers. No verbatim text enters the repo.

## Artifacts (new)

- `spec/coverage/cmos18_sections.json`
- `spec/coverage/bring_sections.json`
- `spec/coverage/README.md`
- `spec/coverage/coverage_summary.json` (optional, derived)

Each entry should track: `section_id`, `title`, `pointer`, `status` (`uncovered|partial|covered|out_of_scope`), and `rule_ids[]`.

## Phases

### Phase 0: Coverage Map Scaffold
- Create coverage files + schema conventions.
- Seed with top-level sections and a few subsections to validate the workflow.

### Phase 1: High-Impact Rules (CMOS)
- numbers, punctuation, citations, headings, links.
- Batch size: 150–250 rules per category.

### Phase 2: Structure + Layout (BRING + CMOS)
- typography, layout, widows/orphans, tables, figures.
- Translate to `typeset`/`postrender` where possible.

### Phase 3: Remaining Categories
- code, front/back matter, accessibility, i18n.

### Phase 4: Enforcement Uplift
- Convert top-impact manual rules to `lint`/`postrender` where feasible.
- Tighten QA gates without breaking deterministic behavior.

## Rules of engagement

- Paraphrase only; no verbatim text.
- If exact wording is required, use: `Exact wording required—refer to pointer`.
- Every rule must include a `source_refs` pointer.
- Prefer **numbers-only** audits when validating coverage maps against OCR.
- Run:
  - `PYTHONPATH=src python3 -m iftypeset.cli validate-spec --spec spec --build-indexes`
  - `./scripts/ci.sh`
- Checkpoint after meaningful batches: `./scripts/checkpoint.sh "extraction <category> batch <NNN>"`

## Spot checks (numbers only)

To reduce “missing section” drift, use the OCR audit helper to compare **section numbers only**
against the coverage map within a scan range (and to sanity-check pointer scan/printed pages).

Example (CMOS18 chapter 13 slice):

```bash
python3 tools/coverage_ocr_audit.py \
  --pdf "/root/docs/_uploads/The Chicago Manual of Style (18th ed OCR).pdf" \
  --coverage spec/coverage/cmos18_sections.json \
  --chapter 13 \
  --scan-start 840 \
  --scan-end 870 \
  --out-json out/coverage-ocr-audit-ch13.json \
  --out-md out/coverage-ocr-audit-ch13.md
```

Interpretation:
- `OCR-only` section ids likely need new coverage entries.
- `Coverage-only` section ids often indicate pointer drift or OCR weirdness (verify the scan page).
- `Printed page mismatches` are strong signals that a pointer is wrong or that the scan page is not the referenced content.

## Progress metrics

- `% sections mapped` per book (non-`uncovered`).
- `% sections complete` per book (`covered` + `out_of_scope`).
- `must` rules implemented coverage (CI floor = 95%).
- Overall implemented coverage (CI floor = 80%).