# Rule Ingestion SOP (Chicago / Bringhurst pointers, no quotes) This is the operator workflow for adding new rules to `iftypeset` without drifting into “we copied the book”. ## Non‑negotiables (repeat them until boring) - **Never store book text** (OCR output, excerpts, paragraphs) in the repo. - Rules are **paraphrases only** + **pointer refs**. - Chicago OCR is allowed **ephemerally** to locate pointers; temp files must be deleted. - If exact wording matters, the rule must say: `Exact wording required—refer to pointer`. ## What you’re producing You add a small batch file under `spec/rules//v1__.ndjson` where each line is a JSON rule record validated by `spec/schema/rule.schema.json`. ## Step-by-step workflow ### 1) Pick the category + severity honestly - Category: one of the `category_taxonomy` buckets in `spec/manifest.yaml`. - Severity: - `must`: blocks release unless profile overrides lower it - `should`: best practice; can be warn in degraded mode - `warn`: advisory If it can’t be automated, tag it `manual_checklist=true` and set `enforcement: manual`. ### 2) Locate the pointer (without storing text) #### Bringhurst (text-layer usable) - Use `tools/bringhurst_locate.py` (preferred) or `ripgrep` directly. - Capture only: - section identifier - book page number (if present) - scan page index (optional) #### Chicago (image scan) - Use `tools/chicago_ocr.py` **grep-only**. - Do not copy OCR output into files. - Use OCR to find: - the relevant section (§) - the printed page number - the scan page index Pointer format examples (note: these are *pointers*, not quotes): - `CMOS18 §6.1 p377 (scan p10)` - `BRING §2.3.2 p39 (scan p412)` Rule: `(scan pN)` is a **single 1-based PDF page index**, not a range. ### 3) Write the rule record (paraphrase only) Create a new NDJSON line. Keep `rule_text` ≤ 800 chars. Prefer short, enforceable statements. Minimal template: ```json { "id": "CMOS.PUNCTUATION.DASHES.EM_DASH", "title": "Use em dashes consistently", "source_refs": ["CMOS18 §X.Y pNNN (scan pMMM)"], "category": "punctuation", "severity": "should", "applies_to": "all", "rule_text": "Paraphrase of the rule (no quotes). If wording matters, say: Exact wording required—refer to pointer.", "rationale": "Why it matters (one line).", "enforcement": "lint", "autofix": "suggest", "autofix_notes": "What we can safely fix (short).", "tags": ["spacing", "manual_checklist=false"], "keywords": ["em dash", "dash", "punctuation"], "dependencies": [], "exceptions": [], "status": "draft" } ``` Guidelines: - **Do not** embed long examples. If you need examples, create them under `spec/examples/` and reference `examples_ref`. - Prefer splitting cross-layer concepts into two rules: - `lint` rule for source cleanliness - `postrender` rule for layout outcome - Use `dependencies` if rule ordering matters (e.g., “normalize quotes” before “ellipsis spacing”). ### 4) Tag manual rules so the checklist can be generated If a rule requires human judgment (e.g., “choose between two valid citation styles”), set: - `enforcement: manual` - `tags: ["manual_checklist=true"]` - `autofix: none` ### 5) Validate + rebuild indexes (every batch) Run: - `PYTHONPATH=src python3 -m iftypeset.cli validate-spec --spec spec --build-indexes` - `PYTHONPATH=src python3 -m iftypeset.cli report --spec spec --out out --build-indexes` Do not merge a batch if schema validation fails. ### 6) Add fixtures / examples (so rules stay enforced) For each batch, add at least: - 1–3 `spec/examples/*` entries that trigger the rule (small, targeted). - 1 fixture doc under `fixtures/` if the rule affects real documents. Rules without fixtures drift into “it exists but nothing enforces it.” ### 7) Promote from draft → active Only set `status: active` when: - the enforcement implementation exists (lint/typeset/postrender/manual) - at least one fixture/example covers it ## Common traps (avoid) - **Copying text into rule_text** (even “short” quotes). Don’t. - **Ranges in scan pages**: use a single `(scan pN)` hint. - **MUST rules that are unenforceable**: tag as manual checklist or downgrade. - **Overfitting to one document**: rules should generalize beyond a single sample. - **“Autofix rewrite” that changes meaning**: keep fixes deterministic and reversible. ## Review checklist (before shipping a batch) - [ ] No book text stored in repo (grep your changes). - [ ] All rules have valid `source_refs` pointers. - [ ] `rule_text` is paraphrase-only and short. - [ ] Manual rules are tagged correctly. - [ ] `validate-spec` + `report` pass. - [ ] At least one fixture/example added for the batch.