iftypeset/STATUS.md

# iftypeset status (pubstyle)

**Updated:** 2026-01-04
**Project root:** `/root/ai-workspace/iftypeset/`

## What exists (working today)

- **Spec + schema:** `spec/schema/rule.schema.json`, `spec/manifest.yaml`
- **Profiles:** `spec/profiles/*.yaml` (`web_pdf`, `print_pdf`, `dense_tech`, `memo`, `slide_deck`, `webtypography_nc` non-commercial)
- **Post-render QA gates:** `spec/quality_gates.yaml`
- **PDF-aware QA (heuristic):** Poppler text extraction via `pdftotext` + `pdftohtml -xml` with page-aware incidents (`widow_pdf`, `orphan_pdf`, `stranded_heading_pdf`, `overfull_line_pdf`, `overfull_bbox_pdf`)
- **HTML QA (v0):** catches bare URL/DOI/email wrapping, overfull tokens, code/table overflow (profile-aware thresholds)
- **Deterministic lint coverage:** DOI references, ordinal suffix errors, and note-marker placement (alongside existing punctuation/link safety checks)
- **Rule registry (seeded):** `spec/rules/**.ndjson`
- **House pointers (no quotes):** `spec/house/HOUSE_EDITORIAL_POINTERS.md`
- **Indexes (derived):** `spec/indexes/*.json` (rebuildable)
- **CLI:** `validate-spec`, `report`, `lint`, `render-html`, `render-pdf`, `qa`, `emit-css`, `doctor`, `bundle`, `run`
  - `render-pdf --engine <name>` to force a specific renderer.
  - `run --require-pdf` to fail the pipeline if PDF rendering fails.
- **Renderer default:** `playwright` is the default engine for PDF typesetting.
- **CLI introspection:** `profiles list`, `gates show`, `rules list/show`
- **Config defaults:** `iftypeset.yaml` (CLI flags override)
- **Run summary artifact:** `out/run-summary.json` (CI-style pipeline output)
- **Multi-doc run mode:** `iftypeset run --input <dir|glob>` writes per-doc outputs under `out/docs/` and a top-level run index at `out/report/index.html`
- **Ephemeral extraction helpers:** `tools/` (Chicago OCR is grep-only, temp files deleted)
- **Forgejo integration note:** `forgejo/README.md`
- **Fixtures + tests:** `fixtures/` and `tests/`
- **CI script:** `scripts/ci.sh` (validate-spec, report, unit tests)
- **Docker runtime:** `Dockerfile` + `docs/15-docker.md` (Playwright + Poppler + fonts)
- **“Don’t lose work” tooling:** `scripts/audit.sh`, `scripts/checkpoint.sh`, `scripts/state.sh`, `scripts/resume.sh` + `docs/07-session-resilience.md`
- **Trust contract artifacts:** `out/trust-contract.md` + `out/trust-contract.json` (generated by `report`)
- **Report index:** `out/report/index.html` (artifact hub from `report`)
- **Doctor report:** `out/doctor.md` + `out/doctor.json` (environment + determinism)
- **Bundle artifact:** `out/iftypeset-bundle.tar.gz` + `out/bundle-manifest.json` (portable review pack)

## Rule corpus snapshot

From `out/coverage-report.json`:

- **Total rules:** 524
- **By category:** editorial 45, abbreviations 27, accessibility 22, backmatter 18, citations 61, code 28, figures 22, frontmatter 20, headings 32, i18n 27, layout 46, links 21, numbers 62, punctuation 55, tables 23, typography 15
- **By enforcement:** manual 379, typeset 62, lint 70, postrender 13
- **By severity:** must 37, should 470, warn 17

## Current rule batches

- `spec/rules/abbreviations/v1_abbreviations_003.ndjson` (27)
- `spec/rules/accessibility/v1_accessibility_001.ndjson` (4)
- `spec/rules/accessibility/v1_accessibility_003.ndjson` (18)
- `spec/rules/backmatter/v1_backmatter_003.ndjson` (18)
- `spec/rules/citations/v1_citations_001.ndjson` (16)
- `spec/rules/citations/v1_citations_002.ndjson` (45)
- `spec/rules/code/v1_code_001.ndjson` (4)
- `spec/rules/code/v1_code_003.ndjson` (24)
- `spec/rules/editorial/v1_editorial_001.ndjson` (45)
- `spec/rules/figures/v1_figures_003.ndjson` (22)
- `spec/rules/frontmatter/v1_frontmatter_003.ndjson` (20)
- `spec/rules/headings/v1_headings_001.ndjson` (12)
- `spec/rules/headings/v1_headings_002.ndjson` (20)
- `spec/rules/i18n/v1_i18n_003.ndjson` (27)
- `spec/rules/layout/v1_layout_001.ndjson` (12)
- `spec/rules/layout/v1_layout_002.ndjson` (30)
- `spec/rules/layout/v1_layout_003.ndjson` (4)
- `spec/rules/links/v1_links_001.ndjson` (5)
- `spec/rules/links/v1_links_003.ndjson` (16)
- `spec/rules/numbers/v1_numbers_001.ndjson` (12)
- `spec/rules/numbers/v1_numbers_002.ndjson` (50)
- `spec/rules/punctuation/v1_punctuation_001.ndjson` (15)
- `spec/rules/punctuation/v1_punctuation_002.ndjson` (40)
- `spec/rules/tables/v1_tables_001.ndjson` (8)
- `spec/rules/tables/v1_tables_002.ndjson` (15)
- `spec/rules/typography/v1_typography_001.ndjson` (8)
- `spec/rules/typography/v1_typography_002.ndjson` (7)

## How to validate and inspect

- Validate spec + rebuild indexes:
  - `PYTHONPATH=src python3 -m iftypeset.cli validate-spec --spec spec --build-indexes`
- Lint:
  - `PYTHONPATH=src python3 -m iftypeset.cli lint --input fixtures/sample.md --out out --profile web_pdf`
- Run end-to-end pipeline:
  - `PYTHONPATH=src python3 -m iftypeset.cli run --input fixtures/sample.md --out out --profile web_pdf --degraded-ok`
  - `PYTHONPATH=src python3 -m iftypeset.cli run --input fixtures/sample.md --out out --profile web_pdf --require-pdf`
- Render HTML + CSS:
  - `PYTHONPATH=src python3 -m iftypeset.cli render-html --input fixtures/sample.md --out out --profile web_pdf`
- Render PDF (if renderer installed):
  - `PYTHONPATH=src python3 -m iftypeset.cli render-pdf --input fixtures/sample.md --out out --profile web_pdf`
- Run QA gates (HTML fallback if no PDF):
  - `PYTHONPATH=src python3 -m iftypeset.cli qa --out out --profile web_pdf`
- Coverage report:
  - `PYTHONPATH=src python3 -m iftypeset.cli report --spec spec --out out --build-indexes`
- Emit CSS for a profile:
  - `PYTHONPATH=src python3 -m iftypeset.cli emit-css --spec spec --profile web_pdf --out out-css`
- Run unit tests:
  - `python3 -m unittest discover -s tests -p 'test_*.py'`

## Key constraints (don’t drift)

- **No bulk OCR/transcription** of books into repo. Rules must be paraphrased and pointer-backed.
- `source_refs` must be **pointers**, not quotes; include `(scan pN)` only as a single page hint.
- Chicago extraction may use OCR **ephemerally** only to locate pointers; do not persist OCR output.

## Next work (highest leverage)

- Extend deterministic lint coverage for citations (author-date patterns, bibliography normalization) and locale/i18n rules where feasible.
- Deepen typeset/CSS profile mapping for tables, figures, and code to reduce manual cleanup before PDF export (and to reduce QA incidents).
- Expand editorial automation beyond manual checklists (safe heuristics like headline ALL-CAPS checks and abbreviation definition hints).