iftypeset/tools
codex e92f1c3b93
Some checks are pending
ci / ci (push) Waiting to run
iftypeset: document CI pipeline + Playwright + font contract
2026-01-08 18:10:41 +00:00
..
mappings Initial iftypeset pipeline 2026-01-03 20:29:35 +00:00
bringhurst_locate.py Initial iftypeset pipeline 2026-01-03 20:29:35 +00:00
chicago_ocr.py Initial iftypeset pipeline 2026-01-03 20:29:35 +00:00
coverage_ocr_audit.py iftypeset: document CI pipeline + Playwright + font contract 2026-01-08 18:10:41 +00:00
coverage_summary.py iftypeset: document CI pipeline + Playwright + font contract 2026-01-08 18:10:41 +00:00
ndjson_patch.py Initial iftypeset pipeline 2026-01-03 20:29:35 +00:00
README.md iftypeset: document CI pipeline + Playwright + font contract 2026-01-08 18:10:41 +00:00

iftypeset tools (ephemeral extraction helpers)

These helpers exist to support pointer-based rule creation from purchased reference PDFs without storing or committing copyrighted text.

Rules of engagement (non-negotiable):

  • Do not check in OCR output or full extracted book text.
  • Use these tools to locate where guidance lives (section/page) and to inform paraphrased rule records.
  • source_refs in spec/rules/**.ndjson must be pointers (e.g., CMOS18 §6 p377), not quotes.
  • Keep any OCR artifacts ephemeral (prefer /tmp, delete images after OCR).

Tools in this folder may print short snippets to stdout for operator convenience. That is okay for local use; do not redirect that output into committed files.

coverage_summary.py

Deterministic summary of spec/coverage/*.json (sections + unique rule IDs). Writes JSON + Markdown summaries to out/coverage-summary.json and out/coverage-summary.md.

coverage_ocr_audit.py

Audit helper that compares OCR-detected section numbers only to a coverage map within a scan-page range, and performs pointer sanity checks (scan page + printed page).