|
|
||
|---|---|---|
| .. | ||
| mappings | ||
| bringhurst_locate.py | ||
| chicago_ocr.py | ||
| coverage_ocr_audit.py | ||
| coverage_summary.py | ||
| ndjson_patch.py | ||
| README.md | ||
iftypeset tools (ephemeral extraction helpers)
These helpers exist to support pointer-based rule creation from purchased reference PDFs without storing or committing copyrighted text.
Rules of engagement (non-negotiable):
- Do not check in OCR output or full extracted book text.
- Use these tools to locate where guidance lives (section/page) and to inform paraphrased rule records.
source_refsinspec/rules/**.ndjsonmust be pointers (e.g.,CMOS18 §6 p377), not quotes.- Keep any OCR artifacts ephemeral (prefer
/tmp, delete images after OCR).
Tools in this folder may print short snippets to stdout for operator convenience. That is okay for local use; do not redirect that output into committed files.
coverage_summary.py
Deterministic summary of spec/coverage/*.json (sections + unique rule IDs).
Writes JSON + Markdown summaries to out/coverage-summary.json and
out/coverage-summary.md.
coverage_ocr_audit.py
Audit helper that compares OCR-detected section numbers only to a coverage map within a scan-page range, and performs pointer sanity checks (scan page + printed page).