iftypeset/STATUS.md
codex e92f1c3b93
Some checks are pending
ci / ci (push) Waiting to run
iftypeset: document CI pipeline + Playwright + font contract
2026-01-08 18:10:41 +00:00

6.4 KiB
Raw Export PDF Blame History

iftypeset status (pubstyle)

Updated: 2026-01-04
Project root: /root/ai-workspace/iftypeset/

What exists (working today)

  • Spec + schema: spec/schema/rule.schema.json, spec/manifest.yaml
  • Profiles: spec/profiles/*.yaml (web_pdf, print_pdf, dense_tech, memo, slide_deck, webtypography_nc non-commercial)
  • Post-render QA gates: spec/quality_gates.yaml
  • PDF-aware QA (heuristic): Poppler text extraction via pdftotext + pdftohtml -xml with page-aware incidents (widow_pdf, orphan_pdf, stranded_heading_pdf, overfull_line_pdf, overfull_bbox_pdf)
  • HTML QA (v0): catches bare URL/DOI/email wrapping, overfull tokens, code/table overflow (profile-aware thresholds)
  • Deterministic lint coverage: DOI references, ordinal suffix errors, and note-marker placement (alongside existing punctuation/link safety checks)
  • Rule registry (seeded): spec/rules/**.ndjson
  • House pointers (no quotes): spec/house/HOUSE_EDITORIAL_POINTERS.md
  • Indexes (derived): spec/indexes/*.json (rebuildable)
  • CLI: validate-spec, report, lint, render-html, render-pdf, qa, emit-css, doctor, bundle, run
    • render-pdf --engine <name> to force a specific renderer.
    • run --require-pdf to fail the pipeline if PDF rendering fails.
  • Renderer default: playwright is the default engine for PDF typesetting.
  • CLI introspection: profiles list, gates show, rules list/show
  • Config defaults: iftypeset.yaml (CLI flags override)
  • Run summary artifact: out/run-summary.json (CI-style pipeline output)
  • Multi-doc run mode: iftypeset run --input <dir|glob> writes per-doc outputs under out/docs/ and a top-level run index at out/report/index.html
  • Ephemeral extraction helpers: tools/ (Chicago OCR is grep-only, temp files deleted)
  • Forgejo integration note: forgejo/README.md
  • Fixtures + tests: fixtures/ and tests/
  • CI script: scripts/ci.sh (validate-spec, report, unit tests)
  • Docker runtime: Dockerfile + docs/15-docker.md (Playwright + Poppler + fonts)
  • “Dont lose work” tooling: scripts/audit.sh, scripts/checkpoint.sh, scripts/state.sh, scripts/resume.sh + docs/07-session-resilience.md
  • Trust contract artifacts: out/trust-contract.md + out/trust-contract.json (generated by report)
  • Report index: out/report/index.html (artifact hub from report)
  • Doctor report: out/doctor.md + out/doctor.json (environment + determinism)
  • Bundle artifact: out/iftypeset-bundle.tar.gz + out/bundle-manifest.json (portable review pack)

Rule corpus snapshot

From out/coverage-report.json:

  • Total rules: 524
  • By category: editorial 45, abbreviations 27, accessibility 22, backmatter 18, citations 61, code 28, figures 22, frontmatter 20, headings 32, i18n 27, layout 46, links 21, numbers 62, punctuation 55, tables 23, typography 15
  • By enforcement: manual 379, typeset 62, lint 70, postrender 13
  • By severity: must 37, should 470, warn 17

Current rule batches

  • spec/rules/abbreviations/v1_abbreviations_003.ndjson (27)
  • spec/rules/accessibility/v1_accessibility_001.ndjson (4)
  • spec/rules/accessibility/v1_accessibility_003.ndjson (18)
  • spec/rules/backmatter/v1_backmatter_003.ndjson (18)
  • spec/rules/citations/v1_citations_001.ndjson (16)
  • spec/rules/citations/v1_citations_002.ndjson (45)
  • spec/rules/code/v1_code_001.ndjson (4)
  • spec/rules/code/v1_code_003.ndjson (24)
  • spec/rules/editorial/v1_editorial_001.ndjson (45)
  • spec/rules/figures/v1_figures_003.ndjson (22)
  • spec/rules/frontmatter/v1_frontmatter_003.ndjson (20)
  • spec/rules/headings/v1_headings_001.ndjson (12)
  • spec/rules/headings/v1_headings_002.ndjson (20)
  • spec/rules/i18n/v1_i18n_003.ndjson (27)
  • spec/rules/layout/v1_layout_001.ndjson (12)
  • spec/rules/layout/v1_layout_002.ndjson (30)
  • spec/rules/layout/v1_layout_003.ndjson (4)
  • spec/rules/links/v1_links_001.ndjson (5)
  • spec/rules/links/v1_links_003.ndjson (16)
  • spec/rules/numbers/v1_numbers_001.ndjson (12)
  • spec/rules/numbers/v1_numbers_002.ndjson (50)
  • spec/rules/punctuation/v1_punctuation_001.ndjson (15)
  • spec/rules/punctuation/v1_punctuation_002.ndjson (40)
  • spec/rules/tables/v1_tables_001.ndjson (8)
  • spec/rules/tables/v1_tables_002.ndjson (15)
  • spec/rules/typography/v1_typography_001.ndjson (8)
  • spec/rules/typography/v1_typography_002.ndjson (7)

How to validate and inspect

  • Validate spec + rebuild indexes:
    • PYTHONPATH=src python3 -m iftypeset.cli validate-spec --spec spec --build-indexes
  • Lint:
    • PYTHONPATH=src python3 -m iftypeset.cli lint --input fixtures/sample.md --out out --profile web_pdf
  • Run end-to-end pipeline:
    • PYTHONPATH=src python3 -m iftypeset.cli run --input fixtures/sample.md --out out --profile web_pdf --degraded-ok
    • PYTHONPATH=src python3 -m iftypeset.cli run --input fixtures/sample.md --out out --profile web_pdf --require-pdf
  • Render HTML + CSS:
    • PYTHONPATH=src python3 -m iftypeset.cli render-html --input fixtures/sample.md --out out --profile web_pdf
  • Render PDF (if renderer installed):
    • PYTHONPATH=src python3 -m iftypeset.cli render-pdf --input fixtures/sample.md --out out --profile web_pdf
  • Run QA gates (HTML fallback if no PDF):
    • PYTHONPATH=src python3 -m iftypeset.cli qa --out out --profile web_pdf
  • Coverage report:
    • PYTHONPATH=src python3 -m iftypeset.cli report --spec spec --out out --build-indexes
  • Emit CSS for a profile:
    • PYTHONPATH=src python3 -m iftypeset.cli emit-css --spec spec --profile web_pdf --out out-css
  • Run unit tests:
    • python3 -m unittest discover -s tests -p 'test_*.py'

Key constraints (dont drift)

  • No bulk OCR/transcription of books into repo. Rules must be paraphrased and pointer-backed.
  • source_refs must be pointers, not quotes; include (scan pN) only as a single page hint.
  • Chicago extraction may use OCR ephemerally only to locate pointers; do not persist OCR output.

Next work (highest leverage)

  • Extend deterministic lint coverage for citations (author-date patterns, bibliography normalization) and locale/i18n rules where feasible.
  • Deepen typeset/CSS profile mapping for tables, figures, and code to reduce manual cleanup before PDF export (and to reduce QA incidents).
  • Expand editorial automation beyond manual checklists (safe heuristics like headline ALL-CAPS checks and abbreviation definition hints).