iftypeset/docs/09-project-status.md
codex e92f1c3b93
Some checks are pending
ci / ci (push) Waiting to run
iftypeset: document CI pipeline + Playwright + font contract
2026-01-08 18:10:41 +00:00

4.8 KiB

iftypeset — Project Status (2026-01-03)

What This Project Is

iftypeset is a thin, deterministic typeset runtime for turning Markdown into:

  • Stable, shareable HTML (render-html)
  • A PDF (render-pdf)
  • Machine-readable quality reports (lint, qa, report)

It is paired with a machine-readable rule registry derived from:

  • Chicago Manual of Style (CMOS 18)
  • Bringhurst (Elements of Typographic Style)

Important: rule records are paraphrases only with pointer refs (e.g., CMOS18 §6.2 p377 (scan p10)). This repo must not contain book text.

What Works Today

Verified on master via ./scripts/ci.sh (spec validate + report + unit tests):

  • End-to-end CLI: validate-spec, report, lint, render-html, render-pdf, qa, emit-css
  • Deterministic lint engine with safe rewrite mode (lint --fix --fix-mode rewrite)
  • Deterministic HTML rendering; PDF rendering works when an engine is available (Playwright is the default)
  • QA analyzer (HTML + PDF heuristics) with incident details:
    • long/bare URL/DOI/email wrap incidents
    • overfull token detection
    • table/code overflow incidents (profile-aware thresholds)
    • PDF-aware widow/orphan heuristics via Poppler text extraction (pdftotext -layout)
    • PDF-aware runt final page detection (short last page heuristics)
  • CI wiring for Forgejo: .forgejo/workflows/ci.yml
  • Session resilience tooling:
    • ./scripts/audit.sh prints a compact truth snapshot (git + coverage + checkpoints)
    • ./scripts/checkpoint.sh "note" creates a portable restore tarball recorded in docs/CHECKPOINTS.md

Rule Registry Snapshot (Real Counts)

From out/coverage-report.json (generated by PYTHONPATH=src python3 -m iftypeset.cli report --spec spec --out out):

  • Total rules: 524
  • Enforcement split: manual 379, typeset 62, lint 70, postrender 13
  • Severity split: must 37, should 470, warn 17

Category counts:

  • editorial 45
  • citations 61
  • numbers 62
  • punctuation 55
  • layout 46
  • headings 32
  • tables 23
  • links 21
  • i18n 27
  • abbreviations 27
  • code 28
  • accessibility 22
  • frontmatter 20
  • backmatter 18
  • figures 22

Coverage Map Summary (Sections)

Generated by python3 tools/coverage_summary.py --coverage-dir spec/coverage --out-json out/coverage-summary.json --out-md out/coverage-summary.md:

  • BRING: 64 sections, 284 rules (all partial)
  • CMOS18: 176 sections, 550 rules (all partial)
  • Total unique rule_ids across coverage maps: 834
  • typography 15

Interpretation:

  • The registry is intentionally larger than the enforcement surface.
  • Many rules remain manual_checklist=true by design until we have deterministic enforcement for them.

What This Is Not Yet

  • A full “publication-grade PDF QA” system. PDF-aware checks exist, but are heuristic (text extraction based) and limited in scope.
  • A complete automated implementation of Chicago/Bringhurst. The registry is pointer-backed; enforcement is incremental and explicit.

How To Run (Quick)

cd /root/ai-workspace/iftypeset

Validate + rebuild indexes:

PYTHONPATH=src python3 -m iftypeset.cli validate-spec --spec spec --build-indexes

Generate coverage report:

PYTHONPATH=src python3 -m iftypeset.cli report --spec spec --out out --build-indexes

Lint (and optionally autofix):

PYTHONPATH=src python3 -m iftypeset.cli lint --input fixtures/sample.md --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli lint --input fixtures/sample.md --out out --profile web_pdf --fix --fix-mode rewrite

Render HTML + PDF:

PYTHONPATH=src python3 -m iftypeset.cli render-html --input fixtures/sample.md --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-pdf  --input fixtures/sample.md --out out --profile web_pdf

Run QA:

PYTHONPATH=src python3 -m iftypeset.cli qa --out out --profile web_pdf

Sanity check:

./scripts/ci.sh

Dont Lose Work (Session Resets)

Chat logs are not durable. The repo is.

  • Snapshot: ./scripts/audit.sh
  • Restore point: ./scripts/checkpoint.sh "short note"
  • Checkpoint index: docs/CHECKPOINTS.md

What Remains (Prioritized)

  1. Improve post-render QA beyond current heuristics

    • PDF-aware stranded headings / keep-with-next violations
    • more reliable overflow/clipping detection when a renderer is pinned
  2. Increase implemented rule coverage where it matters

    • citations normalization / author-date variants where feasible
    • i18n/locale-driven checks (without pretending perfect automation)
    • link/DOI wrapping policies (reduce broken PDFs)
  3. Forgejo integration

    • use iftypeset as the Forgejo typeset/export worker
    • emit QA artifacts as export attachments
  4. Continue adding rule batches

    • prioritize what breaks real documents: figures, references, complex tables, long code blocks