iftypeset/docs/06-project-overview.md
codex e92f1c3b93
Some checks are pending
ci / ci (push) Waiting to run
iftypeset: document CI pipeline + Playwright + font contract
2026-01-08 18:10:41 +00:00

4.7 KiB

iftypeset (pubstyle) — project overview

This document is a narrative snapshot meant for handoffs and external reviewers.

Where we come from

Most Markdown→PDF pipelines optimize for “it renders” and stop there. In practice, teams who ship PDFs for real audiences (customers, regulators, courts, boards) care about the failure modes:

  • links that wrap into unreadable fragments
  • tables that overflow or clip
  • headings stranded at the bottom of a page
  • inconsistent numbering/citations
  • “looks fine on my machine” drift when renderers/fonts change

iftypeset starts from a simple premise: quality must be measurable and enforceable.

Where we are (today)

We have a working foundation that is intentionally boring:

  • A machine-readable rule registry (spec/rules/**.ndjson) that stores paraphrased rules and pointer refs back to primary sources (Chicago / Bringhurst), without reproducing book text.
  • A profile system (spec/profiles/*.yaml) that maps typographic intent into deterministic render tokens (page size, margins, font stacks, measure targets, hyphenation policy).
  • Post-render QA gates (spec/quality_gates.yaml) that define hard numeric thresholds for layout failures.
  • A working CLI surface (iftypeset.cli) that can validate the spec, emit coverage reports, lint Markdown, render HTML/CSS, render PDF (via available engines), and run QA.
  • report now generates a human HTML index at out/report/index.html linking all artifacts.

Current progress is tracked in STATUS.md and out/coverage-summary.md.

What changed (2026-01-04)

  • Expanded the rule registry with new category batches (abbreviations, i18n, frontmatter, backmatter, figures) and the first editorial batch (house-style, pointer-backed).
  • Implemented deterministic lint enforcement for key MUST rules plus DOI references, ordinal suffix errors, and note-marker placement, with safe --fix modes.
  • Strengthened HTML-only QA and added PDF-aware QA (Poppler pdftotext + pdftohtml -xml) for page-aware heuristics; incidents include widow_pdf, orphan_pdf, stranded_heading_pdf, overfull_line_pdf, overfull_bbox_pdf, and runt_final_page_pdf.
  • Improved degraded-mode parsing heuristics to reduce false positives and added fixtures/tests to keep scripts/ci.sh green.
  • Added a human-friendly report index (out/report/index.html) for quick artifact navigation.
  • Added a pinned Docker runtime (Playwright + Poppler + fonts) for reproducible CI/local runs (docs/15-docker.md).
  • Deepened profile tokens and CSS handling for tables, figures, and code blocks (table layout + wrapping, figure sizing, code block padding/wrap) to reduce overflow risk.

Where we are going (v0.1 → v1)

v0.1: “Publishing CI” for a single Markdown input

The v0.1 goal is not to be the best renderer. Its to be the most reliable pipeline:

  • deterministic HTML/CSS output for a chosen profile
  • PDF generation via adapters (Playwright default; others later)
  • QA reports that catch common layout failures and fail the build when thresholds are exceeded
  • an honest manual checklist for rules that cannot be automated

Definition of done lives in docs/01-demo-acceptance.md.

v0.2+: broaden rule coverage + deepen QA gates

Once the pipeline is stable, we expand breadth and depth:

  • add more rule categories (figures, frontmatter/backmatter, abbreviations, i18n, accessibility)
  • increase post-render QA coverage (widows/orphans, keep constraints, overfull lines)
  • add more fixtures to harden degraded-mode handling

v1: “adapter-compatible” quality gates

Longer-term, iftypeset should work with the majors:

  • keep the “meaning” in profiles + QA, not in a single renderer
  • support swapping PDF engines without losing the ability to measure quality consistently

Renderer strategy is documented in docs/04-renderer-strategy.md.

Traps to avoid (so we dont drift)

  • Copying book text into the repo: we can use OCR to locate pointers, but we must not persist verbatim passages.
  • Pretending manual rules dont exist: if it cant be enforced, it must land in manual-checklist.md with a pointer.
  • Overfitting to one renderer: adapters are the point; pinning is allowed, lock-in is not.
  • Unmeasurable QA gates: if we cant measure it reliably, its a “should” or “manual”, not a “must”.

Why this is valuable

The differentiator is not “Markdown to PDF”. Its:

A) auditable rules (paraphrase + pointer discipline) and
B) enforceable layout QA (fail the build when its sloppy).

Thats what makes it compatible with governance workflows (hash/sign artifacts, attach QA reports, reproduce later) and usable in constrained CI environments (like Forgejo PDF export workers).