iftypeset (pubstyle) — project overview

This document is a narrative snapshot meant for handoffs and external reviewers.

Where we come from

Most Markdown→PDF pipelines optimize for “it renders” and stop there. In practice, teams who ship PDFs for real audiences (customers, regulators, courts, boards) care about the failure modes:

links that wrap into unreadable fragments
tables that overflow or clip
headings stranded at the bottom of a page
inconsistent numbering/citations
“looks fine on my machine” drift when renderers/fonts change

iftypeset starts from a simple premise: quality must be measurable and enforceable.

Where we are (today)

We have a working foundation that is intentionally boring:

A machine-readable rule registry (spec/rules/**.ndjson) that stores paraphrased rules and pointer refs back to primary sources (Chicago / Bringhurst), without reproducing book text.
A profile system (spec/profiles/*.yaml) that maps typographic intent into deterministic render tokens (page size, margins, font stacks, measure targets, hyphenation policy).
Post-render QA gates (spec/quality_gates.yaml) that define hard numeric thresholds for layout failures.
A working CLI surface (iftypeset.cli) that can validate the spec, emit coverage reports, lint Markdown, render HTML/CSS, render PDF (via available engines), and run QA.
report now generates a human HTML index at out/report/index.html linking all artifacts.

Current progress is tracked in STATUS.md and out/coverage-summary.md.

What changed (2026-01-04)

Expanded the rule registry with new category batches (abbreviations, i18n, frontmatter, backmatter, figures) and the first editorial batch (house-style, pointer-backed).
Implemented deterministic lint enforcement for key MUST rules plus DOI references, ordinal suffix errors, and note-marker placement, with safe --fix modes.
Strengthened HTML-only QA and added PDF-aware QA (Poppler pdftotext + pdftohtml -xml) for page-aware heuristics; incidents include widow_pdf, orphan_pdf, stranded_heading_pdf, overfull_line_pdf, overfull_bbox_pdf, and runt_final_page_pdf.
Improved degraded-mode parsing heuristics to reduce false positives and added fixtures/tests to keep scripts/ci.sh green.
Added a human-friendly report index (out/report/index.html) for quick artifact navigation.
Added a pinned Docker runtime (Playwright + Poppler + fonts) for reproducible CI/local runs (docs/15-docker.md).
Deepened profile tokens and CSS handling for tables, figures, and code blocks (table layout + wrapping, figure sizing, code block padding/wrap) to reduce overflow risk.

Where we are going (v0.1 → v1)

v0.1: “Publishing CI” for a single Markdown input

The v0.1 goal is not to be the best renderer. It’s to be the most reliable pipeline:

deterministic HTML/CSS output for a chosen profile
PDF generation via adapters (Playwright default; others later)
QA reports that catch common layout failures and fail the build when thresholds are exceeded
an honest manual checklist for rules that cannot be automated

Definition of done lives in docs/01-demo-acceptance.md.

v0.2+: broaden rule coverage + deepen QA gates

Once the pipeline is stable, we expand breadth and depth:

add more rule categories (figures, frontmatter/backmatter, abbreviations, i18n, accessibility)
increase post-render QA coverage (widows/orphans, keep constraints, overfull lines)
add more fixtures to harden degraded-mode handling

v1: “adapter-compatible” quality gates

Longer-term, iftypeset should work with the majors:

keep the “meaning” in profiles + QA, not in a single renderer
support swapping PDF engines without losing the ability to measure quality consistently

Renderer strategy is documented in docs/04-renderer-strategy.md.

Traps to avoid (so we don’t drift)

Copying book text into the repo: we can use OCR to locate pointers, but we must not persist verbatim passages.
Pretending manual rules don’t exist: if it can’t be enforced, it must land in manual-checklist.md with a pointer.
Overfitting to one renderer: adapters are the point; pinning is allowed, lock-in is not.
Unmeasurable QA gates: if we can’t measure it reliably, it’s a “should” or “manual”, not a “must”.

Why this is valuable

The differentiator is not “Markdown to PDF”. It’s:

A) auditable rules (paraphrase + pointer discipline) and
B) enforceable layout QA (fail the build when it’s sloppy).

That’s what makes it compatible with governance workflows (hash/sign artifacts, attach QA reports, reproduce later) and usable in constrained CI environments (like Forgejo PDF export workers).

4.7 KiB Raw Export PDF Permalink Blame History Unescape Escape