iftypeset/docs/04-renderer-strategy.md
codex e92f1c3b93
Some checks are pending
ci / ci (push) Waiting to run
iftypeset: document CI pipeline + Playwright + font contract
2026-01-08 18:10:41 +00:00

4.5 KiB

Multi-renderer Strategy (HTML→PDF adapters)

We should not bet the product on a single PDF engine. iftypeset should be renderer-agnostic: the “meaning” is in the rule registry + profiles + QA gates; the PDF renderer is an interchangeable adapter.

Principles

  • Determinism first: the adapter must emit render-log.json with engine name + version + key options.
  • No-network capable: engines must run with --network=none/offline mode in CI where possible.
  • Graceful degradation: if no PDF engine exists, HTML artifacts + HTML-based QA must still run.
  • Capability disclosure: if a gate cant be measured with an engine, report it explicitly (dont silently pass).

Adapter interface (contract)

All PDF engines implement the same interface:

class PdfEngine(Protocol):
    name: str

    def is_available(self) -> bool: ...
    def version(self) -> str: ...
    def render(self, *, html_path: str, css_path: str, assets_dir: str | None, out_pdf: str, options: dict) -> dict:
        """Returns a structured log: timings, warnings, engine opts, feature flags."""

The CLI should support:

  • --engine auto|playwright|weasyprint|prince|antenna|vivliostyle|wkhtmltopdf
  • --engine-opts <json>

“Majors” to target (pragmatic)

Tier 1 (easy to run, common)

  1. Playwright (browser-backed PDF)
  • via Playwright (preferred)
  • Pros: ubiquitous, good HTML/CSS coverage, easy containerization.
  • Cons: paged-media features vary; footnotes/running headers are limited unless carefully built.
  1. WeasyPrint
  • Pros: pure Python workflow, good paged-media support, easy CI story.
  • Cons: CSS compatibility differs; some complex layouts may need workarounds.

Tier 2 (best print fidelity; commercial)

  1. PrinceXML
  • Pros: excellent paged media, footnotes, running headers, print-quality output.
  • Cons: license cost; needs binary distribution policy.
  1. Antenna House Formatter
  • Pros: top-tier print fidelity; standards publishing; robust PDF/A options.
  • Cons: license + operational complexity.

Tier 3 (useful but limited)

  1. Vivliostyle / Paged.js
  • Pros: strong paged-media model in the web ecosystem.
  • Cons: heavier runtime; often “HTML+JS render” rather than simple CLI.
  1. wkhtmltopdf
  • Pros: simple deploy story in legacy environments.
  • Cons: outdated rendering model; limited CSS; not ideal for “high quality”.

Capability matrix (what we care about)

We should encode an engine capability report (per run) for:

  • paged media (margins, page size, running headers)
  • hyphenation support + dictionaries
  • font embedding/subsetting
  • link handling (wrap/break strategy)
  • footnotes (if we later support them)
  • PDF/A options (later)

This capability map feeds QA:

  • if engine cant support a gate (e.g., true widow/orphan detection on PDF), QA should:
    • run the best available approximation, and
    • mark the gate as skipped with a reason, not passed.

Determinism knobs (must record)

For every PDF render, write out/render-log.json including:

  • engine name + version
  • invocation args
  • environment hints (OS, locale)
  • “self-contained” mode on/off
  • fonts policy + resolution (requested primary fonts, what fontconfig matched, and what fonts were embedded in the PDF)
  • any warnings from the engine

If the engine is a browser:

  • fix viewport
  • disable external requests
  • pin print settings (margins, background graphics, scaling)

Security model

  • Assume untrusted Markdown input (CI context). Mitigations:
    • never execute embedded JS during HTML render (or use a hardened renderer container)
    • disable network
    • restrict filesystem access (mount only out/ and input)
  • If using headless browsers, treat them as an attack surface; run in locked-down containers.
  1. Implement adapters for:
    • Playwright (auto-detect)
    • WeasyPrint (if installed)
  2. Keep Prince/AH as optional adapters (stub + docs) until needed.
  3. Use QA gates as the real value:
    • link wrap, code/table overflow, stranded headings (HTML and PDF when possible)

This keeps delivery fast while preserving “compatible with the majors”.

Future: “Engine parity” testing

Once adapters exist, add an integration job that renders the same fixtures through 2 engines (when available) and compares:

  • gate metrics (should be within thresholds)
  • file size ranges
  • major layout regressions (e.g., table clipping incidents)

We dont need pixel-perfect equivalence; we need “quality gates still pass”.