# Multi-renderer Strategy (HTML→PDF adapters) We should not bet the product on a single PDF engine. `iftypeset` should be **renderer-agnostic**: the “meaning” is in the rule registry + profiles + QA gates; the PDF renderer is an interchangeable adapter. ## Principles - **Determinism first**: the adapter must emit `render-log.json` with engine name + version + key options. - **No-network capable**: engines must run with `--network=none`/offline mode in CI where possible. - **Graceful degradation**: if no PDF engine exists, HTML artifacts + HTML-based QA must still run. - **Capability disclosure**: if a gate can’t be measured with an engine, report it explicitly (don’t silently pass). ## Adapter interface (contract) All PDF engines implement the same interface: ```python class PdfEngine(Protocol): name: str def is_available(self) -> bool: ... def version(self) -> str: ... def render(self, *, html_path: str, css_path: str, assets_dir: str | None, out_pdf: str, options: dict) -> dict: """Returns a structured log: timings, warnings, engine opts, feature flags.""" ``` The CLI should support: - `--engine auto|playwright|weasyprint|prince|antenna|vivliostyle|wkhtmltopdf` - `--engine-opts ` ## “Majors” to target (pragmatic) ### Tier 1 (easy to run, common) 1) **Playwright (browser-backed PDF)** - via Playwright (preferred) - Pros: ubiquitous, good HTML/CSS coverage, easy containerization. - Cons: paged-media features vary; footnotes/running headers are limited unless carefully built. 2) **WeasyPrint** - Pros: pure Python workflow, good paged-media support, easy CI story. - Cons: CSS compatibility differs; some complex layouts may need workarounds. ### Tier 2 (best print fidelity; commercial) 3) **PrinceXML** - Pros: excellent paged media, footnotes, running headers, print-quality output. - Cons: license cost; needs binary distribution policy. 4) **Antenna House Formatter** - Pros: top-tier print fidelity; standards publishing; robust PDF/A options. - Cons: license + operational complexity. ### Tier 3 (useful but limited) 5) **Vivliostyle / Paged.js** - Pros: strong paged-media model in the web ecosystem. - Cons: heavier runtime; often “HTML+JS render” rather than simple CLI. 6) **wkhtmltopdf** - Pros: simple deploy story in legacy environments. - Cons: outdated rendering model; limited CSS; not ideal for “high quality”. ## Capability matrix (what we care about) We should encode an engine capability report (per run) for: - paged media (margins, page size, running headers) - hyphenation support + dictionaries - font embedding/subsetting - link handling (wrap/break strategy) - footnotes (if we later support them) - PDF/A options (later) This capability map feeds QA: - if engine can’t support a gate (e.g., true widow/orphan detection on PDF), QA should: - run the best available approximation, and - mark the gate as `skipped` with a reason, not `passed`. ## Determinism knobs (must record) For every PDF render, write `out/render-log.json` including: - engine name + version - invocation args - environment hints (OS, locale) - “self-contained” mode on/off - fonts policy + resolution (requested primary fonts, what fontconfig matched, and what fonts were embedded in the PDF) - any warnings from the engine If the engine is a browser: - fix viewport - disable external requests - pin print settings (margins, background graphics, scaling) ## Security model - Assume untrusted Markdown input (CI context). Mitigations: - never execute embedded JS during HTML render (or use a hardened renderer container) - disable network - restrict filesystem access (mount only `out/` and input) - If using headless browsers, treat them as an attack surface; run in locked-down containers. ## Recommended v0.1 path (fastest) 1) Implement adapters for: - Playwright (auto-detect) - WeasyPrint (if installed) 2) Keep Prince/AH as optional adapters (stub + docs) until needed. 3) Use QA gates as the real value: - link wrap, code/table overflow, stranded headings (HTML and PDF when possible) This keeps delivery fast while preserving “compatible with the majors”. ## Future: “Engine parity” testing Once adapters exist, add an integration job that renders the same fixtures through 2 engines (when available) and compares: - gate metrics (should be within thresholds) - file size ranges - major layout regressions (e.g., table clipping incidents) We don’t need pixel-perfect equivalence; we need “quality gates still pass”.