# Multi-renderer Strategy (HTML→PDF adapters)
We should not bet the product on a single PDF engine. `iftypeset` should be **renderer-agnostic**: the “meaning” is in the rule registry + profiles + QA gates; the PDF renderer is an interchangeable adapter.
## Principles
- **Determinism first**: the adapter must emit `render-log.json` with engine name + version + key options.
- **No-network capable**: engines must run with `--network=none`/offline mode in CI where possible.
- **Graceful degradation**: if no PDF engine exists, HTML artifacts + HTML-based QA must still run.
- **Capability disclosure**: if a gate can’t be measured with an engine, report it explicitly (don’t silently pass).
## Adapter interface (contract)
All PDF engines implement the same interface:
```python
class PdfEngine(Protocol):
name: str
def is_available(self) -> bool: ...
def version(self) -> str: ...
def render(self, *, html_path: str, css_path: str, assets_dir: str | None, out_pdf: str, options: dict) -> dict:
"""Returns a structured log: timings, warnings, engine opts, feature flags."""
```
The CLI should support:
- `--engine auto|playwright|weasyprint|prince|antenna|vivliostyle|wkhtmltopdf`
- `--engine-opts `
## “Majors” to target (pragmatic)
### Tier 1 (easy to run, common)
1) **Playwright (browser-backed PDF)**
- via Playwright (preferred)
- Pros: ubiquitous, good HTML/CSS coverage, easy containerization.
- Cons: paged-media features vary; footnotes/running headers are limited unless carefully built.
2) **WeasyPrint**
- Pros: pure Python workflow, good paged-media support, easy CI story.
- Cons: CSS compatibility differs; some complex layouts may need workarounds.
### Tier 2 (best print fidelity; commercial)
3) **PrinceXML**
- Pros: excellent paged media, footnotes, running headers, print-quality output.
- Cons: license cost; needs binary distribution policy.
4) **Antenna House Formatter**
- Pros: top-tier print fidelity; standards publishing; robust PDF/A options.
- Cons: license + operational complexity.
### Tier 3 (useful but limited)
5) **Vivliostyle / Paged.js**
- Pros: strong paged-media model in the web ecosystem.
- Cons: heavier runtime; often “HTML+JS render” rather than simple CLI.
6) **wkhtmltopdf**
- Pros: simple deploy story in legacy environments.
- Cons: outdated rendering model; limited CSS; not ideal for “high quality”.
## Capability matrix (what we care about)
We should encode an engine capability report (per run) for:
- paged media (margins, page size, running headers)
- hyphenation support + dictionaries
- font embedding/subsetting
- link handling (wrap/break strategy)
- footnotes (if we later support them)
- PDF/A options (later)
This capability map feeds QA:
- if engine can’t support a gate (e.g., true widow/orphan detection on PDF), QA should:
- run the best available approximation, and
- mark the gate as `skipped` with a reason, not `passed`.
## Determinism knobs (must record)
For every PDF render, write `out/render-log.json` including:
- engine name + version
- invocation args
- environment hints (OS, locale)
- “self-contained” mode on/off
- fonts policy + resolution (requested primary fonts, what fontconfig matched, and what fonts were embedded in the PDF)
- any warnings from the engine
If the engine is a browser:
- fix viewport
- disable external requests
- pin print settings (margins, background graphics, scaling)
## Security model
- Assume untrusted Markdown input (CI context). Mitigations:
- never execute embedded JS during HTML render (or use a hardened renderer container)
- disable network
- restrict filesystem access (mount only `out/` and input)
- If using headless browsers, treat them as an attack surface; run in locked-down containers.
## Recommended v0.1 path (fastest)
1) Implement adapters for:
- Playwright (auto-detect)
- WeasyPrint (if installed)
2) Keep Prince/AH as optional adapters (stub + docs) until needed.
3) Use QA gates as the real value:
- link wrap, code/table overflow, stranded headings (HTML and PDF when possible)
This keeps delivery fast while preserving “compatible with the majors”.
## Future: “Engine parity” testing
Once adapters exist, add an integration job that renders the same fixtures through 2 engines (when available) and compares:
- gate metrics (should be within thresholds)
- file size ranges
- major layout regressions (e.g., table clipping incidents)
We don’t need pixel-perfect equivalence; we need “quality gates still pass”.