dannystocker/iftypeset

Fork 0

codex e92f1c3b93

ci / ci (push) Waiting to run

Details

iftypeset: document CI pipeline + Playwright + font contract

2026-01-08 18:10:41 +00:00

4.5 KiB

Raw Export PDF Blame History

Multi-renderer Strategy (HTML→PDF adapters)

We should not bet the product on a single PDF engine. iftypeset should be renderer-agnostic: the “meaning” is in the rule registry + profiles + QA gates; the PDF renderer is an interchangeable adapter.

Principles

Determinism first: the adapter must emit render-log.json with engine name + version + key options.
No-network capable: engines must run with --network=none/offline mode in CI where possible.
Graceful degradation: if no PDF engine exists, HTML artifacts + HTML-based QA must still run.
Capability disclosure: if a gate can’t be measured with an engine, report it explicitly (don’t silently pass).

Adapter interface (contract)

All PDF engines implement the same interface:

class PdfEngine(Protocol):
    name: str

    def is_available(self) -> bool: ...
    def version(self) -> str: ...
    def render(self, *, html_path: str, css_path: str, assets_dir: str | None, out_pdf: str, options: dict) -> dict:
        """Returns a structured log: timings, warnings, engine opts, feature flags."""

The CLI should support:

--engine auto|playwright|weasyprint|prince|antenna|vivliostyle|wkhtmltopdf
--engine-opts <json>

“Majors” to target (pragmatic)

Tier 1 (easy to run, common)

Playwright (browser-backed PDF)

via Playwright (preferred)
Pros: ubiquitous, good HTML/CSS coverage, easy containerization.
Cons: paged-media features vary; footnotes/running headers are limited unless carefully built.

WeasyPrint

Pros: pure Python workflow, good paged-media support, easy CI story.
Cons: CSS compatibility differs; some complex layouts may need workarounds.

Tier 2 (best print fidelity; commercial)

PrinceXML

Pros: excellent paged media, footnotes, running headers, print-quality output.
Cons: license cost; needs binary distribution policy.

Antenna House Formatter

Pros: top-tier print fidelity; standards publishing; robust PDF/A options.
Cons: license + operational complexity.

Tier 3 (useful but limited)

Vivliostyle / Paged.js

Pros: strong paged-media model in the web ecosystem.
Cons: heavier runtime; often “HTML+JS render” rather than simple CLI.

wkhtmltopdf

Pros: simple deploy story in legacy environments.
Cons: outdated rendering model; limited CSS; not ideal for “high quality”.

Capability matrix (what we care about)

We should encode an engine capability report (per run) for:

paged media (margins, page size, running headers)
hyphenation support + dictionaries
font embedding/subsetting
link handling (wrap/break strategy)
footnotes (if we later support them)
PDF/A options (later)

This capability map feeds QA:

if engine can’t support a gate (e.g., true widow/orphan detection on PDF), QA should:
- run the best available approximation, and
- mark the gate as skipped with a reason, not passed.

Determinism knobs (must record)

For every PDF render, write out/render-log.json including:

engine name + version
invocation args
environment hints (OS, locale)
“self-contained” mode on/off
fonts policy + resolution (requested primary fonts, what fontconfig matched, and what fonts were embedded in the PDF)
any warnings from the engine

If the engine is a browser:

fix viewport
disable external requests
pin print settings (margins, background graphics, scaling)

Security model

Assume untrusted Markdown input (CI context). Mitigations:
- never execute embedded JS during HTML render (or use a hardened renderer container)
- disable network
- restrict filesystem access (mount only out/ and input)
If using headless browsers, treat them as an attack surface; run in locked-down containers.

Recommended v0.1 path (fastest)

Implement adapters for:
- Playwright (auto-detect)
- WeasyPrint (if installed)
Keep Prince/AH as optional adapters (stub + docs) until needed.
Use QA gates as the real value:
- link wrap, code/table overflow, stranded headings (HTML and PDF when possible)

This keeps delivery fast while preserving “compatible with the majors”.

Future: “Engine parity” testing

Once adapters exist, add an integration job that renders the same fixtures through 2 engines (when available) and compares:

gate metrics (should be within thresholds)
file size ranges
major layout regressions (e.g., table clipping incidents)

We don’t need pixel-perfect equivalence; we need “quality gates still pass”.

4.5 KiB Raw Export PDF Blame History Unescape Escape