Forgejo PDF integration (iftypeset → forgejo-pdf worker)

This note documents how to wire iftypeset into the existing Forgejo PDF worker so exported PDFs stop feeling “flat” and start behaving like a real typesetting pipeline.

Current state (Forgejo worker)

The current renderer lives at:

/root/ai-workspace/forgejo-pdf/worker/pdf/src/render_pdf.js

It currently:

Converts Markdown → HTML (MarkdownIt + sanitize-html).
Renders Mermaid diagrams in-page.
Uses Paged.js for pagination.
Emits a PDF via Puppeteer/Chromium.
Applies one of two static stylesheets:
- basic.css
- professional.css

What iftypeset adds

iftypeset is a deterministic “rules + profiles + QA gates” layer.

In Forgejo terms:

Profiles (spec/profiles/*.yaml) → deterministic CSS tokens (iftypeset emit-css).
Quality gates (spec/quality_gates.yaml) → post-render checks (widows/orphans, overflow, stranded headings, etc.) with hard numeric thresholds.
Rule registry (Phase 2) → lint + manual checklists (Chicago/Bringhurst pointers, paraphrased).

Minimal integration (CSS only, low risk)

Generate CSS from a profile:

cd /root/ai-workspace/forgejo-pdf
./scripts/update_iftypeset_css.sh

Select the new pdf.typography option in the worker config contract (example):

basic
professional
iftypeset-web_pdf (new)

The worker will load professional.css first and then iftypeset-web_pdf.css as an override.

This is the safest first step: no new dependencies in the worker container, no new runtime calls, just a different stylesheet.

Next integration (QA gates, medium risk)

The goal is to produce:

layout-report.json (measured layout incidents)
qa-report.json (gate pass/fail summary)

at export time.

Recommended approach:

Pre-PDF (in-page, after Paged.js preview):
- collect page count
- collect per-page heading positions (to detect “stranded headings”)
- record overflow signals (code blocks / tables that exceed page content boxes)
Post-PDF (optional, later):
- parse the PDF with a dedicated analyzer to detect widows/orphans more accurately

Start with the in-page signals first because the Forgejo worker already owns the DOM and pagination lifecycle.

CI wiring (recommended)

In a Forgejo job, run the pipeline after Markdown is available:

PYTHONPATH=src python3 -m iftypeset.cli lint --input <doc.md> --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-html --input <doc.md> --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-pdf --input <doc.md> --out out --profile web_pdf || true
PYTHONPATH=src python3 -m iftypeset.cli qa --out out --profile web_pdf

Artifacts to publish (static hosting):

out/render.html
out/render.css
out/render.pdf (if available)
out/layout-report.json
out/qa-report.json
out/lint-report.json

Failures should be surfaced via exit codes and qa-report.json (gate failures list).

Fonts (important)

Forgejo’s professional.css embeds IBM Plex via @font-face.

If you switch to iftypeset CSS profiles as-is, you should either:

add the fonts used by the profile to the worker assets (preferred for consistency), or
update the profile fonts.*.family stacks to prefer the fonts already bundled in the worker (IBM Plex Sans WOFF2, IBM Plex Mono WOFF2).

Long-term direction

Once Phase 2 rule batches exist (spec/rules/**.ndjson), Forgejo can become a full “publication pipeline”:

iftypeset lint → deterministic lint report + optional autofix (no quotes from books, pointers only)
iftypeset emit-css → render tokens
Forgejo render → HTML/PDF
iftypeset qa → gate failures block the PDF build in CI

This keeps the worker simple and lets the strictness live in the spec, not ad-hoc code.

3.8 KiB Raw Export PDF Blame History Unescape Escape