3.8 KiB
Forgejo PDF integration (iftypeset → forgejo-pdf worker)
This note documents how to wire iftypeset into the existing Forgejo PDF worker so exported PDFs stop feeling “flat” and start behaving like a real typesetting pipeline.
Current state (Forgejo worker)
The current renderer lives at:
/root/ai-workspace/forgejo-pdf/worker/pdf/src/render_pdf.js
It currently:
- Converts Markdown → HTML (MarkdownIt + sanitize-html).
- Renders Mermaid diagrams in-page.
- Uses Paged.js for pagination.
- Emits a PDF via Puppeteer/Chromium.
- Applies one of two static stylesheets:
basic.cssprofessional.css
What iftypeset adds
iftypeset is a deterministic “rules + profiles + QA gates” layer.
In Forgejo terms:
- Profiles (
spec/profiles/*.yaml) → deterministic CSS tokens (iftypeset emit-css). - Quality gates (
spec/quality_gates.yaml) → post-render checks (widows/orphans, overflow, stranded headings, etc.) with hard numeric thresholds. - Rule registry (Phase 2) → lint + manual checklists (Chicago/Bringhurst pointers, paraphrased).
Minimal integration (CSS only, low risk)
- Generate CSS from a profile:
cd /root/ai-workspace/forgejo-pdf
./scripts/update_iftypeset_css.sh
- Select the new
pdf.typographyoption in the worker config contract (example):
basicprofessionaliftypeset-web_pdf(new)
- The worker will load
professional.cssfirst and theniftypeset-web_pdf.cssas an override.
This is the safest first step: no new dependencies in the worker container, no new runtime calls, just a different stylesheet.
Next integration (QA gates, medium risk)
The goal is to produce:
layout-report.json(measured layout incidents)qa-report.json(gate pass/fail summary)
at export time.
Recommended approach:
-
Pre-PDF (in-page, after Paged.js preview):
- collect page count
- collect per-page heading positions (to detect “stranded headings”)
- record overflow signals (code blocks / tables that exceed page content boxes)
-
Post-PDF (optional, later):
- parse the PDF with a dedicated analyzer to detect widows/orphans more accurately
Start with the in-page signals first because the Forgejo worker already owns the DOM and pagination lifecycle.
CI wiring (recommended)
In a Forgejo job, run the pipeline after Markdown is available:
PYTHONPATH=src python3 -m iftypeset.cli lint --input <doc.md> --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-html --input <doc.md> --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-pdf --input <doc.md> --out out --profile web_pdf || true
PYTHONPATH=src python3 -m iftypeset.cli qa --out out --profile web_pdf
Artifacts to publish (static hosting):
out/render.htmlout/render.cssout/render.pdf(if available)out/layout-report.jsonout/qa-report.jsonout/lint-report.json
Failures should be surfaced via exit codes and qa-report.json (gate failures list).
Fonts (important)
Forgejo’s professional.css embeds IBM Plex via @font-face.
If you switch to iftypeset CSS profiles as-is, you should either:
- add the fonts used by the profile to the worker assets (preferred for consistency), or
- update the profile
fonts.*.familystacks to prefer the fonts already bundled in the worker (IBM Plex Sans WOFF2,IBM Plex Mono WOFF2).
Long-term direction
Once Phase 2 rule batches exist (spec/rules/**.ndjson), Forgejo can become a full “publication pipeline”:
iftypeset lint→ deterministic lint report + optional autofix (no quotes from books, pointers only)iftypeset emit-css→ render tokens- Forgejo render → HTML/PDF
iftypeset qa→ gate failures block the PDF build in CI
This keeps the worker simple and lets the strictness live in the spec, not ad-hoc code.