iftypeset/forgejo/README.md
codex e92f1c3b93
Some checks are pending
ci / ci (push) Waiting to run
iftypeset: document CI pipeline + Playwright + font contract
2026-01-08 18:10:41 +00:00

3.8 KiB
Raw Export PDF Blame History

Forgejo PDF integration (iftypeset → forgejo-pdf worker)

This note documents how to wire iftypeset into the existing Forgejo PDF worker so exported PDFs stop feeling “flat” and start behaving like a real typesetting pipeline.

Current state (Forgejo worker)

The current renderer lives at:

  • /root/ai-workspace/forgejo-pdf/worker/pdf/src/render_pdf.js

It currently:

  • Converts Markdown → HTML (MarkdownIt + sanitize-html).
  • Renders Mermaid diagrams in-page.
  • Uses Paged.js for pagination.
  • Emits a PDF via Puppeteer/Chromium.
  • Applies one of two static stylesheets:
    • basic.css
    • professional.css

What iftypeset adds

iftypeset is a deterministic “rules + profiles + QA gates” layer.

In Forgejo terms:

  • Profiles (spec/profiles/*.yaml) → deterministic CSS tokens (iftypeset emit-css).
  • Quality gates (spec/quality_gates.yaml) → post-render checks (widows/orphans, overflow, stranded headings, etc.) with hard numeric thresholds.
  • Rule registry (Phase 2) → lint + manual checklists (Chicago/Bringhurst pointers, paraphrased).

Minimal integration (CSS only, low risk)

  1. Generate CSS from a profile:
cd /root/ai-workspace/forgejo-pdf
./scripts/update_iftypeset_css.sh
  1. Select the new pdf.typography option in the worker config contract (example):
  • basic
  • professional
  • iftypeset-web_pdf (new)
  1. The worker will load professional.css first and then iftypeset-web_pdf.css as an override.

This is the safest first step: no new dependencies in the worker container, no new runtime calls, just a different stylesheet.

Next integration (QA gates, medium risk)

The goal is to produce:

  • layout-report.json (measured layout incidents)
  • qa-report.json (gate pass/fail summary)

at export time.

Recommended approach:

  1. Pre-PDF (in-page, after Paged.js preview):

    • collect page count
    • collect per-page heading positions (to detect “stranded headings”)
    • record overflow signals (code blocks / tables that exceed page content boxes)
  2. Post-PDF (optional, later):

    • parse the PDF with a dedicated analyzer to detect widows/orphans more accurately

Start with the in-page signals first because the Forgejo worker already owns the DOM and pagination lifecycle.

In a Forgejo job, run the pipeline after Markdown is available:

PYTHONPATH=src python3 -m iftypeset.cli lint --input <doc.md> --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-html --input <doc.md> --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-pdf --input <doc.md> --out out --profile web_pdf || true
PYTHONPATH=src python3 -m iftypeset.cli qa --out out --profile web_pdf

Artifacts to publish (static hosting):

  • out/render.html
  • out/render.css
  • out/render.pdf (if available)
  • out/layout-report.json
  • out/qa-report.json
  • out/lint-report.json

Failures should be surfaced via exit codes and qa-report.json (gate failures list).

Fonts (important)

Forgejos professional.css embeds IBM Plex via @font-face.

If you switch to iftypeset CSS profiles as-is, you should either:

  • add the fonts used by the profile to the worker assets (preferred for consistency), or
  • update the profile fonts.*.family stacks to prefer the fonts already bundled in the worker (IBM Plex Sans WOFF2, IBM Plex Mono WOFF2).

Long-term direction

Once Phase 2 rule batches exist (spec/rules/**.ndjson), Forgejo can become a full “publication pipeline”:

  • iftypeset lint → deterministic lint report + optional autofix (no quotes from books, pointers only)
  • iftypeset emit-css → render tokens
  • Forgejo render → HTML/PDF
  • iftypeset qa → gate failures block the PDF build in CI

This keeps the worker simple and lets the strictness live in the spec, not ad-hoc code.