dannystocker/iftypeset

Fork 0

codex e92f1c3b93

ci / ci (push) Waiting to run

Details

iftypeset: document CI pipeline + Playwright + font contract

2026-01-08 18:10:41 +00:00

4.8 KiB

Raw Export PDF Blame History

`iftypeset` — Project Status (2026-01-03)

What This Project Is

iftypeset is a thin, deterministic typeset runtime for turning Markdown into:

Stable, shareable HTML (render-html)
A PDF (render-pdf)
Machine-readable quality reports (lint, qa, report)

It is paired with a machine-readable rule registry derived from:

Chicago Manual of Style (CMOS 18)
Bringhurst (Elements of Typographic Style)

Important: rule records are paraphrases only with pointer refs (e.g., CMOS18 §6.2 p377 (scan p10)). This repo must not contain book text.

What Works Today

Verified on master via ./scripts/ci.sh (spec validate + report + unit tests):

End-to-end CLI: validate-spec, report, lint, render-html, render-pdf, qa, emit-css
Deterministic lint engine with safe rewrite mode (lint --fix --fix-mode rewrite)
Deterministic HTML rendering; PDF rendering works when an engine is available (Playwright is the default)
QA analyzer (HTML + PDF heuristics) with incident details:
- long/bare URL/DOI/email wrap incidents
- overfull token detection
- table/code overflow incidents (profile-aware thresholds)
- PDF-aware widow/orphan heuristics via Poppler text extraction (pdftotext -layout)
- PDF-aware runt final page detection (short last page heuristics)
CI wiring for Forgejo: .forgejo/workflows/ci.yml
Session resilience tooling:
- ./scripts/audit.sh prints a compact truth snapshot (git + coverage + checkpoints)
- ./scripts/checkpoint.sh "note" creates a portable restore tarball recorded in docs/CHECKPOINTS.md

Rule Registry Snapshot (Real Counts)

From out/coverage-report.json (generated by PYTHONPATH=src python3 -m iftypeset.cli report --spec spec --out out):

Total rules: 524
Enforcement split: manual 379, typeset 62, lint 70, postrender 13
Severity split: must 37, should 470, warn 17

Category counts:

editorial 45
citations 61
numbers 62
punctuation 55
layout 46
headings 32
tables 23
links 21
i18n 27
abbreviations 27
code 28
accessibility 22
frontmatter 20
backmatter 18
figures 22

Coverage Map Summary (Sections)

Generated by python3 tools/coverage_summary.py --coverage-dir spec/coverage --out-json out/coverage-summary.json --out-md out/coverage-summary.md:

BRING: 64 sections, 284 rules (all partial)
CMOS18: 176 sections, 550 rules (all partial)
Total unique rule_ids across coverage maps: 834
typography 15

Interpretation:

The registry is intentionally larger than the enforcement surface.
Many rules remain manual_checklist=true by design until we have deterministic enforcement for them.

What This Is Not Yet

A full “publication-grade PDF QA” system. PDF-aware checks exist, but are heuristic (text extraction based) and limited in scope.
A complete automated implementation of Chicago/Bringhurst. The registry is pointer-backed; enforcement is incremental and explicit.

How To Run (Quick)

cd /root/ai-workspace/iftypeset

Validate + rebuild indexes:

PYTHONPATH=src python3 -m iftypeset.cli validate-spec --spec spec --build-indexes

Generate coverage report:

PYTHONPATH=src python3 -m iftypeset.cli report --spec spec --out out --build-indexes

Lint (and optionally autofix):

PYTHONPATH=src python3 -m iftypeset.cli lint --input fixtures/sample.md --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli lint --input fixtures/sample.md --out out --profile web_pdf --fix --fix-mode rewrite

Render HTML + PDF:

PYTHONPATH=src python3 -m iftypeset.cli render-html --input fixtures/sample.md --out out --profile web_pdf
PYTHONPATH=src python3 -m iftypeset.cli render-pdf  --input fixtures/sample.md --out out --profile web_pdf

Run QA:

PYTHONPATH=src python3 -m iftypeset.cli qa --out out --profile web_pdf

Sanity check:

./scripts/ci.sh

Don’t Lose Work (Session Resets)

Chat logs are not durable. The repo is.

Snapshot: ./scripts/audit.sh
Restore point: ./scripts/checkpoint.sh "short note"
Checkpoint index: docs/CHECKPOINTS.md

What Remains (Prioritized)

Improve post-render QA beyond current heuristics
- PDF-aware stranded headings / keep-with-next violations
- more reliable overflow/clipping detection when a renderer is pinned
Increase implemented rule coverage where it matters
- citations normalization / author-date variants where feasible
- i18n/locale-driven checks (without pretending perfect automation)
- link/DOI wrapping policies (reduce broken PDFs)
Forgejo integration
- use iftypeset as the Forgejo typeset/export worker
- emit QA artifacts as export attachments
Continue adding rule batches
- prioritize what breaks real documents: figures, references, complex tables, long code blocks

4.8 KiB Raw Export PDF Blame History Unescape Escape

iftypeset — Project Status (2026-01-03)