codex 626779d4aa Initial iftypeset pipeline

2026-01-03 20:29:35 +00:00

8.4 KiB

Raw Export PDF Permalink Blame History

External Evaluation Prompt — `iftypeset` (pubstyle)

Goal: confirm there is no fundamental flaw (technical, legal, product) and identify obvious issues early.
Audience: humans or LLM reviewers.
Repo root: ai-workspace/iftypeset/

0) Context (read this first)

iftypeset is a thin, deterministic publishing runtime for Markdown → HTML → PDF that adds:

A machine‑readable rule registry (rules are paraphrases only) with pointer refs back to primary sources (Chicago / Bringhurst) instead of reproducing book text.
Typeset profiles (screen-first vs print-first vs dense tech, etc.) that map typographic intent into render tokens/CSS.
Post‑render QA gates that can fail builds when layout degrades (widows/orphans/keeps/overflow/link-wrap/numbering issues).

Non‑negotiables (legal + product)

Do not OCR/transcribe entire books into the repo (copyright). Rules must remain paraphrases with pointers only.
Source pointers must be sufficient for someone who has the book to find the guidance, without quoting it.
The runtime must be able to run in constrained environments (e.g. Forgejo PDF export workers) and produce deterministic artifacts.

1) What to review (map of the repo)

Start here:

README.md
STATUS.md
app/ARCHITECTURE.md
app/CLI_SPEC.md
docs/01-demo-acceptance.md
docs/02-competitor-matrix.md
docs/03-rule-ingestion-sop.md
docs/04-renderer-strategy.md

Spec + rules:

spec/schema/rule.schema.json
spec/manifest.yaml
spec/profiles/*.yaml
spec/quality_gates.yaml
spec/rules/**.ndjson
spec/indexes/*.json (derived; rebuildable)

Forgejo integration note:

forgejo/README.md

2) Quick verification (local)

From ai-workspace/iftypeset/, run:

./scripts/ci.sh

Confirm it:

validates the spec
generates a coverage report
runs unit tests

If it fails, include the command output in your review.

3) Required reviewer metadata (so we can trust the review)

If you are a human reviewer

reviewer_background: 1–2 lines (e.g., “publishing/typography”, “security/GRC”, “docs tooling”).
tools_used: list (e.g., Prince, Antenna House, Pandoc, Quarto, LaTeX, Typst, WeasyPrint, Paged.js, DocRaptor).
date_utc: ISO 8601.

If you are an LLM reviewer

llm_name: provider + model string
probable_model: if ambiguous
cutoff_date: YYYY‑MM or unknown
response_date_utc: ISO 8601
web_access_used: yes|no

4) Evaluation rubric (scorecard)

Score each category 0–5 and write 1–3 sentences of justification.

4.1 Product + positioning

Problem clarity (0–5)
Does this solve a real pain for teams shipping PDFs, beyond “another renderer”?
Differentiation (0–5)
Is the “rule registry + QA gates + deterministic artifacts” wedge clear and credible vs: Pandoc/Quarto/Typst/LaTeX, Prince/AntennaHouse/WeasyPrint/Vivliostyle/Paged.js, DocRaptor, etc.?
Viability (0–5)
Is this buildable to a useful v0.1 in weeks (not months) with a small team?

4.1a Content + style (docs/readability)

Docs clarity (0–5)
Can a new contributor follow README.md and get a useful output quickly?
Spec readability (0–5)
Are spec/manifest.yaml, spec/profiles/*.yaml, and spec/quality_gates.yaml self-explanatory enough for a reviewer?
Market-facing clarity (0–5)
If this were shown to a buyer, does it read like a product with a clear contract, or a research project?

4.2 Technical architecture

Spec design (0–5)
Are rule.schema.json, manifest.yaml, and the profile/gate model coherent and extensible?
Enforcement model (0–5)
Is the split between lint / typeset / postrender / manual realistic? Are “manual checklist” rules handled honestly?
Determinism strategy (0–5)
Does the repo clearly define what “deterministic” means (inputs, renderer versions, fonts, outputs)?

4.3 Rules + content quality

Rule record quality (0–5)
Do rule records look like paraphrases with pointers (not copied text)? Are IDs/tags/keywords useful?
Coverage strategy (0–5)
Are we prioritizing the right categories first (numbers/punctuation/citations/layout), and is coverage reporting useful?

4.4 UX / operational usability

CLI ergonomics (0–5)
Is the CLI spec clear for CI usage (exit codes, JSON artifacts, strictness flags)?
Integration story (0–5)
Is Forgejo integration plausible and incremental (CSS first, then QA gates)?

4.5 Market viability (compare to existing options)

Rate each 0–5 based on your experience (no need to be exhaustive; avoid vendor hype).

Replace vs complement (0–5)
Is iftypeset best positioned as a replacement for existing toolchains, or as a QA layer you plug into them?
Who pays first (0–5)
Does the repo make it clear who would adopt/pay first (docs teams, GRC, legal, research, vendors)?
Defensible wedge (0–5)
Is “publishing CI with hard QA gates + auditable rule registry” a defensible wedge, or easy for existing tools to add?

5) “Fundamental flaw” checklist (answer explicitly)

Mark each: PASS / RISK / FAIL, with a one‑line explanation.

Copyright / licensing risk
Any sign the repo is storing book text rather than paraphrases + pointers?
Determinism risk
Are we likely to produce different PDFs across machines/runs due to fonts/renderer drift?
QA gate feasibility
Are the proposed post-render QA gates realistically implementable, or is this a research project?
Scope creep risk
Does the plan keep a narrow v0.1 “definition of done”, or is it trying to boil the ocean?
Market reality
Is there a clear “why buy/use this” vs adopting an existing doc toolchain and living with some ugliness?

5a) Section-by-section ratings (required)

Rate each 0–5 and include 1–2 lines of justification. The goal is to catch “obvious issues” early.

README.md: clarity + truthfulness (does it match current behavior?)
STATUS.md: accuracy + usefulness (is it a reliable snapshot?)
app/ARCHITECTURE.md: coherence + feasibility
app/CLI_SPEC.md: completeness + CI friendliness
docs/01-demo-acceptance.md: crisp v0.1 target or scope creep?
docs/02-competitor-matrix.md: honest + actionable (no wishful marketing)
docs/03-rule-ingestion-sop.md: safe + repeatable (avoids copyright drift)
docs/04-renderer-strategy.md: realistic adapter plan
spec/manifest.yaml: enforceable contracts + degraded mode clarity
spec/schema/rule.schema.json: schema quality (strict enough, not brittle)
spec/profiles/*.yaml: profiles feel sane, not arbitrary
spec/quality_gates.yaml: gates are measurable + meaningful
spec/rules/**.ndjson: rule quality (paraphrase + pointer discipline)

6) Deliverables quality (what “good” looks like)

Assess whether the repo is on track to produce, for a single Markdown input:

render.html + render.css (deterministic)
render.pdf (deterministic given pinned engine/fonts)
lint-report.json
layout-report.json
qa-report.json (pass/fail thresholds)
coverage-report.json (rule implementation progress)
manual-checklist.md (for rules that cannot be automated)

If you think any of these deliverables are unnecessary or missing, say so.

7) Patch suggestions (actionable)

Provide 5–15 suggestions in this format:

target: file path(s)
problem: 1 sentence
change: concrete text/code change (copy/pasteable)
why: 1 sentence
priority: P0 / P1 / P2
confidence: high / medium / low

Preferred patch format

If possible, include unified diffs:

--- a/path/file.md
+++ b/path/file.md
@@
 ...

8) Output template (copy/paste)

Use this structure in your response:

Summary (5–10 bullets)
Scorecard (0–5 each)
Fundamental flaw checklist (PASS/RISK/FAIL)
Top risks (P0/P1)
Patch suggestions (with diffs if possible)
Go / No‑Go recommendation for v0.1

9) Important constraint for reviewers

Do not paste verbatim passages from Chicago/Bringhurst into your review output. Use pointers only (e.g., BRING §2.1.8 p32) and describe the issue in your own words.

10) Quick market question (optional, but useful)

If you had to ship “good-looking PDFs with hard QA gates” tomorrow, what would you use today, and why would you still choose iftypeset (or not)?

8.4 KiB Raw Export PDF Permalink Blame History Unescape Escape

External Evaluation Prompt — iftypeset (pubstyle)