8.4 KiB
External Evaluation Prompt — iftypeset (pubstyle)
Goal: confirm there is no fundamental flaw (technical, legal, product) and identify obvious issues early.
Audience: humans or LLM reviewers.
Repo root: ai-workspace/iftypeset/
0) Context (read this first)
iftypeset is a thin, deterministic publishing runtime for Markdown → HTML → PDF that adds:
- A machine‑readable rule registry (rules are paraphrases only) with pointer refs back to primary sources (Chicago / Bringhurst) instead of reproducing book text.
- Typeset profiles (screen-first vs print-first vs dense tech, etc.) that map typographic intent into render tokens/CSS.
- Post‑render QA gates that can fail builds when layout degrades (widows/orphans/keeps/overflow/link-wrap/numbering issues).
Non‑negotiables (legal + product)
- Do not OCR/transcribe entire books into the repo (copyright). Rules must remain paraphrases with pointers only.
- Source pointers must be sufficient for someone who has the book to find the guidance, without quoting it.
- The runtime must be able to run in constrained environments (e.g. Forgejo PDF export workers) and produce deterministic artifacts.
1) What to review (map of the repo)
Start here:
README.mdSTATUS.mdapp/ARCHITECTURE.mdapp/CLI_SPEC.mddocs/01-demo-acceptance.mddocs/02-competitor-matrix.mddocs/03-rule-ingestion-sop.mddocs/04-renderer-strategy.md
Spec + rules:
spec/schema/rule.schema.jsonspec/manifest.yamlspec/profiles/*.yamlspec/quality_gates.yamlspec/rules/**.ndjsonspec/indexes/*.json(derived; rebuildable)
Forgejo integration note:
forgejo/README.md
2) Quick verification (local)
From ai-workspace/iftypeset/, run:
./scripts/ci.sh
Confirm it:
- validates the spec
- generates a coverage report
- runs unit tests
If it fails, include the command output in your review.
3) Required reviewer metadata (so we can trust the review)
If you are a human reviewer
reviewer_background: 1–2 lines (e.g., “publishing/typography”, “security/GRC”, “docs tooling”).tools_used: list (e.g., Prince, Antenna House, Pandoc, Quarto, LaTeX, Typst, WeasyPrint, Paged.js, DocRaptor).date_utc: ISO 8601.
If you are an LLM reviewer
llm_name: provider + model stringprobable_model: if ambiguouscutoff_date: YYYY‑MM orunknownresponse_date_utc: ISO 8601web_access_used:yes|no
4) Evaluation rubric (scorecard)
Score each category 0–5 and write 1–3 sentences of justification.
4.1 Product + positioning
-
Problem clarity (0–5)
Does this solve a real pain for teams shipping PDFs, beyond “another renderer”? -
Differentiation (0–5)
Is the “rule registry + QA gates + deterministic artifacts” wedge clear and credible vs: Pandoc/Quarto/Typst/LaTeX, Prince/AntennaHouse/WeasyPrint/Vivliostyle/Paged.js, DocRaptor, etc.? -
Viability (0–5)
Is this buildable to a useful v0.1 in weeks (not months) with a small team?
4.1a Content + style (docs/readability)
-
Docs clarity (0–5)
Can a new contributor followREADME.mdand get a useful output quickly? -
Spec readability (0–5)
Arespec/manifest.yaml,spec/profiles/*.yaml, andspec/quality_gates.yamlself-explanatory enough for a reviewer? -
Market-facing clarity (0–5)
If this were shown to a buyer, does it read like a product with a clear contract, or a research project?
4.2 Technical architecture
-
Spec design (0–5)
Arerule.schema.json,manifest.yaml, and the profile/gate model coherent and extensible? -
Enforcement model (0–5)
Is the split betweenlint/typeset/postrender/manualrealistic? Are “manual checklist” rules handled honestly? -
Determinism strategy (0–5)
Does the repo clearly define what “deterministic” means (inputs, renderer versions, fonts, outputs)?
4.3 Rules + content quality
-
Rule record quality (0–5)
Do rule records look like paraphrases with pointers (not copied text)? Are IDs/tags/keywords useful? -
Coverage strategy (0–5)
Are we prioritizing the right categories first (numbers/punctuation/citations/layout), and is coverage reporting useful?
4.4 UX / operational usability
-
CLI ergonomics (0–5)
Is the CLI spec clear for CI usage (exit codes, JSON artifacts, strictness flags)? -
Integration story (0–5)
Is Forgejo integration plausible and incremental (CSS first, then QA gates)?
4.5 Market viability (compare to existing options)
Rate each 0–5 based on your experience (no need to be exhaustive; avoid vendor hype).
-
Replace vs complement (0–5)
Isiftypesetbest positioned as a replacement for existing toolchains, or as a QA layer you plug into them? -
Who pays first (0–5)
Does the repo make it clear who would adopt/pay first (docs teams, GRC, legal, research, vendors)? -
Defensible wedge (0–5)
Is “publishing CI with hard QA gates + auditable rule registry” a defensible wedge, or easy for existing tools to add?
5) “Fundamental flaw” checklist (answer explicitly)
Mark each: PASS / RISK / FAIL, with a one‑line explanation.
-
Copyright / licensing risk
Any sign the repo is storing book text rather than paraphrases + pointers? -
Determinism risk
Are we likely to produce different PDFs across machines/runs due to fonts/renderer drift? -
QA gate feasibility
Are the proposed post-render QA gates realistically implementable, or is this a research project? -
Scope creep risk
Does the plan keep a narrow v0.1 “definition of done”, or is it trying to boil the ocean? -
Market reality
Is there a clear “why buy/use this” vs adopting an existing doc toolchain and living with some ugliness?
5a) Section-by-section ratings (required)
Rate each 0–5 and include 1–2 lines of justification. The goal is to catch “obvious issues” early.
README.md: clarity + truthfulness (does it match current behavior?)STATUS.md: accuracy + usefulness (is it a reliable snapshot?)app/ARCHITECTURE.md: coherence + feasibilityapp/CLI_SPEC.md: completeness + CI friendlinessdocs/01-demo-acceptance.md: crisp v0.1 target or scope creep?docs/02-competitor-matrix.md: honest + actionable (no wishful marketing)docs/03-rule-ingestion-sop.md: safe + repeatable (avoids copyright drift)docs/04-renderer-strategy.md: realistic adapter planspec/manifest.yaml: enforceable contracts + degraded mode clarityspec/schema/rule.schema.json: schema quality (strict enough, not brittle)spec/profiles/*.yaml: profiles feel sane, not arbitraryspec/quality_gates.yaml: gates are measurable + meaningfulspec/rules/**.ndjson: rule quality (paraphrase + pointer discipline)
6) Deliverables quality (what “good” looks like)
Assess whether the repo is on track to produce, for a single Markdown input:
render.html+render.css(deterministic)render.pdf(deterministic given pinned engine/fonts)lint-report.jsonlayout-report.jsonqa-report.json(pass/fail thresholds)coverage-report.json(rule implementation progress)manual-checklist.md(for rules that cannot be automated)
If you think any of these deliverables are unnecessary or missing, say so.
7) Patch suggestions (actionable)
Provide 5–15 suggestions in this format:
target: file path(s)problem: 1 sentencechange: concrete text/code change (copy/pasteable)why: 1 sentencepriority: P0 / P1 / P2confidence: high / medium / low
Preferred patch format
If possible, include unified diffs:
--- a/path/file.md
+++ b/path/file.md
@@
...
8) Output template (copy/paste)
Use this structure in your response:
- Summary (5–10 bullets)
- Scorecard (0–5 each)
- Fundamental flaw checklist (PASS/RISK/FAIL)
- Top risks (P0/P1)
- Patch suggestions (with diffs if possible)
- Go / No‑Go recommendation for v0.1
9) Important constraint for reviewers
Do not paste verbatim passages from Chicago/Bringhurst into your review output. Use pointers only (e.g., BRING §2.1.8 p32) and describe the issue in your own words.
10) Quick market question (optional, but useful)
If you had to ship “good-looking PDFs with hard QA gates” tomorrow, what would you use today, and why would you still choose iftypeset (or not)?