# External Evaluation Prompt — `iftypeset` (pubstyle) **Goal:** confirm there is no fundamental flaw (technical, legal, product) and identify obvious issues early. **Audience:** humans or LLM reviewers. **Repo root:** `ai-workspace/iftypeset/` ## 0) Context (read this first) `iftypeset` is a thin, deterministic publishing runtime for **Markdown → HTML → PDF** that adds: - A **machine‑readable rule registry** (rules are paraphrases only) with **pointer refs** back to primary sources (Chicago / Bringhurst) instead of reproducing book text. - **Typeset profiles** (screen-first vs print-first vs dense tech, etc.) that map typographic intent into render tokens/CSS. - **Post‑render QA gates** that can fail builds when layout degrades (widows/orphans/keeps/overflow/link-wrap/numbering issues). ### Non‑negotiables (legal + product) - Do **not** OCR/transcribe entire books into the repo (copyright). Rules must remain paraphrases with pointers only. - Source pointers must be sufficient for someone who has the book to find the guidance, without quoting it. - The runtime must be able to run in constrained environments (e.g. Forgejo PDF export workers) and produce deterministic artifacts. ## 1) What to review (map of the repo) Start here: - `README.md` - `STATUS.md` - `app/ARCHITECTURE.md` - `app/CLI_SPEC.md` - `docs/01-demo-acceptance.md` - `docs/02-competitor-matrix.md` - `docs/03-rule-ingestion-sop.md` - `docs/04-renderer-strategy.md` Spec + rules: - `spec/schema/rule.schema.json` - `spec/manifest.yaml` - `spec/profiles/*.yaml` - `spec/quality_gates.yaml` - `spec/rules/**.ndjson` - `spec/indexes/*.json` (derived; rebuildable) Forgejo integration note: - `forgejo/README.md` ## 2) Quick verification (local) From `ai-workspace/iftypeset/`, run: ```bash ./scripts/ci.sh ``` Confirm it: - validates the spec - generates a coverage report - runs unit tests If it fails, include the command output in your review. ## 3) Required reviewer metadata (so we can trust the review) ### If you are a human reviewer - `reviewer_background`: 1–2 lines (e.g., “publishing/typography”, “security/GRC”, “docs tooling”). - `tools_used`: list (e.g., Prince, Antenna House, Pandoc, Quarto, LaTeX, Typst, WeasyPrint, Paged.js, DocRaptor). - `date_utc`: ISO 8601. ### If you are an LLM reviewer - `llm_name`: provider + model string - `probable_model`: if ambiguous - `cutoff_date`: YYYY‑MM or `unknown` - `response_date_utc`: ISO 8601 - `web_access_used`: `yes|no` ## 4) Evaluation rubric (scorecard) Score each category 0–5 and write 1–3 sentences of justification. ### 4.1 Product + positioning 1) **Problem clarity (0–5)** Does this solve a real pain for teams shipping PDFs, beyond “another renderer”? 2) **Differentiation (0–5)** Is the “rule registry + QA gates + deterministic artifacts” wedge clear and credible vs: Pandoc/Quarto/Typst/LaTeX, Prince/AntennaHouse/WeasyPrint/Vivliostyle/Paged.js, DocRaptor, etc.? 3) **Viability (0–5)** Is this buildable to a useful v0.1 in weeks (not months) with a small team? ### 4.1a Content + style (docs/readability) 11) **Docs clarity (0–5)** Can a new contributor follow `README.md` and get a useful output quickly? 12) **Spec readability (0–5)** Are `spec/manifest.yaml`, `spec/profiles/*.yaml`, and `spec/quality_gates.yaml` self-explanatory enough for a reviewer? 13) **Market-facing clarity (0–5)** If this were shown to a buyer, does it read like a product with a clear contract, or a research project? ### 4.2 Technical architecture 4) **Spec design (0–5)** Are `rule.schema.json`, `manifest.yaml`, and the profile/gate model coherent and extensible? 5) **Enforcement model (0–5)** Is the split between `lint` / `typeset` / `postrender` / `manual` realistic? Are “manual checklist” rules handled honestly? 6) **Determinism strategy (0–5)** Does the repo clearly define what “deterministic” means (inputs, renderer versions, fonts, outputs)? ### 4.3 Rules + content quality 7) **Rule record quality (0–5)** Do rule records look like paraphrases with pointers (not copied text)? Are IDs/tags/keywords useful? 8) **Coverage strategy (0–5)** Are we prioritizing the right categories first (numbers/punctuation/citations/layout), and is coverage reporting useful? ### 4.4 UX / operational usability 9) **CLI ergonomics (0–5)** Is the CLI spec clear for CI usage (exit codes, JSON artifacts, strictness flags)? 10) **Integration story (0–5)** Is Forgejo integration plausible and incremental (CSS first, then QA gates)? ### 4.5 Market viability (compare to existing options) Rate each 0–5 based on *your experience* (no need to be exhaustive; avoid vendor hype). 14) **Replace vs complement (0–5)** Is `iftypeset` best positioned as a replacement for existing toolchains, or as a QA layer you plug into them? 15) **Who pays first (0–5)** Does the repo make it clear who would adopt/pay first (docs teams, GRC, legal, research, vendors)? 16) **Defensible wedge (0–5)** Is “publishing CI with hard QA gates + auditable rule registry” a defensible wedge, or easy for existing tools to add? ## 5) “Fundamental flaw” checklist (answer explicitly) Mark each: `PASS` / `RISK` / `FAIL`, with a one‑line explanation. 1) **Copyright / licensing risk** Any sign the repo is storing book text rather than paraphrases + pointers? 2) **Determinism risk** Are we likely to produce different PDFs across machines/runs due to fonts/renderer drift? 3) **QA gate feasibility** Are the proposed post-render QA gates realistically implementable, or is this a research project? 4) **Scope creep risk** Does the plan keep a narrow v0.1 “definition of done”, or is it trying to boil the ocean? 5) **Market reality** Is there a clear “why buy/use this” vs adopting an existing doc toolchain and living with some ugliness? ## 5a) Section-by-section ratings (required) Rate each **0–5** and include 1–2 lines of justification. The goal is to catch “obvious issues” early. - `README.md`: clarity + truthfulness (does it match current behavior?) - `STATUS.md`: accuracy + usefulness (is it a reliable snapshot?) - `app/ARCHITECTURE.md`: coherence + feasibility - `app/CLI_SPEC.md`: completeness + CI friendliness - `docs/01-demo-acceptance.md`: crisp v0.1 target or scope creep? - `docs/02-competitor-matrix.md`: honest + actionable (no wishful marketing) - `docs/03-rule-ingestion-sop.md`: safe + repeatable (avoids copyright drift) - `docs/04-renderer-strategy.md`: realistic adapter plan - `spec/manifest.yaml`: enforceable contracts + degraded mode clarity - `spec/schema/rule.schema.json`: schema quality (strict enough, not brittle) - `spec/profiles/*.yaml`: profiles feel sane, not arbitrary - `spec/quality_gates.yaml`: gates are measurable + meaningful - `spec/rules/**.ndjson`: rule quality (paraphrase + pointer discipline) ## 6) Deliverables quality (what “good” looks like) Assess whether the repo is on track to produce, for a single Markdown input: - `render.html` + `render.css` (deterministic) - `render.pdf` (deterministic *given pinned engine/fonts*) - `lint-report.json` - `layout-report.json` - `qa-report.json` (pass/fail thresholds) - `coverage-report.json` (rule implementation progress) - `manual-checklist.md` (for rules that cannot be automated) If you think any of these deliverables are unnecessary or missing, say so. ## 7) Patch suggestions (actionable) Provide 5–15 suggestions in this format: - `target`: file path(s) - `problem`: 1 sentence - `change`: concrete text/code change (copy/pasteable) - `why`: 1 sentence - `priority`: P0 / P1 / P2 - `confidence`: high / medium / low ### Preferred patch format If possible, include unified diffs: ```diff --- a/path/file.md +++ b/path/file.md @@ ... ``` ## 8) Output template (copy/paste) Use this structure in your response: 1) **Summary (5–10 bullets)** 2) **Scorecard (0–5 each)** 3) **Fundamental flaw checklist (PASS/RISK/FAIL)** 4) **Top risks (P0/P1)** 5) **Patch suggestions (with diffs if possible)** 6) **Go / No‑Go recommendation for v0.1** ## 9) Important constraint for reviewers Do not paste verbatim passages from Chicago/Bringhurst into your review output. Use pointers only (e.g., `BRING §2.1.8 p32`) and describe the issue in your own words. ## 10) Quick market question (optional, but useful) If you had to ship “good-looking PDFs with hard QA gates” tomorrow, what would you use today, and why would you still choose `iftypeset` (or not)?