# External Evaluation Prompt — `iftypeset` (pubstyle)

**Goal:** confirm there is no fundamental flaw (technical, legal, product) and identify obvious issues early.  
**Audience:** humans or LLM reviewers.  
**Repo root:** `ai-workspace/iftypeset/`

## 0) Context (read this first)

`iftypeset` is a thin, deterministic publishing runtime for **Markdown → HTML → PDF** that adds:

- A **machine‑readable rule registry** (rules are paraphrases only) with **pointer refs** back to primary sources (Chicago / Bringhurst) instead of reproducing book text.
- **Typeset profiles** (screen-first vs print-first vs dense tech, etc.) that map typographic intent into render tokens/CSS.
- **Post‑render QA gates** that can fail builds when layout degrades (widows/orphans/keeps/overflow/link-wrap/numbering issues).

### Non‑negotiables (legal + product)

- Do **not** OCR/transcribe entire books into the repo (copyright). Rules must remain paraphrases with pointers only.
- Source pointers must be sufficient for someone who has the book to find the guidance, without quoting it.
- The runtime must be able to run in constrained environments (e.g. Forgejo PDF export workers) and produce deterministic artifacts.

## 1) What to review (map of the repo)

Start here:

- `README.md`
- `STATUS.md`
- `app/ARCHITECTURE.md`
- `app/CLI_SPEC.md`
- `docs/01-demo-acceptance.md`
- `docs/02-competitor-matrix.md`
- `docs/03-rule-ingestion-sop.md`
- `docs/04-renderer-strategy.md`

Spec + rules:

- `spec/schema/rule.schema.json`
- `spec/manifest.yaml`
- `spec/profiles/*.yaml`
- `spec/quality_gates.yaml`
- `spec/rules/**.ndjson`
- `spec/indexes/*.json` (derived; rebuildable)

Forgejo integration note:

- `forgejo/README.md`

## 2) Quick verification (local)

From `ai-workspace/iftypeset/`, run:

```bash
./scripts/ci.sh
```

Confirm it:

- validates the spec
- generates a coverage report
- runs unit tests

If it fails, include the command output in your review.

## 3) Required reviewer metadata (so we can trust the review)

### If you are a human reviewer

- `reviewer_background`: 1–2 lines (e.g., “publishing/typography”, “security/GRC”, “docs tooling”).
- `tools_used`: list (e.g., Prince, Antenna House, Pandoc, Quarto, LaTeX, Typst, WeasyPrint, Paged.js, DocRaptor).
- `date_utc`: ISO 8601.

### If you are an LLM reviewer

- `llm_name`: provider + model string
- `probable_model`: if ambiguous
- `cutoff_date`: YYYY‑MM or `unknown`
- `response_date_utc`: ISO 8601
- `web_access_used`: `yes|no`

## 4) Evaluation rubric (scorecard)

Score each category 0–5 and write 1–3 sentences of justification.

### 4.1 Product + positioning

1) **Problem clarity (0–5)**  
Does this solve a real pain for teams shipping PDFs, beyond “another renderer”?

2) **Differentiation (0–5)**  
Is the “rule registry + QA gates + deterministic artifacts” wedge clear and credible vs:
Pandoc/Quarto/Typst/LaTeX, Prince/AntennaHouse/WeasyPrint/Vivliostyle/Paged.js, DocRaptor, etc.?

3) **Viability (0–5)**  
Is this buildable to a useful v0.1 in weeks (not months) with a small team?

### 4.1a Content + style (docs/readability)

11) **Docs clarity (0–5)**  
Can a new contributor follow `README.md` and get a useful output quickly?

12) **Spec readability (0–5)**  
Are `spec/manifest.yaml`, `spec/profiles/*.yaml`, and `spec/quality_gates.yaml` self-explanatory enough for a reviewer?

13) **Market-facing clarity (0–5)**  
If this were shown to a buyer, does it read like a product with a clear contract, or a research project?

### 4.2 Technical architecture

4) **Spec design (0–5)**  
Are `rule.schema.json`, `manifest.yaml`, and the profile/gate model coherent and extensible?

5) **Enforcement model (0–5)**  
Is the split between `lint` / `typeset` / `postrender` / `manual` realistic? Are “manual checklist” rules handled honestly?

6) **Determinism strategy (0–5)**  
Does the repo clearly define what “deterministic” means (inputs, renderer versions, fonts, outputs)?

### 4.3 Rules + content quality

7) **Rule record quality (0–5)**  
Do rule records look like paraphrases with pointers (not copied text)? Are IDs/tags/keywords useful?

8) **Coverage strategy (0–5)**  
Are we prioritizing the right categories first (numbers/punctuation/citations/layout), and is coverage reporting useful?

### 4.4 UX / operational usability

9) **CLI ergonomics (0–5)**  
Is the CLI spec clear for CI usage (exit codes, JSON artifacts, strictness flags)?

10) **Integration story (0–5)**  
Is Forgejo integration plausible and incremental (CSS first, then QA gates)?

### 4.5 Market viability (compare to existing options)

Rate each 0–5 based on *your experience* (no need to be exhaustive; avoid vendor hype).

14) **Replace vs complement (0–5)**  
Is `iftypeset` best positioned as a replacement for existing toolchains, or as a QA layer you plug into them?

15) **Who pays first (0–5)**  
Does the repo make it clear who would adopt/pay first (docs teams, GRC, legal, research, vendors)?

16) **Defensible wedge (0–5)**  
Is “publishing CI with hard QA gates + auditable rule registry” a defensible wedge, or easy for existing tools to add?

## 5) “Fundamental flaw” checklist (answer explicitly)

Mark each: `PASS` / `RISK` / `FAIL`, with a one‑line explanation.

1) **Copyright / licensing risk**  
Any sign the repo is storing book text rather than paraphrases + pointers?

2) **Determinism risk**  
Are we likely to produce different PDFs across machines/runs due to fonts/renderer drift?

3) **QA gate feasibility**  
Are the proposed post-render QA gates realistically implementable, or is this a research project?

4) **Scope creep risk**  
Does the plan keep a narrow v0.1 “definition of done”, or is it trying to boil the ocean?

5) **Market reality**  
Is there a clear “why buy/use this” vs adopting an existing doc toolchain and living with some ugliness?

## 5a) Section-by-section ratings (required)

Rate each **0–5** and include 1–2 lines of justification. The goal is to catch “obvious issues” early.

- `README.md`: clarity + truthfulness (does it match current behavior?)
- `STATUS.md`: accuracy + usefulness (is it a reliable snapshot?)
- `app/ARCHITECTURE.md`: coherence + feasibility
- `app/CLI_SPEC.md`: completeness + CI friendliness
- `docs/01-demo-acceptance.md`: crisp v0.1 target or scope creep?
- `docs/02-competitor-matrix.md`: honest + actionable (no wishful marketing)
- `docs/03-rule-ingestion-sop.md`: safe + repeatable (avoids copyright drift)
- `docs/04-renderer-strategy.md`: realistic adapter plan
- `spec/manifest.yaml`: enforceable contracts + degraded mode clarity
- `spec/schema/rule.schema.json`: schema quality (strict enough, not brittle)
- `spec/profiles/*.yaml`: profiles feel sane, not arbitrary
- `spec/quality_gates.yaml`: gates are measurable + meaningful
- `spec/rules/**.ndjson`: rule quality (paraphrase + pointer discipline)

## 6) Deliverables quality (what “good” looks like)

Assess whether the repo is on track to produce, for a single Markdown input:

- `render.html` + `render.css` (deterministic)
- `render.pdf` (deterministic *given pinned engine/fonts*)
- `lint-report.json`
- `layout-report.json`
- `qa-report.json` (pass/fail thresholds)
- `coverage-report.json` (rule implementation progress)
- `manual-checklist.md` (for rules that cannot be automated)

If you think any of these deliverables are unnecessary or missing, say so.

## 7) Patch suggestions (actionable)

Provide 5–15 suggestions in this format:

- `target`: file path(s)
- `problem`: 1 sentence
- `change`: concrete text/code change (copy/pasteable)
- `why`: 1 sentence
- `priority`: P0 / P1 / P2
- `confidence`: high / medium / low

### Preferred patch format

If possible, include unified diffs:

```diff
--- a/path/file.md
+++ b/path/file.md
@@
 ...
```

## 8) Output template (copy/paste)

Use this structure in your response:

1) **Summary (5–10 bullets)**
2) **Scorecard (0–5 each)**
3) **Fundamental flaw checklist (PASS/RISK/FAIL)**
4) **Top risks (P0/P1)**
5) **Patch suggestions (with diffs if possible)**
6) **Go / No‑Go recommendation for v0.1**

## 9) Important constraint for reviewers

Do not paste verbatim passages from Chicago/Bringhurst into your review output. Use pointers only (e.g., `BRING §2.1.8 p32`) and describe the issue in your own words.

## 10) Quick market question (optional, but useful)

If you had to ship “good-looking PDFs with hard QA gates” tomorrow, what would you use today, and why would you still choose `iftypeset` (or not)?