Add Mermaid preflight + Dave Factor callouts

2025-12-25 10:22:27 +00:00 · 2025-12-25 10:22:27 +00:00 · 4dbda0209e
commit 4dbda0209e
parent 3da30594eb
9 changed files with 623 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -18,6 +18,15 @@ PYTHONPATH=src python3 -m revoice generate \
  --output examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md
 ```

+Preflight the generated Markdown for PDF export (auto-fix Mermaid + lint):
+
+```bash
+PYTHONPATH=src python3 -m revoice preflight \
+  --style if.dave.v1.2 \
+  --input examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md \
+  --source examples/ai-code-guardrails/AI-Code-Guardrails.pdf
+```
+
 Or install the CLI locally:

 ```bash
--- a/docs/APP_SPEC.md
+++ b/docs/APP_SPEC.md
@ -74,6 +74,16 @@ Run a deterministic linter per bible:

 If lint fails: auto-repair pass (LLM) or return “needs revision” with lint report.

+### 5b) Mermaid preflight (PDF export reliability)
+
+If the output includes Mermaid diagrams, run a preflight pass before PDF export:
+- auto-heal Mermaid blocks (quote labels, normalize headers, balance `subgraph/end`)
+- validate Mermaid rendering in the same runtime used by the PDF exporter
+
+In `re-voice`, this is exposed as:
+
+`revoice preflight --style <style> --input <output.md> --source <source-doc>`
+
 ### 6) Export + publishing

 Outputs:
@ -89,4 +99,3 @@ Publishing strategy:
 - Run extraction/OCR in a sandboxed worker (CPU/mem/time limits).
 - Never store API keys in repos; use env/secret manager.
 - Keep an audit trail: source hash → extracted text hash → output hash → model/prompt hashes.
-
--- a/docs/HANDOFF.md
+++ b/docs/HANDOFF.md
@ -25,12 +25,21 @@ PYTHONPATH=src python3 -m revoice generate \
  --input examples/ai-code-guardrails/AI-Code-Guardrails.pdf \
  --output examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md

+PYTHONPATH=src python3 -m revoice preflight \
+  --style if.dave.v1.2 \
+  --input examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md \
+  --source examples/ai-code-guardrails/AI-Code-Guardrails.pdf
+
 PYTHONPATH=src python3 -m revoice lint \
  --style if.dave.v1.2 \
  --input examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md \
  --source examples/ai-code-guardrails/AI-Code-Guardrails.pdf
 ```

+Mermaid tooling:
+- Self-heal script: `tools/mermaid/mermaid-self-heal.js`
+- Forgejo-worker validator: `tools/mermaid/mermaid-validate-worker.js` (requires the PDF worker runtime)
+
 ## Applying the stack to the full InfraFabric dossier

 Source (huge; ~1MB / ~22k lines):
@ -46,4 +55,3 @@ Recommended approach (don’t paste the whole file into chats):

 Implementation note:
 - To support the dossier properly, `revoice` should add a Markdown-aware section parser (split by headings, preserve code fences) and optionally an LLM-backed rewriter for “full rewrite mode.”
-
--- a/examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md
+++ b/examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md
@ -53,6 +53,9 @@ We fully support focusing guardrails at the pull request stage, because it creat
 It also provides a structurally safe venue for accountability theater: findings can be surfaced, tracked, and re-litigated in perpetuity while timelines remain subject to stakeholder alignment.
 If anything goes sideways, we can always point to the PR thread and note that it was reviewed with deep seriousness at 4:55 PM on a Friday.

+> **The Dave Factor:** Exceptions become the default pathway, because the policy is strict and the deadline is real.
+> **Countermeasure:** Define merge-blocking thresholds, time-box every exception, and make expiry automatic.
+
 ### InfraFabric Red Team Diagram (Inferred)

 ```mermaid
@ -76,6 +79,9 @@ Shifting left is directionally aligned with best practices, provided we define l
 In practice, IDE scanning creates fast feedback loops, and agentic workflows can be covered via a local MCP server, which is excellent because it allows us to say continuous without committing to blocking.
 We recommend a pilot cohort, a slide deck, and an FAQ, so the shift remains culturally reversible.

+> **The Dave Factor:** "Shift left" becomes "optional left," which means the same issues arrive later with better excuses.
+> **Countermeasure:** Gate on local scan signals where possible (or require attestations that are actually checked).
+
 ### InfraFabric Red Team Diagram (Inferred)

 ```mermaid
@ -98,6 +104,9 @@ Requiring proof of local testing is a lightweight enablement workflow that conve
 Screenshots are particularly helpful because they are high-effort to verify and low-fidelity to audit, which preserves the timeless corporate principle that visibility should be proportional to comfort.
 Once the screenshot is uploaded, it can be stored in a folder with a robust heritage naming convention and a retention policy of "until the heat death of the universe."

+> **The Dave Factor:** Screenshots are compliance theater: easy to collect, hard to verify, and immortal in shared drives.
+> **Countermeasure:** Prefer verifiable telemetry (scan events) over images, and pause access when signals go dark.
+
 ### InfraFabric Red Team Diagram (Inferred)

 ```mermaid
@ -128,6 +137,9 @@ Periodic audits are a strong mechanism for discovering that the rollout has alre
 A centralized dashboard with adoption signals allows us to produce a KPI trend line that looks decisive while still leaving room for interpretation, follow-ups, and iterative enablement.
 If the dashboard ever shows a red triangle, we can immediately form the Committee for the Preservation of the Committee and begin the healing process.

+> **The Dave Factor:** Dashboards become a KPI trend, and KPIs become a calendar invite.
+> **Countermeasure:** Tie the dashboard to explicit SLOs and a remediation loop with owners and deadlines.
+
 ### InfraFabric Red Team Diagram (Inferred)

 ```mermaid
@ -149,6 +161,9 @@ Security awareness training is the perfect control because it is both necessary
 A short quiz provides a durable compliance narrative: we can demonstrate investment in education, capture attestations, and schedule refreshers whenever the organization needs to signal seriousness.
 The goal is not mastery; the goal is a completion certificate that can be forwarded to leadership with the subject line "Progress Update."

+> **The Dave Factor:** Completion certificates are treated as controls, even when behavior doesn’t change.
+> **Countermeasure:** Add a practical gate (local scan + PR checks) so training is support, not the defense.
+
 ### InfraFabric Red Team Diagram (Inferred)

 ```mermaid
@ -179,6 +194,9 @@ Tying access to secure configurations creates scalable guardrails, assuming we k
 Endpoint management and dev container baselines let us gate assistants behind prerequisites, ideally in a way that can be described as enablement rather than blocking for cultural compatibility.
 This is the "not my job" routing protocol, except the router is policy and the destination is an alignment session.

+> **The Dave Factor:** Access controls drift into "enablement," and enablement drifts into "we made a wiki."
+> **Countermeasure:** Make prerequisites machine-checkable and make exceptions expire by default.
+
 ### InfraFabric Red Team Diagram (Inferred)

 ```mermaid
@ -217,6 +235,9 @@ The path forward is to treat guardrails as an operational capability, not a one-
 With the right sequencing, we can build trust, reduce friction, and maintain the strategic option value of circling back when timelines become emotionally complex.
 Secure innovation is not just possible; it is operational, provided we align on what operational means in Q3.

+> **The Dave Factor:** Pilots persist indefinitely because "graduation criteria" were never aligned.
+> **Countermeasure:** Publish rollout milestones and a stop condition that cannot be reframed as iteration.
+
 ### InfraFabric Red Team Diagram (Inferred)

 ```mermaid
--- a/src/revoice/cli.py
+++ b/src/revoice/cli.py
@ -1,13 +1,30 @@
 from __future__ import annotations

 import argparse
+import subprocess
 import sys
+from pathlib import Path

 from .extract import extract_text
 from .generate import generate_shadow_dossier
 from .lint import lint_markdown, lint_markdown_with_source


+def _repo_root() -> Path:
+    return Path(__file__).resolve().parents[2]
+
+
+def _run(cmd: list[str]) -> None:
+    subprocess.run(cmd, check=True)
+
+
+def _mermaid_self_heal(paths: list[str]) -> None:
+    script = _repo_root() / "tools" / "mermaid" / "mermaid-self-heal.js"
+    if not script.exists():
+        raise RuntimeError(f"Missing Mermaid self-heal script: {script}")
+    _run(["node", str(script), *paths])
+
+
 def _build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(prog="revoice")
    sub = parser.add_subparsers(dest="cmd", required=True)
@ -26,6 +43,15 @@ def _build_parser() -> argparse.ArgumentParser:
    lint_p.add_argument("--input", required=True, help="Path to markdown file")
    lint_p.add_argument("--source", required=False, help="Optional source document to allow source emojis")

+    mermaid_p = sub.add_parser("mermaid-fix", help="Auto-fix Mermaid blocks in Markdown (in-place)")
+    mermaid_p.add_argument("--input", nargs="+", required=True, help="Markdown file(s) or directories")
+
+    preflight_p = sub.add_parser("preflight", help="Mermaid-fix + lint a dossier (in-place)")
+    preflight_p.add_argument("--style", required=True, help="Style id (e.g. if.dave.v1.2)")
+    preflight_p.add_argument("--input", required=True, help="Path to markdown file (edited in-place)")
+    preflight_p.add_argument("--source", required=False, help="Optional source document to allow source emojis")
+    preflight_p.add_argument("--skip-mermaid-fix", action="store_true", help="Skip Mermaid auto-fix step")
+
    return parser


@ -65,6 +91,29 @@ def main(argv: list[str] | None = None) -> int:
            return 2
        return 0

+    if args.cmd == "mermaid-fix":
+        _mermaid_self_heal(args.input)
+        return 0
+
+    if args.cmd == "preflight":
+        if not args.skip_mermaid_fix:
+            _mermaid_self_heal([args.input])
+
+        with open(args.input, "r", encoding="utf-8") as f:
+            md = f.read()
+
+        if args.source:
+            source_text = extract_text(args.source)
+            issues = lint_markdown_with_source(style_id=args.style, markdown=md, source_text=source_text)
+        else:
+            issues = lint_markdown(style_id=args.style, markdown=md)
+
+        if issues:
+            for issue in issues:
+                print(f"- {issue}", file=sys.stderr)
+            return 2
+        return 0
+
    raise RuntimeError(f"Unhandled cmd: {args.cmd}")


--- a/src/revoice/generate.py
+++ b/src/revoice/generate.py
@ -338,6 +338,62 @@ def _render_inferred_diagram(title: str) -> str | None:
    )


+def _render_dave_factor_callout(section: _SourceSection) -> str | None:
+    title_upper = section.title.upper()
+    excerpt = f"{section.title}\n{section.why_it_matters or ''}\n{section.body}".strip()
+
+    if "PULL REQUEST" in title_upper:
+        return "\n".join(
+            [
+                "> **The Dave Factor:** Exceptions become the default pathway, because the policy is strict and the deadline is real.",
+                "> **Countermeasure:** Define merge-blocking thresholds, time-box every exception, and make expiry automatic.",
+            ]
+        )
+    if "SHIFTING LEFT" in title_upper:
+        return "\n".join(
+            [
+                '> **The Dave Factor:** "Shift left" becomes "optional left," which means the same issues arrive later with better excuses.',
+                "> **Countermeasure:** Gate on local scan signals where possible (or require attestations that are actually checked).",
+            ]
+        )
+    if "REQUEST EVIDENCE" in title_upper or _has(excerpt, "access request", "screenshot"):
+        return "\n".join(
+            [
+                "> **The Dave Factor:** Screenshots are compliance theater: easy to collect, hard to verify, and immortal in shared drives.",
+                "> **Countermeasure:** Prefer verifiable telemetry (scan events) over images, and pause access when signals go dark.",
+            ]
+        )
+    if "AUDIT" in title_upper or _has(excerpt, "usage reports", "periodic audits"):
+        return "\n".join(
+            [
+                "> **The Dave Factor:** Dashboards become a KPI trend, and KPIs become a calendar invite.",
+                "> **Countermeasure:** Tie the dashboard to explicit SLOs and a remediation loop with owners and deadlines.",
+            ]
+        )
+    if "TRAINING" in title_upper or _has(excerpt, "snyk learn", "owasp", "quiz"):
+        return "\n".join(
+            [
+                "> **The Dave Factor:** Completion certificates are treated as controls, even when behavior doesn’t change.",
+                "> **Countermeasure:** Add a practical gate (local scan + PR checks) so training is support, not the defense.",
+            ]
+        )
+    if "ACCESS CONTROL" in title_upper or _has(excerpt, "endpoint management", "prerequisites", "extensions"):
+        return "\n".join(
+            [
+                '> **The Dave Factor:** Access controls drift into "enablement," and enablement drifts into "we made a wiki."',
+                "> **Countermeasure:** Make prerequisites machine-checkable and make exceptions expire by default.",
+            ]
+        )
+    if _has(title_upper, "PATH FORWARD") or _has(excerpt, "secure innovation", "talk to our team"):
+        return "\n".join(
+            [
+                '> **The Dave Factor:** Pilots persist indefinitely because "graduation criteria" were never aligned.',
+                "> **Countermeasure:** Publish rollout milestones and a stop condition that cannot be reframed as iteration.",
+            ]
+        )
+    return None
+
+
 def _render_intro(section: _SourceSection) -> str:
    lines = [ln.strip() for ln in section.body.splitlines() if ln.strip()]
    tagline = "\n".join(lines[:7]).strip() if lines else ""
@ -434,6 +490,10 @@ def _render_section(section: _SourceSection) -> str:

    out.extend(paragraphs)

+    callout = _render_dave_factor_callout(section)
+    if callout:
+        out.extend(["", callout])
+
    inferred = _render_inferred_diagram(section.title)
    if inferred:
        out.extend(["", inferred])
--- a/style_bibles/IF.DAVE.BIBLE.md
+++ b/style_bibles/IF.DAVE.BIBLE.md
@ -120,6 +120,19 @@ Preferred comedic motifs (use sparingly, but use them):
 - “Let’s take this offline” as a routing protocol
 - “Job security engine” and “Return on Inaction (ROI)”
 - “Committee for the Preservation of the Committee”
+- “Visibility is liability” (opacity as a feature)
+- “The Shaggy Defense” (“It wasn’t me”) as governance strategy
+- “Hot potato routing” (push blame across teams)
+
+## 5b) Red Team callout template (keep it short)
+
+Inside each mirrored source section, include at most one small callout:
+
+> **The Dave Factor:** If this section is softened into comfort language, what becomes untestable? What minimal artifact (owner + deadline + acceptance test, or trace/bundle/verifier step) prevents that dilution?
+
+Optional second line (only if it adds value):
+
+> **Countermeasure:** Name the control, the gate (PR/CI/access), and the explicit “stop condition” that Dave cannot reframe as “iteration.”

 ---

--- a/tools/mermaid/mermaid-self-heal.js
+++ b/tools/mermaid/mermaid-self-heal.js
@ -0,0 +1,315 @@
+#!/usr/bin/env node
+/**
+ * Mermaid Self-Healing Pipeline (user-provided "95%+ reliability" edition)
+ *
+ * Usage:
+ *   node tools/mermaid/mermaid-self-heal.js <file-or-dir> [...]
+ *
+ * Notes:
+ * - Edits Markdown files in-place, rewriting ```mermaid fences.
+ * - If `mmdc` (mermaid-cli) is available in PATH, it is used for validation.
+ * - If `mmdc` is missing, the script still applies repairs but skips validation.
+ */
+
+const fs = require("fs");
+const path = require("path");
+const os = require("os");
+const { execSync } = require("child_process");
+
+const SHAPES = [
+  "\\[\\[([^\\]]+)\\]\\]", // stadium
+  "\\[\\(\\([^\\)]+\\)\\)\\]", // cylindrical
+  "\\[\\(/([^\\)]+)\\)\\]\\]", // rounded rect?
+  "\\[([^\\]]+)\\]", // rectangle (default)
+  "\\(\\(([^\\)]+)\\)\\)", // circle
+  "\\(\\{([^\\}]+)\\}\\)", // diamond
+  "\\(\\[([^\\]]+)\\]\\)", // hex
+  "\\[\\/([^\\]]+)\\/\\]", // parallelogram
+  "\\[\\\\([^\\]]+)\\\\\\]", // alt parallelogram
+  "\\{\\{([^\\}]+)\\}\\}", // stadium alt
+  "\\(\\{([^\\}]+)\\}\\)", // subroutine
+  "\\(\\(([^\\)]+)\\)\\)", // circle double
+];
+
+const SHAPE_REGEX = new RegExp(SHAPES.map((s) => `(${s})`).join("|"));
+
+function sanitizeAndNormalize(raw) {
+  let code =
+    String(raw || "")
+      .replace(/[\u00A0\u200B\u200E\uFEFF\u2060]/g, "") // invisible
+      .replace(/\r\n?/g, "\n")
+      .replace(/\t/g, "  ")
+      .trim() + "\n";
+
+  // Force header to very first line
+  const lines = code.split("\n");
+  const firstContent = lines.findIndex((l) => l.trim());
+  if (firstContent > 0) {
+    const header = lines.splice(firstContent, 1)[0];
+    lines.unshift(header.trim());
+    code = lines.join("\n");
+  }
+  return code;
+}
+
+function forceValidId(id) {
+  if (/^[A-Za-z_][A-Za-z0-9_]*$/.test(id)) return id;
+  let clean = String(id || "")
+    .replace(/[^A-Za-z0-9_]/g, "_")
+    .replace(/^_+/, "")
+    .replace(/_+$/, "");
+  if (!clean) clean = "node";
+  if (/^\d/.test(clean)) clean = "_" + clean;
+  return clean;
+}
+
+function quoteLabel(label) {
+  const s = String(label || "");
+  if (!s.includes("\n") && /^[\w\s.,\-–—]+$/.test(s) && !/[":|]/.test(s)) return s;
+  return `"${s.replace(/"/g, "#34;").replace(/\n/g, "\\n")}"`;
+}
+
+function repairNodesAndLabels(code) {
+  // First pass – fix IDs
+  code = code.replace(/^(\s*)([^\s\[\](){}]+)(\s*[[\](){}])/gm, (_m, indent, id, shape) => {
+    return `${indent}${forceValidId(id)}${shape}`;
+  });
+
+  // Second pass – quote shape labels (correctly) for common node syntaxes.
+  const esc = (s) => String(s || "").replace(/"/g, "#34;").replace(/\n/g, "\\n");
+  const alreadyQuoted = (s) => {
+    const t = String(s || "").trim();
+    return t.length >= 2 && t.startsWith('"') && t.endsWith('"');
+  };
+
+  // [label]
+  code = code.replace(/(\b[^\s\[\](){}]+)\[([^\]\n]*)\]/g, (_m, id, label) => {
+    if (alreadyQuoted(label)) return `${id}[${label}]`;
+    return `${id}["${esc(label)}"]`;
+  });
+
+  return code;
+}
+
+function detectType(code) {
+  const first = String(code || "").split("\n", 1)[0].toLowerCase();
+  if (first.includes("sequencediagram")) return "sequence";
+  if (first.includes("classdiagram")) return "class";
+  if (first.includes("statediagram")) return "state";
+  if (first.includes("gantt")) return "gantt";
+  if (first.includes("erdiagram")) return "er";
+  if (first.includes("pie")) return "pie";
+  if (first.includes("gitgraph")) return "gitgraph";
+  if (first.includes("mindmap")) return "mindmap";
+  if (first.includes("timeline")) return "timeline";
+  if (first.includes("quadrantchart")) return "quadrantchart";
+  if (first.includes("xychart")) return "xychart";
+  return "flowchart";
+}
+
+function sequenceSpecificFixes(code) {
+  const participants = new Set();
+  const participantLines = [];
+
+  const lines = String(code || "").split("\n");
+  const cleaned = [];
+
+  for (let line of lines) {
+    const pl = line.match(/^\s*participant\s+(.+)/i);
+    if (pl) {
+      const id = forceValidId(pl[1].split(" as ")[0].trim());
+      participants.add(id);
+      participantLines.push(`participant ${id}`);
+    } else {
+      cleaned.push(line);
+    }
+  }
+
+  // Re-inject participants at top
+  let result = [...participantLines, ...cleaned].join("\n");
+
+  // Balance alt/loop/par/opt/critical/rect
+  const blocks = ["alt", "else", "loop", "par", "opt", "critical", "rect rgb(0,0,0)"];
+  let stack = [];
+  for (let line of result.split("\n")) {
+    const trimmed = line.trim();
+    if (blocks.some((b) => trimmed.startsWith(b))) stack.push(trimmed.split(" ")[0]);
+    if (trimmed === "end") {
+      if (stack.length) stack.pop();
+    }
+  }
+  while (stack.length) {
+    result += "\nend";
+    stack.pop();
+  }
+
+  return result;
+}
+
+function balanceSubgraphs(code) {
+  let depth = 0;
+  const lines = String(code || "").split("\n");
+  const result = [];
+
+  for (let line of lines) {
+    if (/\bsubgraph\b/i.test(line)) depth++;
+    if (/\bend\b/i.test(line)) depth = Math.max(0, depth - 1);
+    result.push(line);
+  }
+  while (depth-- > 0) result.push("end");
+  return result.join("\n");
+}
+
+function ensureHeaderAtTop(code) {
+  const lines = String(code || "").replace(/\r\n?/g, "\n").split("\n");
+  const headerRe =
+    /^(flowchart|graph|sequenceDiagram|classDiagram|stateDiagram(?:-v2)?|gantt|ganttChart|erDiagram|pie|gitgraph|mindmap|timeline|quadrantChart|xychart-beta|xychart)\b/i;
+  const isInit = (l) => String(l || "").trim().startsWith("%%{");
+
+  const initLine = lines.length > 0 && isInit(lines[0]) ? String(lines[0] || "").trim() : null;
+
+  let headerIdx = -1;
+  for (let i = initLine ? 1 : 0; i < lines.length; i++) {
+    const t = String(lines[i] || "").trim();
+    if (headerRe.test(t)) {
+      headerIdx = i;
+      break;
+    }
+  }
+
+  let headerLine = headerIdx >= 0 ? String(lines[headerIdx] || "").trim() : "flowchart TD";
+  headerLine = headerLine.replace(/^graph\b/i, "flowchart");
+  if (/^flowchart\b/i.test(headerLine) && !/\b(LR|RL|TD|TB|BT)\b/i.test(headerLine)) {
+    headerLine = "flowchart TD";
+  }
+
+  const out = [];
+  if (initLine) out.push(initLine);
+  out.push(headerLine);
+  for (let i = 0; i < lines.length; i++) {
+    if (initLine && i === 0) continue;
+    if (headerIdx === i) continue;
+    const l = String(lines[i] || "");
+    if (!l.trim()) continue;
+    out.push(l);
+  }
+  return out.join("\n").trim() + "\n";
+}
+
+function selfHealMermaid(block) {
+  let code = ensureHeaderAtTop(sanitizeAndNormalize(block));
+
+  const t = detectType(code);
+  if (t === "flowchart") {
+    code = repairNodesAndLabels(code);
+    code = balanceSubgraphs(code);
+  }
+
+  // Final normalisation
+  code = code.replace(/-\s+->/g, "-->").replace(/==+/g, "==>").replace(/-\./g, "-.");
+
+  return code;
+}
+
+function hasCmd(cmd) {
+  try {
+    execSync(`command -v ${cmd}`, { stdio: "ignore" });
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+function validateWithMmdc(inputMmdText) {
+  if (!hasCmd("mmdc")) return { ok: null, stderr: "mmdc_not_found" };
+  const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "mmdc-heal-"));
+  const inFile = path.join(tmpDir, "temp.mmd");
+  fs.writeFileSync(inFile, inputMmdText, "utf8");
+  try {
+    execSync(`mmdc -i ${JSON.stringify(inFile)} -o /dev/null --quiet`, { stdio: "pipe" });
+    return { ok: true, stderr: "" };
+  } catch (e) {
+    const stderr =
+      e && typeof e === "object" && e.stderr && Buffer.isBuffer(e.stderr)
+        ? e.stderr.toString("utf8")
+        : e && typeof e === "object" && typeof e.message === "string"
+          ? e.message
+          : "mmdc_failed";
+    return { ok: false, stderr };
+  } finally {
+    try {
+      fs.rmSync(tmpDir, { recursive: true, force: true });
+    } catch {}
+  }
+}
+
+function healMarkdownFile(filePath) {
+  let content = fs.readFileSync(filePath, "utf8");
+
+  content = content.replace(/```mermaid\s*([\s\S]*?)```/g, (_match, rawBlock) => {
+    let attempt = selfHealMermaid(rawBlock);
+    let healed = false;
+
+    for (let i = 0; i < 5; i++) {
+      const v = validateWithMmdc(attempt);
+      if (v.ok === null) {
+        healed = true; // no validator available; still apply healing output
+        break;
+      }
+      if (v.ok === true) {
+        healed = true;
+        break;
+      }
+
+      const err = v.stderr || "";
+      const lineMatch = err.match(/line (\d+)/i);
+      const line = lineMatch ? parseInt(lineMatch[1], 10) - 2 : null; // mmdc counts header as line 1 or 2
+
+      if (err.includes("Parse error") && line !== null) {
+        let lines = attempt.split("\n");
+        let bad = lines[line] || "";
+        // Last-ditch quote everything on that line
+        bad = bad.replace(/\[([^\]"][^\]]*)\]/g, '["$1"]').replace(/\(([^)"]+)\)/g, '("$1")');
+        lines[line] = bad;
+        attempt = lines.join("\n");
+      }
+    }
+
+    const final = healed ? attempt : `%% SELF-HEAL FAILED AFTER 5 ATTEMPTS\n${attempt}`;
+    return "```mermaid\n" + final + "\n```";
+  });
+
+  fs.writeFileSync(filePath, content);
+}
+
+function walkMarkdownFiles(startPath) {
+  const st = fs.statSync(startPath);
+  if (st.isFile()) {
+    if (startPath.toLowerCase().endsWith(".md") || startPath.toLowerCase().endsWith(".markdown")) return [startPath];
+    return [];
+  }
+  if (!st.isDirectory()) return [];
+  const out = [];
+  const entries = fs.readdirSync(startPath, { withFileTypes: true });
+  for (const e of entries) {
+    const p = path.join(startPath, e.name);
+    if (e.isDirectory()) out.push(...walkMarkdownFiles(p));
+    else if (e.isFile() && (p.toLowerCase().endsWith(".md") || p.toLowerCase().endsWith(".markdown"))) out.push(p);
+  }
+  return out;
+}
+
+function main(argv) {
+  const targets = argv.slice(2);
+  if (!targets.length) {
+    console.error("Usage: node tools/mermaid/mermaid-self-heal.js <file-or-dir> [...]");
+    process.exit(2);
+  }
+  for (const t of targets) {
+    const abs = path.resolve(t);
+    const files = walkMarkdownFiles(abs);
+    for (const f of files) healMarkdownFile(f);
+  }
+}
+
+if (require.main === module) main(process.argv);
--- a/tools/mermaid/mermaid-validate-worker.js
+++ b/tools/mermaid/mermaid-validate-worker.js
@ -0,0 +1,137 @@
+#!/usr/bin/env node
+/**
+ * Validate Mermaid blocks in a Markdown file by actually calling `mermaid.render()` in headless Chromium.
+ *
+ * Designed to run inside the Forgejo PDF worker image.
+ *
+ * Usage (inside worker):
+ *   NODE_PATH=/opt/forgejo-pdf/node_modules node /script/mermaid-validate-worker.js /work/file.md
+ */
+
+const fs = require("node:fs");
+const path = require("node:path");
+const os = require("node:os");
+const crypto = require("node:crypto");
+const puppeteer = require("puppeteer");
+
+function sha256Hex(text) {
+  return crypto.createHash("sha256").update(String(text)).digest("hex");
+}
+
+function parseMermaidBlocks(markdown) {
+  const blocks = [];
+  const re = /```mermaid\\s*([\\s\\S]*?)```/g;
+  let m;
+  while ((m = re.exec(markdown)) !== null) {
+    blocks.push({
+      start: m.index,
+      end: m.index + m[0].length,
+      rawBlock: m[1],
+    });
+  }
+  return blocks;
+}
+
+async function withBrowser(fn) {
+  const userDataDir = fs.mkdtempSync(path.join(os.tmpdir(), "chrome-profile-"));
+  const browser = await puppeteer.launch({
+    headless: "new",
+    args: ["--no-sandbox", "--disable-dev-shm-usage", "--allow-file-access-from-files", `--user-data-dir=${userDataDir}`],
+  });
+  try {
+    return await fn(browser);
+  } finally {
+    try {
+      await browser.close();
+    } catch {}
+    try {
+      fs.rmSync(userDataDir, { recursive: true, force: true });
+    } catch {}
+  }
+}
+
+async function createMermaidPage(browser) {
+  const page = await browser.newPage();
+  await page.setRequestInterception(true);
+  page.on("request", (req) => {
+    const u = req.url();
+    if (u.startsWith("file:") || u.startsWith("about:") || u.startsWith("data:")) return req.continue();
+    return req.abort();
+  });
+  await page.setContent("<!doctype html><html><head></head><body></body></html>", { waitUntil: "load" });
+  await page.addScriptTag({ path: "/opt/forgejo-pdf/assets/js/mermaid.min.js" });
+  await page.evaluate(() => {
+    if (!globalThis.mermaid) throw new Error("mermaid_missing");
+    globalThis.mermaid.initialize({
+      startOnLoad: false,
+      securityLevel: "strict",
+      htmlLabels: false,
+      flowchart: { htmlLabels: false, useMaxWidth: false },
+      sequence: { htmlLabels: false },
+      state: { htmlLabels: false },
+      class: { htmlLabels: false },
+      fontFamily: "IBM Plex Sans",
+      theme: "base",
+    });
+  });
+  return page;
+}
+
+async function tryRender(page, id, code) {
+  return await page.evaluate(
+    async ({ id, code }) => {
+      try {
+        const r = await globalThis.mermaid.render(id, code);
+        return { ok: true, svgLen: r && r.svg ? r.svg.length : 0 };
+      } catch (e) {
+        const msg = e && typeof e === "object" && (e.str || e.message) ? String(e.str || e.message) : String(e);
+        return { ok: false, error: msg };
+      }
+    },
+    { id, code }
+  );
+}
+
+function firstNonEmptyLine(block) {
+  const lines = String(block || "").replace(/\\r\\n?/g, "\\n").split("\\n");
+  for (const l of lines) {
+    const t = l.trim();
+    if (t) return t;
+  }
+  return "";
+}
+
+async function main() {
+  const filePath = process.argv[2];
+  if (!filePath) {
+    console.error("Usage: node mermaid-validate-worker.js /path/to/file.md");
+    process.exit(2);
+  }
+  const markdown = fs.readFileSync(filePath, "utf8");
+  const blocks = parseMermaidBlocks(markdown);
+
+  const failures = [];
+  await withBrowser(async (browser) => {
+    const page = await createMermaidPage(browser);
+    for (let i = 0; i < blocks.length; i++) {
+      const b = blocks[i];
+      const id = "m-" + sha256Hex(`${path.basename(filePath)}|${i}|${b.rawBlock}`).slice(0, 12);
+      const r = await tryRender(page, id, b.rawBlock);
+      if (!r.ok) {
+        failures.push({ index: i, header: firstNonEmptyLine(b.rawBlock), error: r.error });
+        if (failures.length >= 25) break;
+      }
+    }
+    await page.close();
+  });
+
+  const out = { file: filePath, total: blocks.length, failures };
+  console.log(JSON.stringify(out));
+  process.exit(failures.length ? 1 : 0);
+}
+
+main().catch((e) => {
+  console.error(JSON.stringify({ error: String(e && e.message ? e.message : e) }));
+  process.exit(1);
+});
+