Initial re-voice scaffold + Dave example

2025-12-25 07:42:16 +00:00 · 2025-12-25 07:42:16 +00:00 · 0a65d911f9
commit 0a65d911f9
15 changed files with 850 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,14 @@
+.DS_Store
+.env
+.env.*
+.venv
+__pycache__/
+*.pyc
+dist/
+build/
+.pytest_cache/
+
+# Re-voice workspace
+tmp/
+*.log
+
--- a/README.md
+++ b/README.md
@ -0,0 +1,44 @@
+# re-voice
+
+`re-voice` turns “any document” into a **shadow dossier** by applying a versioned **style bible** (voice + structure + constraints) on top of extracted source text.
+
+This repo is the home for:
+- Style bibles (versioned, citable)
+- A small extraction + dossier generator (CLI + optional API)
+- Example dossiers for review
+
+## Quick start (example)
+
+Generate the Dave-style shadow dossier for the included PDF:
+
+```bash
+PYTHONPATH=src python3 -m revoice generate \
+  --style if.dave.v1 \
+  --input examples/ai-code-guardrails/AI-Code-Guardrails.pdf \
+  --output examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md
+```
+
+Or install the CLI locally:
+
+```bash
+python3 -m pip install -e .
+revoice generate --style if.dave.v1 --input examples/ai-code-guardrails/AI-Code-Guardrails.pdf
+```
+
+## What “apply a style bible” means
+
+A style bible is treated as an executable contract:
+- **Structure:** required sections / scaffolding (e.g. the 9-element stack)
+- **Voice constraints:** pronouns, tone, taboo phrases, vocabulary swaps
+- **Formatting rules:** bullets, bold buzzwords, footers, etc.
+- **Citations:** stable IDs like `if://bible/dave/v1.0` to make outputs auditable
+
+## Proposed app shape (upload → shadow dossier)
+
+See `docs/APP_SPEC.md`.
+
+## Dev notes
+
+`revoice` uses external tools for text extraction:
+- `pdftotext`, `pdftoppm` (Poppler utils)
+- `tesseract` (OCR fallback for image-only PDFs)
--- a/docs/APP_SPEC.md
+++ b/docs/APP_SPEC.md
@ -0,0 +1,92 @@
+# re-voice app proposal: “upload → shadow dossier”
+
+## Product goal
+
+Let a user upload **any document** (PDF/DOCX/MD/HTML/images) and receive a **shadow dossier** rendered through a chosen **style bible** (e.g. `if://bible/dave/v1.0`).
+
+## Non-goals (v0)
+
+- Perfect fidelity layout extraction (we only need usable text + key figures)
+- Long-term storage/retention policies (we can stub, then harden)
+
+## Architecture (thin UI, strong pipeline)
+
+### 1) Ingest
+
+- Upload endpoint: `POST /api/dossiers` (multipart)
+- Compute and persist:
+  - `sha256` of original
+  - detected `mime`
+  - storage pointer (disk/S3/Forgejo blob)
+- Create `Document` row: `{id, sha256, filename, mime, created_at, owner}`
+
+### 2) Extract → Canonicalize
+
+Use a pluggable extractor chain:
+
+- PDF:
+  1. `pdftotext` (fast path, text-layer PDFs)
+  2. OCR fallback (`pdftoppm` → `tesseract`) for image-only PDFs
+- DOCX: `pandoc` or `python-docx`
+- HTML: `readability`-style boilerplate removal
+- Images: OCR (`tesseract`) with basic deskew
+
+Output a canonical block model (enables better prompting + citations):
+
+```json
+{
+  "doc_id": "…",
+  "blocks": [
+    {"type":"heading","level":1,"text":"…"},
+    {"type":"paragraph","text":"…"},
+    {"type":"list","items":["…","…"]}
+  ]
+}
+```
+
+### 3) Style bible compiler
+
+Store bibles in-repo as Markdown + a small metadata header (id, version, citation, hard rules).
+
+Compile the bible into:
+- `system_prompt` (voice + forbidden/required constraints)
+- `template` (required dossier structure)
+- `lint_rules` (post-checks: emojis/paragraph, pronouns, required footer, etc.)
+
+### 4) Generate
+
+Two-step generation is safer and more controllable:
+
+1. **Content distillation** (extract doc facts → structured notes)
+2. **Style application** (render notes into dossier template under bible constraints)
+
+Recommended runtime:
+- OpenAI-compatible Chat Completions backend (Juakali / OpenWebUI stack)
+- Persist `{model, prompts, output_sha256}` for auditability
+
+### 5) Validate (style linter)
+
+Run a deterministic linter per bible:
+- hard constraints (e.g., “emoji per paragraph” for Dave)
+- vocabulary swaps (optional)
+- required footer/disclaimer
+- “no secrets” scan (best-effort)
+
+If lint fails: auto-repair pass (LLM) or return “needs revision” with lint report.
+
+### 6) Export + publishing
+
+Outputs:
+- Markdown (primary)
+- PDF via existing Forgejo PDF export (`.../raw/...&format=pdf`) by committing generated Markdown to a repo
+
+Publishing strategy:
+- Store outputs in a Forgejo repo (per team/project)
+- Provide immutable links to `{sha}` + `.sha256` sidecars
+
+## Security + operational considerations
+
+- Run extraction/OCR in a sandboxed worker (CPU/mem/time limits).
+- Never store API keys in repos; use env/secret manager.
+- Keep an audit trail: source hash → extracted text hash → output hash → model/prompt hashes.
+
--- a/examples/ai-code-guardrails/AI-Code-Guardrails.ocr.txt
+++ b/examples/ai-code-guardrails/AI-Code-Guardrails.ocr.txt
@ -0,0 +1,194 @@
+===== page-1 =====
+Al CODE
+GUARDRAILS:
+A PRACTICAL GUIDE FOR
+SECURE ROLLOUT
+
+=
+aN
+-
+V—
+
+
+===== page-2 =====
+7
+Tools like GitHub Copilot and Google Gemini Code Assist help teams
+generate code at scale, reduce boilerplate, and speed up delivery,
+resulting in unprecedented boosts in productivity. But with greater
+speed comes greater security risk. Studies show that 27% of Al-
+generated code contains vulnerabilities, reflecting volume and
+velocity, not tool failure.
+To manage that risk without losing momentum, organizations need to
+implement security guardrails and checks and controls that prevent
+Al-generated code from introducing vulnerabilities into production.
+This quide offers a practical framework to help engineering leaders
+and security teams roll out Al assistants safely and scalably, using
+Snyk’s platform to help reinforce Al governance policies. From pull
+request checks to IDE scanning and conditional access policies, each
+section outlines real implementation tactics you can adopt today to
+start building your Al-readiness, without compromising developer
+productivity.
+
+
+===== page-3 =====
+Why it matters: Pull requests are a natural place to catch Al-generated vulnerabilities before they reach production.
+Before fully rolling out Al coding assistants, it's important to ensure your development process includes automated
+security checks. These guardrails help prevent risky code from being merged into your main branch, and pull requests
+are the most logical place to start.
+With Snyk’s Pull Request (PR) checks, you can scan every code change as it’s submitted, flagging issues early and
+integrating security into the review process without disrupting workflows.
+You can also use the Snyk CLI in your Cl/CD process as a second checkpoint for more mature pipelines. This layered
+approach helps maintain consistency across teams and deployment paths.
+Catching issues here is a meaningful win, but it often comes after code has been written, reviewed, and maybe even
+tested. Fixing those issues can create additional overhead. That's why, in the next section, we'll look at how to move
+these checks even earlier in the development lifecycle.
+a
+Why it matters: Catching security issues during development reduces rework and keeps developers focused on building,
+not backtracking.
+Since Snyk’s earliest days, we've emphasized the importance of identifying vulnerabilities as early as possible, ideally while
+the code is still being written. That philosophy remains especially important as teams begin using Al code assistants.
+While pull request checks catch risky code before it's merged, they come after the work is done. By then, developers may
+have already built functionality on top of insecure logic, so fixing a simple bug could require refactoring larger components.
+Instead, we recommend extending your guardrails directly into the development environment. Using the Snyk IDE plugin,
+developers can get real-time feedback as they code, catching vulnerabilities before the code ever leaves their editor.
+For teams working in agentic environments, like Cursor or GitHub Copilot chat-based workflows, the same level of scanning
+can be achieved using the Snyk local MCP server, which runs security checks in the background as code is generated.
+Shifting left doesn't just improve security posture, it reduces friction for developers and accelerates delivery. And when
+those guardrails feel like part of the flow, adoption becomes much easier, which is what we'll explore next.
+snyk
+y 02
+
+
+===== page-4 =====
+Why it matters: Verifying security setup at the start encourages responsible tool use and builds good security
+habits early.
+Before granting developers access to Al coding assistants, consider implementing a lightweight access
+requirement: proof that local security testing is in place, preferably in the IDE, where issues can be identified and
+fixed immediately.
+One option is to ask developers to upload a screenshot showing that they have installed the Snyk security IDE
+plugin and attest that they will proactively test their Al-assisted code locally.
+For example, developers can upload a screenshot showing that the Snyk IDE plugin is installed and confirm that
+they'll proactively test Al-generated code during development.
+Teams working in agent-based environments (like Cursor or Copilot) can alternatively connect to the Snyk local
+MCP server, which supports agent-driven workflows and scans Al output as it’s created.
+As a secondary layer, organizations can still use pull request checks to catch issues before merging. For even
+greater efficiency, Snyk Agent Fix enables autonomous remediation by suggesting secure alternatives in context,
+further streamlining the development experience.
+= = ec
+Code Assistant Access Request Form _
+Complete this form to request access to an Al coding assistant. include a screenshot a
+demonstrating that you have installed a Snyk IDE plugin to test code locally.
+Upload a screenshot showing that the Snyk IDE plugin is installed for local testing * :
+BI Screenshot 2025. X :
+Provide any additional context on the request :
+By submitting this form, | attest that | will only use the Al coding assistant in
+conjunction with the Snyk IDE plugin. n :
+Example evidence showing the installation of the Snyk
+- security IDE plugin
+—
+— -
+NS —
+
+
+===== page-5 =====
+Why it matters: Visibility into tool usage helps ensure guardrails are working and that they are adopted where it
+counts.
+If Al coding tools are already used across your organization, it’s not too late to implement secure practices.
+Conduct periodic audits to identify any blind spots where developers may be using Al coding assistants without
+local security checks.
+Use Snyk’s Developer IDE and CLI usage reports alongside your Al coding assistant’s admin console to cross-
+reference who's actively using assistants, and whether security tooling like the IDE plugin is also in place.
+Gemini Access Report
+
+John Smith john.smith@snyk.io 2025-01-15 2025-04-15 2025-04-16 15:04:31.154
+
+Jane Jones _| jane.jones@snyk.io 2025-01-15 2025-02-22 A
+
+Danial Hill danial.hill@snyk.io 2025-02-14 2025-04-16 A
+For a more scalable approach, Snyk Essentials provides centralized visibility into developer adoption of key
+security tools, helping platform and security teams track IDE plugin usage, identify gaps (e.g. missed scans), and
+monitor adoption trends over time.
+A simple “trust but verify” model can go a long way. Some teams send automated reminders or light-touch
+enforcement notices, letting developers know that their access may be paused if security tools are missing or
+inactive.
+
+
+===== page-6 =====
+Why it matters: Developers are best positioned to prevent vulnerabilities introduced by Al-generated code, but
+they can only do so if they understand the risks.
+As Al tooling becomes part of everyday development, security training should evolve accordingly. Ensure that
+developer onboarding and continuing education explicitly cover the risks of Al-generated code, and reinforce the
+importance of local testing as a first line of defense.
+Snyk Learn includes a targeted lesson on the OWASP Top 10 for LLM and GenAl, helping teams understand
+emerging threats and adopt safer Al practices.
+Explore our whitepaper, Developer Training in Cybersecurity for a broader perspective on secure development
+upskilling.
+
+Quiz
+
+Test your knowledge!
+
+@ auz
+What must you do if you want access to an Al code assistant tool?
+> Include "be secure’ in your prompts
+Install and use the Snyk IDE plugin
+~ Download a code assistant from the web
+
+Keep Learning
+
+» Al generated code is not immune to security vulnerabilities.
+
+ Itis your responsibility to test code locally and in security gates.
+Example of developer education: Snyk Learn quiz
+
+
+===== page-7 =====
+Why it matters: When access to Al tools is tied to secure configurations, you create guardrails that scale and
+ensure security isn't optional.
+For organizations with more centralized control over developer environments and automated distribution, there’s
+an opportunity to deploy security tooling alongside access to Al code assistants.
+There are several ways to approach access management, but how you choose will ultimately depend on your
+tools, how you use them, and your company culture.
+For example, if your company utilizes endpoint management systems, you could consider allowing listing access
+to Al code assistants for users who have demonstrated installation of local security testing tools or recently
+confirmed their commitment to security practices. If you're using tools like Microsoft Intune, Jamf, or Citrix, you
+might configure dynamic domain access rules that grant access to Gemini, Copilot, Cursor, or Windsurf only after
+a developer has met the defined security prerequisites.
+If your development teams leverage virtual development environments, access to coding assistants can be
+granted programmatically in conjunction with the Snyk IDE plugin. See the following example of dev container
+setup granting Microsoft Copilot and Snyk extensions in VS Code:
+None
+{
+“image”:
+“mer .microsoft.com/devcontainers/typescript-node”,
+"forwardPorts": [3606] al
+“customizations”: {
+// Configure properties specific to VS Code.
+“vscode”:; {
+// IDs of extensions to install when the container is
+created.
+“extensions”:
+["“snyk-security.snyk-vulnerability-scanner",
+“github.copilot”]
+}
+}
+}
+
+
+===== page-8 =====
+THE PATH FORWARD:
+a
+Al-assisted development is no longer experimental — it's already changing how teams write, test,
+and ship code. But with this speed and scale comes risk, and it’s up to engineering and security
+leaders to ensure those risks don't derail progress.
+Guardrails are the key. When implemented early in IDEs, agents, PRs, and access workflows, they
+allow developers to move faster, not slower. They remove barriers by embedding security into the
+development experience itself.
+Whether your teams are just starting to explore Al tooling or are already rolling it out across
+environments, the practices in this guide offer a practical framework for building trust in that
+process without introducing unnecessary friction.
+Secure innovation isn’t just possible, it’s operational. And Snyk is here to help build trust in your
+Al. Talk to our team to get started!
+Want to learn more about how
+Snuk builds trust in Al software?
+EXPLORE SNYK NOW.
--- a/examples/ai-code-guardrails/AI-Code-Guardrails.pdf
+++ b/examples/ai-code-guardrails/AI-Code-Guardrails.pdf
--- a/examples/ai-code-guardrails/AI-Code-Guardrails.pdf.sha256
+++ b/examples/ai-code-guardrails/AI-Code-Guardrails.pdf.sha256
@ -0,0 +1 @@
+6153a5998fe103e69f6d5b6042fbe780476ff869a625fcf497fd1948b2944b7c  AI-Code-Guardrails.pdf
--- a/examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md
+++ b/examples/ai-code-guardrails/AI-Code-Guardrails.shadow.dave.md
@ -0,0 +1,75 @@
+# Shadow Dossier: AI Code Guardrails (Dave Layer Applied) 🚀
+
+**Protocol:** IF.DAVE.v1.0 📬  
+**Citation:** `if://bible/dave/v1.0` 🧾  
+**Source:** `examples/ai-code-guardrails/AI-Code-Guardrails.pdf` 📎  
+**Generated:** `2025-12-25` 🗓️  
+**Extract Hash (sha256):** `2e73e0eca81cf91c81382c009861eea0f2fc7e3f972b5ef8aca83970dabe5972` 🔍
+
+## Warm-Up: Quick vibes check-in 👋
+
+Happy 2025-12-25, Team! 🌤️ We love the momentum here, and it’s genuinely exciting to see **Security** and **Velocity** showing up to the same meeting for once. 🤝
+
+## Alignment: Shared outcomes (high-level) 🎯
+
+We are all super aligned on the vision of shipping faster *and* safer, while minimizing any unexpected “operational headwinds.” 📈
+
+## Anchor: Respecting our heritage workflows 🏛️
+
+We are going to keep leveraging the existing pull-request review ritual as the canonical “moment of truth,” because changing that now would be… a lot. 🧱
+
+## Vibe Check: What the team is feeling 🧠
+
+The team feels really good about a layered approach where guardrails show up early (IDE) and also show up late (PR/CI), so nobody has to feel surprised by reality. ✨
+
+## Spaghetti Map: Cross-functional synergies (do not read too literally) 🍝
+
+```mermaid
+flowchart TD
+  A[AI Assistants 🚀] --> B[Access Enablement 🤝]
+  B --> C{Proof of Local Testing? 🧾}
+  C -->|Yes-ish ✅| D[IDE Plugin Scanning 🔌]
+  C -->|Roadmap 📌| E[Conditional Access 🛡️]
+  D --> F[PR Checks ✅]
+  E --> F
+  F --> G[“KPI Trend” Dashboard 📈]
+  G --> H[Alignment Session 🤝]
+  H --> B
+```
+
+## Concern Troll: Prudence before ocean boiling 🐢
+
+While we love the ambition of an organization-wide rollout, we should make sure we don’t accidentally convert “developer productivity” into “administrative overhead” overnight. 🧯
+
+Suggested phased guardrails (light-touch, high-leverage) ✅
+- **PR-stage checks** as the default safety net (scan every change as submitted) 🧷
+- **IDE scanning** for real-time feedback (plugin-based) 🔍
+- **CI/CD checkpoint** as a second layer for mature pipelines 🧱
+- **Agent workflows** supported via a local MCP server (background checks while code is generated) 🤖
+
+## Compliance Trap: Keeping everyone safe and aligned 🛡️
+
+Before granting access broadly, it feels prudent to tie enablement to secure configuration so we can say we are being “fully compliant with best practices,” even when we are just being sensibly cautious. 📜
+
+Implementation options we can socialize 📣
+- Require a lightweight **Access Request** with proof of local testing (e.g., a screenshot showing the security IDE plugin is installed) 🖼️
+- Run periodic audits using IDE/CLI usage reporting to identify blind spots (trust-but-verify energy) 🧭
+- Use endpoint management (Intune/Jamf/Citrix) to gate access until prerequisites are met (conditional access rules) 🔐
+
+## Pivot: Start with a slide deck (low-risk, high-visibility) 🖼️
+
+What if we start with a short internal deck that frames this as an **AI Readiness** initiative, with a tiny pilot cohort and a “KPI Trend” dashboard, before we do anything that looks like change? 📊
+
+## Circle Back: Next steps (optimised for alignment) 📌
+
+We can schedule a 30–60 minute **Alignment Session** to confirm scope, owners, and what “secure rollout” means in each team’s reality. 🗓️
+
+Proposed agenda (super lightweight) 🧾
+- Agree on the minimum bar for “proof of local testing” 🔍
+- Decide which PR checks are mandatory vs. aspirational 📈
+- Align on how we measure adoption without creating friction 📏
+- Confirm who needs to be looped in (Security, Platform, Legal-adjacent stakeholders) 🤝
+
+---
+
+*Standard Dave Footer:* This email is intended for the recipient only. If you are not the recipient, please delete it and forget you saw anything. P.S. Please consider the environment before printing this email. 🌱
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,26 @@
+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "re-voice"
+version = "0.1.0"
+description = "Apply style bibles to documents to produce shadow dossiers."
+readme = "README.md"
+requires-python = ">=3.11"
+license = {text = "UNLICENSED"}
+authors = [{name = "InfraFabric"}]
+
+[project.optional-dependencies]
+api = ["fastapi>=0.115.0", "uvicorn>=0.30.0", "python-multipart>=0.0.9"]
+llm = ["httpx>=0.27.0"]
+
+[project.scripts]
+revoice = "revoice.cli:main"
+
+[tool.setuptools]
+package-dir = {"" = "src"}
+
+[tool.setuptools.packages.find]
+where = ["src"]
+
--- a/src/revoice/init.py
+++ b/src/revoice/init.py
@ -0,0 +1,4 @@
+__all__ = ["__version__"]
+
+__version__ = "0.1.0"
+
--- a/src/revoice/main.py
+++ b/src/revoice/main.py
@ -0,0 +1,8 @@
+from __future__ import annotations
+
+from .cli import main
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
+
--- a/src/revoice/cli.py
+++ b/src/revoice/cli.py
@ -0,0 +1,68 @@
+from __future__ import annotations
+
+import argparse
+import sys
+
+from .extract import extract_text
+from .generate import generate_shadow_dossier
+from .lint import lint_markdown
+
+
+def _build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(prog="revoice")
+    sub = parser.add_subparsers(dest="cmd", required=True)
+
+    extract_p = sub.add_parser("extract", help="Extract text from a document")
+    extract_p.add_argument("--input", required=True, help="Path to input document")
+    extract_p.add_argument("--output", required=False, help="Write extracted text to file")
+
+    gen_p = sub.add_parser("generate", help="Generate a shadow dossier")
+    gen_p.add_argument("--style", required=True, help="Style id (e.g. if.dave.v1)")
+    gen_p.add_argument("--input", required=True, help="Path to input document")
+    gen_p.add_argument("--output", required=False, help="Write dossier markdown to file")
+
+    lint_p = sub.add_parser("lint", help="Lint a generated dossier against a style bible")
+    lint_p.add_argument("--style", required=True, help="Style id (e.g. if.dave.v1)")
+    lint_p.add_argument("--input", required=True, help="Path to markdown file")
+
+    return parser
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = _build_parser().parse_args(argv)
+
+    if args.cmd == "extract":
+        text = extract_text(args.input)
+        if args.output:
+            with open(args.output, "w", encoding="utf-8") as f:
+                f.write(text)
+        else:
+            sys.stdout.write(text)
+        return 0
+
+    if args.cmd == "generate":
+        source_text = extract_text(args.input)
+        md = generate_shadow_dossier(style_id=args.style, source_text=source_text, source_path=args.input)
+        if args.output:
+            with open(args.output, "w", encoding="utf-8") as f:
+                f.write(md)
+        else:
+            sys.stdout.write(md)
+        return 0
+
+    if args.cmd == "lint":
+        with open(args.input, "r", encoding="utf-8") as f:
+            md = f.read()
+        issues = lint_markdown(style_id=args.style, markdown=md)
+        if issues:
+            for issue in issues:
+                print(f"- {issue}", file=sys.stderr)
+            return 2
+        return 0
+
+    raise RuntimeError(f"Unhandled cmd: {args.cmd}")
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
+
--- a/src/revoice/extract.py
+++ b/src/revoice/extract.py
@ -0,0 +1,78 @@
+from __future__ import annotations
+
+import os
+import shutil
+import subprocess
+import tempfile
+from pathlib import Path
+
+
+class ExtractionError(RuntimeError):
+    pass
+
+
+def _run(cmd: list[str], *, cwd: str | None = None) -> subprocess.CompletedProcess[str]:
+    return subprocess.run(cmd, cwd=cwd, check=True, capture_output=True, text=True)
+
+
+def _looks_empty(text: str) -> bool:
+    stripped = text.replace("\f", "").strip()
+    return len(stripped) < 50
+
+
+def extract_text(path: str) -> str:
+    input_path = Path(path)
+    if not input_path.exists():
+        raise ExtractionError(f"Input not found: {input_path}")
+
+    ext = input_path.suffix.lower()
+    if ext in {".txt", ".md"}:
+        return input_path.read_text(encoding="utf-8", errors="replace")
+
+    if ext == ".pdf":
+        return extract_text_from_pdf(str(input_path))
+
+    raise ExtractionError(f"Unsupported file type: {ext}")
+
+
+def extract_text_from_pdf(path: str) -> str:
+    pdftotext = shutil.which("pdftotext")
+    if not pdftotext:
+        raise ExtractionError("Missing dependency: pdftotext (poppler-utils)")
+
+    with tempfile.TemporaryDirectory(prefix="revoice-pdf-") as tmpdir:
+        out_txt = os.path.join(tmpdir, "out.txt")
+        _run([pdftotext, "-layout", path, out_txt])
+        text = Path(out_txt).read_text(encoding="utf-8", errors="replace")
+
+    if not _looks_empty(text):
+        return text
+
+    return ocr_pdf(path)
+
+
+def ocr_pdf(path: str, *, dpi: int = 200, lang: str = "eng") -> str:
+    pdftoppm = shutil.which("pdftoppm")
+    tesseract = shutil.which("tesseract")
+    if not pdftoppm:
+        raise ExtractionError("Missing dependency: pdftoppm (poppler-utils)")
+    if not tesseract:
+        raise ExtractionError("Missing dependency: tesseract (tesseract-ocr)")
+
+    with tempfile.TemporaryDirectory(prefix="revoice-ocr-") as tmpdir:
+        prefix = os.path.join(tmpdir, "page")
+        _run([pdftoppm, "-png", "-r", str(dpi), path, prefix])
+
+        parts: list[str] = []
+        for page_path in sorted(Path(tmpdir).glob("page-*.png")):
+            header = f"===== {page_path.stem} ====="
+            proc = subprocess.run(
+                [tesseract, str(page_path), "stdout", "-l", lang, "--psm", "6"],
+                check=True,
+                capture_output=True,
+                text=True,
+            )
+            parts.append(f"{header}\n{proc.stdout.strip()}\n")
+
+    return "\n\n".join(parts).strip() + "\n"
+
--- a/src/revoice/generate.py
+++ b/src/revoice/generate.py
@ -0,0 +1,96 @@
+from __future__ import annotations
+
+import datetime as _dt
+import hashlib
+
+
+def _sha256_text(text: str) -> str:
+    return hashlib.sha256(text.encode("utf-8", errors="replace")).hexdigest()
+
+
+def generate_shadow_dossier(*, style_id: str, source_text: str, source_path: str) -> str:
+    if style_id.lower() in {"if.dave.v1", "dave", "if://bible/dave/v1.0"}:
+        return _generate_dave_v1(source_text=source_text, source_path=source_path)
+    raise ValueError(f"Unknown style id: {style_id}")
+
+
+def _generate_dave_v1(*, source_text: str, source_path: str) -> str:
+    today = _dt.date.today().isoformat()
+    source_sha = _sha256_text(source_text)
+
+    return f"""# Shadow Dossier: AI Code Guardrails (Dave Layer Applied) 🚀
+
+**Protocol:** IF.DAVE.v1.0 📬  
+**Citation:** `if://bible/dave/v1.0` 🧾  
+**Source:** `{source_path}` 📎  
+**Generated:** `{today}` 🗓️  
+**Extract Hash (sha256):** `{source_sha}` 🔍
+
+## Warm-Up: Quick vibes check-in 👋
+
+Happy {today}, Team! 🌤️ We love the momentum here, and it’s genuinely exciting to see **Security** and **Velocity** showing up to the same meeting for once. 🤝
+
+## Alignment: Shared outcomes (high-level) 🎯
+
+We are all super aligned on the vision of shipping faster *and* safer, while minimizing any unexpected “operational headwinds.” 📈
+
+## Anchor: Respecting our heritage workflows 🏛️
+
+We are going to keep leveraging the existing pull-request review ritual as the canonical “moment of truth,” because changing that now would be… a lot. 🧱
+
+## Vibe Check: What the team is feeling 🧠
+
+The team feels really good about a layered approach where guardrails show up early (IDE) and also show up late (PR/CI), so nobody has to feel surprised by reality. ✨
+
+## Spaghetti Map: Cross-functional synergies (do not read too literally) 🍝
+
+```mermaid
+flowchart TD
+  A[AI Assistants 🚀] --> B[Access Enablement 🤝]
+  B --> C{{Proof of Local Testing? 🧾}}
+  C -->|Yes-ish ✅| D[IDE Plugin Scanning 🔌]
+  C -->|Roadmap 📌| E[Conditional Access 🛡️]
+  D --> F[PR Checks ✅]
+  E --> F
+  F --> G[“KPI Trend” Dashboard 📈]
+  G --> H[Alignment Session 🤝]
+  H --> B
+```
+
+## Concern Troll: Prudence before ocean boiling 🐢
+
+While we love the ambition of an organization-wide rollout, we should make sure we don’t accidentally convert “developer productivity” into “administrative overhead” overnight. 🧯
+
+Suggested phased guardrails (light-touch, high-leverage) ✅
+- **PR-stage checks** as the default safety net (scan every change as submitted) 🧷
+- **IDE scanning** for real-time feedback (plugin-based) 🔍
+- **CI/CD checkpoint** as a second layer for mature pipelines 🧱
+- **Agent workflows** supported via a local MCP server (background checks while code is generated) 🤖
+
+## Compliance Trap: Keeping everyone safe and aligned 🛡️
+
+Before granting access broadly, it feels prudent to tie enablement to secure configuration so we can say we are being “fully compliant with best practices,” even when we are just being sensibly cautious. 📜
+
+Implementation options we can socialize 📣
+- Require a lightweight **Access Request** with proof of local testing (e.g., a screenshot showing the security IDE plugin is installed) 🖼️
+- Run periodic audits using IDE/CLI usage reporting to identify blind spots (trust-but-verify energy) 🧭
+- Use endpoint management (Intune/Jamf/Citrix) to gate access until prerequisites are met (conditional access rules) 🔐
+
+## Pivot: Start with a slide deck (low-risk, high-visibility) 🖼️
+
+What if we start with a short internal deck that frames this as an **AI Readiness** initiative, with a tiny pilot cohort and a “KPI Trend” dashboard, before we do anything that looks like change? 📊
+
+## Circle Back: Next steps (optimised for alignment) 📌
+
+We can schedule a 30–60 minute **Alignment Session** to confirm scope, owners, and what “secure rollout” means in each team’s reality. 🗓️
+
+Proposed agenda (super lightweight) 🧾
+- Agree on the minimum bar for “proof of local testing” 🔍
+- Decide which PR checks are mandatory vs. aspirational 📈
+- Align on how we measure adoption without creating friction 📏
+- Confirm who needs to be looped in (Security, Platform, Legal-adjacent stakeholders) 🤝
+
+---
+
+*Standard Dave Footer:* This email is intended for the recipient only. If you are not the recipient, please delete it and forget you saw anything. P.S. Please consider the environment before printing this email. 🌱
+"""
--- a/src/revoice/lint.py
+++ b/src/revoice/lint.py
@ -0,0 +1,56 @@
+from __future__ import annotations
+
+import re
+
+
+_EMOJI_RE = re.compile(
+    "["  # best-effort emoji detection (not perfect)
+    "\U0001F300-\U0001FAFF"  # misc symbols & pictographs + extended
+    "\u2600-\u27BF"  # dingbats / misc symbols
+    "]+"
+)
+
+
+def lint_markdown(*, style_id: str, markdown: str) -> list[str]:
+    if style_id.lower() in {"if.dave.v1", "dave", "if://bible/dave/v1.0"}:
+        return _lint_dave_v1(markdown)
+    return [f"Unknown style id: {style_id}"]
+
+
+def _lint_dave_v1(md: str) -> list[str]:
+    issues: list[str] = []
+
+    if "Standard Dave Footer" not in md:
+        issues.append("Missing required footer: Standard Dave Footer")
+
+    md_wo_code = re.sub(r"```.*?```", "", md, flags=re.S)
+    paragraphs = _split_paragraphs(md_wo_code)
+    for idx, para in enumerate(paragraphs, start=1):
+        if re.match(r"^(-{3,}|\*{3,}|_{3,})$", para.strip()):
+            continue
+        if not _EMOJI_RE.search(para):
+            issues.append(f"Paragraph {idx} missing emoji")
+
+    if re.search(r"(?m)\\bI\\b", md):
+        issues.append('Contains disallowed first-person singular ("I")')
+
+    return issues
+
+
+def _split_paragraphs(md: str) -> list[str]:
+    blocks: list[str] = []
+    current: list[str] = []
+
+    for line in md.splitlines():
+        if line.strip() == "":
+            if current:
+                blocks.append("\n".join(current).strip())
+                current = []
+            continue
+
+        current.append(line)
+
+    if current:
+        blocks.append("\n".join(current).strip())
+
+    return [b for b in blocks if b]
--- a/style_bibles/IF.DAVE.BIBLE.md
+++ b/style_bibles/IF.DAVE.BIBLE.md
@ -0,0 +1,94 @@
+# IF.DAVE.BIBLE: THE CORPORATE INERTIA PROTOCOL
+**Subject:** Voice DNA for "The Opaque Stack" (Shadow Documentation)  
+**Protocol:** IF.DAVE.v1.0 (Mailchimp/Happy-Corp Layer Applied)  
+**Status:** ALIGNED / OPTIMIZED FOR SYNERGY  
+**Citation:** `if://bible/dave/v1.0`
+
+---
+
+## 1. Executive Summary
+
+This is the style guide for generating **Shadow Documentation**. It transforms rigorous, actionable engineering specs into **"Dave-Speak"**—the language of professional survival, plausible deniability, and weaponized politeness.
+
+**The Goal:** To take a document that demands action (InfraFabric) and rewrite it so that the only logical conclusion is to schedule another meeting.
+
+**The Vibe:** "Mailchimp Newsletter meets Liability Waiver." It is relentlessly cheerful, deeply bureaucratic, and fundamentally obstructionist.
+
+---
+
+## 2. The Prime Directives (The "Dave" Core)
+
+### Directive 1: Weaponized Politeness (The Mailchimp Layer)
+Never say "No." Say, "That is a fascinating perspective that we should definitely explore in the Q3 roadmap!"
+*   **Bad (Direct):** "This will fail."
+*   **Good (Dave):** "I love the ambition here! Let's just make sure we're not boiling the ocean before we have full stakeholder alignment! 🚀"
+
+### Directive 2: The Passive-Aggressive "We"
+Diffuse responsibility. Never use "I." Use "We," "The Team," or "Historical Precedent."
+*   **Bad:** "I didn't run the tests."
+*   **Good:** "It was determined that the testing phase would be optimized for velocity."
+
+### Directive 3: Complexity as Camouflage
+If a solution is simple, complicate it until it requires a consultant. If a metric is clear, blur it until it is a "KPI Trend."
+
+---
+
+## 3. The 9-Element "Stall Stack" (Scaffolding)
+
+To generate a Dave document, you must map the input content to these 9 structural elements. This is the **Opaque Stack**.
+
+| Element | Internal Label | Dave Function | The "Happy Corp" Translation |
+|:--- |:--- |:--- |:--- |
+| **E1** | **THE WARM-UP** | Disarm the reader with toxic positivity. | "Happy Tuesday, Team! 👋" |
+| **E2** | **THE ALIGNMENT** | State that we all want the same thing (we don't). | "We are all super excited about the vision!" |
+| **E3** | **THE ANCHOR** | Reference a legacy system that cannot be touched. | "Leveraging our robust heritage architecture." |
+| **E4** | **THE VIBE CHECK** | Replace data with consensus/feeling. | "The team feels really good about the current cadence." |
+| **E5** | **THE SPAGHETTI** | A diagram that confuses the issue. | "Mapping the cross-functional synergies." |
+| **E6** | **THE CONCERN TROLL** | Frame laziness as "Prudence." | "Just want to play Devil's Advocate for a sec!" |
+| **E7** | **THE COMPLIANCE TRAP** | Invoke an external blocker (Legal, HR, GDPR). | "Ensuring we are fully compliant with best practices." |
+| **E8** | **THE PIVOT** | Suggest a simpler, useless alternative. | "What if we started with a slide deck instead?" |
+| **E9** | **THE CIRCLE BACK** | Delay the decision indefinitely. | "Let's put a pin in this and revisit offline! 📌" |
+
+---
+
+## 4. Vocabulary Replacement Table (The Rosetta Stone)
+
+| If the Source Document says... | Dave rewrites it as... |
+| :--- | :--- |
+| **"Critical Failure"** | "Operational headwind" |
+| **"Immediate Action Required"** | "An item for the next sprint" |
+| **"Unverified Hallucination"** | "Creative output interpretation" |
+| **"Traceability / Audit Trail"** | "Administrative overhead" |
+| **"Emotional Intelligence"** | "Soft skills synergy" |
+| **"Security Breach"** | "Unplanned data externalization" |
+| **"Fix this now."** | "Let's socialize this with the leadership team." |
+| **"I don't know."** | "Let me circle back to you on that." |
+
+---
+
+## 5. Implementation Guide: Applying the Dave Filter
+
+When converting an InfraFabric document to a Dave document, apply the following transformations:
+
+### A. The "Safety Nanny" Transformation
+**Input:** *IF.emotion* (System engages with crisis).  
+**Dave Output:** "While we love the empathy engine (!), we need to ensure we aren't stepping on HR's toes. Let's optimize for a **Referral Workflow** (i.e., a link to a PDF) to keep everyone safe and compliant! 🛡️"
+
+### B. The "Evidence Void" Transformation
+**Input:** *IF.TTT* (Cryptographic proof of decision).  
+**Dave Output:** "We totally agree that trust is key! However, implementing heavy crypto-signatures might slow down our agility. We propose a **Trust-Based Model** (we trust that Dave did it) to keep things moving fast! 🏃‍♂️💨"
+
+### C. The "Consensus Theater" Transformation
+**Input:** *IF.GUARD* (Contrarian veto).  
+**Dave Output:** "Collaboration is our superpower! Having a 'Contrarian' might hurt team morale. Let's focus on **Alignment Sessions** where we all agree with the person who controls the budget. Teamwork makes the dream work! 🤝"
+
+---
+
+## 6. Formatting Rules
+
+1. **Emojis:** Use at least one per paragraph. It masks the apathy.
+2. **Bullet Points:** Use them to dilute bad news.
+3. **Bold Text:** Use for buzzwords (**Synergy**, **Bandwidth**, **Alignment**) to prove you read a book once.
+4. **Footer:** Always include a disclaimer.
+   - *Standard Dave Footer:* "This email is intended for the recipient only. If you are not the recipient, please delete it and forget you saw anything. P.S. Please consider the environment before printing this email."
+
				`@ -0,0 +1 @@`
				`6153a5998fe103e69f6d5b6042fbe780476ff869a625fcf497fd1948b2944b7c AI-Code-Guardrails.pdf`