re-voice/docs/APP_SPEC.md

3.2 KiB

re-voice app proposal: “upload → shadow dossier”

Product goal

Let a user upload any document (PDF/DOCX/MD/HTML/images) and receive a shadow dossier rendered through a chosen style bible (e.g. if://bible/dave/v1.0).

Non-goals (v0)

  • Perfect fidelity layout extraction (we only need usable text + key figures)
  • Long-term storage/retention policies (we can stub, then harden)

Architecture (thin UI, strong pipeline)

1) Ingest

  • Upload endpoint: POST /api/dossiers (multipart)
  • Compute and persist:
    • sha256 of original
    • detected mime
    • storage pointer (disk/S3/Forgejo blob)
  • Create Document row: {id, sha256, filename, mime, created_at, owner}

2) Extract → Canonicalize

Use a pluggable extractor chain:

  • PDF:
    1. pdftotext (fast path, text-layer PDFs)
    2. OCR fallback (pdftoppmtesseract) for image-only PDFs
  • DOCX: pandoc or python-docx
  • HTML: readability-style boilerplate removal
  • Images: OCR (tesseract) with basic deskew

Output a canonical block model (enables better prompting + citations):

{
  "doc_id": "…",
  "blocks": [
    {"type":"heading","level":1,"text":"…"},
    {"type":"paragraph","text":"…"},
    {"type":"list","items":["…","…"]}
  ]
}

3) Style bible compiler

Store bibles in-repo as Markdown + a small metadata header (id, version, citation, hard rules).

Compile the bible into:

  • system_prompt (voice + forbidden/required constraints)
  • template (required dossier structure)
  • lint_rules (post-checks: emojis/paragraph, pronouns, required footer, etc.)

4) Generate

Two-step generation is safer and more controllable:

  1. Content distillation (extract doc facts → structured notes)
  2. Style application (render notes into dossier template under bible constraints)

Recommended runtime:

  • OpenAI-compatible Chat Completions backend (Juakali / OpenWebUI stack)
  • Persist {model, prompts, output_sha256} for auditability

5) Validate (style linter)

Run a deterministic linter per bible:

  • hard constraints (e.g., “emoji per paragraph” for Dave)
  • vocabulary swaps (optional)
  • required footer/disclaimer
  • “no secrets” scan (best-effort)

If lint fails: auto-repair pass (LLM) or return “needs revision” with lint report.

5b) Mermaid preflight (PDF export reliability)

If the output includes Mermaid diagrams, run a preflight pass before PDF export:

  • auto-heal Mermaid blocks (quote labels, normalize headers, balance subgraph/end)
  • validate Mermaid rendering in the same runtime used by the PDF exporter

In re-voice, this is exposed as:

revoice preflight --style <style> --input <output.md> --source <source-doc>

6) Export + publishing

Outputs:

  • Markdown (primary)
  • PDF via existing Forgejo PDF export (.../raw/...&format=pdf) by committing generated Markdown to a repo

Publishing strategy:

  • Store outputs in a Forgejo repo (per team/project)
  • Provide immutable links to {sha} + .sha256 sidecars

Security + operational considerations

  • Run extraction/OCR in a sandboxed worker (CPU/mem/time limits).
  • Never store API keys in repos; use env/secret manager.
  • Keep an audit trail: source hash → extracted text hash → output hash → model/prompt hashes.