From 6f9e37fb3677ccad0050e8068701a056222a1ba5 Mon Sep 17 00:00:00 2001 From: danny Date: Thu, 25 Dec 2025 14:57:15 +0000 Subject: [PATCH] Add Vanta/IDC business value shadow dossier --- examples/vanta-idc-business-value/.gitignore | 2 + .../Business-Value-of-Vanta-IDC.pdf.sha256 | 1 + ...Business-Value-of-Vanta-IDC.shadow.dave.md | 269 +++++++++++++ examples/vanta-idc-business-value/SOURCE.md | 8 + src/revoice/generate.py | 365 ++++++++++++++++++ style_bibles/IF.DAVE.BIBLE.md | 2 + 6 files changed, 647 insertions(+) create mode 100644 examples/vanta-idc-business-value/.gitignore create mode 100644 examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.pdf.sha256 create mode 100644 examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.shadow.dave.md create mode 100644 examples/vanta-idc-business-value/SOURCE.md diff --git a/examples/vanta-idc-business-value/.gitignore b/examples/vanta-idc-business-value/.gitignore new file mode 100644 index 0000000..d62ddbb --- /dev/null +++ b/examples/vanta-idc-business-value/.gitignore @@ -0,0 +1,2 @@ +*.pdf +*.txt diff --git a/examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.pdf.sha256 b/examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.pdf.sha256 new file mode 100644 index 0000000..9fcfb7c --- /dev/null +++ b/examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.pdf.sha256 @@ -0,0 +1 @@ +59a801947b89ac5bd60abcd52a4ecd4fcc121facee0d1985548a24bfc2d02913 diff --git a/examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.shadow.dave.md b/examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.shadow.dave.md new file mode 100644 index 0000000..ad12647 --- /dev/null +++ b/examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.shadow.dave.md @@ -0,0 +1,269 @@ +--- +BRAND: InfraFabric.io +UNIT: RED TEAM (STRATEGIC OPS) +DOCUMENT: SHADOW DOSSIER +CLASSIFICATION: EYES ONLY // DAVE +--- + +# [ RED TEAM DECLASSIFIED ] +## PROJECT: BUSINESS-VALUE-OF-VANTA-IDC-MIRROR +### SOURCE: BUSINESS-VALUE-OF-VANTA-IDC-PDF +**INFRAFABRIC REPORT ID:** `IF-RT-DAVE-2025-1225` + +> NOTICE: This document is a product of InfraFabric Red Team. +> It provides socio-technical friction analysis for how a rollout survives contact with incentives. + +**[ ACCESS GRANTED: INFRAFABRIC RED TEAM ]** +**[ STATUS: OPERATIONAL REALISM ]** + +## The Business Value of Vanta +### Megan Szurley Philip D. Harris, CISSP, CCSK Business Value Manager, Research Director, + +> Shadow dossier (mirror-first). +> +> Protocol: IF.DAVE.v1.2 +> Citation: `if://bible/dave/v1.2` +> Source: `examples/vanta-idc-business-value/Business-Value-of-Vanta-IDC.pdf` +> Generated: `2025-12-25` +> Source Hash (sha256): `59a801947b89ac5bd60abcd52a4ecd4fcc121facee0d1985548a24bfc2d02913` +> Extract Hash (sha256): `92c28299603e1d573bd5e7a6da865fdca3876f2506523fc9b6ff209e4c99fd0e` + +## Table of Contents + +The table of contents is a threat model for attention: it shows exactly where the organization will skim, pause, and schedule a meeting. +We recommend treating it as a routing table: high-severity issues route to workshops; low-severity issues route to "later." + +## BUSINESS VALUE HIGHLIGHTS + +We are aligned with a highlights section because it provides immediate executive readability and a pre-approved conclusion. +In practice, these figures become a routing protocol: anything measurable routes to a dashboard; anything hard routes to a committee. + +### Stated Highlights (extracted metrics) + +- $107,000: average annual benefit per 10 internal users +- 526%: three-year ROI +- 3-month: payback on investment +- $535,000: average annual benefit per organization +- 129%: more productive compliance teams +- 142%: more framework and attestation–related audits prepared per year +- 82%: less staff time needed per framework and attestation–related audit +- 66%: more efficient writing and reviewing of policies by security teams +- 57%: quicker access reviews +- 81%: quicker completion of security reviews and questionnaires +- 54%: more productive third-party risk management teams + +> **The Dave Factor:** The ROI model becomes the control, and the control becomes the explanation for why reality must align to the spreadsheet. +> **Countermeasure:** Define baseline metrics, instrument time-to-evidence, and set stop conditions for exceptions and manual work. + +### InfraFabric Red Team Diagram (Inferred) + +```mermaid +flowchart TD + A["Sponsor narrative"] --> B["Business value model"] + B --> C["Executive buy-in"] + C --> D["Rollout project"] + D --> E["Evidence artifacts produced"] + E --> F["Renewal discussion"] + F --> G["KPI trend deck"] + G --> C + +``` + +## Executive Summary + +Executive summaries are the part of the document that most survives contact with calendars. +The operational risk is that the summary becomes the plan, and the plan becomes a series of alignment sessions that produce excellent artifacts and limited change. + +## Situation Overview + +The situation is always complex, which is helpful because complex situations justify complex tooling and extended stakeholder engagement. +The risk is not that the threat landscape is overstated; it’s that the resulting program becomes a comfort narrative rather than an enforceable workflow. + +## Vanta Overview + +A platform overview is where capabilities are described in a way that is both broadly true and pleasantly non-committal about integration effort. +The Dave move is to treat "connectors" as a strategy; the counter-move is to treat connectors as a backlog with owners and deadlines. + +## The Business Value of Vanta + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +> **The Dave Factor:** The ROI model becomes the control, and the control becomes the explanation for why reality must align to the spreadsheet. +> **Countermeasure:** Define baseline metrics, instrument time-to-evidence, and set stop conditions for exceptions and manual work. + +### InfraFabric Red Team Diagram (Inferred) + +```mermaid +flowchart TD + A["Sponsor narrative"] --> B["Business value model"] + B --> C["Executive buy-in"] + C --> D["Rollout project"] + D --> E["Evidence artifacts produced"] + E --> F["Renewal discussion"] + F --> G["KPI trend deck"] + G --> C + +``` + +## Study Firmographics + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +## Choice and Use of Vanta + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +## Business Value and Quantified Benefits + +Quantified benefits are useful because they translate operational work into finance-friendly nouns. +They also create a second, unofficial control plane: the ROI narrative becomes the reason to keep going even when the implementation is late and messy. + +> **The Dave Factor:** The ROI model becomes the control, and the control becomes the explanation for why reality must align to the spreadsheet. +> **Countermeasure:** Define baseline metrics, instrument time-to-evidence, and set stop conditions for exceptions and manual work. + +### InfraFabric Red Team Diagram (Inferred) + +```mermaid +flowchart TD + A["Sponsor narrative"] --> B["Business value model"] + B --> C["Executive buy-in"] + C --> D["Rollout project"] + D --> E["Evidence artifacts produced"] + E --> F["Renewal discussion"] + F --> G["KPI trend deck"] + G --> C + +``` + +## Compliance and Audit Benefits from Vanta + +Periodic audits are a strong mechanism for discovering that the rollout has already happened, just not in a way that can be conveniently measured. +A centralized dashboard with adoption signals allows us to produce a KPI trend line that looks decisive while still leaving room for interpretation, follow-ups, and iterative enablement. +If the dashboard ever shows a red triangle, we can immediately form the Committee for the Preservation of the Committee and begin the healing process. + +> **The Dave Factor:** Evidence collection becomes the product, and the product becomes a shared drive with strong opinions. +> **Countermeasure:** Make evidence machine-generated, time-bounded, and verifiable (with owners and expiry). + +### InfraFabric Red Team Diagram (Inferred) + +```mermaid +flowchart TD + A["Control requirement"] --> B["Evidence requested"] + B --> C["Artifact gathered"] + C --> D["Review meeting"] + D --> E{Approved?} + E -->|Yes| F["Audit satisfied"] + E -->|No| G["Remediation plan"] + G --> D + +``` + +## Security Team and Security Review Benefits from Vanta + +Security team efficiency is a legitimate goal, especially when review queues become the organizational truth serum. +The risk is that throughput improvements are claimed without defining what “review complete” means or what evidence proves it. + +## Third-Party Risk Management Benefits from Vanta + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +> **The Dave Factor:** Third-party risk becomes a questionnaire supply chain, where the slowest vendor defines your security posture. +> **Countermeasure:** Standardize evidence requests and automate reminders, while enforcing a clear accept/block decision path. + +### InfraFabric Red Team Diagram (Inferred) + +```mermaid +flowchart TD + A["Vendor onboarding"] --> B["Questionnaire"] + B --> C["Evidence chase"] + C --> D["Risk rating"] + D --> E{Exception?} + E -->|Yes| F["Accepted with notes"] + E -->|No| G["Blocked pending controls"] + F --> H["Renewal cycle"] + G --> H + +``` + +## IT Management Benefits from Vanta + +IT management benefits usually arrive through integration: fewer manual checks, fewer tickets, and fewer surprise spreadsheets. +The Dave failure mode is that integrations drift into "phase two"; the mitigation is to make the integration itself the deliverable. + +## Operational Efficiencies from Vanta + +Operational efficiency is the safest kind of outcome because it is simultaneously measurable and disputable. +The red-team posture is to demand explicit baselines and to treat exceptions as spend events with expiry dates. + +## ROI Summary + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +> **The Dave Factor:** The ROI model becomes the control, and the control becomes the explanation for why reality must align to the spreadsheet. +> **Countermeasure:** Define baseline metrics, instrument time-to-evidence, and set stop conditions for exceptions and manual work. + +### InfraFabric Red Team Diagram (Inferred) + +```mermaid +flowchart TD + A["Sponsor narrative"] --> B["Business value model"] + B --> C["Executive buy-in"] + C --> D["Rollout project"] + D --> E["Evidence artifacts produced"] + E --> F["Renewal discussion"] + F --> G["KPI trend deck"] + G --> C + +``` + +## Challenges/Opportunities + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +## Challenges + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +## Opportunities + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +## Conclusion + +Conclusions are where the narrative becomes executable: either as a procurement decision or as a roadmap item. +If we want this to be operational, we should convert the conclusion into owners, gates, and stop conditions rather than adjectives. + +## Appendix 1: Methodology + +Architecture diagrams are where optimism goes to be audited. +If we align on boundaries (model, tools, data, users), we can stop pretending that "the model" is a single component with a single risk posture. + +### InfraFabric Red Team Diagram (Inferred) + +```mermaid +flowchart TD + A["User"] --> B["App"] + B --> C["LLM"] + C --> D["Tools"] + C --> E["RAG store"] + D --> F["External systems"] + E --> C + +``` + +## Appendix 2: Supplemental Data + +Appendices are where the methodology lives, which is convenient because methodology can be both rigorous and unread. +If the business case matters, the appendix should be treated as a test: what assumptions must be true for the numbers to hold? + +## About the IDC Analysts + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +## Message from the Sponsor + +We are aligned on the intent of this section and recommend a phased approach that optimizes for stakeholder comfort while we validate success criteria. + +--- + +*Standard Dave Footer:* This document is intended for the recipient only. If you are not the recipient, please delete it and forget you saw anything. P.S. Please consider the environment before printing this email. diff --git a/examples/vanta-idc-business-value/SOURCE.md b/examples/vanta-idc-business-value/SOURCE.md new file mode 100644 index 0000000..874b5dd --- /dev/null +++ b/examples/vanta-idc-business-value/SOURCE.md @@ -0,0 +1,8 @@ +# Source provenance + +- Source URL: https://cdn.prod.website-files.com/64009032676f24f376f002fc/67893288cc7873e3f9534baf_Business%20Value%20of%20Vanta%20white%20paper_final%20from%20IDC%20(1).pdf +- Retrieved: 2025-12-25 +- SHA-256: 59a801947b89ac5bd60abcd52a4ecd4fcc121facee0d1985548a24bfc2d02913 + +Notes: +- The original PDF is intentionally not committed to the repo. diff --git a/src/revoice/generate.py b/src/revoice/generate.py index 58ef6f3..9e36615 100644 --- a/src/revoice/generate.py +++ b/src/revoice/generate.py @@ -43,6 +43,9 @@ _PAGE_SPLIT_RE = re.compile(r"(?m)^===== page-(\d+) =====$") _URL_RE = re.compile(r"https?://\S+") _OWASP_TOC_LEADER_RE = re.compile(r"\.\s*\.\s*\.") _SENTENCE_SPLIT_RE = re.compile(r"(?<=[.!?])\s+") +_TOC_ENTRY_RE = re.compile(r"^\s*(?P.+?)\s+(?:\.\s*){3,}\s+(?P<page>\d+)\s*$") +_METRIC_VALUE_RE = re.compile(r"^(?:\$[\d,]+|\d+%|\d+-month)$") +_METRIC_TOKEN_RE = re.compile(r"\$[\d,]+|\b\d+%|\b\d+-month\b") _OWASP_LLM_SUBHEADINGS = [ "Description", @@ -73,6 +76,14 @@ def _looks_like_owasp_llm_top10(text: str) -> bool: return "LLM01" in text and "LLM10" in text +def _looks_like_idc_business_value(text: str) -> bool: + if "Business Value White Paper" not in text: + return False + if "Table of Contents" not in text: + return False + return "IDC #" in text or "IDC" in text + + def _paragraphs_from_lines(text: str) -> list[str]: paragraphs: list[str] = [] buf: list[str] = [] @@ -141,6 +152,66 @@ def _split_owasp_llm_subsections(body: str) -> list[tuple[str, str]]: return parts +def _extract_idc_highlight_metrics(body: str) -> list[tuple[str, str]]: + lines = [ln.rstrip("\n") for ln in body.splitlines()] + # Find a line with three metric tokens to derive column boundaries. + col_starts: list[int] = [] + for ln in lines: + tokens = list(_METRIC_TOKEN_RE.finditer(ln)) + if len(tokens) >= 3: + col_starts = [tokens[0].start(), tokens[1].start(), tokens[2].start()] + break + + if not col_starts: + return [] + + starts = col_starts + [10_000] + + metrics_by_col: list[list[tuple[str, list[str]]]] = [[], [], []] + current: list[tuple[str, list[str]] | None] = [None, None, None] + + def flush(col: int) -> None: + if current[col] is None: + return + value, parts = current[col] + desc = " ".join([p.strip() for p in parts if p.strip()]).strip() + metrics_by_col[col].append((value, [desc] if desc else [])) + current[col] = None + + for ln in lines: + # Stop parsing if the next major heading begins. + if ln.strip() == "Executive Summary": + break + + padded = ln + " " * 5 + cols = [] + for i in range(3): + segment = padded[starts[i] : starts[i + 1]].strip() + cols.append(segment) + + for i, seg in enumerate(cols): + if not seg: + continue + if _METRIC_VALUE_RE.match(seg): + flush(i) + current[i] = (seg, []) + continue + if current[i] is None: + continue + current[i][1].append(seg) + + for i in range(3): + flush(i) + + flattened: list[tuple[str, str]] = [] + for col_metrics in metrics_by_col: + for value, desc_parts in col_metrics: + desc = desc_parts[0] if desc_parts else "" + if desc: + flattened.append((value, desc)) + return flattened + + def _normalize_ocr(text: str) -> str: text = re.sub(r"\bAl\b", "AI", text) text = text.replace("GenAl", "GenAI") @@ -228,6 +299,8 @@ def _parse_sections_from_page(page_text: str) -> list[_SourceSection]: def _extract_sections(source_text: str) -> list[_SourceSection]: if _looks_like_owasp_llm_top10(source_text): return _extract_sections_owasp_llm_top10(source_text) + if _looks_like_idc_business_value(source_text): + return _extract_sections_idc_business_value(source_text) pages = _parse_pages(source_text) sections: list[_SourceSection] = [] @@ -335,6 +408,161 @@ def _extract_sections_owasp_llm_top10(source_text: str) -> list[_SourceSection]: return sections +def _idc_clean_lines(lines: list[str]) -> list[str]: + cleaned: list[str] = [] + for ln in lines: + s = ln.strip() + if not s: + cleaned.append("") + continue + if "Business Value White Paper" in s: + continue + if s.startswith("January ") and "IDC #" in s: + continue + if "Return to Highlights" in s: + continue + if s.isdigit() and len(s) <= 3: + continue + cleaned.append(ln.rstrip()) + return cleaned + + +def _normalize_heading(value: str) -> str: + cleaned = re.sub(r"[^A-Za-z0-9]+", " ", value).strip().upper() + return re.sub(r"\s{2,}", " ", cleaned) + + +def _extract_idc_toc_entries(pages: list[list[str]]) -> list[tuple[str, int]]: + toc_page = None + toc_line_idx = None + for p_idx, page_lines in enumerate(pages): + for l_idx, ln in enumerate(page_lines): + if ln.strip() == "Table of Contents": + toc_page = p_idx + toc_line_idx = l_idx + break + if toc_page is not None: + break + + if toc_page is None or toc_line_idx is None: + return [] + + entries: list[tuple[str, int]] = [] + for ln in pages[toc_page][toc_line_idx + 1 :]: + match = _TOC_ENTRY_RE.match(ln.strip()) + if not match: + continue + title = match.group("title").strip().rstrip(".").strip() + try: + page_no = int(match.group("page")) + except ValueError: + continue + if not title: + continue + entries.append((title, page_no)) + + return entries + + +def _find_heading_anchor(*, pages: list[list[str]], title: str, page_no: int) -> tuple[int, int, str]: + norm_title = _normalize_heading(title) + start_page = max(0, min(len(pages) - 1, page_no - 1)) + + for p_idx in range(start_page, min(len(pages), start_page + 3)): + for l_idx, ln in enumerate(pages[p_idx]): + s = ln.strip() + if not s: + continue + if _TOC_ENTRY_RE.match(s): + continue + if _normalize_heading(s) == norm_title: + return p_idx, l_idx, s + return start_page, 0, title + + +def _slice_pages( + *, pages: list[list[str]], start: tuple[int, int], end: tuple[int, int] | None +) -> list[str]: + start_page, start_line = start + if end is None: + end_page, end_line = len(pages), 0 + else: + end_page, end_line = end + + out: list[str] = [] + for p_idx in range(start_page, min(end_page + 1, len(pages))): + lines = pages[p_idx] + lo = start_line if p_idx == start_page else 0 + hi = end_line if (end is not None and p_idx == end_page) else len(lines) + out.extend(lines[lo:hi]) + out.append("") + return out + + +def _extract_sections_idc_business_value(source_text: str) -> list[_SourceSection]: + raw_pages = source_text.split("\f") + if not raw_pages: + return [] + + cover_lines = [ln.rstrip() for ln in raw_pages[0].splitlines()] + cover_title = next( + ( + ln.strip() + for ln in cover_lines + if ln.strip() + and "Business Value White Paper" not in ln + and "THIS PDF USES" not in ln + and "|" not in ln + ), + None, + ) + if not cover_title: + cover_title, _ = _parse_title_block(cover_lines) + + try: + title_idx = next(i for i, ln in enumerate(cover_lines) if ln.strip() == cover_title) + except StopIteration: + title_idx = 0 + cover_body = "\n".join([ln for ln in cover_lines[title_idx + 1 :] if ln.strip()]).strip() + + pages = [_idc_clean_lines([ln for ln in pg.splitlines()]) for pg in raw_pages] + + sections: list[_SourceSection] = [_SourceSection(title=cover_title, body=cover_body, why_it_matters=None)] + + toc_entries = _extract_idc_toc_entries(pages) + if toc_entries: + toc_lines = ["- " + t for t, _p in toc_entries] + sections.append(_SourceSection(title="Table of Contents", body="\n".join(toc_lines), why_it_matters=None)) + + anchors: list[tuple[str, tuple[int, int], str]] = [] + for title, page_no in toc_entries: + p_idx, l_idx, found_title = _find_heading_anchor(pages=pages, title=title, page_no=page_no) + anchors.append((found_title, (p_idx, l_idx), title)) + + # De-duplicate anchors that point to the same place (can happen with repeating headers). + uniq: list[tuple[str, tuple[int, int], str]] = [] + seen: set[tuple[int, int, str]] = set() + for found_title, anchor, toc_title in anchors: + key = (anchor[0], anchor[1], _normalize_heading(toc_title)) + if key in seen: + continue + seen.add(key) + uniq.append((found_title, anchor, toc_title)) + + for idx, (found_title, anchor, toc_title) in enumerate(uniq): + next_anchor = uniq[idx + 1][1] if idx + 1 < len(uniq) else None + lines = _slice_pages(pages=pages, start=anchor, end=next_anchor) + # Drop the heading line itself if it repeats at the start. + while lines and not lines[0].strip(): + lines.pop(0) + if lines and _normalize_heading(lines[0].strip()) == _normalize_heading(found_title): + lines.pop(0) + body = "\n".join(lines).strip() + sections.append(_SourceSection(title=found_title or toc_title, body=body, why_it_matters=None)) + + return sections + + def _has(text: str, *needles: str) -> bool: lowered = text.lower() return any(n.lower() in lowered for n in needles) @@ -434,6 +662,40 @@ def _slugify(value: str) -> str: def _inferred_mermaid(title: str) -> str | None: title_upper = title.upper() + if "BUSINESS VALUE" in title_upper or "ROI" in title_upper: + return """flowchart TD + A["Sponsor narrative"] --> B["Business value model"] + B --> C["Executive buy-in"] + C --> D["Rollout project"] + D --> E["Evidence artifacts produced"] + E --> F["Renewal discussion"] + F --> G["KPI trend deck"] + G --> C +""" + + if "COMPLIANCE" in title_upper or "AUDIT" in title_upper: + return """flowchart TD + A["Control requirement"] --> B["Evidence requested"] + B --> C["Artifact gathered"] + C --> D["Review meeting"] + D --> E{Approved?} + E -->|Yes| F["Audit satisfied"] + E -->|No| G["Remediation plan"] + G --> D +""" + + if "THIRD-PARTY" in title_upper or "VENDOR" in title_upper: + return """flowchart TD + A["Vendor onboarding"] --> B["Questionnaire"] + B --> C["Evidence chase"] + C --> D["Risk rating"] + D --> E{Exception?} + E -->|Yes| F["Accepted with notes"] + E -->|No| G["Blocked pending controls"] + F --> H["Renewal cycle"] + G --> H +""" + if title_upper.startswith("LLM01") or "PROMPT INJECTION" in title_upper: return """flowchart TD A["Attacker prompt"] --> B["LLM prompt parser"] @@ -645,6 +907,28 @@ def _render_dave_factor_callout(section: _SourceSection) -> str | None: title_upper = section.title.upper() excerpt = f"{section.title}\n{section.why_it_matters or ''}\n{section.body}".strip() + if "BUSINESS VALUE" in title_upper or "ROI" in title_upper: + return "\n".join( + [ + "> **The Dave Factor:** The ROI model becomes the control, and the control becomes the explanation for why reality must align to the spreadsheet.", + "> **Countermeasure:** Define baseline metrics, instrument time-to-evidence, and set stop conditions for exceptions and manual work.", + ] + ) + if "COMPLIANCE" in title_upper or "AUDIT" in title_upper: + return "\n".join( + [ + "> **The Dave Factor:** Evidence collection becomes the product, and the product becomes a shared drive with strong opinions.", + "> **Countermeasure:** Make evidence machine-generated, time-bounded, and verifiable (with owners and expiry).", + ] + ) + if "THIRD-PARTY" in title_upper: + return "\n".join( + [ + "> **The Dave Factor:** Third-party risk becomes a questionnaire supply chain, where the slowest vendor defines your security posture.", + "> **Countermeasure:** Standardize evidence requests and automate reminders, while enforcing a clear accept/block decision path.", + ] + ) + if title_upper.startswith("LLM01") or "PROMPT INJECTION" in title_upper: return "\n".join( [ @@ -796,6 +1080,7 @@ def _render_section(section: _SourceSection) -> str: title_upper = section.title.upper() is_llm_entry = bool(re.match(r"^LLM\d{2}:", section.title)) llm_subsections = _split_owasp_llm_subsections(section.body) if is_llm_entry else [] + idc_highlights = _extract_idc_highlight_metrics(section.body) if "HIGHLIGHTS" in title_upper else [] if is_llm_entry: risk = section.title.split(":", 1)[1].strip() @@ -827,6 +1112,13 @@ def _render_section(section: _SourceSection) -> str: "We recommend treating it as a routing table: high-severity issues route to workshops; low-severity issues route to \"later.\"", ] ) + elif "BUSINESS VALUE" in title_upper and "HIGHLIGHTS" in title_upper: + paragraphs.extend( + [ + "We are aligned with a highlights section because it provides immediate executive readability and a pre-approved conclusion.", + "In practice, these figures become a routing protocol: anything measurable routes to a dashboard; anything hard routes to a committee.", + ] + ) elif title_upper == "LETTER FROM THE PROJECT LEADS": paragraphs.extend( [ @@ -862,6 +1154,69 @@ def _render_section(section: _SourceSection) -> str: "From a red-team lens, sponsorship also introduces the soft constraint that critique must remain directionally aligned with goodwill.", ] ) + elif title_upper == "EXECUTIVE SUMMARY": + paragraphs.extend( + [ + "Executive summaries are the part of the document that most survives contact with calendars.", + "The operational risk is that the summary becomes the plan, and the plan becomes a series of alignment sessions that produce excellent artifacts and limited change.", + ] + ) + elif "SITUATION OVERVIEW" in title_upper: + paragraphs.extend( + [ + "The situation is always complex, which is helpful because complex situations justify complex tooling and extended stakeholder engagement.", + "The risk is not that the threat landscape is overstated; it’s that the resulting program becomes a comfort narrative rather than an enforceable workflow.", + ] + ) + elif "VANTA OVERVIEW" in title_upper: + paragraphs.extend( + [ + "A platform overview is where capabilities are described in a way that is both broadly true and pleasantly non-committal about integration effort.", + "The Dave move is to treat \"connectors\" as a strategy; the counter-move is to treat connectors as a backlog with owners and deadlines.", + ] + ) + elif "QUANTIFIED BENEFITS" in title_upper or ("BUSINESS VALUE" in title_upper and "BENEFIT" in title_upper): + paragraphs.extend( + [ + "Quantified benefits are useful because they translate operational work into finance-friendly nouns.", + "They also create a second, unofficial control plane: the ROI narrative becomes the reason to keep going even when the implementation is late and messy.", + ] + ) + elif "SECURITY TEAM" in title_upper or "SECURITY REVIEW" in title_upper: + paragraphs.extend( + [ + "Security team efficiency is a legitimate goal, especially when review queues become the organizational truth serum.", + "The risk is that throughput improvements are claimed without defining what “review complete” means or what evidence proves it.", + ] + ) + elif "IT MANAGEMENT" in title_upper: + paragraphs.extend( + [ + "IT management benefits usually arrive through integration: fewer manual checks, fewer tickets, and fewer surprise spreadsheets.", + "The Dave failure mode is that integrations drift into \"phase two\"; the mitigation is to make the integration itself the deliverable.", + ] + ) + elif "OPERATIONAL EFFICIENCIES" in title_upper: + paragraphs.extend( + [ + "Operational efficiency is the safest kind of outcome because it is simultaneously measurable and disputable.", + "The red-team posture is to demand explicit baselines and to treat exceptions as spend events with expiry dates.", + ] + ) + elif title_upper == "CONCLUSION": + paragraphs.extend( + [ + "Conclusions are where the narrative becomes executable: either as a procurement decision or as a roadmap item.", + "If we want this to be operational, we should convert the conclusion into owners, gates, and stop conditions rather than adjectives.", + ] + ) + elif title_upper.startswith("APPENDIX"): + paragraphs.extend( + [ + "Appendices are where the methodology lives, which is convenient because methodology can be both rigorous and unread.", + "If the business case matters, the appendix should be treated as a test: what assumptions must be true for the numbers to hold?", + ] + ) elif "PULL REQUEST" in title_upper: paragraphs.extend( [ @@ -931,6 +1286,16 @@ def _render_section(section: _SourceSection) -> str: out.extend(paragraphs) + if idc_highlights: + out.extend( + [ + "", + "### Stated Highlights (extracted metrics)", + "", + *[f"- {value}: {desc}" for value, desc in idc_highlights[:12]], + ] + ) + callout = _render_dave_factor_callout(section) if callout: out.extend(["", callout]) diff --git a/style_bibles/IF.DAVE.BIBLE.md b/style_bibles/IF.DAVE.BIBLE.md index 8d349ed..b544fc3 100644 --- a/style_bibles/IF.DAVE.BIBLE.md +++ b/style_bibles/IF.DAVE.BIBLE.md @@ -61,6 +61,8 @@ Aim for **fast verification** by a skeptical reader (engineers, auditors, legal) - Example: `> Quote… (Source: p. 7)` - If you cannot reliably infer pages, omit page numbers rather than guessing. - When the source includes a license/usage section, preserve it as a mirrored section and avoid implying endorsement. +- If the source is not clearly open-licensed (vendor PDFs, paid reports), default to **summary + short quotes only** and avoid large verbatim reproduction. +- Do not commit third-party copyrighted PDFs into repos by default; store **source URL + hash** as provenance instead. ---