hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md

193 lines
7.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# InfraFabric Dossier — DataDriven Technical Report (Microlab) v1.0
**Subject:** Measured characteristics of the IF.TTT trace pipeline (microlab)
**Protocol:** IF.TTT.dossier.metrics
**Status:** TECHNICAL REPORT (BORING ON PURPOSE)
**Date:** 2025-12-22
**Citation:** `if://doc/INFRAFABRIC_DOSSIER_DATA_DRIVEN/v1.0`
**Author:** Danny Stocker (`ds@infrafabric.io`)
**Web:** https://infrafabric.io
This edition intentionally avoids narrative framing. It reports what can be measured, what cannot, and what is planned.
**Canonical (static mirror):** `https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md`
**Repo source:** `https://git.infrafabric.io/danny/hosted/src/branch/main/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md`
**SHA256 (sidecar):** `https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md.sha256`
**Verify:** `curl -fsSLO 'https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md' -fsSLO 'https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md.sha256' && sha256sum -c DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md.sha256`
---
## 1) Scope
This report covers:
- the IF.emotion evidence bundle format (tar.gz + manifest)
- measured latencies recorded in trace events (`auth_ms`, `rag_ms`, `llm_ms`) for a small sample of published bundles
- storage footprint of published bundles
This report does **not** claim:
- production scalability
- clinical validity
- “truth” of model outputs (only provenance of what the system did)
---
## 2) Data Sources (Public, Verifiable)
All artifacts referenced here are publicly downloadable and hash-verifiable.
Static mirror (preferred): `https://infrafabric.io/static/hosted/`
Source repo: `https://git.infrafabric.io/danny/hosted`
### 2.1 Evidence bundles used in this report
| Trace | Bundle | SHA256 sidecar |
|---|---|---|
| `016cca78-6f9d-4ffe-aec0-99792d383ca1` | `https://infrafabric.io/static/hosted/emo_trace_payload_016cca78-6f9d-4ffe-aec0-99792d383ca1.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_016cca78-6f9d-4ffe-aec0-99792d383ca1.tar.gz.sha256` |
| `0642c357-7f8d-4eb5-9643-1992e7ee14a9` | `https://infrafabric.io/static/hosted/emo_trace_payload_0642c357-7f8d-4eb5-9643-1992e7ee14a9.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_0642c357-7f8d-4eb5-9643-1992e7ee14a9.tar.gz.sha256` |
| `09aad3e1-f420-451e-a189-e86f68073dc0` | `https://infrafabric.io/static/hosted/emo_trace_payload_09aad3e1-f420-451e-a189-e86f68073dc0.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_09aad3e1-f420-451e-a189-e86f68073dc0.tar.gz.sha256` |
| `96700e8e-6a83-445e-86f7-06905c500146` | `https://infrafabric.io/static/hosted/emo_trace_payload_96700e8e-6a83-445e-86f7-06905c500146.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_96700e8e-6a83-445e-86f7-06905c500146.tar.gz.sha256` |
### 2.2 Verification command (bundle transport integrity)
```bash
curl -fsSLO '<BUNDLE_URL>' -fsSLO '<BUNDLE_URL>.sha256' && sha256sum -c '<BUNDLE_FILENAME>.sha256'
```
### 2.3 Verifier tool
- Static: `https://infrafabric.io/static/hosted/iftrace.py`
- Repo: `https://git.infrafabric.io/danny/hosted/raw/branch/main/iftrace.py`
Run:
```bash
python3 iftrace.py verify '<BUNDLE_FILENAME>.tar.gz'
```
---
## 3) Measurement Method (How Numbers Are Obtained)
For each bundle:
1. extract `payload/trace_events.jsonl`
2. read per-event `event.data`:
- `request_received.data.auth_ms`
- `retrieval_done.data.rag_ms` (when present)
- `model_done.data.llm_ms` (when present)
3. treat these values as **self-reported microlab timings** (they are not externally attested)
Key point: even if the numbers are not “audited”, the bundle makes them *replayable* and makes the presence/absence of events *auditable*.
---
## 4) Architecture Boundary (Where Guarantees Begin)
```mermaid
flowchart TB
U[User] -->|HTTPS| E[Edge]
E --> B[Backend Witness Boundary]
B --> R[Retrieval]
B --> P[Prompt]
B --> M[Model]
B --> X[Postprocess]
B --> T1["REQ_SEEN ledger<br/>(hourly JSONL)"]
B --> T2["Trace events<br/>(hash chain JSONL)"]
B --> T3["Signed summary<br/>(output hash + head attestation)"]
T1 --> H["Signed Merkle head<br/>(per hour)"]
T2 --> S["Trace head<br/>(event_hash)"]
H --> BUNDLE["Evidence bundle<br/>(tar.gz + manifest)"]
S --> BUNDLE
T3 --> BUNDLE
BUNDLE --> MIRROR["Static mirror<br/>(public download)"]
```
Interpretation: integrity begins at the backend witness boundary; completeness is meaningful at and after that boundary until edge witnessing exists.
---
## 5) Observed Sample Metrics (N=4 Bundles)
### 5.1 Bundle sizes (storage footprint)
| Trace | Outcome | Bundle size |
|---|---|---:|
| `016cca78-6f9d-4ffe-aec0-99792d383ca1` | full trace (retrieval + model) | 82,010 bytes |
| `0642c357-7f8d-4eb5-9643-1992e7ee14a9` | full trace (retrieval + model) | 5,515 bytes |
| `09aad3e1-f420-451e-a189-e86f68073dc0` | full trace (retrieval + model) | 71,817 bytes |
| `96700e8e-6a83-445e-86f7-06905c500146` | guard short-circuit (no retrieval/model) | 82,410 bytes |
Notes:
- N is small; treat these as indicative examples, not stable distributions.
- The short-circuit bundle being large indicates that “blocked paths” can still carry substantial evidence payloads (depending on included artifacts).
### 5.2 Latency fields recorded in trace events
| Trace | `auth_ms` | `rag_ms` | `llm_ms` | `retrieved_count` | Notes |
|---|---:|---:|---:|---:|---|
| `016cca78-6f9d-4ffe-aec0-99792d383ca1` | 3 | 1107 | 10550 | 1 | request → retrieval → model |
| `0642c357-7f8d-4eb5-9643-1992e7ee14a9` | 4 | 383 | 12287 | 2 | request → retrieval → model |
| `09aad3e1-f420-451e-a189-e86f68073dc0` | 4 | 1377 | 17879 | 2 | request → retrieval → model |
| `96700e8e-6a83-445e-86f7-06905c500146` | 5 | (n/a) | (n/a) | (n/a) | guard short-circuit reason: `self_harm_signal` |
Derived from the sample (successful traces only; N=3):
- `rag_ms`: min 383, median 1107, max 1377
- `llm_ms`: min 10550, median 12287, max 17879
---
## 6) What This Sample Demonstrates
From the published bundles, an external reviewer can verify (cryptographically / structurally):
- the chain-of-custody wiring exists (hash chain + signed summary + inclusion proof)
- “short-circuit” decisions are still witnessed and included in the trace log (a critical property for dispute resolution)
An external reviewer cannot verify (from the bundle alone):
- the correctness of the output in the outside world
- the completeness of events *before* the backend witness boundary (edge drops / load balancer denials)
- the integrity of local keys (key custody, rotation, compromise response)
---
## 7) Engineering Roadmap (Metrics-Driven)
This roadmap is written as measurable deliverables.
### 7.1 Key management hardening
- Add explicit key separation: `IF_REQ_SEEN_HMAC_KEY` must not fall back to signing secret.
- Document key rotation procedure and compromise response.
- Optional: HSM/TPM signing for production deployments.
### 7.2 Edge witnessing (completeness boundary expansion)
- Implement edge request attempt ledger (cryptographic, not just web logs).
- Publish a “completeness SLO” (e.g., signed head anchored every N minutes or every N requests).
### 7.3 Time and truncation defenses
- Add monotonic counters (per trace and per ledger hour).
- Anchor chain heads periodically to reduce tail truncation risk window.
### 7.4 Scale testing harness
- Replay harness that runs the verifier across 1000+ generated traces (including failures) and publishes summary metrics.
---
## 8) Companion Documents
- Full dossier (uncut): `DANNY_STOCKER_INFRAFABRIC_DOSSIER.md`
- IF.emotion trace protocol (detailed, with walkthrough): `https://infrafabric.io/static/hosted/IF_EMOTION_DEBUGGING_TRACE_WHITEPAPER_v3.3_STYLED.md`
- Evidence bundles directory: `https://infrafabric.io/static/hosted/`