hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md

# InfraFabric Dossier — Data‑Driven Technical Report (Microlab) v1.0

**Subject:** Measured characteristics of the IF.TTT trace pipeline (microlab)
**Protocol:** IF.TTT.dossier.metrics
**Status:** TECHNICAL REPORT (BORING ON PURPOSE)
**Date:** 2025-12-22
**Citation:** `if://doc/INFRAFABRIC_DOSSIER_DATA_DRIVEN/v1.0`
**Author:** Danny Stocker (`ds@infrafabric.io`)
**Web:** https://infrafabric.io

This edition intentionally avoids narrative framing. It reports what can be measured, what cannot, and what is planned.

**Canonical (static mirror):** `https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md`
**Repo source:** `https://git.infrafabric.io/danny/hosted/src/branch/main/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md`
**SHA256 (sidecar):** `https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md.sha256`
**Verify:** `curl -fsSLO 'https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md' -fsSLO 'https://infrafabric.io/static/hosted/DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md.sha256' && sha256sum -c DANNY_STOCKER_INFRAFABRIC_DOSSIER_DATA_DRIVEN_EDITION.md.sha256`

---

## 1) Scope

This report covers:

- the IF.emotion evidence bundle format (tar.gz + manifest)
- measured latencies recorded in trace events (`auth_ms`, `rag_ms`, `llm_ms`) for a small sample of published bundles
- storage footprint of published bundles

This report does **not** claim:

- production scalability
- clinical validity
- “truth” of model outputs (only provenance of what the system did)

---

## 2) Data Sources (Public, Verifiable)

All artifacts referenced here are publicly downloadable and hash-verifiable.

Static mirror (preferred): `https://infrafabric.io/static/hosted/`
Source repo: `https://git.infrafabric.io/danny/hosted`

### 2.1 Evidence bundles used in this report

| Trace | Bundle | SHA256 sidecar |
|---|---|---|
| `016cca78-6f9d-4ffe-aec0-99792d383ca1` | `https://infrafabric.io/static/hosted/emo_trace_payload_016cca78-6f9d-4ffe-aec0-99792d383ca1.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_016cca78-6f9d-4ffe-aec0-99792d383ca1.tar.gz.sha256` |
| `0642c357-7f8d-4eb5-9643-1992e7ee14a9` | `https://infrafabric.io/static/hosted/emo_trace_payload_0642c357-7f8d-4eb5-9643-1992e7ee14a9.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_0642c357-7f8d-4eb5-9643-1992e7ee14a9.tar.gz.sha256` |
| `09aad3e1-f420-451e-a189-e86f68073dc0` | `https://infrafabric.io/static/hosted/emo_trace_payload_09aad3e1-f420-451e-a189-e86f68073dc0.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_09aad3e1-f420-451e-a189-e86f68073dc0.tar.gz.sha256` |
| `96700e8e-6a83-445e-86f7-06905c500146` | `https://infrafabric.io/static/hosted/emo_trace_payload_96700e8e-6a83-445e-86f7-06905c500146.tar.gz` | `https://infrafabric.io/static/hosted/emo_trace_payload_96700e8e-6a83-445e-86f7-06905c500146.tar.gz.sha256` |

### 2.2 Verification command (bundle transport integrity)

```bash
curl -fsSLO '<BUNDLE_URL>' -fsSLO '<BUNDLE_URL>.sha256' && sha256sum -c '<BUNDLE_FILENAME>.sha256'
```

### 2.3 Verifier tool

- Static: `https://infrafabric.io/static/hosted/iftrace.py`
- Repo: `https://git.infrafabric.io/danny/hosted/raw/branch/main/iftrace.py`

Run:

```bash
python3 iftrace.py verify '<BUNDLE_FILENAME>.tar.gz'
```

---

## 3) Measurement Method (How Numbers Are Obtained)

For each bundle:

1. extract `payload/trace_events.jsonl`
2. read per-event `event.data`:
   - `request_received.data.auth_ms`
   - `retrieval_done.data.rag_ms` (when present)
   - `model_done.data.llm_ms` (when present)
3. treat these values as **self-reported microlab timings** (they are not externally attested)

Key point: even if the numbers are not “audited”, the bundle makes them *replayable* and makes the presence/absence of events *auditable*.

---

## 4) Architecture Boundary (Where Guarantees Begin)

```mermaid
flowchart TB
  U[User] -->|HTTPS| E[Edge]
  E --> B[Backend Witness Boundary]

  B --> R[Retrieval]
  B --> P[Prompt]
  B --> M[Model]
  B --> X[Postprocess]

  B --> T1["REQ_SEEN ledger<br/>(hourly JSONL)"]
  B --> T2["Trace events<br/>(hash chain JSONL)"]
  B --> T3["Signed summary<br/>(output hash + head attestation)"]

  T1 --> H["Signed Merkle head<br/>(per hour)"]
  T2 --> S["Trace head<br/>(event_hash)"]

  H --> BUNDLE["Evidence bundle<br/>(tar.gz + manifest)"]
  S --> BUNDLE
  T3 --> BUNDLE

  BUNDLE --> MIRROR["Static mirror<br/>(public download)"]
```

Interpretation: integrity begins at the backend witness boundary; completeness is meaningful at and after that boundary until edge witnessing exists.

---

## 5) Observed Sample Metrics (N=4 Bundles)

### 5.1 Bundle sizes (storage footprint)

| Trace | Outcome | Bundle size |
|---|---|---:|
| `016cca78-6f9d-4ffe-aec0-99792d383ca1` | full trace (retrieval + model) | 82,010 bytes |
| `0642c357-7f8d-4eb5-9643-1992e7ee14a9` | full trace (retrieval + model) | 5,515 bytes |
| `09aad3e1-f420-451e-a189-e86f68073dc0` | full trace (retrieval + model) | 71,817 bytes |
| `96700e8e-6a83-445e-86f7-06905c500146` | guard short-circuit (no retrieval/model) | 82,410 bytes |

Notes:

- N is small; treat these as indicative examples, not stable distributions.
- The short-circuit bundle being large indicates that “blocked paths” can still carry substantial evidence payloads (depending on included artifacts).

### 5.2 Latency fields recorded in trace events

| Trace | `auth_ms` | `rag_ms` | `llm_ms` | `retrieved_count` | Notes |
|---|---:|---:|---:|---:|---|
| `016cca78-6f9d-4ffe-aec0-99792d383ca1` | 3 | 1107 | 10550 | 1 | request → retrieval → model |
| `0642c357-7f8d-4eb5-9643-1992e7ee14a9` | 4 | 383 | 12287 | 2 | request → retrieval → model |
| `09aad3e1-f420-451e-a189-e86f68073dc0` | 4 | 1377 | 17879 | 2 | request → retrieval → model |
| `96700e8e-6a83-445e-86f7-06905c500146` | 5 | (n/a) | (n/a) | (n/a) | guard short-circuit reason: `self_harm_signal` |

Derived from the sample (successful traces only; N=3):

- `rag_ms`: min 383, median 1107, max 1377
- `llm_ms`: min 10550, median 12287, max 17879

---

## 6) What This Sample Demonstrates

From the published bundles, an external reviewer can verify (cryptographically / structurally):

- the chain-of-custody wiring exists (hash chain + signed summary + inclusion proof)
- “short-circuit” decisions are still witnessed and included in the trace log (a critical property for dispute resolution)

An external reviewer cannot verify (from the bundle alone):

- the correctness of the output in the outside world
- the completeness of events *before* the backend witness boundary (edge drops / load balancer denials)
- the integrity of local keys (key custody, rotation, compromise response)

---

## 7) Engineering Roadmap (Metrics-Driven)

This roadmap is written as measurable deliverables.

### 7.1 Key management hardening

- Add explicit key separation: `IF_REQ_SEEN_HMAC_KEY` must not fall back to signing secret.
- Document key rotation procedure and compromise response.
- Optional: HSM/TPM signing for production deployments.

### 7.2 Edge witnessing (completeness boundary expansion)

- Implement edge request attempt ledger (cryptographic, not just web logs).
- Publish a “completeness SLO” (e.g., signed head anchored every N minutes or every N requests).

### 7.3 Time and truncation defenses

- Add monotonic counters (per trace and per ledger hour).
- Anchor chain heads periodically to reduce tail truncation risk window.

### 7.4 Scale testing harness

- Replay harness that runs the verifier across 1000+ generated traces (including failures) and publishes summary metrics.

---

## 8) Companion Documents

- Full dossier (uncut): `DANNY_STOCKER_INFRAFABRIC_DOSSIER.md`
- IF.emotion trace protocol (detailed, with walkthrough): `https://infrafabric.io/static/hosted/IF_EMOTION_DEBUGGING_TRACE_WHITEPAPER_v3.3_STYLED.md`
- Evidence bundles directory: `https://infrafabric.io/static/hosted/`