# Indexes

This project builds small, fast indexes so the runtime can answer questions like:

* “Which rules mention *en dash*?”
* “Which rules cite *CMOS18 §6.88 p412*?”
* “Which rules apply to `postrender` QA?”
* “What rules are overridden by the `print_pdf` profile?”

Indexes are derived artifacts (rebuildable) and should not be hand-edited.

## Indexes the app will build

### 1) keyword → rule IDs

**Purpose:** fast search/autocomplete and lint explanations.

* **Path:** `spec/indexes/keywords_all.json` and per-category deltas:
  * `spec/indexes/keywords_<category>.json`
* **Format (JSON):**
  * keys: normalized keyword (lowercased)
  * values: array of rule IDs sorted stable (lexicographic)

Normalization (default):

* Unicode NFKC
* lowercase
* collapse whitespace
* strip surrounding punctuation

### 2) source_ref → rule IDs

**Purpose:** audit trail back to references without embedding book text.

* **Path:** `spec/indexes/source_refs_all.json` and per-category deltas:
  * `spec/indexes/source_refs_<category>.json`
* **Format (JSON):**
  * keys: exact `source_ref` pointer strings
  * values: array of rule IDs

### 3) category → rule IDs

**Purpose:** batch reporting, extraction coverage, profile scoping.

* **Path:** `spec/indexes/category.json`
* **Format (JSON):**
  * keys: category name
  * values: array of rule IDs

### 4) enforcement → rule IDs

**Purpose:** quickly decide which engine (lint/typeset/postrender/manual) handles which rules.

* **Path:** `spec/indexes/enforcement.json`

### 5) profile overrides

**Purpose:** allow profiles to override severity or token parameters without editing rules.

* **Path:** `spec/indexes/profile_overrides.json`
* **Format (JSON):**
  * per profile: list of override objects (selector + action)
  * selectors may match category, tags, applies_to, or explicit rule IDs

## Build guarantees

* Index builds are deterministic from:
  * `spec/rules/**.ndjson`
  * `spec/profiles/*.yaml`
  * `spec/manifest.yaml`

* The runtime must treat indexes as **cacheable**:
  * if index missing/outdated → rebuild or fallback to scanning rule files.