navidocs/FIX_TOC.md
Danny Stocker 58b344aa31 FINAL: P0 blockers fixed + Joe Trader + ignore binaries
Fixed:
- Price: €800K-€1.5M, Sunseeker added
- Agent 1: Joe Trader persona + actual sale ads research
- Ignored meilisearch binary + data/ (too large for GitHub)
- SESSION_DEBUG_BLOCKERS.md created

Ready for Session 1 launch.

🤖 Generated with Claude Code
2025-11-13 01:29:59 +01:00

28 lines
1.4 KiB
Markdown

# CRITICAL FIX: TOC Extractor
**Problem:** Only extracts 1 corrupted entry. Code tries OCR first (broken), then PDF outline (works but never reached).
**Solution:** In `/home/setup/navidocs/server/services/toc-extractor.js` line ~346:
REPLACE lines 346-390 with:
```javascript
// PRIORITY: Use PDF outline FIRST (Adobe approach)
const doc = db.prepare('SELECT file_path FROM documents WHERE id = ?').get(documentId);
if (doc?.file_path) {
const outlineEntries = await extractPdfOutline(doc.file_path, documentId);
if (outlineEntries?.length > 0) {
db.prepare('DELETE FROM document_toc WHERE document_id = ?').run(documentId);
const insert = db.prepare('INSERT INTO document_toc (id, document_id, title, section_key, page_start, level, parent_id, order_index) VALUES (?, ?, ?, ?, ?, ?, ?, ?)');
for (const entry of outlineEntries) {
insert.run(entry.id, documentId, entry.title, entry.sectionKey || null, entry.pageStart, entry.level, entry.parentId || null, entry.orderIndex);
}
return { success: true, entriesCount: outlineEntries.length, pages: [], message: `Extracted ${outlineEntries.length} entries from PDF outline` };
}
}
// If no outline, fall back to OCR (existing code continues...)
```
Then restart server and run: `curl -X POST http://localhost:8001/api/documents/efb25a15-7d84-4bc3-b070-6bd7dec8d59a/toc/extract`
**Test URL:** http://172.29.75.55:8080/document/efb25a15-7d84-4bc3-b070-6bd7dec8d59a