From c0512ec6433294ac0acd03c567a74c271c900ca9 Mon Sep 17 00:00:00 2001 From: ggq-admin Date: Sun, 19 Oct 2025 01:23:40 +0200 Subject: [PATCH] docs: Add architecture summary Comprehensive overview of: - Core architectural decisions - Schema design rationale - Technology stack - Scaling strategy - Expert panel consensus - Success criteria Ready for implementation phase. --- ARCHITECTURE-SUMMARY.md | 255 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 255 insertions(+) create mode 100644 ARCHITECTURE-SUMMARY.md diff --git a/ARCHITECTURE-SUMMARY.md b/ARCHITECTURE-SUMMARY.md new file mode 100644 index 0000000..761d9aa --- /dev/null +++ b/ARCHITECTURE-SUMMARY.md @@ -0,0 +1,255 @@ +# NaviDocs Architecture Summary + +**Status:** Design Complete โœ… +**Next Phase:** Implementation +**Created:** 2025-01-19 + +--- + +## ๐Ÿ“‹ What We've Built + +A **future-proof, multi-vertical document management platform** for boat owners, marinas, and property managers. + +--- + +## ๐ŸŽฏ Core Architectural Decisions + +### 1. **Hybrid Database Strategy** +- **SQLite** for transactional data (users, boats, documents) +- **Meilisearch** for search-optimized queries +- Migration path to PostgreSQL when scaling requires it + +**Why:** Search-first architecture. Every query is a search query, not a SQL JOIN. + +### 2. **Multi-Vertical Schema** +- Designed for boats (v1.0) +- Expandable to marinas, properties, HOAs (v1.1+) +- Unified hierarchy: Organization โ†’ Entity โ†’ Sub-Entity โ†’ Component โ†’ Document + +**Why:** Patterns are identical across verticals. Build once, reuse everywhere. + +### 3. **Security-First** +- Tenant tokens (NOT master keys in client) +- Background queue for CPU-intensive OCR +- File safety pipeline (qpdf + ClamAV + validation) +- Rate limiting on all endpoints + +**Why:** Expert panel identified these as production killers if skipped. + +### 4. **Offline-First PWA** +- Service worker caches critical manuals +- Works 20 miles offshore with no cell signal +- IndexedDB for local state + +**Why:** Boat owners need manuals when engines fail at sea. + +### 5. **Synonym-Rich Search** +- 40+ boat terminology synonyms ("bilge" โ†’ "sump pump") +- Typo tolerance (Meilisearch built-in) +- Future: semantic search with embeddings + +**Why:** Boat owners don't know technical jargon. + +--- + +## ๐Ÿ“Š Schema Design + +### SQLite Tables (13 tables) +``` +Core: users, organizations, user_organizations +Entities: entities, sub_entities, components +Documents: documents, document_pages, ocr_jobs +Permissions: permissions, document_shares +UX: bookmarks +``` + +### Meilisearch Index +``` +Index: navidocs-pages +Documents: One per PDF page +Searchable: title, text, systems, categories, tags +Filterable: boatId, userId, make, model, year, etc +Synonyms: 40+ boat terminology mappings +``` + +**Key Insight:** Each PDF page is a separate Meilisearch document. No JOINs needed. + +--- + +## ๐Ÿš€ Technology Stack + +### Backend +- Node.js v20 (Express or Fastify) +- SQLite3 (better-sqlite3) +- Meilisearch v1.6.2 +- BullMQ (or SQLite-based queue fallback) +- Tesseract.js (OCR) +- qpdf + ClamAV (file safety) + +### Frontend +- Vue 3 + Vite +- Tailwind CSS +- PDF.js (document viewer) +- Meilisearch-inspired design (clean, professional, SVG icons) +- PWA (offline support) + +### Security +- Helmet (CSP, HSTS headers) +- express-rate-limit +- JWT auth +- Tenant tokens (Meilisearch) + +--- + +## ๐ŸŽจ Design Philosophy + +**Inspired by:** https://www.meilisearch.com/ +**Visual Language:** +- Clean, spacious layouts +- Professional SVG icons (no emojis) +- Muted color palette (grays, blues, whites) +- Typography: SF Pro / Inter / Roboto +- Expensive, grown-up aesthetic + +**NOT:** Playful, colorful, emoji-heavy consumer apps + +--- + +## ๐Ÿ“ˆ Scaling Strategy + +### Day 1 (MVP) +- SQLite (< 100k documents) +- Single Meilisearch instance +- Single-tenant (one user, multiple boats) + +### Month 6 (Growth) +- Still SQLite (works up to 1M documents) +- Meilisearch cluster (if > 10k searches/day) +- Multi-tenant (organizations) + +### Year 1 (Scale) +- Migrate to PostgreSQL +- Add pgvector for semantic search +- Cloudflare CDN for PDFs +- Separate OCR worker VPS + +--- + +## ๐Ÿ”’ Security Hardening Checklist + +- [ ] Never expose Meilisearch master key to client +- [ ] Use tenant tokens (1-hour TTL) +- [ ] Background queue for OCR (prevent CPU spikes) +- [ ] File safety: extension + magic byte + qpdf + ClamAV +- [ ] Rate limiting: 10 uploads/hour, 30 searches/minute +- [ ] Helmet security headers (CSP, HSTS) +- [ ] HTTPS only (no HTTP) +- [ ] Rotate API keys monthly + +--- + +## ๐Ÿงช Testing Strategy + +### Unit Tests (Jest/Vitest) +- Database models +- Search service +- OCR pipeline +- File validation + +### Integration Tests +- Upload โ†’ OCR โ†’ Index โ†’ Search +- User auth flow +- Permission checks + +### E2E Tests (Playwright) +- Upload PDF +- Search and view results +- Offline mode +- Mobile responsive + +--- + +## ๐Ÿ“ฆ Expert Panel Consensus + +**47 minutes of debate:** +- Database Architect: "Future-proof for Postgres migration" +- Search Engineer: "Search-first, not relational-first" +- DevOps: "Append-only schema, no breaking changes" +- Data Scientist: "Embedding field from day 1 (even if null)" +- Backend Lead: "Hybrid approach wins" + +**Result:** SQLite + Meilisearch hybrid, designed for Postgres migration. + +**38 minutes with boating experts:** +- Marine Surveyor: "Emergency scenarios = offline required" +- Marina Manager: "Shared component library (10 boats, same Volvo engine)" +- Yacht Broker: "Resale value = complete documentation history" + +**Result:** Offline PWA, shared manuals, service tracking. + +**29 minutes with property/marina experts:** +- Multi-entity hierarchy (XYZ Corp โ†’ Marina A โ†’ Dock 1 โ†’ Slip 42) +- Compliance tracking (inspections, certifications) +- Geo-search for physical assets + +**Result:** Schema supports vertical expansion. + +--- + +## ๐Ÿ“‚ Repository Structure + +``` +navidocs/ +โ”œโ”€โ”€ docs/ +โ”‚ โ”œโ”€โ”€ debates/ +โ”‚ โ”‚ โ””โ”€โ”€ 01-schema-and-vertical-analysis.md +โ”‚ โ”œโ”€โ”€ architecture/ +โ”‚ โ”‚ โ”œโ”€โ”€ database-schema.sql +โ”‚ โ”‚ โ”œโ”€โ”€ meilisearch-config.json +โ”‚ โ”‚ โ””โ”€โ”€ hardened-production-guide.md +โ”‚ โ””โ”€โ”€ roadmap/ +โ”‚ โ”œโ”€โ”€ v1.0-mvp.md +โ”‚ โ””โ”€โ”€ 2-week-launch-plan.md +โ”œโ”€โ”€ server/ (TBD: Extract from lilian1) +โ”œโ”€โ”€ client/ (TBD: Build from scratch) +โ”œโ”€โ”€ README.md +โ””โ”€โ”€ ARCHITECTURE-SUMMARY.md (this file) +``` + +--- + +## ๐ŸŽฏ Success Criteria (MVP Launch) + +**Technical:** +- [ ] Upload PDF โ†’ searchable in < 5 minutes +- [ ] Search latency < 100ms +- [ ] Synonym search works ("bilge" finds "sump") +- [ ] All fields display correctly +- [ ] Offline mode functional + +**Security:** +- [ ] Zero master keys in client code +- [ ] Tenant tokens expire after 1 hour +- [ ] All PDFs sanitized +- [ ] Rate limits prevent abuse + +**User Experience:** +- [ ] Upload success rate > 95% +- [ ] Search relevance 4/5+ rating +- [ ] Mobile usable without zooming + +--- + +## ๐Ÿšฆ Next Steps + +1. **Analyze lilian1** - Extract clean code, identify Frank-AI junk +2. **Bootstrap NaviDocs** - Create server/ and client/ structure +3. **Implement core features** - Upload, OCR, Search +4. **Playwright tests** - E2E coverage +5. **Local deployment** - Test with real boat manuals +6. **Beta launch** - 5-10 boat owners + +--- + +**The war council has spoken. Time to build.** +