navidocs/intelligence/session-2/codebase-architecture-map.md
Claude d250dc334e
Session 2: Complete technical architecture from 11 Haiku agents
All 11 agents (S2-H01 through S2-H09 + S2-H03A + S2-H07A) have completed
their technical specifications:

- S2-H01: NaviDocs codebase architecture analysis
- S2-H02: Inventory tracking system (€15K-€50K value recovery)
- S2-H03: Maintenance log & reminder system
- S2-H04: Camera & Home Assistant integration
- S2-H05: Contact management system
- S2-H06: Accounting module & receipt OCR integration
- S2-H07: Impeccable search UX (Meilisearch facets)
- S2-H08: WhatsApp Business API + AI agent integration
- S2-H09: Document versioning with IF.TTT compliance
- S2-H03A: VAT/tax jurisdiction tracking & compliance
- S2-H07A: Multi-calendar system (4 calendar types)

Total: ~15,600 lines of technical specifications
Status: Ready for S2-H10 synthesis (awaiting Session 1 completion)
IF.bus: All inter-agent communications documented
2025-11-13 01:57:25 +00:00

1443 lines
47 KiB
Markdown

# NaviDocs Codebase Architecture Map
**Analysis Date:** 2025-11-13
**Agent:** S2-H01
**Status:** Complete
---
## 1. Database Schema Summary
### Core Entities
The NaviDocs database uses SQLite (v3) with a schema designed for future PostgreSQL migration. All timestamps use Unix epoch (seconds).
#### User Management
```
- users (id: TEXT PRIMARY KEY)
- id: UUID
- email: TEXT UNIQUE
- password_hash: TEXT (bcrypt)
- name: TEXT
- status: TEXT (active, suspended, deleted)
- email_verified: BOOLEAN
- created_at, updated_at: INTEGER
- last_login_at: INTEGER
- failed_login_attempts, locked_until: Security fields
```
#### Organization Structure (Multi-tenant)
```
- organizations (id: TEXT PRIMARY KEY)
- id: UUID
- name: TEXT
- type: TEXT (personal, commercial, hoa)
- created_at, updated_at: INTEGER
- user_organizations (user_id + organization_id PRIMARY KEY)
- role: TEXT (admin, manager, member, viewer)
- joined_at: INTEGER
```
#### Entity Management (Boats, Marinas, Properties)
```
- entities (id: TEXT PRIMARY KEY)
- id: UUID
- organization_id: FK
- user_id: FK (primary owner)
- entity_type: TEXT (boat, marina, condo, yacht-club)
- name: TEXT
Boat-specific:
- make, model, year: TEXT/INTEGER
- hull_id: TEXT
- vessel_type: TEXT (powerboat, sailboat, catamaran, trawler)
- length_feet: INTEGER
Property-specific:
- property_type: TEXT
- address: TEXT
- gps_lat, gps_lon: REAL
- metadata: TEXT (JSON)
- created_at, updated_at: INTEGER
```
#### Hierarchical Component Structure
```
- sub_entities (id: TEXT PRIMARY KEY)
- id: UUID
- entity_id: FK
- name: TEXT (system, dock, unit, facility)
- type: TEXT
- metadata: TEXT (JSON)
- components (id: TEXT PRIMARY KEY)
- id: UUID
- sub_entity_id: FK (optional)
- entity_id: FK (direct link)
- name, manufacturer, model_number, serial_number: TEXT
- install_date, warranty_expires: INTEGER
- metadata: TEXT (JSON)
```
#### Document Management
```
- documents (id: TEXT PRIMARY KEY)
- id: UUID
- organization_id: FK
- entity_id, sub_entity_id, component_id: FK (hierarchical linking)
- uploaded_by: FK (user)
- title, document_type: TEXT
- file_path, file_name, file_size: TEXT/INTEGER
- file_hash: TEXT (SHA256 for deduplication)
- mime_type: TEXT (default: application/pdf)
- page_count: INTEGER
- language: TEXT (default: en)
- status: TEXT (processing, indexed, failed, archived, deleted)
- replaced_by: TEXT (document supersession)
- is_shared: BOOLEAN
- shared_component_id: TEXT (for shared manual library)
- metadata: TEXT (JSON)
- created_at, updated_at: INTEGER
- document_pages (id: TEXT PRIMARY KEY)
- id: UUID (page_<doc_id>_<page_num>)
- document_id: FK
- page_number: INTEGER
- ocr_text: TEXT
- ocr_confidence: REAL (0-1)
- ocr_language: TEXT (default: en)
- ocr_completed_at: INTEGER
- search_indexed_at: INTEGER
- meilisearch_id: TEXT
- section: TEXT (TOC section name)
- section_key: TEXT (normalized key)
- section_order: INTEGER
- metadata: TEXT (JSON - bounding boxes, etc)
- document_images (extracted from PDFs)
- id: UUID
- documentId: FK
- pageNumber: INTEGER
- imageIndex: INTEGER
- imagePath: TEXT
- imageFormat: TEXT (png, jpeg)
- width, height: INTEGER
- position: TEXT (JSON)
- extractedText: TEXT
- textConfidence: REAL
- anchorTextBefore, anchorTextAfter: TEXT
```
#### Background Jobs
```
- ocr_jobs (id: TEXT PRIMARY KEY)
- id: UUID
- document_id: FK
- status: TEXT (pending, processing, completed, failed)
- progress: INTEGER (0-100%)
- error: TEXT
- started_at, completed_at: INTEGER
- created_at: INTEGER
```
#### Permissions & Sharing
```
- permissions (granular access control)
- id: UUID
- resource_type: TEXT (document, entity, organization)
- resource_id: FK
- user_id: FK
- permission: TEXT (read, write, share, delete, admin)
- granted_by, granted_at: FK + INTEGER
- expires_at: INTEGER (optional)
- entity_permissions (entity-level access)
- id: UUID
- user_id, entity_id: FK
- permission_level: TEXT (viewer, editor, manager, admin)
- granted_by, granted_at: FK + INTEGER
- expires_at: INTEGER
- document_shares (simplified document sharing)
- id: UUID
- document_id, shared_by, shared_with: FK
- permission: TEXT (read, write)
- created_at: INTEGER
- refresh_tokens (JWT session management)
- id: UUID
- user_id: FK
- token_hash: TEXT (SHA256)
- device_info, ip_address: TEXT
- expires_at: INTEGER
- revoked: BOOLEAN
- created_at, revoked_at: INTEGER
- password_reset_tokens
- id: UUID
- user_id: FK
- token_hash: TEXT (SHA256)
- expires_at: INTEGER
- used: BOOLEAN
- ip_address: TEXT
- used_at: INTEGER
```
#### User Preferences
```
- bookmarks (quick access)
- id: UUID
- user_id, document_id: FK
- page_id: FK (optional - specific page)
- label: TEXT
- quick_access: BOOLEAN (pin to homepage)
- created_at: INTEGER
```
#### Audit Trail (Optional)
```
- audit_events (not shown in schema but referenced in code)
- Logs all significant operations for compliance
- user_id, event_type, resource_type, resource_id
- status, ip_address, user_agent, metadata
```
#### Settings/Configuration
```
- settings (key-value store)
- key: TEXT PRIMARY KEY
- value: TEXT (JSON)
- description: TEXT
- category: TEXT
```
### Key Indexes
- `idx_entities_org`, `idx_entities_user`, `idx_entities_type`
- `idx_documents_org`, `idx_documents_entity`, `idx_documents_status`, `idx_documents_hash`, `idx_documents_shared`
- `idx_pages_document`, `idx_pages_indexed`
- `idx_jobs_status`, `idx_jobs_document`
- `idx_permissions_user`, `idx_permissions_resource`
- `idx_bookmarks_user`
---
## 2. API Endpoints (Grouped by Feature)
### Authentication Endpoints (`/api/auth`)
**File:** `server/routes/auth.routes.js`
```
POST /api/auth/register
- Input: email, password, name
- Output: userId, email, verificationToken
- Logging: audit.service logs user.register
POST /api/auth/login
- Input: email, password, deviceInfo, ipAddress
- Output: accessToken (JWT), refreshToken, user object
- Auth: None (initial login)
- Side Effects: Updates failed_login_attempts, triggers account lock after 5 failures
POST /api/auth/refresh
- Input: refreshToken
- Output: new accessToken, user object
- Auth: None (token-based)
POST /api/auth/logout
- Input: refreshToken
- Output: success message
- Side Effects: Revokes refresh token
POST /api/auth/logout-all
- Input: None (uses JWT)
- Output: success message
- Side Effects: Revokes all user tokens
- Auth: JWT required
POST /api/auth/password/reset-request
- Input: email
- Output: generic success (doesn't reveal email exists)
- Side Effects: Creates password_reset_tokens entry
POST /api/auth/password/reset
- Input: token, newPassword
- Output: success message
- Side Effects: Updates password, revokes all refresh tokens
POST /api/auth/email/verify
- Input: token
- Output: email, success message
- Side Effects: Sets email_verified = 1
GET /api/auth/me
- Input: None (JWT)
- Output: user object (id, email, name, status, emailVerified, createdAt, lastLoginAt)
- Auth: JWT required
```
### Organization Management (`/api/organizations`)
**File:** `server/routes/organization.routes.js`
```
POST /api/organizations
- Input: name, type (optional), metadata (optional)
- Output: organization object
- Auth: JWT required
GET /api/organizations
- Input: None
- Output: Array of user's organizations with role
- Auth: JWT required
GET /api/organizations/:organizationId
- Input: organizationId in params
- Output: organization details with userRole
- Auth: JWT + requireOrganizationMember
PUT /api/organizations/:organizationId
- Input: name, type, metadata
- Output: updated organization
- Auth: JWT + requireOrganizationRole('manager')
DELETE /api/organizations/:organizationId
- Input: organizationId
- Output: success message with deleted count
- Auth: JWT + requireOrganizationRole('admin')
GET /api/organizations/:organizationId/members
- Input: organizationId
- Output: Array of members with roles
- Auth: JWT + requireOrganizationMember
POST /api/organizations/:organizationId/members
- Input: userId, role (optional)
- Output: success message
- Auth: JWT + requireOrganizationRole('manager')
- Side Effects: Adds or updates user role
DELETE /api/organizations/:organizationId/members/:userId
- Input: organizationId, userId
- Output: success message with removed role
- Auth: JWT + requireOrganizationRole('manager')
GET /api/organizations/:organizationId/stats
- Input: organizationId
- Output: organization statistics (document count, member count, etc)
- Auth: JWT + requireOrganizationMember
```
### Permission Management (`/api/permissions`)
**File:** `server/routes/permission.routes.js` (referenced but not fully reviewed)
```
Expected endpoints:
- POST /api/permissions/grant (grant permission to user)
- DELETE /api/permissions/revoke (revoke permission)
- GET /api/permissions/check (check permission)
```
### Document Management (`/api/documents`)
**File:** `server/routes/documents.js`
```
POST /api/upload
- Input: file (PDF), title, documentType, organizationId, entityId (optional), componentId (optional), subEntityId (optional)
- Output: jobId, documentId, message
- Auth: None (TODO: should be JWT)
- Side Effects:
* Validates file safety (file-safety.service)
* Generates SHA256 hash for deduplication
* Creates documents and ocr_jobs records
* Adds OCR job to BullMQ queue
GET /api/documents
- Input: organizationId, entityId, documentType, status, limit, offset (query params)
- Output: { documents: [], pagination: { total, limit, offset, hasMore } }
- Auth: None (TODO: should verify organization membership)
GET /api/documents/:id
- Input: documentId in params
- Output: Full document metadata + pages array + entity + component info
- Auth: Checks organization membership, document ownership, or share access
- Side Effects: Parses metadata JSON
GET /api/documents/:id/pdf
- Input: documentId
- Output: PDF file stream (inline)
- Auth: Same as GET /api/documents/:id
- Security: Path traversal protection
DELETE /api/documents/:id
- Input: documentId
- Output: success message with document title
- Auth: None (TODO: should verify ownership)
- Side Effects:
* Deletes from Meilisearch index
* Deletes from database (CASCADE deletes document_pages, ocr_jobs)
* Deletes file from filesystem
```
### Upload Routes (`/api/upload`)
**File:** `server/routes/upload.js`
```
POST /api/upload (same as above but dedicated file)
- Multer configuration: 50MB limit, memory storage
- Creates document in processing state
- Queues OCR job via queue.service
```
### Quick OCR Route (`/api/upload/quick-ocr`)
**File:** `server/routes/quick-ocr.js` (referenced but not fully reviewed)
```
Expected endpoint:
- POST /api/upload/quick-ocr (rapid OCR without document creation)
```
### Job Management (`/api/jobs`)
**File:** `server/routes/jobs.js`
```
GET /api/jobs/:id
- Input: jobId
- Output: { jobId, documentId, status, progress, error, startedAt, completedAt, createdAt, document? }
- Auth: None (TODO)
- Status values: pending, processing, completed, failed
- Document info included only if status === completed
GET /api/jobs
- Input: status (optional), limit (default 50), offset (default 0)
- Output: { jobs: [], pagination: { limit, offset } }
- Auth: Filters to current user's jobs
- Status filtering: Only allows pending|processing|completed|failed
```
### Search (`/api/search`)
**File:** `server/routes/search.js`
```
POST /api/search/token
- Input: expiresIn (seconds, default 3600, max 86400)
- Output: { token, expiresAt, indexName, searchUrl, mode }
- Auth: JWT (gets user's organizations)
- Modes: 'tenant' (preferred) or 'search-key' (fallback)
- Side Effects: Generates Meilisearch tenant token with organization filters
POST /api/search
- Input: q (query string), filters? (documentType, entityId, language), limit, offset
- Output: { hits, estimatedTotalHits, query, processingTimeMs, limit, offset }
- Auth: JWT
- Meilisearch filters: userId or organizationId membership
- Additional filters: documentType, entityId, language
GET /api/search/health
- Input: None
- Output: { status, meilisearch: <health_response> }
- Auth: None
```
### Image Management (`/api/images`)
**File:** `server/routes/images.js`
```
GET /api/documents/:id/images
- Input: documentId
- Output: { documentId, imageCount, images: [{ id, pageNumber, imageIndex, format, width, height, position, extractedText, confidence, imageUrl }] }
- Auth: Verifies document access
- Side Effects: Parses position JSON
GET /api/documents/:id/pages/:pageNum/images
- Input: documentId, pageNumber
- Output: { documentId, pageNumber, imageCount, images: [] }
- Auth: Verifies document and page exist
- Validation: pageNumber must be >= 1
GET /api/images/:imageId
- Input: imageId (img_<uuid>_p<page>_<index>_<timestamp> or UUID)
- Output: Image file stream (PNG or JPEG)
- Auth: Verifies document access
- Rate Limiting: 200 requests per minute (more permissive than API)
- Security: Path traversal prevention (normalizes path, checks within /uploads)
```
### Table of Contents (`/api/documents/:documentId/toc`)
**File:** `server/routes/toc.js`
```
GET /api/documents/:documentId/toc
- Input: documentId, format? (flat|tree, default flat)
- Output: { entries: [], format, count }
- Auth: None (TODO)
- Caching: LRU cache (200 max, 30 min TTL)
- Side Effects: Builds tree structure if format=tree
POST /api/documents/:documentId/toc/extract
- Input: documentId
- Output: { success, entriesCount, tocPages: [], message }
- Auth: None (TODO)
- Side Effects:
* Calls extractTocFromDocument (section-extractor.service)
* Invalidates LRU cache entries
```
### Statistics (`/api/stats`)
**File:** `server/routes/stats.js` (referenced but not fully reviewed)
```
Expected endpoints:
- GET /api/stats/organization/:organizationId
- GET /api/stats/documents
- GET /api/stats/search
```
### Settings (`/api/admin/settings`)
**File:** `server/routes/settings.routes.js` (referenced but not fully reviewed)
```
Expected endpoints:
- GET /api/admin/settings (get all settings)
- PUT /api/admin/settings/:key (update setting)
- GET /api/settings/public/app (public app settings - no auth)
```
### Health Check
```
GET /health
- Output: { status, timestamp, uptime }
- Auth: None
```
---
## 3. Service Layer Architecture
### Authentication Service
**File:** `server/services/auth.service.js`
**Key Functions:**
- `register(email, password, name)` - User registration with bcrypt hashing (12 rounds)
- `login(email, password, deviceInfo, ipAddress)` - JWT + refresh token generation
- `refreshAccessToken(refreshToken)` - Generate new JWT from refresh token
- `revokeRefreshToken(refreshToken)` - Revoke single token (logout)
- `revokeAllUserTokens(userId)` - Logout all devices
- `requestPasswordReset(email, ipAddress)` - Generate reset token
- `resetPassword(token, newPassword)` - Validate token and update password
- `verifyEmail(token)` - Mark email as verified
- `getUserById(userId)` - Fetch user details
- `verifyAccessToken(token)` - Validate JWT
**Token Management:**
- JWT Access Token: `expiresIn` from env (default 15m)
- Refresh Token: 7 days in seconds (604800)
- Both stored with bcrypt hashing (for refresh tokens)
- JWT Secret: `process.env.JWT_SECRET` (must change in production)
**Security Features:**
- Password minimum 8 characters
- Account lockout after 5 failed login attempts (15 min lock)
- Refresh token revocation on password reset
- Email verification token support
### Authorization Service
**File:** `server/services/authorization.service.js`
**Key Functions:**
- `grantEntityPermission(userId, entityId, permissionLevel, grantedBy, expiresAt)` - Grant entity access
- `revokeEntityPermission(userId, entityId, revokedBy)` - Revoke entity access
- `checkEntityPermission(userId, entityId, minimumPermission)` - Check if user has permission
- `getUserEntityPermissions(userId, options)` - Get all user's entity permissions
- `getEntityPermissions(entityId, options)` - Get all entity's permissions
- `addOrganizationMember(userId, organizationId, role, addedBy)` - Add to organization
- `removeOrganizationMember(userId, organizationId, removedBy)` - Remove from organization
- `checkOrganizationMembership(userId, organizationId, minimumRole)` - Check membership
- `getOrganizationMembers(organizationId)` - List org members
- `getUserOrganizations(userId)` - Get user's organizations
- `cleanupExpiredPermissions()` - Cleanup task
**Permission Hierarchy:**
```
Entity Permissions: viewer (0) < editor (1) < manager (2) < admin (3)
Organization Roles: viewer (0) < member (1) < manager (2) < admin (3)
```
**Audit Integration:**
- All permission grants/revokes logged via `logAuditEvent()`
### Organization Service
**File:** `server/services/organization.service.js` (referenced but not fully reviewed)
**Expected Functions:**
- `createOrganization(name, type, metadata, createdBy)`
- `updateOrganization(organizationId, name, type, metadata, updatedBy)`
- `deleteOrganization(organizationId, deletedBy)`
- `getOrganizationById(organizationId)`
- `getOrganizationStats(organizationId)`
### Search Service (Meilisearch Integration)
**File:** `server/services/search.js`
**Key Functions:**
- `indexDocumentPage(pageId, documentId, pageNumber, text, confidence)` - Index page in Meilisearch
- `generateTenantToken(userId, organizationIds, expiresIn)` - Generate tenant-scoped token
**Meilisearch Index:**
- Index name: `navidocs-pages` (env configurable)
- Searchable attributes: ocr text, metadata
- Filtering: organizationId, userId, documentType, entityId, language
- Document structure:
```
{
id: string (unique page ID),
docId: string (document UUID),
pageNumber: integer,
organizationId: string,
userId: string,
documentType: string,
text: string (OCR content),
language: string,
ocrConfidence: number,
createdAt: integer,
updatedAt: integer
}
```
**Tenant Token Support:**
- Scoped search to user's organizations
- Expiration support (max 24 hours)
- Fallback to search API key if tenant token fails
### Queue Service (BullMQ)
**File:** `server/services/queue.js`
**Key Functions:**
- `getOcrQueue()` - Get singleton queue instance
- `addOcrJob(documentId, jobId, data)` - Add OCR job to queue
- `getJobStatus(jobId)` - Get BullMQ job status
- `closeQueue()` - Graceful shutdown
**Queue Configuration:**
- Redis connection: `REDIS_HOST` (default 127.0.0.1), `REDIS_PORT` (default 6379)
- Queue name: `ocr-processing`
- Job retry: 3 attempts with exponential backoff (2s base)
- Cleanup: Complete jobs kept 24h, failed jobs kept 7 days
- Job options: priority support
**Job Data Structure:**
```
{
documentId: string,
jobId: string,
filePath: string,
fileName: string,
organizationId: string,
userId: string,
priority: number (optional)
}
```
### OCR Service
**File:** `server/services/ocr.js` (referenced)
**Expected Functions:**
- `extractTextFromImage(imagePath, language)` - Tesseract.js OCR on images
- `cleanOCRText(text)` - Clean and normalize OCR output
### OCR Hybrid Service
**File:** `server/services/ocr-hybrid.js` (referenced)
**Expected Functions:**
- `extractTextFromPDF(filePath, options)` - Extract text from PDF with progress callback
- Returns: `[{ pageNumber, text, confidence, error }]`
### OCR Google Vision Service
**File:** `server/services/ocr-google-vision.js` (referenced)
**Expected Functions:**
- Alternative OCR provider (Google Cloud Vision)
### OCR Client Service
**File:** `server/services/ocr-client.js` (referenced)
**Expected Functions:**
- Client-side OCR coordination
### Section Extractor Service
**File:** `server/services/section-extractor.js` (referenced)
**Expected Functions:**
- `extractSections(filePath, ocrResults)` - Extract document sections/headings
- `mapPagesToSections(sections, totalPages)` - Map pages to TOC sections
### TOC Extractor Service
**File:** `server/services/toc-extractor.js` (referenced)
**Expected Functions:**
- `getDocumentToc(documentId)` - Fetch TOC from database
- `buildTocTree(entries)` - Build hierarchical tree from flat list
- `extractTocFromDocument(documentId)` - Extract TOC from PDF
### Audit Service
**File:** `server/services/audit.service.js` (referenced)
**Expected Functions:**
- `logAuditEvent(userId, eventType, status, ipAddress, userAgent, metadata, resourceType, resourceId)`
- Logs all security-relevant actions
### Settings Service
**File:** `server/services/settings.service.js` (referenced)
**Expected Functions:**
- `getSetting(key)` - Get setting by key
- `setSetting(key, value)` - Set/update setting
- `getAllSettings()` - Get all settings
### File Safety Service
**File:** `server/services/file-safety.js`
**Expected Functions:**
- `validateFile(file)` - Validate file type, size, etc.
- `sanitizeFilename(filename)` - Remove dangerous characters
---
## 4. Background Job Patterns (BullMQ Usage)
### OCR Worker
**File:** `server/workers/ocr-worker.js`
**Job Processing Pipeline:**
1. **Job Initialization**
- Receives `{ documentId, jobId, filePath, fileName, organizationId, userId, priority }`
- Updates ocr_jobs: status = 'processing', progress = 0, started_at = now
2. **PDF Text Extraction** (60-70% of job)
- Calls `extractTextFromPDF()` with progress callback
- Returns: `[{ pageNumber, text, confidence, error }]`
- Concurrency: 2 documents at a time (env: OCR_CONCURRENCY)
- Limiter: 5 jobs per minute (prevents Tesseract overload)
3. **Page Processing** (per page)
- Clean OCR text via `cleanOCRText()`
- Insert/update document_pages
- Index in Meilisearch via `indexDocumentPage()`
- Store confidence scores and language
4. **Image Extraction** (per page)
- Extract images via `extractImagesFromPage()`
- Run Tesseract on each image
- Store in document_images table
- Index image text in Meilisearch with `documentType: 'image'`
5. **Section/TOC Extraction** (post-processing)
- Call `extractSections()` and `mapPagesToSections()`
- Update document_pages with section metadata (section, section_key, section_order)
- Call `extractTocFromDocument()` for TOC entries
6. **Completion**
- Update documents: status = 'indexed', imagesExtracted = 1
- Update ocr_jobs: status = 'completed', progress = 100, completed_at = now
- Return: `{ success: true, documentId, pagesProcessed }`
7. **Error Handling**
- On failure: status = 'failed', error = error.message
- Continues processing other pages on individual page failures
- Re-throws to mark BullMQ job as failed
- Retries up to 3 times with exponential backoff
**Event Handlers:**
```
worker.on('completed', (job, result) => { /* log */ })
worker.on('failed', (job, error) => { /* log error */ })
worker.on('error', (error) => { /* worker crash */ })
worker.on('ready', () => { /* worker ready */ })
```
**Graceful Shutdown:**
- `SIGTERM` / `SIGINT` handlers
- Calls `worker.close()` and `connection.quit()`
### Image Extractor Worker
**File:** `server/workers/image-extractor.js`
**Expected Functionality:**
- `extractImagesFromPage(filePath, pageNumber, documentId)` - Extract images from PDF page
- Returns: `[{ id, path, format, width, height, imageIndex, position }]`
---
## 5. Integration Points for New Features
### Inventory Management Feature
**Integration Points:**
1. **Database Schema:**
- Extend `components` table with inventory fields:
```sql
ALTER TABLE components ADD COLUMN (
quantity_available INTEGER DEFAULT 0,
reorder_level INTEGER,
supplier_info TEXT, -- JSON with supplier contacts
last_purchased_date INTEGER,
purchase_cost REAL,
location_storage TEXT
);
```
- Create `inventory_transactions` table for audit trail
2. **API Endpoints:**
- `POST /api/inventory/items` - Create inventory item (link to component)
- `GET /api/inventory/items` - List inventory with filters
- `PUT /api/inventory/items/:id` - Update quantity/location
- `POST /api/inventory/items/:id/transactions` - Record transaction (purchase, use, transfer)
- `GET /api/inventory/alerts` - Get low-stock alerts
3. **Service Layer:**
- Create `server/services/inventory.service.js`:
- `createInventoryItem(componentId, quantity, reorderLevel, supplier)`
- `updateInventoryQuantity(itemId, change, reason, userId)`
- `getInventoryAlerts(organizationId)`
- `calculateReorderPoints()`
4. **Route File:**
- Create `server/routes/inventory.routes.js`
- Add to `server/index.js`: `app.use('/api/inventory', inventoryRoutes);`
5. **BullMQ Job (Optional):**
- Create background job for inventory replenishment alerts
- Queue in `server/workers/inventory-alerts.js`
### Maintenance Tracking Feature
**Integration Points:**
1. **Database Schema:**
- Extend `components` table:
```sql
ALTER TABLE components ADD COLUMN (
maintenance_interval_days INTEGER,
last_maintenance_date INTEGER,
next_maintenance_date INTEGER
);
```
- Create `maintenance_logs` table:
```sql
CREATE TABLE maintenance_logs (
id TEXT PRIMARY KEY,
component_id FK,
entity_id FK,
performed_by FK,
maintenance_type TEXT (inspection, service, repair, replacement),
description TEXT,
cost REAL,
duration_hours REAL,
next_scheduled_date INTEGER,
document_id FK (reference manual),
created_at INTEGER
);
```
2. **API Endpoints:**
- `POST /api/maintenance/logs` - Log maintenance event
- `GET /api/maintenance/logs` - List maintenance history
- `GET /api/maintenance/schedule` - Get upcoming maintenance
- `PUT /api/maintenance/logs/:id` - Update log
- `DELETE /api/maintenance/logs/:id` - Remove log
3. **Service Layer:**
- Create `server/services/maintenance.service.js`:
- `logMaintenance(componentId, type, description, performedBy)`
- `getMaintenanceHistory(componentId, limit)`
- `getUpcomingMaintenance(organizationId)`
- `calculateNextMaintenanceDate(componentId)`
4. **Route File:**
- Create `server/routes/maintenance.routes.js`
- Add to `server/index.js`: `app.use('/api/maintenance', maintenanceRoutes);`
5. **Background Job:**
- Create `server/workers/maintenance-reminders.js`
- BullMQ cron job to check and send alerts
6. **Search Integration:**
- Index maintenance logs in Meilisearch for searchability
### Camera/Document Capture Feature
**Integration Points:**
1. **Database Schema:**
- Extend `documents` table:
```sql
ALTER TABLE documents ADD COLUMN (
capture_method TEXT (upload, camera, screenshot, scan),
camera_device_info TEXT, -- JSON with device metadata
capture_timestamp INTEGER
);
```
- Create `camera_sessions` table:
```sql
CREATE TABLE camera_sessions (
id TEXT PRIMARY KEY,
user_id FK,
organization_id FK,
device_info TEXT, -- JSON
started_at INTEGER,
ended_at INTEGER,
capture_count INTEGER
);
```
2. **API Endpoints:**
- `POST /api/capture/camera-session` - Start camera session
- `POST /api/capture/upload-frame` - Upload single camera frame
- `GET /api/capture/sessions` - List capture sessions
- `POST /api/capture/batch-process` - Process batch of frames as single document
3. **Service Layer:**
- Create `server/services/capture.service.js`:
- `createCameraSession(userId, organizationId, deviceInfo)`
- `uploadCaptureFrame(sessionId, imageBuffer, frameNumber)`
- `processCaptureSession(sessionId)` - Convert frames to PDF
- `getSessionCaptures(sessionId)`
4. **Route File:**
- Create `server/routes/capture.routes.js`
- Add to `server/index.js`: `app.use('/api/capture', captureRoutes);`
5. **Background Job:**
- Extend OCR worker to handle batch-captured images
- Create `server/workers/batch-processor.js` for frame-to-PDF conversion
6. **Client Integration:**
- Camera API integration in Vue 3 frontend
- WebRTC support for real-time preview
### New Feature Route Registration Pattern
**Standard Integration Checklist:**
```javascript
// 1. Create service file: server/services/[feature].service.js
// 2. Create route file: server/routes/[feature].routes.js
// 3. Add to server/index.js:
import [feature]Routes from './routes/[feature].routes.js';
app.use('/api/[feature]', [feature]Routes);
// 4. If background job needed:
// - Create server/workers/[feature]-worker.js
// - Extend queue.service.js with get[Feature]Queue()
// 5. If search needed:
// - Index documents via Meilisearch client in service layer
// 6. Database schema changes:
// - Add migration file or update schema.sql comments
// - Test with db/init.js
```
---
## 6. Tech Stack Validation
### Backend Stack
| Technology | Version | Purpose | Status |
|-----------|---------|---------|--------|
| **Node.js** | 18+ | Runtime | Running |
| **Express.js** | ^5.0.0 | Web framework | Active |
| **SQLite (better-sqlite3)** | ^11.0.0 | Database | Active |
| **PostgreSQL** | - | Planned migration target | Not yet |
| **Redis (ioredis)** | ^5.0.0 | Queue backend | Required |
| **BullMQ** | ^5.0.0 | Job queue | Active |
| **JWT (jsonwebtoken)** | ^9.0.2 | Authentication | Active |
| **Bcryptjs** | ^3.0.2 | Password hashing | Active |
| **Meilisearch** | ^0.41.0 | Full-text search | Active |
| **Tesseract.js** | ^5.0.0 | OCR engine | Active |
| **PDF processing** | - | - | - |
| ├─ pdf-parse | ^1.1.1 | PDF parsing | Active |
| ├─ pdf-img-convert | ^2.0.0 | PDF to image | Active |
| ├─ pdfjs-dist | ^4.0.0 | PDF viewer lib | Client |
| **Image processing** | - | - | - |
| ├─ sharp | ^0.34.4 | Image optimization | Active |
| **Multer** | ^1.4.5-lts.1 | File upload | Active |
| **file-type** | ^19.0.0 | File validation | Active |
| **Helmet** | ^7.0.0 | Security headers | Active |
| **CORS** | ^2.8.5 | Cross-origin | Active |
| **Rate-limit** | ^7.0.0 | Request limiting | Active |
| **LRU-Cache** | ^11.2.2 | TOC caching | Active |
| **UUID** | ^10.0.0 | ID generation | Active |
| **dotenv** | ^16.0.0 | Config management | Active |
### Frontend Stack
| Technology | Version | Purpose | Status |
|-----------|---------|---------|--------|
| **Vue.js** | ^3.5.0 | UI framework | Active |
| **Vue Router** | ^4.4.0 | Client routing | Active |
| **Pinia** | ^2.2.0 | State management | Active |
| **Vue i18n** | ^9.14.5 | Internationalization | Active |
| **Vite** | ^5.0.0 | Build tool | Active |
| **Tailwind CSS** | ^3.4.0 | Styling | Active |
| **PostCSS** | ^8.4.0 | CSS processing | Active |
| **Meilisearch SDK** | ^0.41.0 | Client search | Active |
| **PDF.js** | ^4.0.0 | PDF viewer | Active |
| **Playwright** | ^1.40.0 | Testing | Dev |
### Infrastructure Requirements
| Service | Configuration | Purpose |
|---------|--------------|---------|
| **Database** | SQLite file (or PostgreSQL) | Primary data store |
| **Redis** | `REDIS_HOST` (default 127.0.0.1:6379) | BullMQ backend |
| **Meilisearch** | `MEILISEARCH_HOST` (default http://127.0.0.1:7700) | Search service |
| **File Storage** | `/uploads` directory | PDF and image storage |
### Environment Variables (Key)
```
# Server
PORT=3001
NODE_ENV=development
ALLOWED_ORIGINS=http://localhost:5173
# Database
DATABASE_PATH=./navidocs.db
# Redis
REDIS_HOST=127.0.0.1
REDIS_PORT=6379
# Meilisearch
MEILISEARCH_HOST=http://127.0.0.1:7700
MEILISEARCH_MASTER_KEY=<key>
MEILISEARCH_SEARCH_KEY=<key>
MEILISEARCH_INDEX_NAME=navidocs-pages
# JWT
JWT_SECRET=your-secret-key-change-in-production
JWT_EXPIRES_IN=15m
# File Upload
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=52428800 # 50MB
# OCR
OCR_CONCURRENCY=2
# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000 # 15 minutes
RATE_LIMIT_MAX_REQUESTS=100
IMAGE_RATE_LIMIT_MAX_REQUESTS=200
```
### Validation Summary
**Confirmed Technologies:**
- Vue 3: ✓ Installed (^3.5.0)
- Express.js: ✓ Installed (^5.0.0)
- SQLite: ✓ Installed via better-sqlite3 (^11.0.0)
- Redis: ✓ Installed via ioredis (^5.0.0)
- Meilisearch: ✓ Installed (^0.41.0)
- Tesseract: ✓ Installed via tesseract.js (^5.0.0)
**Status:** All core tech stack components present and correctly configured.
---
## 7. Architecture Diagram (Text-based)
```
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER (Vue 3) │
├─────────────────────────────────────────────────────────────────┤
│ • Vue Router (SPA navigation) │
│ • Pinia (state management) │
│ • Meilisearch Client SDK (full-text search UI) │
│ • PDF.js (document viewer) │
│ • Tailwind CSS (styling) │
└─────────────────────────────────────────────────────────────────┘
↓ HTTP/REST
┌─────────────────────────────────────────────────────────────────┐
│ EXPRESS.JS API LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Routes: /api/auth, /api/documents, /api/search, /api/upload, │
│ /api/organizations, /api/jobs, /api/maintenance, etc │
│ │
│ Middleware: Authentication (JWT), Authorization, Rate Limiting │
│ Request Logging, Security Headers (Helmet) │
│ │
│ Response: JSON (documents, images, search results) │
└─────────────────────────────────────────────────────────────────┘
↓ ↓ ↓
┌─────────────────────────────────────────────────┐
│ SERVICE LAYER (Business Logic) │
├─────────────────────────────────────────────────┤
│ • auth.service.js - JWT, password hashing │
│ • authorization.service.js - Permission checks │
│ • search.js - Meilisearch indexing │
│ • queue.js - BullMQ job management │
│ • ocr-hybrid.js - PDF text extraction │
│ • inventory.service.js - (new feature) │
│ • maintenance.service.js - (new feature) │
│ • capture.service.js - (new feature) │
└─────────────────────────────────────────────────┘
↓ ↓ ↓
┌────────────────────┐ ┌──────────────────────┐ ┌─────────────────┐
│ SQLite DB │ │ Redis Queue │ │ Meilisearch │
├────────────────────┤ ├──────────────────────┤ ├─────────────────┤
│ • users │ │ ocr-processing queue │ │ Full-text index │
│ • organizations │ │ job data + status │ │ Page documents │
│ • documents │ │ (in-memory) │ │ Image text │
│ • entities │ │ │ │ │
│ • components │ │ │ │ │
│ • permissions │ │ │ │ │
│ • maintenance_logs │ │ │ │ │
│ • inventory_items │ │ │ │ │
└────────────────────┘ └──────────────────────┘ └─────────────────┘
┌──────────────────────┐
│ Background Workers │
├──────────────────────┤
│ • ocr-worker.js │
│ - PDF → text │
│ - Tesseract.js OCR │
│ - Index to MS │
│ - Extract images │
│ - Extract TOC │
│ │
│ • inventory-alerts │
│ • maintenance-reminders
│ • batch-processor │
└──────────────────────┘
┌──────────────────────┐
│ File System │
├──────────────────────┤
│ /uploads/ │
│ • PDF documents │
│ • Extracted images │
│ • Temporary files │
└──────────────────────┘
```
---
## 8. Data Flow Examples
### Document Upload & OCR Processing Flow
```
1. User uploads PDF via POST /api/upload
├─ Multer stores file in memory
├─ File validation (size, type)
├─ SHA256 hash for deduplication
├─ File saved to disk (/uploads/:docId.pdf)
├─ Document record created (status: processing)
├─ ocr_job record created (status: pending)
└─ Response: { jobId, documentId }
2. API queues OCR job via queue.service.addOcrJob()
└─ BullMQ adds to Redis 'ocr-processing' queue
3. OCR Worker picks up job
├─ extractTextFromPDF() using pdf-parse + Tesseract.js
├─ Per page:
│ ├─ cleanOCRText()
│ ├─ Insert document_page record
│ ├─ Index in Meilisearch
│ ├─ extractImagesFromPage()
│ │ ├─ Convert page to image
│ │ ├─ Extract embedded images
│ │ └─ Run OCR on each image
│ └─ Store image metadata
├─ extractSections() for TOC
├─ Update document status: indexed
└─ Update ocr_job: completed
4. User polls GET /api/jobs/:jobId
├─ Checks database ocr_jobs record
└─ Response: { status, progress, documentId }
5. Document now searchable
├─ GET /api/search/token → Meilisearch auth
├─ POST /api/search → Full-text search results
└─ GET /api/documents/:id → Page list with OCR
```
### Search & Document Retrieval Flow
```
1. User requests search token
POST /api/search/token
├─ Verifies user's organizations
├─ Generates Meilisearch tenant token (org-scoped)
└─ Response: { token, expiresAt, searchUrl }
2. Client calls Meilisearch directly with token
├─ Client library: meilisearch.index().search(q)
└─ Results filtered by organization
3. User clicks document result
GET /api/documents/:id
├─ Verify ownership/access
├─ Fetch document + pages + entity/component
└─ Response: Full metadata + page list
4. User views PDF
GET /api/documents/:id/pdf
├─ Verify access
├─ Stream file from /uploads/:id.pdf
└─ Response: PDF stream
5. User views document images
GET /api/documents/:id/images
├─ Query document_images table
└─ Response: Image metadata + URLs
6. Client fetches image
GET /api/images/:imageId
├─ Verify access
├─ Rate limit (200/min)
├─ Path traversal check
└─ Stream: /uploads/:docId/image_*.png
```
### Permission & Sharing Flow
```
1. Document Owner Shares Document
POST /api/documents/:id/share
├─ Create document_shares record
├─ Audit log: document.share event
└─ Response: { success, sharedWith }
2. Recipient Accesses Document
GET /api/documents/:id
├─ Check access via:
│ ├─ user_organizations (org membership)
│ ├─ documents.uploaded_by (owner)
│ └─ document_shares (shared with)
├─ Grant read/write permission
└─ Return document + pages
3. Manager Grants Entity Permission
POST /api/permissions/grant
├─ Create entity_permissions record
├─ Set permission_level (viewer|editor|manager|admin)
├─ Optional expiration
├─ Audit log
└─ Response: Permission ID
4. Check Permission
checkEntityPermission(userId, entityId, minimumLevel)
├─ Query entity_permissions table
├─ Verify expiration
├─ Check permission hierarchy
└─ Return: { hasPermission, level }
```
---
## 9. Security Implementation
### Authentication & Authorization
**JWT Strategy:**
- Access Token: 15 minutes (short-lived)
- Refresh Token: 7 days (stored in DB with hash)
- Tokens revoked on password reset
- Account lockout: 15 min after 5 failed attempts
**Password Security:**
- Bcrypt with 12 rounds
- Minimum 8 characters
- Hashing on register and reset
**Session Management:**
- Refresh tokens tracked in database
- Device info and IP logging
- Logout-all support
**Role-Based Access Control (RBAC):**
```
Organization Roles:
• viewer: Read-only access
• member: Can upload documents
• manager: Can add members, update org
• admin: Full org control + deletion
Entity Permissions:
• viewer: Read-only
• editor: Can modify/share
• manager: All + member management
• admin: Full control
Default Flow:
User → Organization (role) → Entities (permissions)
```
### API Security
**Middleware Stack:**
1. **Helmet**: Security headers (CSP, X-Frame-Options, etc)
2. **CORS**: Whitelisted origins (production)
3. **Rate Limiting**: 100 req/15min per IP (configurable)
4. **Authentication**: JWT verification on protected routes
5. **Authorization**: Role/permission checks in handlers
6. **Input Validation**: UUID format, file type, size limits
7. **Path Traversal Prevention**: Normalized path checks for file serving
**File Upload Security:**
- Multer memory storage (prevents direct disk write)
- File type validation via file-type library
- Size limit: 50MB (configurable)
- SHA256 hash for deduplication
- Filename sanitization (remove dangerous chars)
### Data Protection
**In Transit:**
- HTTPS enforced (production)
- TLS/SSL certificates
- Secure cookies for JWT
**At Rest:**
- SQLite encryption (optional setup)
- Bcrypt password hashing
- No plaintext credentials in code
**Audit Trail:**
- All permission changes logged
- User actions tracked (audit_events)
- Login/logout recorded
---
## 10. Performance Considerations
### Database Optimization
- Indexes on common query columns (org, entity, status, hash)
- Prepared statements via better-sqlite3
- Connection pooling (single connection in current setup)
### Search Optimization
- Meilisearch for full-text indexing (not SQLite FTS)
- Async indexing in OCR worker
- Tenant tokens for client-side search
- 30-min LRU cache for TOC queries
### OCR Processing
- Concurrency: 2 documents (configurable via OCR_CONCURRENCY)
- Limiter: 5 jobs/minute (prevents Tesseract overload)
- Progress tracking (0-100%)
- Batch image processing
### Memory Management
- Streaming responses for large PDFs
- Image compression via sharp
- LRU cache cleanup (30 min TTL)
- Job cleanup: Complete (24h), Failed (7 days)
### Scalability Bottlenecks
- **Single SQLite connection**: Switch to PostgreSQL for concurrent writes
- **Local file storage**: Switch to S3/cloud storage
- **Tesseract CPU usage**: Distribute workers across machines
- **Meilisearch scale**: Deploy cluster for high traffic
---
## 11. Known Issues & TODOs
### Authentication
- [ ] Authentication middleware incomplete (req.user often hardcoded as 'test-user-id')
- [ ] Email verification not sent (template needed)
- [ ] Password reset email not sent (template needed)
### Authorization
- [ ] Some endpoints missing auth checks
- [ ] Entity-level permissions not fully integrated
- [ ] Document-level permissions incomplete
### Database
- [ ] Password reset tokens table missing from schema
- [ ] Refresh tokens table missing from schema
- [ ] Audit events table not defined
- [ ] Document images table not in schema.sql
- [ ] Document metadata handling inconsistent
### OCR Worker
- [ ] Image extraction may fail silently
- [ ] Section extraction error handling needs improvement
- [ ] TOC extraction timing makes it optional (should be robust)
### Frontend
- [ ] Client-side image upload/capture not implemented
- [ ] Multilingual search needs testing
- [ ] Rate limiting feedback incomplete
---
## 12. Integration Roadmap for New Features
### Phase 1: Inventory Management
**Dependencies:**
- Components schema (exists)
- Basic CRUD API patterns (exist)
- Database migrations (setup required)
**Estimated effort:** 3-4 days
**New files:** 3 (service, routes, worker)
**Database changes:** +2 tables
### Phase 2: Maintenance Tracking
**Dependencies:**
- Inventory feature (Phase 1)
- Meilisearch indexing (exists)
- Audit logging (partial)
**Estimated effort:** 2-3 days
**New files:** 3 (service, routes, worker)
**Database changes:** +1 table
### Phase 3: Camera/Capture Feature
**Dependencies:**
- Upload API (exists)
- PDF processing (exists)
- WebRTC/Camera API (client)
**Estimated effort:** 4-5 days
**New files:** 4 (service, routes, worker, batch-processor)
**Database changes:** +2 tables
### Phase 4: Enhanced Search & Analytics
**Dependencies:**
- Meilisearch integration (exists)
- Audit trail (Phase 2+)
- Statistics API (exists)
**Estimated effort:** 2-3 days
**New files:** 2 (service, routes)
---
## Conclusion
The NaviDocs codebase is well-structured with clear separation of concerns:
- **Database**: Comprehensive schema supporting multi-entity, multi-tenant architecture
- **API**: RESTful endpoints organized by feature with consistent patterns
- **Services**: Business logic isolated from routes with dependency injection
- **Workers**: Background OCR processing via BullMQ + Redis
- **Frontend**: Vue 3 SPA with Meilisearch client-side search
**Ready for integration of:**
- Inventory management
- Maintenance tracking
- Camera/document capture
- Enhanced analytics
All integration points identified and documented above.