All 11 agents (S2-H01 through S2-H09 + S2-H03A + S2-H07A) have completed their technical specifications: - S2-H01: NaviDocs codebase architecture analysis - S2-H02: Inventory tracking system (€15K-€50K value recovery) - S2-H03: Maintenance log & reminder system - S2-H04: Camera & Home Assistant integration - S2-H05: Contact management system - S2-H06: Accounting module & receipt OCR integration - S2-H07: Impeccable search UX (Meilisearch facets) - S2-H08: WhatsApp Business API + AI agent integration - S2-H09: Document versioning with IF.TTT compliance - S2-H03A: VAT/tax jurisdiction tracking & compliance - S2-H07A: Multi-calendar system (4 calendar types) Total: ~15,600 lines of technical specifications Status: Ready for S2-H10 synthesis (awaiting Session 1 completion) IF.bus: All inter-agent communications documented
1443 lines
47 KiB
Markdown
1443 lines
47 KiB
Markdown
# NaviDocs Codebase Architecture Map
|
|
|
|
**Analysis Date:** 2025-11-13
|
|
**Agent:** S2-H01
|
|
**Status:** Complete
|
|
|
|
---
|
|
|
|
## 1. Database Schema Summary
|
|
|
|
### Core Entities
|
|
|
|
The NaviDocs database uses SQLite (v3) with a schema designed for future PostgreSQL migration. All timestamps use Unix epoch (seconds).
|
|
|
|
#### User Management
|
|
```
|
|
- users (id: TEXT PRIMARY KEY)
|
|
- id: UUID
|
|
- email: TEXT UNIQUE
|
|
- password_hash: TEXT (bcrypt)
|
|
- name: TEXT
|
|
- status: TEXT (active, suspended, deleted)
|
|
- email_verified: BOOLEAN
|
|
- created_at, updated_at: INTEGER
|
|
- last_login_at: INTEGER
|
|
- failed_login_attempts, locked_until: Security fields
|
|
```
|
|
|
|
#### Organization Structure (Multi-tenant)
|
|
```
|
|
- organizations (id: TEXT PRIMARY KEY)
|
|
- id: UUID
|
|
- name: TEXT
|
|
- type: TEXT (personal, commercial, hoa)
|
|
- created_at, updated_at: INTEGER
|
|
|
|
- user_organizations (user_id + organization_id PRIMARY KEY)
|
|
- role: TEXT (admin, manager, member, viewer)
|
|
- joined_at: INTEGER
|
|
```
|
|
|
|
#### Entity Management (Boats, Marinas, Properties)
|
|
```
|
|
- entities (id: TEXT PRIMARY KEY)
|
|
- id: UUID
|
|
- organization_id: FK
|
|
- user_id: FK (primary owner)
|
|
- entity_type: TEXT (boat, marina, condo, yacht-club)
|
|
- name: TEXT
|
|
|
|
Boat-specific:
|
|
- make, model, year: TEXT/INTEGER
|
|
- hull_id: TEXT
|
|
- vessel_type: TEXT (powerboat, sailboat, catamaran, trawler)
|
|
- length_feet: INTEGER
|
|
|
|
Property-specific:
|
|
- property_type: TEXT
|
|
- address: TEXT
|
|
- gps_lat, gps_lon: REAL
|
|
|
|
- metadata: TEXT (JSON)
|
|
- created_at, updated_at: INTEGER
|
|
```
|
|
|
|
#### Hierarchical Component Structure
|
|
```
|
|
- sub_entities (id: TEXT PRIMARY KEY)
|
|
- id: UUID
|
|
- entity_id: FK
|
|
- name: TEXT (system, dock, unit, facility)
|
|
- type: TEXT
|
|
- metadata: TEXT (JSON)
|
|
|
|
- components (id: TEXT PRIMARY KEY)
|
|
- id: UUID
|
|
- sub_entity_id: FK (optional)
|
|
- entity_id: FK (direct link)
|
|
- name, manufacturer, model_number, serial_number: TEXT
|
|
- install_date, warranty_expires: INTEGER
|
|
- metadata: TEXT (JSON)
|
|
```
|
|
|
|
#### Document Management
|
|
```
|
|
- documents (id: TEXT PRIMARY KEY)
|
|
- id: UUID
|
|
- organization_id: FK
|
|
- entity_id, sub_entity_id, component_id: FK (hierarchical linking)
|
|
- uploaded_by: FK (user)
|
|
- title, document_type: TEXT
|
|
- file_path, file_name, file_size: TEXT/INTEGER
|
|
- file_hash: TEXT (SHA256 for deduplication)
|
|
- mime_type: TEXT (default: application/pdf)
|
|
- page_count: INTEGER
|
|
- language: TEXT (default: en)
|
|
- status: TEXT (processing, indexed, failed, archived, deleted)
|
|
- replaced_by: TEXT (document supersession)
|
|
- is_shared: BOOLEAN
|
|
- shared_component_id: TEXT (for shared manual library)
|
|
- metadata: TEXT (JSON)
|
|
- created_at, updated_at: INTEGER
|
|
|
|
- document_pages (id: TEXT PRIMARY KEY)
|
|
- id: UUID (page_<doc_id>_<page_num>)
|
|
- document_id: FK
|
|
- page_number: INTEGER
|
|
- ocr_text: TEXT
|
|
- ocr_confidence: REAL (0-1)
|
|
- ocr_language: TEXT (default: en)
|
|
- ocr_completed_at: INTEGER
|
|
- search_indexed_at: INTEGER
|
|
- meilisearch_id: TEXT
|
|
- section: TEXT (TOC section name)
|
|
- section_key: TEXT (normalized key)
|
|
- section_order: INTEGER
|
|
- metadata: TEXT (JSON - bounding boxes, etc)
|
|
|
|
- document_images (extracted from PDFs)
|
|
- id: UUID
|
|
- documentId: FK
|
|
- pageNumber: INTEGER
|
|
- imageIndex: INTEGER
|
|
- imagePath: TEXT
|
|
- imageFormat: TEXT (png, jpeg)
|
|
- width, height: INTEGER
|
|
- position: TEXT (JSON)
|
|
- extractedText: TEXT
|
|
- textConfidence: REAL
|
|
- anchorTextBefore, anchorTextAfter: TEXT
|
|
```
|
|
|
|
#### Background Jobs
|
|
```
|
|
- ocr_jobs (id: TEXT PRIMARY KEY)
|
|
- id: UUID
|
|
- document_id: FK
|
|
- status: TEXT (pending, processing, completed, failed)
|
|
- progress: INTEGER (0-100%)
|
|
- error: TEXT
|
|
- started_at, completed_at: INTEGER
|
|
- created_at: INTEGER
|
|
```
|
|
|
|
#### Permissions & Sharing
|
|
```
|
|
- permissions (granular access control)
|
|
- id: UUID
|
|
- resource_type: TEXT (document, entity, organization)
|
|
- resource_id: FK
|
|
- user_id: FK
|
|
- permission: TEXT (read, write, share, delete, admin)
|
|
- granted_by, granted_at: FK + INTEGER
|
|
- expires_at: INTEGER (optional)
|
|
|
|
- entity_permissions (entity-level access)
|
|
- id: UUID
|
|
- user_id, entity_id: FK
|
|
- permission_level: TEXT (viewer, editor, manager, admin)
|
|
- granted_by, granted_at: FK + INTEGER
|
|
- expires_at: INTEGER
|
|
|
|
- document_shares (simplified document sharing)
|
|
- id: UUID
|
|
- document_id, shared_by, shared_with: FK
|
|
- permission: TEXT (read, write)
|
|
- created_at: INTEGER
|
|
|
|
- refresh_tokens (JWT session management)
|
|
- id: UUID
|
|
- user_id: FK
|
|
- token_hash: TEXT (SHA256)
|
|
- device_info, ip_address: TEXT
|
|
- expires_at: INTEGER
|
|
- revoked: BOOLEAN
|
|
- created_at, revoked_at: INTEGER
|
|
|
|
- password_reset_tokens
|
|
- id: UUID
|
|
- user_id: FK
|
|
- token_hash: TEXT (SHA256)
|
|
- expires_at: INTEGER
|
|
- used: BOOLEAN
|
|
- ip_address: TEXT
|
|
- used_at: INTEGER
|
|
```
|
|
|
|
#### User Preferences
|
|
```
|
|
- bookmarks (quick access)
|
|
- id: UUID
|
|
- user_id, document_id: FK
|
|
- page_id: FK (optional - specific page)
|
|
- label: TEXT
|
|
- quick_access: BOOLEAN (pin to homepage)
|
|
- created_at: INTEGER
|
|
```
|
|
|
|
#### Audit Trail (Optional)
|
|
```
|
|
- audit_events (not shown in schema but referenced in code)
|
|
- Logs all significant operations for compliance
|
|
- user_id, event_type, resource_type, resource_id
|
|
- status, ip_address, user_agent, metadata
|
|
```
|
|
|
|
#### Settings/Configuration
|
|
```
|
|
- settings (key-value store)
|
|
- key: TEXT PRIMARY KEY
|
|
- value: TEXT (JSON)
|
|
- description: TEXT
|
|
- category: TEXT
|
|
```
|
|
|
|
### Key Indexes
|
|
- `idx_entities_org`, `idx_entities_user`, `idx_entities_type`
|
|
- `idx_documents_org`, `idx_documents_entity`, `idx_documents_status`, `idx_documents_hash`, `idx_documents_shared`
|
|
- `idx_pages_document`, `idx_pages_indexed`
|
|
- `idx_jobs_status`, `idx_jobs_document`
|
|
- `idx_permissions_user`, `idx_permissions_resource`
|
|
- `idx_bookmarks_user`
|
|
|
|
---
|
|
|
|
## 2. API Endpoints (Grouped by Feature)
|
|
|
|
### Authentication Endpoints (`/api/auth`)
|
|
**File:** `server/routes/auth.routes.js`
|
|
|
|
```
|
|
POST /api/auth/register
|
|
- Input: email, password, name
|
|
- Output: userId, email, verificationToken
|
|
- Logging: audit.service logs user.register
|
|
|
|
POST /api/auth/login
|
|
- Input: email, password, deviceInfo, ipAddress
|
|
- Output: accessToken (JWT), refreshToken, user object
|
|
- Auth: None (initial login)
|
|
- Side Effects: Updates failed_login_attempts, triggers account lock after 5 failures
|
|
|
|
POST /api/auth/refresh
|
|
- Input: refreshToken
|
|
- Output: new accessToken, user object
|
|
- Auth: None (token-based)
|
|
|
|
POST /api/auth/logout
|
|
- Input: refreshToken
|
|
- Output: success message
|
|
- Side Effects: Revokes refresh token
|
|
|
|
POST /api/auth/logout-all
|
|
- Input: None (uses JWT)
|
|
- Output: success message
|
|
- Side Effects: Revokes all user tokens
|
|
- Auth: JWT required
|
|
|
|
POST /api/auth/password/reset-request
|
|
- Input: email
|
|
- Output: generic success (doesn't reveal email exists)
|
|
- Side Effects: Creates password_reset_tokens entry
|
|
|
|
POST /api/auth/password/reset
|
|
- Input: token, newPassword
|
|
- Output: success message
|
|
- Side Effects: Updates password, revokes all refresh tokens
|
|
|
|
POST /api/auth/email/verify
|
|
- Input: token
|
|
- Output: email, success message
|
|
- Side Effects: Sets email_verified = 1
|
|
|
|
GET /api/auth/me
|
|
- Input: None (JWT)
|
|
- Output: user object (id, email, name, status, emailVerified, createdAt, lastLoginAt)
|
|
- Auth: JWT required
|
|
```
|
|
|
|
### Organization Management (`/api/organizations`)
|
|
**File:** `server/routes/organization.routes.js`
|
|
|
|
```
|
|
POST /api/organizations
|
|
- Input: name, type (optional), metadata (optional)
|
|
- Output: organization object
|
|
- Auth: JWT required
|
|
|
|
GET /api/organizations
|
|
- Input: None
|
|
- Output: Array of user's organizations with role
|
|
- Auth: JWT required
|
|
|
|
GET /api/organizations/:organizationId
|
|
- Input: organizationId in params
|
|
- Output: organization details with userRole
|
|
- Auth: JWT + requireOrganizationMember
|
|
|
|
PUT /api/organizations/:organizationId
|
|
- Input: name, type, metadata
|
|
- Output: updated organization
|
|
- Auth: JWT + requireOrganizationRole('manager')
|
|
|
|
DELETE /api/organizations/:organizationId
|
|
- Input: organizationId
|
|
- Output: success message with deleted count
|
|
- Auth: JWT + requireOrganizationRole('admin')
|
|
|
|
GET /api/organizations/:organizationId/members
|
|
- Input: organizationId
|
|
- Output: Array of members with roles
|
|
- Auth: JWT + requireOrganizationMember
|
|
|
|
POST /api/organizations/:organizationId/members
|
|
- Input: userId, role (optional)
|
|
- Output: success message
|
|
- Auth: JWT + requireOrganizationRole('manager')
|
|
- Side Effects: Adds or updates user role
|
|
|
|
DELETE /api/organizations/:organizationId/members/:userId
|
|
- Input: organizationId, userId
|
|
- Output: success message with removed role
|
|
- Auth: JWT + requireOrganizationRole('manager')
|
|
|
|
GET /api/organizations/:organizationId/stats
|
|
- Input: organizationId
|
|
- Output: organization statistics (document count, member count, etc)
|
|
- Auth: JWT + requireOrganizationMember
|
|
```
|
|
|
|
### Permission Management (`/api/permissions`)
|
|
**File:** `server/routes/permission.routes.js` (referenced but not fully reviewed)
|
|
|
|
```
|
|
Expected endpoints:
|
|
- POST /api/permissions/grant (grant permission to user)
|
|
- DELETE /api/permissions/revoke (revoke permission)
|
|
- GET /api/permissions/check (check permission)
|
|
```
|
|
|
|
### Document Management (`/api/documents`)
|
|
**File:** `server/routes/documents.js`
|
|
|
|
```
|
|
POST /api/upload
|
|
- Input: file (PDF), title, documentType, organizationId, entityId (optional), componentId (optional), subEntityId (optional)
|
|
- Output: jobId, documentId, message
|
|
- Auth: None (TODO: should be JWT)
|
|
- Side Effects:
|
|
* Validates file safety (file-safety.service)
|
|
* Generates SHA256 hash for deduplication
|
|
* Creates documents and ocr_jobs records
|
|
* Adds OCR job to BullMQ queue
|
|
|
|
GET /api/documents
|
|
- Input: organizationId, entityId, documentType, status, limit, offset (query params)
|
|
- Output: { documents: [], pagination: { total, limit, offset, hasMore } }
|
|
- Auth: None (TODO: should verify organization membership)
|
|
|
|
GET /api/documents/:id
|
|
- Input: documentId in params
|
|
- Output: Full document metadata + pages array + entity + component info
|
|
- Auth: Checks organization membership, document ownership, or share access
|
|
- Side Effects: Parses metadata JSON
|
|
|
|
GET /api/documents/:id/pdf
|
|
- Input: documentId
|
|
- Output: PDF file stream (inline)
|
|
- Auth: Same as GET /api/documents/:id
|
|
- Security: Path traversal protection
|
|
|
|
DELETE /api/documents/:id
|
|
- Input: documentId
|
|
- Output: success message with document title
|
|
- Auth: None (TODO: should verify ownership)
|
|
- Side Effects:
|
|
* Deletes from Meilisearch index
|
|
* Deletes from database (CASCADE deletes document_pages, ocr_jobs)
|
|
* Deletes file from filesystem
|
|
```
|
|
|
|
### Upload Routes (`/api/upload`)
|
|
**File:** `server/routes/upload.js`
|
|
|
|
```
|
|
POST /api/upload (same as above but dedicated file)
|
|
- Multer configuration: 50MB limit, memory storage
|
|
- Creates document in processing state
|
|
- Queues OCR job via queue.service
|
|
```
|
|
|
|
### Quick OCR Route (`/api/upload/quick-ocr`)
|
|
**File:** `server/routes/quick-ocr.js` (referenced but not fully reviewed)
|
|
|
|
```
|
|
Expected endpoint:
|
|
- POST /api/upload/quick-ocr (rapid OCR without document creation)
|
|
```
|
|
|
|
### Job Management (`/api/jobs`)
|
|
**File:** `server/routes/jobs.js`
|
|
|
|
```
|
|
GET /api/jobs/:id
|
|
- Input: jobId
|
|
- Output: { jobId, documentId, status, progress, error, startedAt, completedAt, createdAt, document? }
|
|
- Auth: None (TODO)
|
|
- Status values: pending, processing, completed, failed
|
|
- Document info included only if status === completed
|
|
|
|
GET /api/jobs
|
|
- Input: status (optional), limit (default 50), offset (default 0)
|
|
- Output: { jobs: [], pagination: { limit, offset } }
|
|
- Auth: Filters to current user's jobs
|
|
- Status filtering: Only allows pending|processing|completed|failed
|
|
```
|
|
|
|
### Search (`/api/search`)
|
|
**File:** `server/routes/search.js`
|
|
|
|
```
|
|
POST /api/search/token
|
|
- Input: expiresIn (seconds, default 3600, max 86400)
|
|
- Output: { token, expiresAt, indexName, searchUrl, mode }
|
|
- Auth: JWT (gets user's organizations)
|
|
- Modes: 'tenant' (preferred) or 'search-key' (fallback)
|
|
- Side Effects: Generates Meilisearch tenant token with organization filters
|
|
|
|
POST /api/search
|
|
- Input: q (query string), filters? (documentType, entityId, language), limit, offset
|
|
- Output: { hits, estimatedTotalHits, query, processingTimeMs, limit, offset }
|
|
- Auth: JWT
|
|
- Meilisearch filters: userId or organizationId membership
|
|
- Additional filters: documentType, entityId, language
|
|
|
|
GET /api/search/health
|
|
- Input: None
|
|
- Output: { status, meilisearch: <health_response> }
|
|
- Auth: None
|
|
```
|
|
|
|
### Image Management (`/api/images`)
|
|
**File:** `server/routes/images.js`
|
|
|
|
```
|
|
GET /api/documents/:id/images
|
|
- Input: documentId
|
|
- Output: { documentId, imageCount, images: [{ id, pageNumber, imageIndex, format, width, height, position, extractedText, confidence, imageUrl }] }
|
|
- Auth: Verifies document access
|
|
- Side Effects: Parses position JSON
|
|
|
|
GET /api/documents/:id/pages/:pageNum/images
|
|
- Input: documentId, pageNumber
|
|
- Output: { documentId, pageNumber, imageCount, images: [] }
|
|
- Auth: Verifies document and page exist
|
|
- Validation: pageNumber must be >= 1
|
|
|
|
GET /api/images/:imageId
|
|
- Input: imageId (img_<uuid>_p<page>_<index>_<timestamp> or UUID)
|
|
- Output: Image file stream (PNG or JPEG)
|
|
- Auth: Verifies document access
|
|
- Rate Limiting: 200 requests per minute (more permissive than API)
|
|
- Security: Path traversal prevention (normalizes path, checks within /uploads)
|
|
```
|
|
|
|
### Table of Contents (`/api/documents/:documentId/toc`)
|
|
**File:** `server/routes/toc.js`
|
|
|
|
```
|
|
GET /api/documents/:documentId/toc
|
|
- Input: documentId, format? (flat|tree, default flat)
|
|
- Output: { entries: [], format, count }
|
|
- Auth: None (TODO)
|
|
- Caching: LRU cache (200 max, 30 min TTL)
|
|
- Side Effects: Builds tree structure if format=tree
|
|
|
|
POST /api/documents/:documentId/toc/extract
|
|
- Input: documentId
|
|
- Output: { success, entriesCount, tocPages: [], message }
|
|
- Auth: None (TODO)
|
|
- Side Effects:
|
|
* Calls extractTocFromDocument (section-extractor.service)
|
|
* Invalidates LRU cache entries
|
|
```
|
|
|
|
### Statistics (`/api/stats`)
|
|
**File:** `server/routes/stats.js` (referenced but not fully reviewed)
|
|
|
|
```
|
|
Expected endpoints:
|
|
- GET /api/stats/organization/:organizationId
|
|
- GET /api/stats/documents
|
|
- GET /api/stats/search
|
|
```
|
|
|
|
### Settings (`/api/admin/settings`)
|
|
**File:** `server/routes/settings.routes.js` (referenced but not fully reviewed)
|
|
|
|
```
|
|
Expected endpoints:
|
|
- GET /api/admin/settings (get all settings)
|
|
- PUT /api/admin/settings/:key (update setting)
|
|
- GET /api/settings/public/app (public app settings - no auth)
|
|
```
|
|
|
|
### Health Check
|
|
```
|
|
GET /health
|
|
- Output: { status, timestamp, uptime }
|
|
- Auth: None
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Service Layer Architecture
|
|
|
|
### Authentication Service
|
|
**File:** `server/services/auth.service.js`
|
|
|
|
**Key Functions:**
|
|
- `register(email, password, name)` - User registration with bcrypt hashing (12 rounds)
|
|
- `login(email, password, deviceInfo, ipAddress)` - JWT + refresh token generation
|
|
- `refreshAccessToken(refreshToken)` - Generate new JWT from refresh token
|
|
- `revokeRefreshToken(refreshToken)` - Revoke single token (logout)
|
|
- `revokeAllUserTokens(userId)` - Logout all devices
|
|
- `requestPasswordReset(email, ipAddress)` - Generate reset token
|
|
- `resetPassword(token, newPassword)` - Validate token and update password
|
|
- `verifyEmail(token)` - Mark email as verified
|
|
- `getUserById(userId)` - Fetch user details
|
|
- `verifyAccessToken(token)` - Validate JWT
|
|
|
|
**Token Management:**
|
|
- JWT Access Token: `expiresIn` from env (default 15m)
|
|
- Refresh Token: 7 days in seconds (604800)
|
|
- Both stored with bcrypt hashing (for refresh tokens)
|
|
- JWT Secret: `process.env.JWT_SECRET` (must change in production)
|
|
|
|
**Security Features:**
|
|
- Password minimum 8 characters
|
|
- Account lockout after 5 failed login attempts (15 min lock)
|
|
- Refresh token revocation on password reset
|
|
- Email verification token support
|
|
|
|
### Authorization Service
|
|
**File:** `server/services/authorization.service.js`
|
|
|
|
**Key Functions:**
|
|
- `grantEntityPermission(userId, entityId, permissionLevel, grantedBy, expiresAt)` - Grant entity access
|
|
- `revokeEntityPermission(userId, entityId, revokedBy)` - Revoke entity access
|
|
- `checkEntityPermission(userId, entityId, minimumPermission)` - Check if user has permission
|
|
- `getUserEntityPermissions(userId, options)` - Get all user's entity permissions
|
|
- `getEntityPermissions(entityId, options)` - Get all entity's permissions
|
|
- `addOrganizationMember(userId, organizationId, role, addedBy)` - Add to organization
|
|
- `removeOrganizationMember(userId, organizationId, removedBy)` - Remove from organization
|
|
- `checkOrganizationMembership(userId, organizationId, minimumRole)` - Check membership
|
|
- `getOrganizationMembers(organizationId)` - List org members
|
|
- `getUserOrganizations(userId)` - Get user's organizations
|
|
- `cleanupExpiredPermissions()` - Cleanup task
|
|
|
|
**Permission Hierarchy:**
|
|
```
|
|
Entity Permissions: viewer (0) < editor (1) < manager (2) < admin (3)
|
|
Organization Roles: viewer (0) < member (1) < manager (2) < admin (3)
|
|
```
|
|
|
|
**Audit Integration:**
|
|
- All permission grants/revokes logged via `logAuditEvent()`
|
|
|
|
### Organization Service
|
|
**File:** `server/services/organization.service.js` (referenced but not fully reviewed)
|
|
|
|
**Expected Functions:**
|
|
- `createOrganization(name, type, metadata, createdBy)`
|
|
- `updateOrganization(organizationId, name, type, metadata, updatedBy)`
|
|
- `deleteOrganization(organizationId, deletedBy)`
|
|
- `getOrganizationById(organizationId)`
|
|
- `getOrganizationStats(organizationId)`
|
|
|
|
### Search Service (Meilisearch Integration)
|
|
**File:** `server/services/search.js`
|
|
|
|
**Key Functions:**
|
|
- `indexDocumentPage(pageId, documentId, pageNumber, text, confidence)` - Index page in Meilisearch
|
|
- `generateTenantToken(userId, organizationIds, expiresIn)` - Generate tenant-scoped token
|
|
|
|
**Meilisearch Index:**
|
|
- Index name: `navidocs-pages` (env configurable)
|
|
- Searchable attributes: ocr text, metadata
|
|
- Filtering: organizationId, userId, documentType, entityId, language
|
|
- Document structure:
|
|
```
|
|
{
|
|
id: string (unique page ID),
|
|
docId: string (document UUID),
|
|
pageNumber: integer,
|
|
organizationId: string,
|
|
userId: string,
|
|
documentType: string,
|
|
text: string (OCR content),
|
|
language: string,
|
|
ocrConfidence: number,
|
|
createdAt: integer,
|
|
updatedAt: integer
|
|
}
|
|
```
|
|
|
|
**Tenant Token Support:**
|
|
- Scoped search to user's organizations
|
|
- Expiration support (max 24 hours)
|
|
- Fallback to search API key if tenant token fails
|
|
|
|
### Queue Service (BullMQ)
|
|
**File:** `server/services/queue.js`
|
|
|
|
**Key Functions:**
|
|
- `getOcrQueue()` - Get singleton queue instance
|
|
- `addOcrJob(documentId, jobId, data)` - Add OCR job to queue
|
|
- `getJobStatus(jobId)` - Get BullMQ job status
|
|
- `closeQueue()` - Graceful shutdown
|
|
|
|
**Queue Configuration:**
|
|
- Redis connection: `REDIS_HOST` (default 127.0.0.1), `REDIS_PORT` (default 6379)
|
|
- Queue name: `ocr-processing`
|
|
- Job retry: 3 attempts with exponential backoff (2s base)
|
|
- Cleanup: Complete jobs kept 24h, failed jobs kept 7 days
|
|
- Job options: priority support
|
|
|
|
**Job Data Structure:**
|
|
```
|
|
{
|
|
documentId: string,
|
|
jobId: string,
|
|
filePath: string,
|
|
fileName: string,
|
|
organizationId: string,
|
|
userId: string,
|
|
priority: number (optional)
|
|
}
|
|
```
|
|
|
|
### OCR Service
|
|
**File:** `server/services/ocr.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- `extractTextFromImage(imagePath, language)` - Tesseract.js OCR on images
|
|
- `cleanOCRText(text)` - Clean and normalize OCR output
|
|
|
|
### OCR Hybrid Service
|
|
**File:** `server/services/ocr-hybrid.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- `extractTextFromPDF(filePath, options)` - Extract text from PDF with progress callback
|
|
- Returns: `[{ pageNumber, text, confidence, error }]`
|
|
|
|
### OCR Google Vision Service
|
|
**File:** `server/services/ocr-google-vision.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- Alternative OCR provider (Google Cloud Vision)
|
|
|
|
### OCR Client Service
|
|
**File:** `server/services/ocr-client.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- Client-side OCR coordination
|
|
|
|
### Section Extractor Service
|
|
**File:** `server/services/section-extractor.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- `extractSections(filePath, ocrResults)` - Extract document sections/headings
|
|
- `mapPagesToSections(sections, totalPages)` - Map pages to TOC sections
|
|
|
|
### TOC Extractor Service
|
|
**File:** `server/services/toc-extractor.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- `getDocumentToc(documentId)` - Fetch TOC from database
|
|
- `buildTocTree(entries)` - Build hierarchical tree from flat list
|
|
- `extractTocFromDocument(documentId)` - Extract TOC from PDF
|
|
|
|
### Audit Service
|
|
**File:** `server/services/audit.service.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- `logAuditEvent(userId, eventType, status, ipAddress, userAgent, metadata, resourceType, resourceId)`
|
|
- Logs all security-relevant actions
|
|
|
|
### Settings Service
|
|
**File:** `server/services/settings.service.js` (referenced)
|
|
|
|
**Expected Functions:**
|
|
- `getSetting(key)` - Get setting by key
|
|
- `setSetting(key, value)` - Set/update setting
|
|
- `getAllSettings()` - Get all settings
|
|
|
|
### File Safety Service
|
|
**File:** `server/services/file-safety.js`
|
|
|
|
**Expected Functions:**
|
|
- `validateFile(file)` - Validate file type, size, etc.
|
|
- `sanitizeFilename(filename)` - Remove dangerous characters
|
|
|
|
---
|
|
|
|
## 4. Background Job Patterns (BullMQ Usage)
|
|
|
|
### OCR Worker
|
|
**File:** `server/workers/ocr-worker.js`
|
|
|
|
**Job Processing Pipeline:**
|
|
|
|
1. **Job Initialization**
|
|
- Receives `{ documentId, jobId, filePath, fileName, organizationId, userId, priority }`
|
|
- Updates ocr_jobs: status = 'processing', progress = 0, started_at = now
|
|
|
|
2. **PDF Text Extraction** (60-70% of job)
|
|
- Calls `extractTextFromPDF()` with progress callback
|
|
- Returns: `[{ pageNumber, text, confidence, error }]`
|
|
- Concurrency: 2 documents at a time (env: OCR_CONCURRENCY)
|
|
- Limiter: 5 jobs per minute (prevents Tesseract overload)
|
|
|
|
3. **Page Processing** (per page)
|
|
- Clean OCR text via `cleanOCRText()`
|
|
- Insert/update document_pages
|
|
- Index in Meilisearch via `indexDocumentPage()`
|
|
- Store confidence scores and language
|
|
|
|
4. **Image Extraction** (per page)
|
|
- Extract images via `extractImagesFromPage()`
|
|
- Run Tesseract on each image
|
|
- Store in document_images table
|
|
- Index image text in Meilisearch with `documentType: 'image'`
|
|
|
|
5. **Section/TOC Extraction** (post-processing)
|
|
- Call `extractSections()` and `mapPagesToSections()`
|
|
- Update document_pages with section metadata (section, section_key, section_order)
|
|
- Call `extractTocFromDocument()` for TOC entries
|
|
|
|
6. **Completion**
|
|
- Update documents: status = 'indexed', imagesExtracted = 1
|
|
- Update ocr_jobs: status = 'completed', progress = 100, completed_at = now
|
|
- Return: `{ success: true, documentId, pagesProcessed }`
|
|
|
|
7. **Error Handling**
|
|
- On failure: status = 'failed', error = error.message
|
|
- Continues processing other pages on individual page failures
|
|
- Re-throws to mark BullMQ job as failed
|
|
- Retries up to 3 times with exponential backoff
|
|
|
|
**Event Handlers:**
|
|
```
|
|
worker.on('completed', (job, result) => { /* log */ })
|
|
worker.on('failed', (job, error) => { /* log error */ })
|
|
worker.on('error', (error) => { /* worker crash */ })
|
|
worker.on('ready', () => { /* worker ready */ })
|
|
```
|
|
|
|
**Graceful Shutdown:**
|
|
- `SIGTERM` / `SIGINT` handlers
|
|
- Calls `worker.close()` and `connection.quit()`
|
|
|
|
### Image Extractor Worker
|
|
**File:** `server/workers/image-extractor.js`
|
|
|
|
**Expected Functionality:**
|
|
- `extractImagesFromPage(filePath, pageNumber, documentId)` - Extract images from PDF page
|
|
- Returns: `[{ id, path, format, width, height, imageIndex, position }]`
|
|
|
|
---
|
|
|
|
## 5. Integration Points for New Features
|
|
|
|
### Inventory Management Feature
|
|
|
|
**Integration Points:**
|
|
|
|
1. **Database Schema:**
|
|
- Extend `components` table with inventory fields:
|
|
```sql
|
|
ALTER TABLE components ADD COLUMN (
|
|
quantity_available INTEGER DEFAULT 0,
|
|
reorder_level INTEGER,
|
|
supplier_info TEXT, -- JSON with supplier contacts
|
|
last_purchased_date INTEGER,
|
|
purchase_cost REAL,
|
|
location_storage TEXT
|
|
);
|
|
```
|
|
- Create `inventory_transactions` table for audit trail
|
|
|
|
2. **API Endpoints:**
|
|
- `POST /api/inventory/items` - Create inventory item (link to component)
|
|
- `GET /api/inventory/items` - List inventory with filters
|
|
- `PUT /api/inventory/items/:id` - Update quantity/location
|
|
- `POST /api/inventory/items/:id/transactions` - Record transaction (purchase, use, transfer)
|
|
- `GET /api/inventory/alerts` - Get low-stock alerts
|
|
|
|
3. **Service Layer:**
|
|
- Create `server/services/inventory.service.js`:
|
|
- `createInventoryItem(componentId, quantity, reorderLevel, supplier)`
|
|
- `updateInventoryQuantity(itemId, change, reason, userId)`
|
|
- `getInventoryAlerts(organizationId)`
|
|
- `calculateReorderPoints()`
|
|
|
|
4. **Route File:**
|
|
- Create `server/routes/inventory.routes.js`
|
|
- Add to `server/index.js`: `app.use('/api/inventory', inventoryRoutes);`
|
|
|
|
5. **BullMQ Job (Optional):**
|
|
- Create background job for inventory replenishment alerts
|
|
- Queue in `server/workers/inventory-alerts.js`
|
|
|
|
### Maintenance Tracking Feature
|
|
|
|
**Integration Points:**
|
|
|
|
1. **Database Schema:**
|
|
- Extend `components` table:
|
|
```sql
|
|
ALTER TABLE components ADD COLUMN (
|
|
maintenance_interval_days INTEGER,
|
|
last_maintenance_date INTEGER,
|
|
next_maintenance_date INTEGER
|
|
);
|
|
```
|
|
- Create `maintenance_logs` table:
|
|
```sql
|
|
CREATE TABLE maintenance_logs (
|
|
id TEXT PRIMARY KEY,
|
|
component_id FK,
|
|
entity_id FK,
|
|
performed_by FK,
|
|
maintenance_type TEXT (inspection, service, repair, replacement),
|
|
description TEXT,
|
|
cost REAL,
|
|
duration_hours REAL,
|
|
next_scheduled_date INTEGER,
|
|
document_id FK (reference manual),
|
|
created_at INTEGER
|
|
);
|
|
```
|
|
|
|
2. **API Endpoints:**
|
|
- `POST /api/maintenance/logs` - Log maintenance event
|
|
- `GET /api/maintenance/logs` - List maintenance history
|
|
- `GET /api/maintenance/schedule` - Get upcoming maintenance
|
|
- `PUT /api/maintenance/logs/:id` - Update log
|
|
- `DELETE /api/maintenance/logs/:id` - Remove log
|
|
|
|
3. **Service Layer:**
|
|
- Create `server/services/maintenance.service.js`:
|
|
- `logMaintenance(componentId, type, description, performedBy)`
|
|
- `getMaintenanceHistory(componentId, limit)`
|
|
- `getUpcomingMaintenance(organizationId)`
|
|
- `calculateNextMaintenanceDate(componentId)`
|
|
|
|
4. **Route File:**
|
|
- Create `server/routes/maintenance.routes.js`
|
|
- Add to `server/index.js`: `app.use('/api/maintenance', maintenanceRoutes);`
|
|
|
|
5. **Background Job:**
|
|
- Create `server/workers/maintenance-reminders.js`
|
|
- BullMQ cron job to check and send alerts
|
|
|
|
6. **Search Integration:**
|
|
- Index maintenance logs in Meilisearch for searchability
|
|
|
|
### Camera/Document Capture Feature
|
|
|
|
**Integration Points:**
|
|
|
|
1. **Database Schema:**
|
|
- Extend `documents` table:
|
|
```sql
|
|
ALTER TABLE documents ADD COLUMN (
|
|
capture_method TEXT (upload, camera, screenshot, scan),
|
|
camera_device_info TEXT, -- JSON with device metadata
|
|
capture_timestamp INTEGER
|
|
);
|
|
```
|
|
- Create `camera_sessions` table:
|
|
```sql
|
|
CREATE TABLE camera_sessions (
|
|
id TEXT PRIMARY KEY,
|
|
user_id FK,
|
|
organization_id FK,
|
|
device_info TEXT, -- JSON
|
|
started_at INTEGER,
|
|
ended_at INTEGER,
|
|
capture_count INTEGER
|
|
);
|
|
```
|
|
|
|
2. **API Endpoints:**
|
|
- `POST /api/capture/camera-session` - Start camera session
|
|
- `POST /api/capture/upload-frame` - Upload single camera frame
|
|
- `GET /api/capture/sessions` - List capture sessions
|
|
- `POST /api/capture/batch-process` - Process batch of frames as single document
|
|
|
|
3. **Service Layer:**
|
|
- Create `server/services/capture.service.js`:
|
|
- `createCameraSession(userId, organizationId, deviceInfo)`
|
|
- `uploadCaptureFrame(sessionId, imageBuffer, frameNumber)`
|
|
- `processCaptureSession(sessionId)` - Convert frames to PDF
|
|
- `getSessionCaptures(sessionId)`
|
|
|
|
4. **Route File:**
|
|
- Create `server/routes/capture.routes.js`
|
|
- Add to `server/index.js`: `app.use('/api/capture', captureRoutes);`
|
|
|
|
5. **Background Job:**
|
|
- Extend OCR worker to handle batch-captured images
|
|
- Create `server/workers/batch-processor.js` for frame-to-PDF conversion
|
|
|
|
6. **Client Integration:**
|
|
- Camera API integration in Vue 3 frontend
|
|
- WebRTC support for real-time preview
|
|
|
|
### New Feature Route Registration Pattern
|
|
|
|
**Standard Integration Checklist:**
|
|
|
|
```javascript
|
|
// 1. Create service file: server/services/[feature].service.js
|
|
// 2. Create route file: server/routes/[feature].routes.js
|
|
// 3. Add to server/index.js:
|
|
import [feature]Routes from './routes/[feature].routes.js';
|
|
app.use('/api/[feature]', [feature]Routes);
|
|
|
|
// 4. If background job needed:
|
|
// - Create server/workers/[feature]-worker.js
|
|
// - Extend queue.service.js with get[Feature]Queue()
|
|
|
|
// 5. If search needed:
|
|
// - Index documents via Meilisearch client in service layer
|
|
|
|
// 6. Database schema changes:
|
|
// - Add migration file or update schema.sql comments
|
|
// - Test with db/init.js
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Tech Stack Validation
|
|
|
|
### Backend Stack
|
|
|
|
| Technology | Version | Purpose | Status |
|
|
|-----------|---------|---------|--------|
|
|
| **Node.js** | 18+ | Runtime | Running |
|
|
| **Express.js** | ^5.0.0 | Web framework | Active |
|
|
| **SQLite (better-sqlite3)** | ^11.0.0 | Database | Active |
|
|
| **PostgreSQL** | - | Planned migration target | Not yet |
|
|
| **Redis (ioredis)** | ^5.0.0 | Queue backend | Required |
|
|
| **BullMQ** | ^5.0.0 | Job queue | Active |
|
|
| **JWT (jsonwebtoken)** | ^9.0.2 | Authentication | Active |
|
|
| **Bcryptjs** | ^3.0.2 | Password hashing | Active |
|
|
| **Meilisearch** | ^0.41.0 | Full-text search | Active |
|
|
| **Tesseract.js** | ^5.0.0 | OCR engine | Active |
|
|
| **PDF processing** | - | - | - |
|
|
| ├─ pdf-parse | ^1.1.1 | PDF parsing | Active |
|
|
| ├─ pdf-img-convert | ^2.0.0 | PDF to image | Active |
|
|
| ├─ pdfjs-dist | ^4.0.0 | PDF viewer lib | Client |
|
|
| **Image processing** | - | - | - |
|
|
| ├─ sharp | ^0.34.4 | Image optimization | Active |
|
|
| **Multer** | ^1.4.5-lts.1 | File upload | Active |
|
|
| **file-type** | ^19.0.0 | File validation | Active |
|
|
| **Helmet** | ^7.0.0 | Security headers | Active |
|
|
| **CORS** | ^2.8.5 | Cross-origin | Active |
|
|
| **Rate-limit** | ^7.0.0 | Request limiting | Active |
|
|
| **LRU-Cache** | ^11.2.2 | TOC caching | Active |
|
|
| **UUID** | ^10.0.0 | ID generation | Active |
|
|
| **dotenv** | ^16.0.0 | Config management | Active |
|
|
|
|
### Frontend Stack
|
|
|
|
| Technology | Version | Purpose | Status |
|
|
|-----------|---------|---------|--------|
|
|
| **Vue.js** | ^3.5.0 | UI framework | Active |
|
|
| **Vue Router** | ^4.4.0 | Client routing | Active |
|
|
| **Pinia** | ^2.2.0 | State management | Active |
|
|
| **Vue i18n** | ^9.14.5 | Internationalization | Active |
|
|
| **Vite** | ^5.0.0 | Build tool | Active |
|
|
| **Tailwind CSS** | ^3.4.0 | Styling | Active |
|
|
| **PostCSS** | ^8.4.0 | CSS processing | Active |
|
|
| **Meilisearch SDK** | ^0.41.0 | Client search | Active |
|
|
| **PDF.js** | ^4.0.0 | PDF viewer | Active |
|
|
| **Playwright** | ^1.40.0 | Testing | Dev |
|
|
|
|
### Infrastructure Requirements
|
|
|
|
| Service | Configuration | Purpose |
|
|
|---------|--------------|---------|
|
|
| **Database** | SQLite file (or PostgreSQL) | Primary data store |
|
|
| **Redis** | `REDIS_HOST` (default 127.0.0.1:6379) | BullMQ backend |
|
|
| **Meilisearch** | `MEILISEARCH_HOST` (default http://127.0.0.1:7700) | Search service |
|
|
| **File Storage** | `/uploads` directory | PDF and image storage |
|
|
|
|
### Environment Variables (Key)
|
|
|
|
```
|
|
# Server
|
|
PORT=3001
|
|
NODE_ENV=development
|
|
ALLOWED_ORIGINS=http://localhost:5173
|
|
|
|
# Database
|
|
DATABASE_PATH=./navidocs.db
|
|
|
|
# Redis
|
|
REDIS_HOST=127.0.0.1
|
|
REDIS_PORT=6379
|
|
|
|
# Meilisearch
|
|
MEILISEARCH_HOST=http://127.0.0.1:7700
|
|
MEILISEARCH_MASTER_KEY=<key>
|
|
MEILISEARCH_SEARCH_KEY=<key>
|
|
MEILISEARCH_INDEX_NAME=navidocs-pages
|
|
|
|
# JWT
|
|
JWT_SECRET=your-secret-key-change-in-production
|
|
JWT_EXPIRES_IN=15m
|
|
|
|
# File Upload
|
|
UPLOAD_DIR=./uploads
|
|
MAX_FILE_SIZE=52428800 # 50MB
|
|
|
|
# OCR
|
|
OCR_CONCURRENCY=2
|
|
|
|
# Rate Limiting
|
|
RATE_LIMIT_WINDOW_MS=900000 # 15 minutes
|
|
RATE_LIMIT_MAX_REQUESTS=100
|
|
IMAGE_RATE_LIMIT_MAX_REQUESTS=200
|
|
```
|
|
|
|
### Validation Summary
|
|
|
|
**Confirmed Technologies:**
|
|
- Vue 3: ✓ Installed (^3.5.0)
|
|
- Express.js: ✓ Installed (^5.0.0)
|
|
- SQLite: ✓ Installed via better-sqlite3 (^11.0.0)
|
|
- Redis: ✓ Installed via ioredis (^5.0.0)
|
|
- Meilisearch: ✓ Installed (^0.41.0)
|
|
- Tesseract: ✓ Installed via tesseract.js (^5.0.0)
|
|
|
|
**Status:** All core tech stack components present and correctly configured.
|
|
|
|
---
|
|
|
|
## 7. Architecture Diagram (Text-based)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ CLIENT LAYER (Vue 3) │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ • Vue Router (SPA navigation) │
|
|
│ • Pinia (state management) │
|
|
│ • Meilisearch Client SDK (full-text search UI) │
|
|
│ • PDF.js (document viewer) │
|
|
│ • Tailwind CSS (styling) │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
↓ HTTP/REST
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ EXPRESS.JS API LAYER │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ Routes: /api/auth, /api/documents, /api/search, /api/upload, │
|
|
│ /api/organizations, /api/jobs, /api/maintenance, etc │
|
|
│ │
|
|
│ Middleware: Authentication (JWT), Authorization, Rate Limiting │
|
|
│ Request Logging, Security Headers (Helmet) │
|
|
│ │
|
|
│ Response: JSON (documents, images, search results) │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
↓ ↓ ↓
|
|
┌─────────────────────────────────────────────────┐
|
|
│ SERVICE LAYER (Business Logic) │
|
|
├─────────────────────────────────────────────────┤
|
|
│ • auth.service.js - JWT, password hashing │
|
|
│ • authorization.service.js - Permission checks │
|
|
│ • search.js - Meilisearch indexing │
|
|
│ • queue.js - BullMQ job management │
|
|
│ • ocr-hybrid.js - PDF text extraction │
|
|
│ • inventory.service.js - (new feature) │
|
|
│ • maintenance.service.js - (new feature) │
|
|
│ • capture.service.js - (new feature) │
|
|
└─────────────────────────────────────────────────┘
|
|
↓ ↓ ↓
|
|
┌────────────────────┐ ┌──────────────────────┐ ┌─────────────────┐
|
|
│ SQLite DB │ │ Redis Queue │ │ Meilisearch │
|
|
├────────────────────┤ ├──────────────────────┤ ├─────────────────┤
|
|
│ • users │ │ ocr-processing queue │ │ Full-text index │
|
|
│ • organizations │ │ job data + status │ │ Page documents │
|
|
│ • documents │ │ (in-memory) │ │ Image text │
|
|
│ • entities │ │ │ │ │
|
|
│ • components │ │ │ │ │
|
|
│ • permissions │ │ │ │ │
|
|
│ • maintenance_logs │ │ │ │ │
|
|
│ • inventory_items │ │ │ │ │
|
|
└────────────────────┘ └──────────────────────┘ └─────────────────┘
|
|
↓
|
|
┌──────────────────────┐
|
|
│ Background Workers │
|
|
├──────────────────────┤
|
|
│ • ocr-worker.js │
|
|
│ - PDF → text │
|
|
│ - Tesseract.js OCR │
|
|
│ - Index to MS │
|
|
│ - Extract images │
|
|
│ - Extract TOC │
|
|
│ │
|
|
│ • inventory-alerts │
|
|
│ • maintenance-reminders
|
|
│ • batch-processor │
|
|
└──────────────────────┘
|
|
↓
|
|
┌──────────────────────┐
|
|
│ File System │
|
|
├──────────────────────┤
|
|
│ /uploads/ │
|
|
│ • PDF documents │
|
|
│ • Extracted images │
|
|
│ • Temporary files │
|
|
└──────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Data Flow Examples
|
|
|
|
### Document Upload & OCR Processing Flow
|
|
|
|
```
|
|
1. User uploads PDF via POST /api/upload
|
|
├─ Multer stores file in memory
|
|
├─ File validation (size, type)
|
|
├─ SHA256 hash for deduplication
|
|
├─ File saved to disk (/uploads/:docId.pdf)
|
|
├─ Document record created (status: processing)
|
|
├─ ocr_job record created (status: pending)
|
|
└─ Response: { jobId, documentId }
|
|
|
|
2. API queues OCR job via queue.service.addOcrJob()
|
|
└─ BullMQ adds to Redis 'ocr-processing' queue
|
|
|
|
3. OCR Worker picks up job
|
|
├─ extractTextFromPDF() using pdf-parse + Tesseract.js
|
|
├─ Per page:
|
|
│ ├─ cleanOCRText()
|
|
│ ├─ Insert document_page record
|
|
│ ├─ Index in Meilisearch
|
|
│ ├─ extractImagesFromPage()
|
|
│ │ ├─ Convert page to image
|
|
│ │ ├─ Extract embedded images
|
|
│ │ └─ Run OCR on each image
|
|
│ └─ Store image metadata
|
|
├─ extractSections() for TOC
|
|
├─ Update document status: indexed
|
|
└─ Update ocr_job: completed
|
|
|
|
4. User polls GET /api/jobs/:jobId
|
|
├─ Checks database ocr_jobs record
|
|
└─ Response: { status, progress, documentId }
|
|
|
|
5. Document now searchable
|
|
├─ GET /api/search/token → Meilisearch auth
|
|
├─ POST /api/search → Full-text search results
|
|
└─ GET /api/documents/:id → Page list with OCR
|
|
```
|
|
|
|
### Search & Document Retrieval Flow
|
|
|
|
```
|
|
1. User requests search token
|
|
POST /api/search/token
|
|
├─ Verifies user's organizations
|
|
├─ Generates Meilisearch tenant token (org-scoped)
|
|
└─ Response: { token, expiresAt, searchUrl }
|
|
|
|
2. Client calls Meilisearch directly with token
|
|
├─ Client library: meilisearch.index().search(q)
|
|
└─ Results filtered by organization
|
|
|
|
3. User clicks document result
|
|
GET /api/documents/:id
|
|
├─ Verify ownership/access
|
|
├─ Fetch document + pages + entity/component
|
|
└─ Response: Full metadata + page list
|
|
|
|
4. User views PDF
|
|
GET /api/documents/:id/pdf
|
|
├─ Verify access
|
|
├─ Stream file from /uploads/:id.pdf
|
|
└─ Response: PDF stream
|
|
|
|
5. User views document images
|
|
GET /api/documents/:id/images
|
|
├─ Query document_images table
|
|
└─ Response: Image metadata + URLs
|
|
|
|
6. Client fetches image
|
|
GET /api/images/:imageId
|
|
├─ Verify access
|
|
├─ Rate limit (200/min)
|
|
├─ Path traversal check
|
|
└─ Stream: /uploads/:docId/image_*.png
|
|
```
|
|
|
|
### Permission & Sharing Flow
|
|
|
|
```
|
|
1. Document Owner Shares Document
|
|
POST /api/documents/:id/share
|
|
├─ Create document_shares record
|
|
├─ Audit log: document.share event
|
|
└─ Response: { success, sharedWith }
|
|
|
|
2. Recipient Accesses Document
|
|
GET /api/documents/:id
|
|
├─ Check access via:
|
|
│ ├─ user_organizations (org membership)
|
|
│ ├─ documents.uploaded_by (owner)
|
|
│ └─ document_shares (shared with)
|
|
├─ Grant read/write permission
|
|
└─ Return document + pages
|
|
|
|
3. Manager Grants Entity Permission
|
|
POST /api/permissions/grant
|
|
├─ Create entity_permissions record
|
|
├─ Set permission_level (viewer|editor|manager|admin)
|
|
├─ Optional expiration
|
|
├─ Audit log
|
|
└─ Response: Permission ID
|
|
|
|
4. Check Permission
|
|
checkEntityPermission(userId, entityId, minimumLevel)
|
|
├─ Query entity_permissions table
|
|
├─ Verify expiration
|
|
├─ Check permission hierarchy
|
|
└─ Return: { hasPermission, level }
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Security Implementation
|
|
|
|
### Authentication & Authorization
|
|
|
|
**JWT Strategy:**
|
|
- Access Token: 15 minutes (short-lived)
|
|
- Refresh Token: 7 days (stored in DB with hash)
|
|
- Tokens revoked on password reset
|
|
- Account lockout: 15 min after 5 failed attempts
|
|
|
|
**Password Security:**
|
|
- Bcrypt with 12 rounds
|
|
- Minimum 8 characters
|
|
- Hashing on register and reset
|
|
|
|
**Session Management:**
|
|
- Refresh tokens tracked in database
|
|
- Device info and IP logging
|
|
- Logout-all support
|
|
|
|
**Role-Based Access Control (RBAC):**
|
|
```
|
|
Organization Roles:
|
|
• viewer: Read-only access
|
|
• member: Can upload documents
|
|
• manager: Can add members, update org
|
|
• admin: Full org control + deletion
|
|
|
|
Entity Permissions:
|
|
• viewer: Read-only
|
|
• editor: Can modify/share
|
|
• manager: All + member management
|
|
• admin: Full control
|
|
|
|
Default Flow:
|
|
User → Organization (role) → Entities (permissions)
|
|
```
|
|
|
|
### API Security
|
|
|
|
**Middleware Stack:**
|
|
1. **Helmet**: Security headers (CSP, X-Frame-Options, etc)
|
|
2. **CORS**: Whitelisted origins (production)
|
|
3. **Rate Limiting**: 100 req/15min per IP (configurable)
|
|
4. **Authentication**: JWT verification on protected routes
|
|
5. **Authorization**: Role/permission checks in handlers
|
|
6. **Input Validation**: UUID format, file type, size limits
|
|
7. **Path Traversal Prevention**: Normalized path checks for file serving
|
|
|
|
**File Upload Security:**
|
|
- Multer memory storage (prevents direct disk write)
|
|
- File type validation via file-type library
|
|
- Size limit: 50MB (configurable)
|
|
- SHA256 hash for deduplication
|
|
- Filename sanitization (remove dangerous chars)
|
|
|
|
### Data Protection
|
|
|
|
**In Transit:**
|
|
- HTTPS enforced (production)
|
|
- TLS/SSL certificates
|
|
- Secure cookies for JWT
|
|
|
|
**At Rest:**
|
|
- SQLite encryption (optional setup)
|
|
- Bcrypt password hashing
|
|
- No plaintext credentials in code
|
|
|
|
**Audit Trail:**
|
|
- All permission changes logged
|
|
- User actions tracked (audit_events)
|
|
- Login/logout recorded
|
|
|
|
---
|
|
|
|
## 10. Performance Considerations
|
|
|
|
### Database Optimization
|
|
- Indexes on common query columns (org, entity, status, hash)
|
|
- Prepared statements via better-sqlite3
|
|
- Connection pooling (single connection in current setup)
|
|
|
|
### Search Optimization
|
|
- Meilisearch for full-text indexing (not SQLite FTS)
|
|
- Async indexing in OCR worker
|
|
- Tenant tokens for client-side search
|
|
- 30-min LRU cache for TOC queries
|
|
|
|
### OCR Processing
|
|
- Concurrency: 2 documents (configurable via OCR_CONCURRENCY)
|
|
- Limiter: 5 jobs/minute (prevents Tesseract overload)
|
|
- Progress tracking (0-100%)
|
|
- Batch image processing
|
|
|
|
### Memory Management
|
|
- Streaming responses for large PDFs
|
|
- Image compression via sharp
|
|
- LRU cache cleanup (30 min TTL)
|
|
- Job cleanup: Complete (24h), Failed (7 days)
|
|
|
|
### Scalability Bottlenecks
|
|
- **Single SQLite connection**: Switch to PostgreSQL for concurrent writes
|
|
- **Local file storage**: Switch to S3/cloud storage
|
|
- **Tesseract CPU usage**: Distribute workers across machines
|
|
- **Meilisearch scale**: Deploy cluster for high traffic
|
|
|
|
---
|
|
|
|
## 11. Known Issues & TODOs
|
|
|
|
### Authentication
|
|
- [ ] Authentication middleware incomplete (req.user often hardcoded as 'test-user-id')
|
|
- [ ] Email verification not sent (template needed)
|
|
- [ ] Password reset email not sent (template needed)
|
|
|
|
### Authorization
|
|
- [ ] Some endpoints missing auth checks
|
|
- [ ] Entity-level permissions not fully integrated
|
|
- [ ] Document-level permissions incomplete
|
|
|
|
### Database
|
|
- [ ] Password reset tokens table missing from schema
|
|
- [ ] Refresh tokens table missing from schema
|
|
- [ ] Audit events table not defined
|
|
- [ ] Document images table not in schema.sql
|
|
- [ ] Document metadata handling inconsistent
|
|
|
|
### OCR Worker
|
|
- [ ] Image extraction may fail silently
|
|
- [ ] Section extraction error handling needs improvement
|
|
- [ ] TOC extraction timing makes it optional (should be robust)
|
|
|
|
### Frontend
|
|
- [ ] Client-side image upload/capture not implemented
|
|
- [ ] Multilingual search needs testing
|
|
- [ ] Rate limiting feedback incomplete
|
|
|
|
---
|
|
|
|
## 12. Integration Roadmap for New Features
|
|
|
|
### Phase 1: Inventory Management
|
|
**Dependencies:**
|
|
- Components schema (exists)
|
|
- Basic CRUD API patterns (exist)
|
|
- Database migrations (setup required)
|
|
|
|
**Estimated effort:** 3-4 days
|
|
**New files:** 3 (service, routes, worker)
|
|
**Database changes:** +2 tables
|
|
|
|
### Phase 2: Maintenance Tracking
|
|
**Dependencies:**
|
|
- Inventory feature (Phase 1)
|
|
- Meilisearch indexing (exists)
|
|
- Audit logging (partial)
|
|
|
|
**Estimated effort:** 2-3 days
|
|
**New files:** 3 (service, routes, worker)
|
|
**Database changes:** +1 table
|
|
|
|
### Phase 3: Camera/Capture Feature
|
|
**Dependencies:**
|
|
- Upload API (exists)
|
|
- PDF processing (exists)
|
|
- WebRTC/Camera API (client)
|
|
|
|
**Estimated effort:** 4-5 days
|
|
**New files:** 4 (service, routes, worker, batch-processor)
|
|
**Database changes:** +2 tables
|
|
|
|
### Phase 4: Enhanced Search & Analytics
|
|
**Dependencies:**
|
|
- Meilisearch integration (exists)
|
|
- Audit trail (Phase 2+)
|
|
- Statistics API (exists)
|
|
|
|
**Estimated effort:** 2-3 days
|
|
**New files:** 2 (service, routes)
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The NaviDocs codebase is well-structured with clear separation of concerns:
|
|
- **Database**: Comprehensive schema supporting multi-entity, multi-tenant architecture
|
|
- **API**: RESTful endpoints organized by feature with consistent patterns
|
|
- **Services**: Business logic isolated from routes with dependency injection
|
|
- **Workers**: Background OCR processing via BullMQ + Redis
|
|
- **Frontend**: Vue 3 SPA with Meilisearch client-side search
|
|
|
|
**Ready for integration of:**
|
|
- Inventory management
|
|
- Maintenance tracking
|
|
- Camera/document capture
|
|
- Enhanced analytics
|
|
|
|
All integration points identified and documented above.
|