# NaviDocs Developer Guide **Version:** 1.0 **Tech Stack:** Node.js + Express + Vue 3 + SQLite + Meilisearch --- ## Architecture ### Backend (Express.js) ``` server/ ├── index.js # Main server ├── config/ │ └── db.js # SQLite connection ├── routes/ │ ├── upload.js # File upload API │ ├── search.js # Search API │ ├── timeline.js # Timeline API │ └── auth.routes.js # Authentication ├── services/ │ ├── ocr.js # OCR processing │ ├── pdf-text-extractor.js # Native PDF text extraction │ ├── document-processor.js # Multi-format routing │ ├── activity-logger.js # Timeline logging │ └── file-safety.js # File validation ├── workers/ │ └── ocr-worker.js # Background OCR jobs └── migrations/ └── 010_activity_timeline.sql ``` ### Frontend (Vue 3 + Vite) ``` client/src/ ├── views/ │ ├── HomeView.vue │ ├── SearchView.vue │ ├── Timeline.vue │ └── DocumentView.vue ├── components/ │ ├── CompactNav.vue │ ├── UploadModal.vue │ └── TocSidebar.vue ├── router.js └── App.vue ``` --- ## Key Features ### 1. Smart OCR (Session 1) **Problem:** 100-page PDFs took 3+ minutes with Tesseract **Solution:** Hybrid approach - Extract native PDF text first (pdfjs-dist) - Only OCR pages with <50 characters - Performance: 180s → 5s (36x speedup) **Implementation:** ```javascript // server/services/pdf-text-extractor.js export async function extractNativeTextPerPage(pdfPath) { const data = new Uint8Array(readFileSync(pdfPath)); const pdf = await pdfjsLib.getDocument({ data }).promise; // Extract text from each page } // server/services/ocr.js if (await hasNativeText(pdfPath)) { // Use native text } else { // Fallback to OCR } ``` ### 2. Multi-Format Upload (Session 2) **Supported Formats:** - PDF: Native text + OCR fallback - Images: Tesseract OCR - Word (DOCX): Mammoth text extraction - Excel (XLSX): Sheet-to-CSV conversion - Text (TXT, MD): Direct read **Implementation:** ```javascript // server/services/document-processor.js export async function processDocument(filePath, options) { const category = getFileCategory(filePath); switch (category) { case 'pdf': return await extractTextFromPDF(filePath, options); case 'image': return await processImageFile(filePath, options); case 'word': return await processWordDocument(filePath, options); case 'excel': return await processExcelDocument(filePath, options); case 'text': return await processTextFile(filePath, options); } } ``` ### 3. Timeline Feature (Session 3) **Database Schema:** ```sql CREATE TABLE activity_log ( id TEXT PRIMARY KEY, organization_id TEXT NOT NULL, user_id TEXT NOT NULL, event_type TEXT NOT NULL, event_title TEXT NOT NULL, created_at INTEGER NOT NULL ); ``` **Auto-logging:** ```javascript // After successful upload await logActivity({ organizationId: orgId, userId: req.user.id, eventType: 'document_upload', eventTitle: title, referenceId: documentId, referenceType: 'document' }); ``` --- ## API Endpoints ### Authentication - `POST /api/auth/login` - User login - `POST /api/auth/register` - User registration - `GET /api/auth/me` - Get current user ### Documents - `POST /api/upload` - Upload document (multipart/form-data) - `GET /api/documents` - List documents - `GET /api/documents/:id` - Get document details - `DELETE /api/documents/:id` - Delete document ### Search - `POST /api/search` - Search documents (body: {q, limit, offset}) ### Timeline - `GET /api/organizations/:orgId/timeline` - Get activity timeline --- ## Environment Variables ```env NODE_ENV=production PORT=8001 DATABASE_PATH=./navidocs.db JWT_SECRET=[64-char hex] MEILISEARCH_HOST=http://localhost:7700 UPLOAD_DIR=./uploads MAX_FILE_SIZE=52428800 OCR_MIN_TEXT_THRESHOLD=50 ``` --- ## Development Setup ```bash # Clone repo git clone https://github.com/dannystocker/navidocs.git cd navidocs # Install dependencies cd server && npm install cd ../client && npm install # Create .env cp server/.env.example server/.env # Run migrations cd server && node scripts/run-migration.js db/schema.sql # Start services cd .. && ./start-all.sh # Backend: http://localhost:8001 # Frontend: http://localhost:8081 ``` --- ## Testing ### Manual Testing ```bash # Upload test curl -X POST http://localhost:8001/api/upload \ -H "Authorization: Bearer $TOKEN" \ -F "file=@test.pdf" # Search test curl -X POST http://localhost:8001/api/search \ -H "Authorization: Bearer $TOKEN" \ -d '{"q":"bilge"}' ``` ### E2E Testing ```bash cd client npm run test:e2e ``` --- ## Deployment ### Production Checklist - [ ] Update .env.production with secure secrets - [ ] Build frontend: `cd client && npm run build` - [ ] Run database migrations - [ ] Configure SSL certificate - [ ] Set up PM2 for process management - [ ] Configure Nginx reverse proxy - [ ] Set up daily backups (cron job) - [ ] Configure monitoring (PM2 logs) ### Deploy to StackCP ```bash ./deploy-stackcp.sh ``` --- ## Performance ### Benchmarks | Operation | Before | After | Improvement | |-----------|--------|-------|-------------| | Native PDF (100 pages) | 180s | 5s | 36x | | Image OCR | 3s | 3s | - | | Word doc upload | N/A | 0.8s | New | | Search query | <10ms | <10ms | - | ### Optimization Tips - Use smart OCR for PDFs - Index documents in background workers - Cache search results in Redis - Compress images before upload --- ## Troubleshooting ### OCR Worker Not Processing ```bash # Check worker status ps aux | grep ocr-worker # View logs tail -f /tmp/navidocs-ocr-worker.log # Restart worker pm2 restart navidocs-ocr-worker ``` ### Meilisearch Not Responding ```bash # Check status curl http://localhost:7700/health # Restart pm2 restart meilisearch ``` ### Database Locked ```bash # Check for zombie processes lsof | grep navidocs.db # Kill zombie process kill -9 [PID] ``` --- ## Contributing 1. Create feature branch: `git checkout -b feature/your-feature` 2. Make changes with tests 3. Commit: `git commit -m "[FEATURE] Your feature description"` 4. Push: `git push origin feature/your-feature` 5. Create Pull Request --- ## License Proprietary - All rights reserved --- **Questions? Contact the development team.**