## Backend (server/) - Express 5 API with security middleware (helmet, rate limiting) - SQLite database with WAL mode (schema from docs/architecture/) - Meilisearch integration with tenant tokens - BullMQ + Redis background job queue - OCR pipeline with Tesseract.js - File safety validation (extension, MIME, size) - 4 API route modules: upload, jobs, search, documents ## Frontend (client/) - Vue 3 with Composition API (<script setup>) - Vite 5 build system with HMR - Tailwind CSS (Meilisearch-inspired design) - UploadModal with drag-and-drop - FigureZoom component (ported from lilian1) - Meilisearch search integration with tenant tokens - Job polling composable - Clean SVG icons (no emojis) ## Code Extraction - ✅ manuals.js → UploadModal.vue, useJobPolling.js - ✅ figure-zoom.js → FigureZoom.vue - ✅ service-worker.js → client/public/service-worker.js (TODO) - ✅ glossary.json → Merged into Meilisearch synonyms - ❌ Discarded: quiz.js, persona.js, gamification.js (Frank-AI junk) ## Documentation - Complete extraction plan in docs/analysis/ - README with quick start guide - Architecture summary in docs/architecture/ ## Build Status - Server dependencies: ✅ Installed (234 packages) - Client dependencies: ✅ Installed (160 packages) - Client build: ✅ Successful (2.63s) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
137 lines
3.3 KiB
Markdown
137 lines
3.3 KiB
Markdown
# NaviDocs OCR Pipeline - Quick Start
|
|
|
|
## 1. Install Dependencies
|
|
|
|
```bash
|
|
# System dependencies
|
|
sudo apt-get install -y poppler-utils imagemagick tesseract-ocr tesseract-ocr-eng
|
|
|
|
# Node dependencies (already in package.json)
|
|
cd server && npm install
|
|
```
|
|
|
|
## 2. Start Services
|
|
|
|
```bash
|
|
# Redis
|
|
docker run -d -p 6379:6379 --name navidocs-redis redis:alpine
|
|
|
|
# Meilisearch
|
|
docker run -d -p 7700:7700 --name navidocs-meilisearch \
|
|
-e MEILI_MASTER_KEY=masterKey \
|
|
getmeili/meilisearch:latest
|
|
```
|
|
|
|
## 3. Configure Environment
|
|
|
|
```bash
|
|
cd server
|
|
cat > .env << EOF
|
|
DATABASE_PATH=./db/navidocs.db
|
|
REDIS_HOST=127.0.0.1
|
|
REDIS_PORT=6379
|
|
MEILISEARCH_HOST=http://127.0.0.1:7700
|
|
MEILISEARCH_MASTER_KEY=masterKey
|
|
OCR_CONCURRENCY=2
|
|
EOF
|
|
```
|
|
|
|
## 4. Initialize Database
|
|
|
|
```bash
|
|
node db/init.js
|
|
```
|
|
|
|
## 5. Start OCR Worker
|
|
|
|
```bash
|
|
# Terminal 1: Start worker
|
|
node workers/ocr-worker.js
|
|
|
|
# Terminal 2: Start API server
|
|
npm start
|
|
```
|
|
|
|
## 6. Test the Pipeline
|
|
|
|
```bash
|
|
# Verify setup
|
|
node scripts/test-ocr.js
|
|
|
|
# Run examples
|
|
node examples/ocr-integration.js
|
|
```
|
|
|
|
## Usage Example
|
|
|
|
```javascript
|
|
import { v4 as uuidv4 } from 'uuid';
|
|
import { addOcrJob } from './services/queue.js';
|
|
import { getDb } from './config/db.js';
|
|
|
|
// Create document
|
|
const documentId = uuidv4();
|
|
const jobId = uuidv4();
|
|
const db = getDb();
|
|
|
|
db.prepare(`
|
|
INSERT INTO documents (id, organization_id, uploaded_by, title, file_path, status, created_at, updated_at)
|
|
VALUES (?, ?, ?, ?, ?, 'processing', ?, ?)
|
|
`).run(documentId, 'org123', 'user456', 'Boat Manual', '/uploads/manual.pdf', Date.now()/1000, Date.now()/1000);
|
|
|
|
// Create OCR job
|
|
db.prepare(`
|
|
INSERT INTO ocr_jobs (id, document_id, status, created_at)
|
|
VALUES (?, ?, 'pending', ?)
|
|
`).run(jobId, documentId, Date.now()/1000);
|
|
|
|
// Queue for processing
|
|
await addOcrJob(documentId, jobId, { filePath: '/uploads/manual.pdf' });
|
|
|
|
// Monitor progress
|
|
setInterval(() => {
|
|
const job = db.prepare('SELECT status, progress FROM ocr_jobs WHERE id = ?').get(jobId);
|
|
console.log(`${job.status}: ${job.progress}%`);
|
|
}, 2000);
|
|
```
|
|
|
|
## Search Example
|
|
|
|
```javascript
|
|
import { searchPages } from './services/search.js';
|
|
|
|
const results = await searchPages('bilge pump maintenance', {
|
|
filter: `userId = "user123"`,
|
|
limit: 10
|
|
});
|
|
|
|
results.hits.forEach(hit => {
|
|
console.log(`Page ${hit.pageNumber}: ${hit.title}`);
|
|
console.log(`Confidence: ${(hit.ocrConfidence * 100).toFixed(0)}%`);
|
|
});
|
|
```
|
|
|
|
## File Locations
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/home/setup/navidocs/server/services/ocr.js` | OCR text extraction |
|
|
| `/home/setup/navidocs/server/services/search.js` | Meilisearch indexing |
|
|
| `/home/setup/navidocs/server/workers/ocr-worker.js` | Background processor |
|
|
| `/home/setup/navidocs/OCR_PIPELINE_SETUP.md` | Complete documentation |
|
|
|
|
## Troubleshooting
|
|
|
|
| Problem | Solution |
|
|
|---------|----------|
|
|
| PDF conversion fails | Install: `sudo apt-get install poppler-utils` |
|
|
| Redis connection error | Start: `docker run -d -p 6379:6379 redis:alpine` |
|
|
| Meilisearch not found | Start: `docker run -d -p 7700:7700 getmeili/meilisearch` |
|
|
| Worker not processing | Check: `pm2 logs ocr-worker` |
|
|
|
|
## Next Steps
|
|
|
|
1. Read full documentation: `OCR_PIPELINE_SETUP.md`
|
|
2. Review examples: `server/examples/ocr-integration.js`
|
|
3. Check service docs: `server/services/README.md`
|
|
4. Review worker docs: `server/workers/README.md`
|