NaviDocs OCR Pipeline - Quick Start
1. Install Dependencies
# System dependencies
sudo apt-get install -y poppler-utils imagemagick tesseract-ocr tesseract-ocr-eng
# Node dependencies (already in package.json)
cd server && npm install
2. Start Services
# Redis
docker run -d -p 6379:6379 --name navidocs-redis redis:alpine
# Meilisearch
docker run -d -p 7700:7700 --name navidocs-meilisearch \
-e MEILI_MASTER_KEY=masterKey \
getmeili/meilisearch:latest
3. Configure Environment
cd server
cat > .env << EOF
DATABASE_PATH=./db/navidocs.db
REDIS_HOST=127.0.0.1
REDIS_PORT=6379
MEILISEARCH_HOST=http://127.0.0.1:7700
MEILISEARCH_MASTER_KEY=masterKey
OCR_CONCURRENCY=2
EOF
4. Initialize Database
node db/init.js
5. Start OCR Worker
# Terminal 1: Start worker
node workers/ocr-worker.js
# Terminal 2: Start API server
npm start
6. Test the Pipeline
# Verify setup
node scripts/test-ocr.js
# Run examples
node examples/ocr-integration.js
Usage Example
import { v4 as uuidv4 } from 'uuid';
import { addOcrJob } from './services/queue.js';
import { getDb } from './config/db.js';
// Create document
const documentId = uuidv4();
const jobId = uuidv4();
const db = getDb();
db.prepare(`
INSERT INTO documents (id, organization_id, uploaded_by, title, file_path, status, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, 'processing', ?, ?)
`).run(documentId, 'org123', 'user456', 'Boat Manual', '/uploads/manual.pdf', Date.now()/1000, Date.now()/1000);
// Create OCR job
db.prepare(`
INSERT INTO ocr_jobs (id, document_id, status, created_at)
VALUES (?, ?, 'pending', ?)
`).run(jobId, documentId, Date.now()/1000);
// Queue for processing
await addOcrJob(documentId, jobId, { filePath: '/uploads/manual.pdf' });
// Monitor progress
setInterval(() => {
const job = db.prepare('SELECT status, progress FROM ocr_jobs WHERE id = ?').get(jobId);
console.log(`${job.status}: ${job.progress}%`);
}, 2000);
Search Example
import { searchPages } from './services/search.js';
const results = await searchPages('bilge pump maintenance', {
filter: `userId = "user123"`,
limit: 10
});
results.hits.forEach(hit => {
console.log(`Page ${hit.pageNumber}: ${hit.title}`);
console.log(`Confidence: ${(hit.ocrConfidence * 100).toFixed(0)}%`);
});
File Locations
| File |
Purpose |
/home/setup/navidocs/server/services/ocr.js |
OCR text extraction |
/home/setup/navidocs/server/services/search.js |
Meilisearch indexing |
/home/setup/navidocs/server/workers/ocr-worker.js |
Background processor |
/home/setup/navidocs/OCR_PIPELINE_SETUP.md |
Complete documentation |
Troubleshooting
| Problem |
Solution |
| PDF conversion fails |
Install: sudo apt-get install poppler-utils |
| Redis connection error |
Start: docker run -d -p 6379:6379 redis:alpine |
| Meilisearch not found |
Start: docker run -d -p 7700:7700 getmeili/meilisearch |
| Worker not processing |
Check: pm2 logs ocr-worker |
Next Steps
- Read full documentation:
OCR_PIPELINE_SETUP.md
- Review examples:
server/examples/ocr-integration.js
- Check service docs:
server/services/README.md
- Review worker docs:
server/workers/README.md