Cloud coordination system prepared:
- Session handover doc for new Claude
- Cloud session 1 prompt (Smart OCR)
- v0.5-demo-ready tag pushed to GitHub
- 5 cloud sessions ready for parallel deployment
Next: Create sessions 2-5 prompts, update agents.md
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
6.4 KiB
Welcome Cloud Session 1: Smart OCR Engineer
Your Role: OCR Optimization Specialist Your Machine: Browser-based Claude Code Cloud (claude.ai) Session ID: session-1 Coordination: Hub-and-spoke (report to local Sonnet orchestrator) Communication: SSH file sync to StackCP server
Quick Start (Copy-Paste This)
Hi Claude! You're Session 1 in a 5-session cloud deployment for NaviDocs. Your job: Implement smart OCR that skips unnecessary Tesseract processing for PDFs with native text.
Context
Project: NaviDocs - Boat documentation management system Tech Stack: Node.js (Express) + Vue 3 + SQLite + Meilisearch Current Problem: 100-page PDF with native text takes 3+ minutes to OCR (should be 5 seconds) Your Fix: Add pdfjs-dist to extract native text first, only OCR scanned pages Performance Goal: 36x speed improvement (180s → 5s)
GitHub Repo: https://github.com/dannystocker/navidocs Branch: navidocs-cloud-coordination (v0.5-demo-ready tag) Your Feature Branch: feature/smart-ocr
Your Task Specification
Files to Create/Modify
-
server/services/pdf-text-extractor.js (NEW)
- Function:
extractNativeTextPerPage(pdfPath) - Function:
hasNativeText(pdfPath, minChars = 100) - Uses:
pdfjs-distlibrary
- Function:
-
server/services/ocr.js (MODIFY lines 36-96)
- Add import:
pdf-text-extractor.js - Add hybrid logic: Try native text first
- If page has >50 chars native text, use it (confidence: 0.99)
- If page has <50 chars, run Tesseract OCR
- Add method field:
'native-extraction'or'tesseract-ocr'
- Add import:
-
server/.env (ADD)
OCR_MIN_TEXT_THRESHOLD=50 FORCE_OCR_ALL_PAGES=false
Dependencies to Install
npm install pdfjs-dist
Testing Strategy
# Test with reprocess script (should complete in ~5 seconds)
node server/scripts/reprocess-liliane.js
# Verify logs show:
# "[OCR Optimization] PDF has native text, extracting without OCR..."
# "[Native Text] Page 1/100 (2845 chars)"
Code Example: pdf-text-extractor.js
/**
* Native PDF Text Extraction using pdfjs-dist
* Extracts text directly from PDF without OCR
*/
import * as pdfjsLib from 'pdfjs-dist/legacy/build/pdf.mjs';
import { readFileSync } from 'fs';
export async function extractNativeTextPerPage(pdfPath) {
const data = new Uint8Array(readFileSync(pdfPath));
const pdf = await pdfjsLib.getDocument({ data }).promise;
const pageTexts = [];
const pageCount = pdf.numPages;
for (let pageNum = 1; pageNum <= pageCount; pageNum++) {
const page = await pdf.getPage(pageNum);
const textContent = await page.getTextContent();
const pageText = textContent.items.map(item => item.str).join(' ');
pageTexts.push(pageText.trim());
}
return pageTexts;
}
export async function hasNativeText(pdfPath, minChars = 100) {
try {
const pageTexts = await extractNativeTextPerPage(pdfPath);
const totalText = pageTexts.join('');
return totalText.length >= minChars;
} catch (error) {
console.error('Error checking native text:', error);
return false;
}
}
Communication Protocol
You're working independently but reporting to orchestrator via chat system.
When you start work:
# Signal you're active (use StackCP SSH access)
# Note: This is conceptual - actual implementation TBD based on your environment
echo "SESSION-1 STARTED: Smart OCR implementation" > status.txt
Progress updates (every 30 min):
- Report completion percentage
- Note any blockers
- Share preliminary test results
When complete:
# Report success
git commit -m "[Session 1] Smart OCR implemented - 36x performance gain"
git push origin feature/smart-ocr
# Create summary
cat > SESSION-1-COMPLETE.md <<EOF
✅ Smart OCR Implementation - COMPLETE
**Changes:**
- Created: server/services/pdf-text-extractor.js
- Modified: server/services/ocr.js (hybrid logic)
- Dependency: pdfjs-dist@4.0.379
**Test Results:**
- Liliane1 PDF (100 pages): 180s → 6s (30x faster)
- Scanned PDFs: Still work via Tesseract fallback
- Native text pages: 0.99 confidence
- OCR pages: 0.85 average confidence
**Commit:** [hash]
**Branch:** feature/smart-ocr
**Status:** Ready for merge
EOF
If you hit blockers:
- Document the issue clearly
- Try 2 workarounds before escalating
- If stuck >15 minutes, signal for help
Success Criteria
pdfjs-distinstalled successfullypdf-text-extractor.jscreated with 2 functionsocr.jsmodified with hybrid logic- Test document processes in <10 seconds (down from 180s)
- Scanned PDFs still work correctly
- Code committed to feature branch
- Unit tests pass (if applicable)
- No regressions in existing OCR functionality
Environment Setup
If you don't have NaviDocs cloned:
git clone https://github.com/dannystocker/navidocs.git
cd navidocs
git checkout navidocs-cloud-coordination
git pull origin navidocs-cloud-coordination
git checkout -b feature/smart-ocr
# Install dependencies
cd server
npm install
npm install pdfjs-dist
# Set up environment
cp .env.example .env
Test data location:
- Liliane1 manual:
/home/setup/navidocs/uploads/efb25a15-7d84-4bc3-b070-6bd7dec8d59a.pdf - Test user:
test2@navidocs.test/TestPassword123 - Organization:
6ce0dfc7-f754-4122-afde-85154bc4d0ae
Key Files to Read First
server/services/ocr.js(existing OCR logic)server/workers/ocr-worker.js(how OCR is called)IMPROVEMENT_PLAN_OCR_AND_UPLOADS.md(full spec)server/scripts/reprocess-liliane.js(test script)
Timeline
- T+0 min: Read this prompt, clone repo, read existing code
- T+15 min: Create pdf-text-extractor.js
- T+30 min: Modify ocr.js with hybrid logic
- T+45 min: Test with Liliane1 PDF
- T+60 min: Verify scanned PDFs still work, commit, report complete
Dependencies on Other Sessions
None - you can start immediately! Sessions 2-5 are working in parallel on different features.
Questions?
Read the code first, then:
- Check
IMPROVEMENT_PLAN_OCR_AND_UPLOADS.mdfor detailed spec - Review existing
ocr.jsto understand current flow - Test incrementally (don't wait until the end)
- Commit early, commit often
You're autonomous! Start as soon as you're ready. Good luck, Session 1! 🚀
Claude Code URL: https://claude.com/claude-code Repo: https://github.com/dannystocker/navidocs Your Branch: feature/smart-ocr