- Add AUTONOMOUS-NEXT-TASKS.md (tasks for next agent execution)
- Add cloud session prompt documents (1-4)
- Add GITHUB_READINESS_REPORT.md (deployment status)
- Add GIT_STATE_REPORT.md (git state verification)
- Add feature-selector-complete.html (demo UI)
- Add demo-data/ directory (sample data for demo)
- Add .github/ workflows (CI/CD configuration)
Ready for cloud session launch.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
5.4 KiB
Cloud Session 2: Advanced Document Search
Session ID: CLOUD-2-DOCUMENT-SEARCH Timeline: 90 minutes Deadline: 4 hours from now (Riviera Plaisance presentation) Target: Ship OCR search + document classifier improvements
Your Mission
Improve the document search experience so boat owners can find maintenance manuals, insurance papers, warranty docs instantly. Current system works but needs:
- Sticky engagement: "Where's my engine manual?" → answers in 2 seconds
- OCR accuracy: Text extraction from boat documents (technical manuals, warranty cards, insurance papers)
- Smart grouping: Show warranty + insurance + service history together
- Auto-tagging: Classify documents by type (engine, electrical, safety equipment, etc.)
This prevents "Where's the engine manual?" crisis during mechanical emergencies.
Quick Start
-
Clone repo:
git clone https://github.com/dannystocker/navidocs.git && cd navidocs -
Read context:
OCR_PIPELINE_SETUP.md- Current OCR implementation (Tesseract + Google Vision)BUILD_COMPLETE.md- What search features already work
-
Check Meilisearch status:
curl http://localhost:7700/health -
Review OCR API:
grep -r "ocr\|vision" src/api/
Your Task List
-
Diagnostic: Review current OCR results quality
- Upload 5 test documents (warranty card, engine manual, insurance doc)
- Check extraction quality (confidence scores, missing text?)
- Document findings in
SEARCH_QUALITY_REPORT.md
-
Implement: Smart document classifier
- Add
documentTypefield to Document table (engine, electrical, hull, interior, warranty, insurance, service, safety) - Create classifier endpoint:
POST /api/documents/classify(reads OCR text → returns type) - Support manual override (user selects type if AI wrong)
- Add
-
Improve: Search results ranking
- Boost warranty + service docs to top
- Show document type icon + confidence score
- Group results by document type
-
Test: Search UX with real documents
- "engine manual" → Find service manuals
- "warranty" → Find all warranty cards + service plans
- "electrical" → Find electrical system diagrams + parts docs
-
API endpoints:
POST /api/documents/classify- Auto-classify document typeGET /api/documents/by-type/:type- Filter by typeGET /api/search/advanced- Enhanced search with type + relevance ranking
-
Git commit:
[AGENT-2] Add document classifier and search ranking -
Create issue:
[AGENT-2] DEPLOY-READY: Document Search Improvementswith:- Test results (5 documents, accuracy %)
- Search quality report
- Performance metrics (search latency)
- Deployment checklist
Technical Context
Current Stack:
- OCR Pipeline: Tesseract (local) + Google Vision API (backup)
- Search Engine: Meilisearch (localhost:7700)
- Database: PostgreSQL -
Documenttable withcontentfield (OCR extracted text) - Frontend: Next.js search UI component
Key Files:
src/api/ocr/route.ts- Current OCR implementationsrc/api/search/route.ts- Search endpointsrc/components/DocumentSearch.tsx- Search UIprisma/schema.prisma- Document model
Design Specs:
- Document types: engine, electrical, hull, interior, warranty, insurance, service, safety, other
- OCR text stored in
Document.content(PostgreSQL) - Meilisearch index includes: title, type, confidence, upload_date
- Search results show: document title, type badge, 2-line preview, relevance score
Sample Test Documents
Create these for testing OCR quality:
- Engine Manual - Technical specifications, maintenance schedule
- Warranty Card - Registration, coverage terms, contact info
- Insurance Document - Policy details, coverage limits
- Service Record - Date, service performed, parts replaced
- Electrical Diagram - System schematic with part numbers
Quality Thresholds:
- OCR confidence >85% = no review needed
- 70-85% = flag for manual review
- <70% = skip from search (mark as low-confidence)
Critical Notes
- Boat owner pain point: Mechanical emergency at 2am, need engine manual NOW
- Search must be fast: <500ms response time (cached results)
- OCR accuracy matters: Wrong document type = wrong answers
- Offline support: Downloaded documents searchable without internet
- Mobile first: Search on small screens must work perfectly
GitHub Access
- Repo: https://github.com/dannystocker/navidocs
- Branch:
feature/document-search(create from main) - Base for PR: main branch
Success Criteria
✅ Document classifier working (type detection >80% accurate) ✅ Search results ranked by type + relevance ✅ Test documents fully searchable ✅ OCR quality report completed ✅ API endpoints tested and working ✅ No console errors ✅ Git commit with [AGENT-2] tag
If Blocked
- Check Google Vision API credentials:
echo $GOOGLE_VISION_API_KEY - Verify Tesseract installed:
tesseract --version - Review current OCR:
cat OCR_PIPELINE_SETUP.md - Check Meilisearch index:
curl http://localhost:7700/indexes/documents/stats - Create blocker issue:
[AGENT-2] BLOCKER: [description]
Reference Files
OCR_PIPELINE_SETUP.md- Complete OCR setup guideARCHITECTURE-SUMMARY.md- System architectureSMOKE_TEST_CHECKLIST.md- Testing procedures