navidocs/TEST_RESULTS.md
ggq-admin 1a09dfb1f9 docs: Update test results with Meilisearch troubleshooting steps
- Document detailed solution steps for Meilisearch auth issue
- Clarify that OCR is fully working and saving to database
- Provide step-by-step commands to restart Meilisearch correctly
- Updated status from "NOT WORKING" to "NEEDS MANUAL RESTART"

The core functionality is proven working - only search indexing
remains blocked by Meilisearch authentication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 09:00:57 +02:00

7 KiB

NaviDocs Local Testing Results

Date: 2025-10-19 Environment: WSL2 Ubuntu

Working Components

1. System Dependencies

  • Redis: 7.0.15 running on port 6379
  • Tesseract OCR: 5.3.4 with English training data
  • Poppler Utils: pdftoppm for PDF to image conversion
  • Meilisearch: 1.11.3 binary running on port 7700

2. Database

  • SQLite: navidocs.db initialized with all 13 tables
  • Schema: Users, organizations, documents, document_pages, OCR jobs, etc.
  • Test Data: Created test user and organization for testing

3. Backend API

  • Server: Express app running on port 3001
  • Health Check: http://localhost:3001/health
  • Upload Endpoint: /api/upload accepting PDF files
  • Jobs Endpoint: /api/jobs/:jobId tracking OCR progress

4. OCR Pipeline

STATUS: WORKING

  • PDF Upload: Successfully accepts PDF files
  • Queue Processing: BullMQ + Redis queuing working
  • PDF to Image: pdftoppm conversion working at 300 DPI
  • Text Extraction: Tesseract OCR extracting text successfully
  • Confidence Score: 0.85 (85%) average confidence
  • Database Storage: Pages saved to document_pages table

Test Results

Uploaded NaviDocs Test Manual - Successfully extracted:

"NaviDocs Test Manual Page 7 Bilge Pump Maintenance
lge pump is located in the aft compar ar maintenance
is required every 6 mc Electrical System heck the
battery connections regularl)"

Document ID: f23fdada-3c4f-4457-b9fe-c11884fd70f2 Confidence: 0.85 Language: eng

5. OCR Worker

  • BullMQ Worker: Processing jobs from ocr-processing queue
  • Concurrency: 2 documents at a time
  • Progress Tracking: Real-time progress updates (0-100%)
  • Error Handling: Graceful failure handling per page

⚠️ Known Issues

1. Meilisearch Authentication

STATUS: NEEDS MANUAL RESTART

The Meilisearch instance is running, but the master key is unknown or has changed. We attempted to restart with changeme123 as the master key, but a persisted Meilisearch instance appears to be running with a different key.

What was attempted:

  • Removed data.ms directory to start fresh
  • Started Meilisearch with --master-key="changeme123"
  • Updated .env to use changeme123
  • Added dotenv loading to OCR worker
  • However, the Meilisearch instance is still rejecting the key

Impact: Search indexing fails after OCR completion. OCR text IS successfully saved to database but not indexed in Meilisearch for search.

Error Message:

MeiliSearchApiError: The provided API key is invalid.
code: 'invalid_api_key'
httpStatus: 403

Solution Steps:

  1. Find and kill ALL Meilisearch processes:

    pkill -9 meilisearch
    # OR
    echo "setup" | sudo -S fuser -k 7700/tcp
    
  2. Remove all Meilisearch data:

    cd /home/setup/navidocs
    rm -rf data.ms meilisearch-data
    
  3. Start Meilisearch with known key:

    /home/setup/opt/meilisearch --master-key="changeme123" --no-analytics \
      --db-path=/home/setup/navidocs/meilisearch-data > logs/meilisearch.log 2>&1 &
    
  4. Verify the key works:

    curl -H "Authorization: Bearer changeme123" http://127.0.0.1:7700/keys
    
  5. Restart the OCR worker:

    cd /home/setup/navidocs
    pkill -f ocr-worker
    node server/workers/ocr-worker.js > logs/worker.log 2>&1 &
    
  6. Upload a test document and verify indexing works

2. Frontend

STATUS: NOT TESTED

The Vite dev server is running on port 5174 but frontend functionality has not been tested yet.

TODO:

  • Test upload modal
  • Test search interface
  • Test document viewer
  • Test page navigation

📊 Service Status

Service Port Status PID
Meilisearch 7700 Running Unknown
Redis 6379 Running System
Backend API 3001 Running 48254
OCR Worker - Running Active
Frontend 5174 ⚠️ Running (not tested) Active

🔧 Configuration Changes

Files Modified:

  1. server/services/ocr.js

    • Added language code mapping (en → eng)
    • Switched to local system tesseract command
    • Added TESSDATA_PREFIX environment variable
  2. server/.env

    • Updated MEILISEARCH_MASTER_KEY (still needs correct value)
  3. server/config/meilisearch.js

    • Updated default master key fallback

Test Files Created:

  • test-manual.pdf: Single-page test PDF with sample marine manual content
  • test/data/05-versions-space.pdf: Workaround for pdf-parse debug mode

📝 Next Steps

High Priority:

  1. Fix Meilisearch Authentication

    • Determine or set correct master key
    • Restart Meilisearch with known key
    • Re-process a test document to verify indexing
  2. Test Search Functionality

    • Manually index a document if needed
    • Test search queries
    • Verify synonym search (e.g., "bilge" finds "sump pump")
    • Test tenant token generation
  3. Test Frontend UI

Medium Priority:

  1. Integration Testing

    • Upload multiple PDFs
    • Test concurrent OCR processing
    • Verify database integrity
    • Test error scenarios
  2. Performance Testing

    • Large PDF files (50+ pages)
    • Multiple concurrent uploads
    • Search response times

📋 Database Verification

Successful OCR Records:

SELECT COUNT(*) FROM document_pages WHERE ocr_confidence > 0;
-- Result: 5 successful pages

SELECT document_id, page_number, ocr_confidence,
       LENGTH(ocr_text) as text_length
FROM document_pages
WHERE ocr_confidence > 0;

Failed OCR Records:

SELECT COUNT(*) FROM document_pages WHERE ocr_confidence = 0;
-- Result: 4 failed pages (due to 'en' vs 'eng' language code issue - now fixed)

🚀 How to Continue Testing

Upload a Document:

curl -X POST http://localhost:3001/api/upload \
  -F "file=@test-manual.pdf" \
  -F "title=My Boat Manual" \
  -F "documentType=owner-manual" \
  -F "organizationId=test-org-id"

Check Job Status:

curl http://localhost:3001/api/jobs/{jobId} | jq

Check Database:

cd server
node -e "
import { getDb } from './db/db.js';
const db = getDb();
const pages = db.prepare('SELECT * FROM document_pages ORDER BY created_at DESC LIMIT 1').all();
console.log(JSON.stringify(pages, null, 2));
"

🎯 Success Criteria Met:

  • All system dependencies installed
  • Database initialized and working
  • All services running
  • Upload endpoint functional
  • OCR pipeline extracting text with high confidence
  • Job queue processing documents
  • Database storing OCR results
  • ⚠️ Search indexing needs Meilisearch auth fix
  • Frontend UI not yet tested

Git Commits:

  1. Initial setup commit: chore: Local development environment setup
  2. Tesseract fix commit: fix: Switch to local system tesseract command for OCR
  3. OCR completion commit: fix: Complete OCR pipeline with language code mapping