ggq-admin 1a09dfb1f9 docs: Update test results with Meilisearch troubleshooting steps

- Document detailed solution steps for Meilisearch auth issue
- Clarify that OCR is fully working and saving to database
- Provide step-by-step commands to restart Meilisearch correctly
- Updated status from "NOT WORKING" to "NEEDS MANUAL RESTART"

The core functionality is proven working - only search indexing
remains blocked by Meilisearch authentication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-19 09:00:57 +02:00

7 KiB

Raw Export PDF Blame History

NaviDocs Local Testing Results

Date: 2025-10-19 Environment: WSL2 Ubuntu

✅ Working Components

1. System Dependencies

Redis: 7.0.15 running on port 6379
Tesseract OCR: 5.3.4 with English training data
Poppler Utils: pdftoppm for PDF to image conversion
Meilisearch: 1.11.3 binary running on port 7700

2. Database

SQLite: navidocs.db initialized with all 13 tables
Schema: Users, organizations, documents, document_pages, OCR jobs, etc.
Test Data: Created test user and organization for testing

3. Backend API

Server: Express app running on port 3001
Health Check: http://localhost:3001/health ✅
Upload Endpoint: /api/upload accepting PDF files ✅
Jobs Endpoint: /api/jobs/:jobId tracking OCR progress ✅

4. OCR Pipeline

STATUS: ✅ WORKING

PDF Upload: Successfully accepts PDF files
Queue Processing: BullMQ + Redis queuing working
PDF to Image: pdftoppm conversion working at 300 DPI
Text Extraction: Tesseract OCR extracting text successfully
Confidence Score: 0.85 (85%) average confidence
Database Storage: Pages saved to document_pages table

Test Results

Uploaded NaviDocs Test Manual - Successfully extracted:

"NaviDocs Test Manual Page 7 Bilge Pump Maintenance
lge pump is located in the aft compar ar maintenance
is required every 6 mc Electrical System heck the
battery connections regularl)"

Document ID: f23fdada-3c4f-4457-b9fe-c11884fd70f2 Confidence: 0.85 Language: eng

5. OCR Worker

BullMQ Worker: Processing jobs from ocr-processing queue
Concurrency: 2 documents at a time
Progress Tracking: Real-time progress updates (0-100%)
Error Handling: Graceful failure handling per page

⚠️ Known Issues

1. Meilisearch Authentication

STATUS: NEEDS MANUAL RESTART

The Meilisearch instance is running, but the master key is unknown or has changed. We attempted to restart with changeme123 as the master key, but a persisted Meilisearch instance appears to be running with a different key.

What was attempted:

Removed data.ms directory to start fresh
Started Meilisearch with --master-key="changeme123"
Updated .env to use changeme123
Added dotenv loading to OCR worker
However, the Meilisearch instance is still rejecting the key

Impact: Search indexing fails after OCR completion. OCR text IS successfully saved to database but not indexed in Meilisearch for search.

Error Message:

MeiliSearchApiError: The provided API key is invalid.
code: 'invalid_api_key'
httpStatus: 403

Solution Steps:

Find and kill ALL Meilisearch processes:

pkill -9 meilisearch
# OR
echo "setup" | sudo -S fuser -k 7700/tcp

Remove all Meilisearch data:

cd /home/setup/navidocs
rm -rf data.ms meilisearch-data

Start Meilisearch with known key:

/home/setup/opt/meilisearch --master-key="changeme123" --no-analytics \
  --db-path=/home/setup/navidocs/meilisearch-data > logs/meilisearch.log 2>&1 &

Verify the key works:

curl -H "Authorization: Bearer changeme123" http://127.0.0.1:7700/keys

Restart the OCR worker:

cd /home/setup/navidocs
pkill -f ocr-worker
node server/workers/ocr-worker.js > logs/worker.log 2>&1 &

Upload a test document and verify indexing works

2. Frontend

STATUS: NOT TESTED

The Vite dev server is running on port 5174 but frontend functionality has not been tested yet.

TODO:

Test upload modal
Test search interface
Test document viewer
Test page navigation

📊 Service Status

Service	Port	Status	PID
Meilisearch	7700	✅ Running	Unknown
Redis	6379	✅ Running	System
Backend API	3001	✅ Running	48254
OCR Worker	-	✅ Running	Active
Frontend	5174	⚠️ Running (not tested)	Active

🔧 Configuration Changes

Files Modified:

server/services/ocr.js
- Added language code mapping (en → eng)
- Switched to local system tesseract command
- Added TESSDATA_PREFIX environment variable
server/.env
- Updated MEILISEARCH_MASTER_KEY (still needs correct value)
server/config/meilisearch.js
- Updated default master key fallback

Test Files Created:

test-manual.pdf: Single-page test PDF with sample marine manual content
test/data/05-versions-space.pdf: Workaround for pdf-parse debug mode

📝 Next Steps

High Priority:

Fix Meilisearch Authentication
- Determine or set correct master key
- Restart Meilisearch with known key
- Re-process a test document to verify indexing
Test Search Functionality
- Manually index a document if needed
- Test search queries
- Verify synonym search (e.g., "bilge" finds "sump pump")
- Test tenant token generation
Test Frontend UI
- Open http://localhost:5174
- Test document upload flow
- Test search interface
- Test document viewer

Medium Priority:

Integration Testing
- Upload multiple PDFs
- Test concurrent OCR processing
- Verify database integrity
- Test error scenarios
Performance Testing
- Large PDF files (50+ pages)
- Multiple concurrent uploads
- Search response times

📋 Database Verification

Successful OCR Records:

SELECT COUNT(*) FROM document_pages WHERE ocr_confidence > 0;
-- Result: 5 successful pages

SELECT document_id, page_number, ocr_confidence,
       LENGTH(ocr_text) as text_length
FROM document_pages
WHERE ocr_confidence > 0;

Failed OCR Records:

SELECT COUNT(*) FROM document_pages WHERE ocr_confidence = 0;
-- Result: 4 failed pages (due to 'en' vs 'eng' language code issue - now fixed)

🚀 How to Continue Testing

Upload a Document:

curl -X POST http://localhost:3001/api/upload \
  -F "file=@test-manual.pdf" \
  -F "title=My Boat Manual" \
  -F "documentType=owner-manual" \
  -F "organizationId=test-org-id"

Check Job Status:

curl http://localhost:3001/api/jobs/{jobId} | jq

Check Database:

cd server
node -e "
import { getDb } from './db/db.js';
const db = getDb();
const pages = db.prepare('SELECT * FROM document_pages ORDER BY created_at DESC LIMIT 1').all();
console.log(JSON.stringify(pages, null, 2));
"

🎯 Success Criteria Met:

✅ All system dependencies installed
✅ Database initialized and working
✅ All services running
✅ Upload endpoint functional
✅ OCR pipeline extracting text with high confidence
✅ Job queue processing documents
✅ Database storing OCR results
⚠️ Search indexing needs Meilisearch auth fix
❓ Frontend UI not yet tested

Git Commits:

Initial setup commit: chore: Local development environment setup
Tesseract fix commit: fix: Switch to local system tesseract command for OCR
OCR completion commit: fix: Complete OCR pipeline with language code mapping

7 KiB Raw Export PDF Blame History