navidocs/docs/analysis/DISAPPEARING_DOCUMENTS_BUG_REPORT.md
Danny Stocker 58b344aa31 FINAL: P0 blockers fixed + Joe Trader + ignore binaries
Fixed:
- Price: €800K-€1.5M, Sunseeker added
- Agent 1: Joe Trader persona + actual sale ads research
- Ignored meilisearch binary + data/ (too large for GitHub)
- SESSION_DEBUG_BLOCKERS.md created

Ready for Session 1 launch.

🤖 Generated with Claude Code
2025-11-13 01:29:59 +01:00

20 KiB
Raw Export PDF Blame History

Disappearing Documents Bug Report

Date: 2025-10-23 Priority: HIGH Status: Investigation Complete


Executive Summary

After thorough investigation of the NaviDocs backend codebase, NO CRITICAL BUGS were found that would cause documents to systematically disappear. However, several potential issues and areas of concern were identified that could lead to data loss under specific circumstances.


Investigation Findings

1. Database Configuration - LOW RISK

Location: /home/setup/navidocs/server/db/db.js and /home/setup/navidocs/server/config/db.js

Finding: Database is correctly configured with:

  • WAL mode enabled (journal_mode = WAL) - Good for concurrency
  • Foreign keys enabled (foreign_keys = ON)
  • Proper CASCADE and SET NULL rules on foreign keys

Status: NO ISSUES FOUND


2. Document Status Transitions - MEDIUM RISK

Locations:

  • /home/setup/navidocs/server/routes/upload.js (Line 140)
  • /home/setup/navidocs/server/workers/ocr-worker.js (Lines 332-391)

Issue Found: Documents can get stuck in "processing" or "failed" state

Flow:

  1. Document uploaded → status set to 'processing' (upload.js:140)
  2. OCR job processes document → status should become 'indexed' (ocr-worker.js:334)
  3. IF OCR FAILS → status becomes 'failed' (ocr-worker.js:388)

Problem Scenarios:

  • If the OCR worker crashes mid-processing, documents remain in "processing" state forever
  • Failed documents (status='failed') are not retried automatically
  • No timeout mechanism to mark hung jobs as failed
  • Users may think documents with status='failed' are "missing" when they're actually just failed

Code Evidence:

// upload.js:140 - Initial status
status: 'processing'

// ocr-worker.js:385-391 - Failure handling
db.prepare(`
  UPDATE documents
  SET status = 'failed',
      updated_at = ?
  WHERE id = ?
`).run(now, documentId);

Risk Level: MEDIUM - Documents don't disappear but become invisible if queries filter by status


3. Hard Delete Endpoint - HIGH RISK

Location: /home/setup/navidocs/server/routes/documents.js (Lines 350-414)

Issue Found: DELETE endpoint performs hard deletion (no soft delete)

What It Does:

  1. Deletes from Meilisearch index (line 375)
  2. Deletes from database with CASCADE (line 383-384)
  3. Deletes entire document folder from filesystem (line 392)

Code:

router.delete('/:id', async (req, res) => {
  // ... authentication checks ...

  // Delete from Meilisearch
  await index.deleteDocuments({ filter });

  // Delete from database (CASCADE deletes pages, jobs, etc)
  db.prepare('DELETE FROM documents WHERE id = ?').run(id);

  // Delete from filesystem
  await rm(docFolder, { recursive: true, force: true });
});

Concerns:

  1. No authentication/authorization checks - Anyone with the endpoint can delete (TODO comment on line 352: "simplified permissions")
  2. No soft delete - No recovery possible after deletion
  3. No confirmation required - Single API call deletes everything
  4. Continues on Meilisearch failure - Comment on line 379: "Continue with deletion even if search cleanup fails"

Risk Level: HIGH - If endpoint is called (intentionally or accidentally), documents are permanently deleted


4. Cleanup Scripts - CRITICAL RISK

Locations:

  • /home/setup/navidocs/server/scripts/clean-duplicates.js
  • /home/setup/navidocs/server/scripts/keep-last-n.js

Issue Found: Manual cleanup scripts exist that delete documents in bulk

clean-duplicates.js:

  • Finds documents with duplicate titles
  • Keeps newest, deletes older ones
  • No confirmation prompt before deletion
  • Deletes from DB, filesystem, and Meilisearch

keep-last-n.js:

  • Keeps only N most recent documents (default N=2)
  • Deletes ALL others
  • Takes command line argument: node keep-last-n.js 5

Code Evidence:

// keep-last-n.js:20
const KEEP_COUNT = parseInt(process.argv[2]) || 2;

// keep-last-n.js:77
const deleteStmt = db.prepare(`DELETE FROM documents WHERE id = ?`);

CRITICAL CONCERN: If someone accidentally runs:

node scripts/keep-last-n.js

Without arguments, it will delete ALL documents except the 2 most recent!

Risk Level: CRITICAL - These scripts can delete all user documents


5. Meilisearch Sync Issues - LOW RISK

Location: /home/setup/navidocs/server/workers/ocr-worker.js (Lines 168-184)

Issue Found: Indexing failures are logged but don't fail the job

Code:

// Line 180-183
catch (indexError) {
  console.error(`[OCR Worker] Failed to index page ${pageNumber}:`, indexError.message);
  // Continue processing other pages even if indexing fails
}

Consequence:

  • Documents complete successfully but pages may be missing from search
  • Users search and can't find documents that exist in the database
  • Appears like documents are "missing" but they're just not indexed

Risk Level: LOW - Documents exist but aren't searchable


6. CASCADE Deletion Behavior - MEDIUM RISK

Location: /home/setup/navidocs/server/db/schema.sql

Foreign Key Rules Found:

-- Line 144: Organization deletion cascades to documents
FOREIGN KEY (organization_id) REFERENCES organizations(id) ON DELETE CASCADE

-- Line 173: Document deletion cascades to pages
FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE

-- Line 193: Document deletion cascades to OCR jobs
FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE

Issue: If an organization is deleted, ALL documents in that organization are deleted

Code:

// services/organization.service.js:182
db.prepare('DELETE FROM organizations WHERE id = ?').run(organizationId);

Risk Level: MEDIUM - Single organization deletion cascades to all documents


7. Duplicate Detection Logic - LOW RISK

Location: /home/setup/navidocs/server/routes/upload.js (Lines 104-113)

Finding: Duplicate check exists but doesn't prevent upload

// Lines 105-106
const duplicateCheck = db.prepare(
  'SELECT id, title, file_path FROM documents WHERE file_hash = ? AND organization_id = ? AND status != ?'
).get(fileHash, organizationId, 'deleted');

if (duplicateCheck) {
  // Lines 110-112
  console.log(`Duplicate file detected: ${duplicateCheck.id}, proceeding with new upload`);
}

Issue: Duplicates are detected but allowed. Note the exclusion of status != 'deleted', suggesting soft delete was planned but not implemented.

Risk Level: LOW - Not a bug, but indicates incomplete feature


Root Cause Analysis

Most Likely Causes of "Disappearing Documents"

  1. Accidental Script Execution (HIGH PROBABILITY)

    • User/admin runs node scripts/keep-last-n.js without arguments
    • Deletes all but 2 most recent documents
    • No undo available
  2. Status Filter Confusion (MEDIUM PROBABILITY)

    • Documents in 'failed' or 'processing' state
    • UI filters only show 'indexed' documents
    • Users think documents are gone but they're just in wrong state
  3. Organization Deletion (MEDIUM PROBABILITY)

    • Admin deletes organization
    • CASCADE deletes all documents
    • Users see their documents gone
  4. Manual DELETE API Call (LOW PROBABILITY)

    • Someone with API access calls DELETE endpoint
    • No authorization checks prevent this
    • Documents permanently deleted
  5. Search Index Out of Sync (LOW PROBABILITY)

    • Documents exist in database
    • Not indexed in Meilisearch due to indexing errors
    • Users can't find via search, think they're gone

Priority 1: CRITICAL - Protect Against Bulk Deletion

Fix 1.1: Add Safety to keep-last-n.js

// scripts/keep-last-n.js
const KEEP_COUNT = parseInt(process.argv[2]);

// Add validation
if (!KEEP_COUNT || KEEP_COUNT < 5) {
  console.error('ERROR: Must specify KEEP_COUNT >= 5');
  console.error('Usage: node keep-last-n.js <number>');
  console.error('Example: node keep-last-n.js 10');
  process.exit(1);
}

// Add confirmation prompt
if (toDelete.length > 0) {
  console.log(`\n⚠  WARNING: About to delete ${toDelete.length} documents`);
  console.log('This action cannot be undone!');
  console.log('Type "DELETE" to confirm: ');

  // Add readline confirmation here
}

Fix 1.2: Add Confirmation to clean-duplicates.js

// scripts/clean-duplicates.js
if (documentsToDelete.length > 0) {
  console.log(`\n⚠  WARNING: About to delete ${documentsToDelete.length} documents`);
  console.log('Type "CONFIRM" to proceed: ');

  // Add readline confirmation
}

Priority 2: HIGH - Implement Soft Delete

Fix 2.1: Change DELETE endpoint to soft delete

Location: /home/setup/navidocs/server/routes/documents.js

router.delete('/:id', async (req, res) => {
  const { id } = req.params;

  try {
    logger.info(`Soft deleting document ${id}`);

    const db = getDb();

    // Get document info
    const document = db.prepare('SELECT * FROM documents WHERE id = ?').get(id);

    if (!document) {
      return res.status(404).json({ error: 'Document not found' });
    }

    // ADD AUTHORIZATION CHECK HERE
    const userId = req.user?.id || 'test-user-id';
    // Verify user has permission to delete

    // Soft delete - just update status
    const now = Math.floor(Date.now() / 1000);
    db.prepare(`
      UPDATE documents
      SET status = 'deleted',
          updated_at = ?
      WHERE id = ?
    `).run(now, id);

    // Optionally remove from search index
    try {
      const searchClient = getMeilisearchClient();
      const index = await searchClient.getIndex(MEILISEARCH_INDEX_NAME);
      await index.deleteDocuments({ filter: `docId = "${id}"` });
    } catch (err) {
      logger.warn(`Search cleanup failed for ${id}:`, err);
    }

    logger.info(`Document ${id} soft deleted successfully`);

    res.json({
      success: true,
      message: 'Document deleted successfully',
      documentId: id,
      title: document.title
    });

  } catch (error) {
    logger.error(`Failed to delete document ${id}`, error);
    res.status(500).json({
      error: 'Failed to delete document',
      message: error.message
    });
  }
});

Fix 2.2: Add hard delete endpoint for admins only

router.delete('/:id/permanent', requireAdmin, async (req, res) => {
  // Current hard delete logic here
  // Only accessible to system admins
});

Priority 3: MEDIUM - Fix Status Transition Issues

Fix 3.1: Add job timeout mechanism

Location: /home/setup/navidocs/server/workers/ocr-worker.js

Add stale job detection:

// New function to detect and mark stale jobs
export async function detectStaleJobs() {
  const db = getDb();
  const now = Math.floor(Date.now() / 1000);
  const TIMEOUT = 30 * 60; // 30 minutes

  // Find jobs stuck in 'processing' for > 30 minutes
  const staleJobs = db.prepare(`
    SELECT id, document_id
    FROM ocr_jobs
    WHERE status = 'processing'
      AND started_at < ?
  `).all(now - TIMEOUT);

  for (const job of staleJobs) {
    // Mark job as failed
    db.prepare(`
      UPDATE ocr_jobs
      SET status = 'failed',
          error = 'Job timeout - exceeded 30 minutes',
          completed_at = ?
      WHERE id = ?
    `).run(now, job.id);

    // Mark document as failed
    db.prepare(`
      UPDATE documents
      SET status = 'failed',
          updated_at = ?
      WHERE id = ?
    `).run(now, job.document_id);

    console.log(`Marked stale job ${job.id} as failed`);
  }

  return staleJobs.length;
}

// Run every 5 minutes
setInterval(detectStaleJobs, 5 * 60 * 1000);

Fix 3.2: Add retry mechanism for failed jobs

// New endpoint to retry failed documents
router.post('/documents/:id/retry', async (req, res) => {
  const { id } = req.params;
  const db = getDb();

  const doc = db.prepare('SELECT * FROM documents WHERE id = ? AND status = ?')
    .get(id, 'failed');

  if (!doc) {
    return res.status(404).json({ error: 'No failed document found' });
  }

  // Create new OCR job
  const jobId = uuidv4();
  const now = Math.floor(Date.now() / 1000);

  db.prepare(`
    INSERT INTO ocr_jobs (id, document_id, status, progress, created_at)
    VALUES (?, ?, 'pending', 0, ?)
  `).run(jobId, id, now);

  // Update document status
  db.prepare(`
    UPDATE documents
    SET status = 'processing', updated_at = ?
    WHERE id = ?
  `).run(now, id);

  // Queue job
  await addOcrJob(id, jobId, {
    filePath: doc.file_path,
    fileName: doc.file_name,
    organizationId: doc.organization_id,
    userId: doc.uploaded_by
  });

  res.json({ success: true, jobId, documentId: id });
});

Priority 4: MEDIUM - Add Authorization to DELETE

Fix 4: Implement proper authorization

Location: /home/setup/navidocs/server/routes/documents.js

router.delete('/:id', async (req, res) => {
  const { id } = req.params;
  const userId = req.user?.id;

  if (!userId) {
    return res.status(401).json({ error: 'Authentication required' });
  }

  const db = getDb();
  const document = db.prepare('SELECT * FROM documents WHERE id = ?').get(id);

  if (!document) {
    return res.status(404).json({ error: 'Document not found' });
  }

  // Check authorization
  const isAuthorized = db.prepare(`
    SELECT 1 FROM user_organizations
    WHERE user_id = ? AND organization_id = ?
  `).get(userId, document.organization_id);

  const isUploader = document.uploaded_by === userId;

  if (!isAuthorized && !isUploader) {
    return res.status(403).json({
      error: 'Forbidden',
      message: 'You do not have permission to delete this document'
    });
  }

  // Proceed with deletion
  // ...
});

Priority 5: LOW - Improve Search Index Reliability

Fix 5: Make indexing failures more visible

Location: /home/setup/navidocs/server/workers/ocr-worker.js

// Track indexing failures in document metadata
const indexingFailures = [];

for (const pageResult of ocrResults) {
  // ... page processing ...

  if (cleanedText && !error) {
    try {
      await indexDocumentPage({ ... });
    } catch (indexError) {
      console.error(`Failed to index page ${pageNumber}:`, indexError.message);
      indexingFailures.push({
        page: pageNumber,
        error: indexError.message
      });
    }
  }
}

// Update document with indexing status
if (indexingFailures.length > 0) {
  db.prepare(`
    UPDATE documents
    SET status = 'indexed_partial',
        metadata = ?
    WHERE id = ?
  `).run(JSON.stringify({ indexingFailures }), documentId);

  console.warn(`Document ${documentId} indexed with ${indexingFailures.length} failures`);
}

Priority 6: LOW - Add Document Recovery

Fix 6: Create recovery endpoint for soft-deleted documents

// New endpoint
router.post('/documents/:id/restore', requireAuth, async (req, res) => {
  const { id } = req.params;
  const db = getDb();

  const doc = db.prepare('SELECT * FROM documents WHERE id = ? AND status = ?')
    .get(id, 'deleted');

  if (!doc) {
    return res.status(404).json({ error: 'No deleted document found' });
  }

  // Check authorization
  // ...

  // Restore document
  const now = Math.floor(Date.now() / 1000);
  db.prepare(`
    UPDATE documents
    SET status = 'indexed', updated_at = ?
    WHERE id = ?
  `).run(now, id);

  // Re-index in Meilisearch
  // ...

  res.json({ success: true, documentId: id, message: 'Document restored' });
});

Testing Scenarios

Test 1: Verify Soft Delete

# Upload document
curl -X POST http://localhost:3001/api/upload \
  -F "file=@test.pdf" \
  -F "title=Test Document" \
  -F "documentType=manual" \
  -F "organizationId=test-org"

# Delete document
curl -X DELETE http://localhost:3001/api/documents/<doc-id>

# Verify status is 'deleted', not removed
sqlite3 db/navidocs.db "SELECT id, status FROM documents WHERE id = '<doc-id>'"
# Should return: <doc-id>|deleted

# Verify file still exists
ls uploads/<doc-id>/
# Should still exist

Test 2: Verify Stale Job Detection

# Manually create stale job
sqlite3 db/navidocs.db "
  UPDATE ocr_jobs
  SET status = 'processing',
      started_at = strftime('%s', 'now') - 3600
  WHERE id = '<job-id>'
"

# Wait for stale job detector (5 minutes) or call manually
# Verify job marked as failed
sqlite3 db/navidocs.db "SELECT status FROM ocr_jobs WHERE id = '<job-id>'"
# Should return: failed

Test 3: Verify Authorization

# Try to delete document without auth
curl -X DELETE http://localhost:3001/api/documents/<doc-id>
# Should return: 401 Unauthorized

# Try to delete document from different organization
curl -X DELETE http://localhost:3001/api/documents/<doc-id> \
  -H "Authorization: Bearer <wrong-user-token>"
# Should return: 403 Forbidden

Test 4: Verify Script Safety

# Try to run keep-last-n without argument
node scripts/keep-last-n.js
# Should return: ERROR message and exit

# Try with small number
node scripts/keep-last-n.js 2
# Should return: ERROR: Must specify KEEP_COUNT >= 5

Test 5: Verify Duplicate Handling

# Upload same file twice
curl -X POST http://localhost:3001/api/upload \
  -F "file=@test.pdf" \
  -F "title=Test Doc" \
  -F "documentType=manual" \
  -F "organizationId=test-org"

# Upload again
curl -X POST http://localhost:3001/api/upload \
  -F "file=@test.pdf" \
  -F "title=Test Doc 2" \
  -F "documentType=manual" \
  -F "organizationId=test-org"

# Verify both exist
sqlite3 db/navidocs.db "SELECT COUNT(*) FROM documents WHERE file_hash = '<hash>'"
# Should return: 2

Monitoring Recommendations

1. Add Document Count Metrics

// routes/stats.js - Add endpoint
router.get('/document-counts', async (req, res) => {
  const db = getDb();

  const counts = db.prepare(`
    SELECT
      status,
      COUNT(*) as count
    FROM documents
    GROUP BY status
  `).all();

  res.json({
    byStatus: counts,
    total: counts.reduce((sum, c) => sum + c.count, 0)
  });
});

2. Add Audit Logging for Deletions

// Before deletion
await auditLog.log({
  action: 'document.delete',
  userId: req.user.id,
  resourceId: documentId,
  resourceType: 'document',
  metadata: {
    title: document.title,
    organizationId: document.organization_id
  }
});

3. Set Up Alerts

  • Alert if document count drops by >10% in 1 hour
  • Alert if >5 documents marked as 'failed' in 1 hour
  • Alert if any cleanup script is run in production

Prevention Checklist

  • Implement soft delete (Priority 2)
  • Add confirmation prompts to cleanup scripts (Priority 1)
  • Add authorization checks to DELETE endpoint (Priority 4)
  • Implement stale job detection (Priority 3)
  • Add document restoration endpoint (Priority 6)
  • Add audit logging for deletions
  • Set up monitoring alerts
  • Document recovery procedures
  • Add integration tests for delete scenarios
  • Create backup/restore documentation

Conclusion

The "disappearing documents" bug is most likely caused by:

  1. Accidental execution of cleanup scripts without proper safeguards
  2. Documents getting stuck in 'failed' or 'processing' states and appearing missing
  3. Lack of soft delete causing permanent data loss
  4. Missing authorization checks allowing unauthorized deletions

The database configuration and CASCADE rules are working correctly. The primary issues are around operational safety, status management, and lack of recovery mechanisms.

Immediate Actions:

  1. Add confirmation prompts to cleanup scripts
  2. Implement soft delete
  3. Add stale job detection
  4. Add proper authorization to DELETE endpoint

Next Steps:

  1. Review production logs for DELETE operations
  2. Check for any scheduled cron jobs running cleanup scripts
  3. Interview users to understand exact scenarios where documents disappeared
  4. Implement monitoring and alerting

Report Prepared By: Claude Code Investigation Date: 2025-10-23 Files Analyzed: 15+ source files Lines of Code Reviewed: ~5,000+