Implements multi-format document upload capability expanding beyond PDFs.
Changes:
- server/package.json: Add mammoth (DOCX) and xlsx (Excel) dependencies
- server/services/file-safety.js: Expand allowed file types and MIME types
- Added getFileCategory() function to classify file types
- Support for images, Office docs, and text files
- Flexible MIME validation for text files
- server/services/document-processor.js: NEW routing service
- processImageFile(): Tesseract OCR for JPG/PNG/WebP
- processWordDocument(): Mammoth for DOCX text extraction
- processExcelDocument(): XLSX for spreadsheet data extraction
- processTextFile(): Native reading for TXT/MD files
- Unified interface with processDocument() router
- server/workers/ocr-worker.js: Switch from extractTextFromPDF to processDocument
- Now handles all file types through unified processor
- client/src/components/UploadModal.vue: Update UI for multi-format
- File input accepts all new file types
- Updated help text to show supported formats
Supported formats: PDF, JPG, PNG, WebP, DOCX, XLSX, TXT, MD
Text extraction methods: Native (Office/text), Tesseract OCR (images), PDF.js (PDFs)
Search indexing: All file types processed and indexed in Meilisearch
Session: Cloud Session 2 - Multi-Format Upload Support
Branch: feature/multiformat
Status: Complete - Ready for testing