navidocs

Author	SHA1	Message	Date
Claude	33a4d49924	Fix: Remove pdf-img-convert dependency + Implementation docs Resolves canvas dependency installation issue that was blocking npm install. Changes: - server/package.json: Remove pdf-img-convert (unused, caused canvas build errors) - pdf-img-convert requires canvas with native system libraries (pangocairo, cairo) - Package was not imported anywhere in codebase - After removal, npm install completes successfully (272 packages) - server/MULTIFORMAT_IMPLEMENTATION.md: Complete implementation documentation - Full technical summary of multi-format upload feature - Processing flow diagrams and code examples - Issue resolution details (canvas dependency) - Integration instructions for Session 1 - Success criteria verification Verification: ✅ npm install completes without errors ✅ mammoth and xlsx dependencies installed successfully ✅ All 272 packages installed in 7s ✅ Implementation ready for integration testing Status: Multi-format upload feature COMPLETE Session: Cloud Session 2 (011CV53B2oMH6VqjaePrFZgb)	2025-11-13 13:03:08 +00:00
Claude	f0096a6bd6	Feature: Multi-format upload support (JPG, PNG, DOCX, XLSX, TXT, MD) Implements multi-format document upload capability expanding beyond PDFs. Changes: - server/package.json: Add mammoth (DOCX) and xlsx (Excel) dependencies - server/services/file-safety.js: Expand allowed file types and MIME types - Added getFileCategory() function to classify file types - Support for images, Office docs, and text files - Flexible MIME validation for text files - server/services/document-processor.js: NEW routing service - processImageFile(): Tesseract OCR for JPG/PNG/WebP - processWordDocument(): Mammoth for DOCX text extraction - processExcelDocument(): XLSX for spreadsheet data extraction - processTextFile(): Native reading for TXT/MD files - Unified interface with processDocument() router - server/workers/ocr-worker.js: Switch from extractTextFromPDF to processDocument - Now handles all file types through unified processor - client/src/components/UploadModal.vue: Update UI for multi-format - File input accepts all new file types - Updated help text to show supported formats Supported formats: PDF, JPG, PNG, WebP, DOCX, XLSX, TXT, MD Text extraction methods: Native (Office/text), Tesseract OCR (images), PDF.js (PDFs) Search indexing: All file types processed and indexed in Meilisearch Session: Cloud Session 2 - Multi-Format Upload Support Branch: feature/multiformat Status: Complete - Ready for testing	2025-11-13 12:54:44 +00:00
Danny Stocker	58b344aa31	FINAL: P0 blockers fixed + Joe Trader + ignore binaries Fixed: - Price: €800K-€1.5M, Sunseeker added - Agent 1: Joe Trader persona + actual sale ads research - Ignored meilisearch binary + data/ (too large for GitHub) - SESSION_DEBUG_BLOCKERS.md created Ready for Session 1 launch. 🤖 Generated with Claude Code	2025-11-13 01:29:59 +01:00
ggq-admin	fb88b291de	feat: Add interactive Table of Contents navigation with i18n support Implements complete TOC feature for document navigation with bilingual support. ## TOC Detection & Extraction - Pattern-based TOC detection with 3 regex patterns - Heuristic validation (30%+ match ratio, 5+ entries, sequential pages) - Hierarchical section key parsing (e.g., "4.1.2" → level 3, parent "4.1") - Database schema with parent-child relationships - Automatic extraction during OCR post-processing - Server-side LRU caching (200 entries, 30min TTL) ## UI Components - TocSidebar: Collapsible sidebar (320px) with auto-open on TOC presence - TocEntry: Recursive component for hierarchical rendering - Flex layout: Sidebar + PDF viewer side-by-side - Active page highlighting with real-time sync - localStorage persistence for sidebar state ## Navigation Features - Click TOC entry → PDF jumps to page - Deep link support: URL hash format #p=12 - Page change events: navidocs:pagechange custom event - URL hash updates on all navigation (next/prev/goTo/TOC) - Hash change listener for external navigation - Page clamping and validation ## Search Integration - "Jump to section" button in search results - Shows when result has section field - Navigates to document with page number and hash ## Accessibility - ARIA attributes: role, aria-label, aria-expanded, aria-current - Keyboard navigation: Enter/Space on entries, Tab focus - Screen reader support with aria-live regions - Semantic HTML with proper list/listitem roles ## Internationalization (i18n) - Vue I18n integration with vue-i18n package - English and French translations - 8 TOC-specific translation keys - Language switcher component in document viewer - Locale persistence in localStorage ## Error Handling - Specific error messages for each failure case - Validation before processing (doc exists, has pages, has OCR) - Non-blocking TOC extraction (doesn't fail OCR jobs) - Detailed error returns: {success, error, entriesCount, pages} ## API Endpoints - GET /api/documents/:id/toc?format=flat\|tree - POST /api/documents/:id/toc/extract - Cache invalidation on re-extraction ## Testing - Smoke test script: 9 comprehensive tests - E2E testing guide with 5 manual scenarios - Tests cover: API, caching, validation, navigation, search ## Database - Migration 002: document_toc table - Fields: id, document_id, title, section_key, page_start, level, parent_id, order_index - Foreign keys with CASCADE delete ## Files Changed - New: TocSidebar.vue, TocEntry.vue, LanguageSwitcher.vue - New: toc-extractor.js, toc.js routes, i18n setup - Modified: DocumentView.vue (sidebar, deep links, events) - Modified: SearchView.vue (Jump to section button) - Modified: ocr-worker.js (TOC post-processing) - New: toc-smoke-test.sh, TOC_E2E_TEST.md Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-20 13:22:45 +02:00
ggq-admin	09d9f1b601	Implement PDF image extraction with OCR in OCR worker This commit adds comprehensive image extraction and OCR functionality to the OCR worker: Features: - Created image-extractor.js worker module with extractImagesFromPage() function - Uses pdftoppm (with ImageMagick fallback) to convert PDF pages to high-res images - Images saved to /uploads/{documentId}/images/page-{N}-img-{M}.png - Returns image metadata: id, path, position, width, height OCR Worker Integration: - Imports image-extractor module and extractTextFromImage from OCR service - After processing page text, extracts images from each page - Runs Tesseract OCR on extracted images - Stores image data in document_images table with extracted text and confidence - Indexes images in Meilisearch with type='image' for searchability - Updates document.imageCount and sets imagesExtracted flag Database: - Uses existing document_images table from migration 004 - Stores image metadata, OCR text, and confidence scores Dependencies: - Added pdf-img-convert and sharp packages - Uses system tools (pdftoppm/ImageMagick) for reliable PDF conversion Testing: - Created test-image-extraction.js to verify image extraction - Created test-full-pipeline.js to test end-to-end extraction + OCR - Successfully tested with 05-versions-space.pdf test document Error Handling: - Graceful degradation if image extraction fails - Continues OCR processing even if images cannot be extracted - Comprehensive logging for debugging Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-19 19:54:25 +02:00
ggq-admin	155a8c0305	feat: NaviDocs MVP - Complete codebase extraction from lilian1 ## Backend (server/) - Express 5 API with security middleware (helmet, rate limiting) - SQLite database with WAL mode (schema from docs/architecture/) - Meilisearch integration with tenant tokens - BullMQ + Redis background job queue - OCR pipeline with Tesseract.js - File safety validation (extension, MIME, size) - 4 API route modules: upload, jobs, search, documents ## Frontend (client/) - Vue 3 with Composition API (<script setup>) - Vite 5 build system with HMR - Tailwind CSS (Meilisearch-inspired design) - UploadModal with drag-and-drop - FigureZoom component (ported from lilian1) - Meilisearch search integration with tenant tokens - Job polling composable - Clean SVG icons (no emojis) ## Code Extraction - ✅ manuals.js → UploadModal.vue, useJobPolling.js - ✅ figure-zoom.js → FigureZoom.vue - ✅ service-worker.js → client/public/service-worker.js (TODO) - ✅ glossary.json → Merged into Meilisearch synonyms - ❌ Discarded: quiz.js, persona.js, gamification.js (Frank-AI junk) ## Documentation - Complete extraction plan in docs/analysis/ - README with quick start guide - Architecture summary in docs/architecture/ ## Build Status - Server dependencies: ✅ Installed (234 packages) - Client dependencies: ✅ Installed (160 packages) - Client build: ✅ Successful (2.63s) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-19 01:55:44 +02:00

6 commits