Session 4 (Implementation Planning) has completed comprehensive 4-week sprint planning: Deliverables: - Week 1-4 detailed schedules (162 total hours) - 24 API endpoints (OpenAPI 3.0 specification) - 5 database migrations (100% rollback coverage) - Testing strategy (70% unit, 50% integration, 10 E2E flows) - 28 Gherkin acceptance criteria scenarios - Dependency graph with critical path analysis - Zero-downtime deployment runbook Agents: S4-H01 through S4-H10 (all complete) Token Cost: $2.66 (82% under $15 budget) Efficiency: 82% Haiku delegation Status: Ready for Week 1 implementation kickoff
1151 lines
30 KiB
Markdown
1151 lines
30 KiB
Markdown
# NaviDocs Deployment Runbook
|
|
## 4-Week Sprint Production Deployment Guide
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2025-11-13
|
|
**Status:** Phase 1 - Ready for Implementation
|
|
**Owner:** S4-H10 (Deployment Checklist Creator & Synthesis Agent)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This runbook provides step-by-step procedures for deploying the NaviDocs 4-week sprint (Nov 13 - Dec 10, 2025) to production. It covers:
|
|
|
|
- **Pre-deployment validation** (tests, backups, configuration)
|
|
- **Zero-downtime deployment** (rolling updates, worker coordination)
|
|
- **Post-deployment smoke tests** (critical flow validation)
|
|
- **Rollback procedures** (emergency recovery)
|
|
- **Monitoring & logging** (incident response)
|
|
|
|
**Target Deployment Window:** December 8-10, 2025 (after Week 4 completion)
|
|
**Estimated Deployment Time:** 30-45 minutes
|
|
**Expected Downtime:** <2 minutes (for database migration only)
|
|
|
|
---
|
|
|
|
## Part 1: Pre-Deployment Checklist
|
|
|
|
### A. Test Coverage Validation
|
|
|
|
**Objective:** Ensure code quality and feature completeness before deploying to production.
|
|
|
|
#### Unit Tests
|
|
```bash
|
|
# Run all unit tests
|
|
npm run test:unit
|
|
|
|
# Expected output
|
|
# PASS test/services/warranty.service.test.js
|
|
# PASS test/services/event-bus.service.test.js
|
|
# PASS test/services/webhook.service.test.js
|
|
# PASS test/services/notification.service.test.js
|
|
# PASS test/services/sale-workflow.service.test.js
|
|
# PASS test/services/home-assistant.service.test.js
|
|
# PASS test/services/yachtworld.service.test.js
|
|
# ============================================
|
|
# Test Suites: 7 passed, 7 total
|
|
# Tests: 87 passed, 87 total
|
|
# Coverage: 75% statements, 82% branches, 68% functions
|
|
```
|
|
|
|
**Pass Criteria:**
|
|
- [ ] All test suites passing
|
|
- [ ] Coverage >70% statements
|
|
- [ ] Zero critical failures
|
|
|
|
#### Integration Tests
|
|
```bash
|
|
# Run integration tests (requires test database)
|
|
npm run test:integration
|
|
|
|
# Expected output
|
|
# PASS test/routes/warranty.routes.test.js
|
|
# PASS test/routes/integrations.routes.test.js
|
|
# PASS test/routes/sales.routes.test.js
|
|
# PASS test/workers/warranty-expiration.worker.test.js
|
|
# ============================================
|
|
# Test Suites: 4 passed, 4 total
|
|
# Tests: 42 passed, 42 total
|
|
# Coverage: 68% statements, 75% branches
|
|
```
|
|
|
|
**Pass Criteria:**
|
|
- [ ] All API routes tested
|
|
- [ ] Database operations verified
|
|
- [ ] Background workers functional
|
|
|
|
#### E2E Tests
|
|
```bash
|
|
# Run end-to-end tests against staging environment
|
|
npm run test:e2e
|
|
|
|
# Expected output
|
|
# PASS e2e/warranty-tracking.spec.js (warranty creation, alerts, claim package)
|
|
# PASS e2e/sale-workflow.spec.js (initiate, package generation, transfer)
|
|
# PASS e2e/home-assistant.spec.js (webhook registration, event delivery)
|
|
# PASS e2e/critical-flows.spec.js (login, document upload, export)
|
|
# ============================================
|
|
# Test Suites: 4 passed, 4 total
|
|
# Tests: 18 passed, 18 total
|
|
# Duration: 2m 34s
|
|
```
|
|
|
|
**Pass Criteria:**
|
|
- [ ] All critical user flows pass
|
|
- [ ] No timeout failures
|
|
- [ ] Performance within acceptable ranges
|
|
|
|
#### Security Audit
|
|
```bash
|
|
# Check dependencies for vulnerabilities
|
|
npm audit
|
|
|
|
# Expected output
|
|
# 0 vulnerabilities (after fixes applied)
|
|
|
|
# If vulnerabilities found, fix them:
|
|
npm audit fix
|
|
npm audit fix --force # Only if necessary and reviewed
|
|
```
|
|
|
|
**Pass Criteria:**
|
|
- [ ] Zero critical vulnerabilities
|
|
- [ ] Zero high severity vulnerabilities
|
|
- [ ] All audits passed
|
|
|
|
### B. Database & Environment Setup
|
|
|
|
#### Database Backup
|
|
```bash
|
|
# Create timestamped backup before any operations
|
|
BACKUP_TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
|
cp /var/www/navidocs/navidocs.db \
|
|
/var/www/navidocs/backups/navidocs.db.backup-${BACKUP_TIMESTAMP}
|
|
|
|
# Verify backup integrity
|
|
sqlite3 /var/www/navidocs/backups/navidocs.db.backup-${BACKUP_TIMESTAMP} ".tables"
|
|
|
|
# Expected output: Should list all existing tables
|
|
# boats documents organization_settings organizations users warranty_tracking webhooks
|
|
```
|
|
|
|
**Backup Verification Checklist:**
|
|
- [ ] Backup file created successfully
|
|
- [ ] Backup file size > 100KB (contains data)
|
|
- [ ] Backup file readable (sqlite3 can open it)
|
|
- [ ] Backup location: `/var/www/navidocs/backups/`
|
|
|
|
#### Environment Variables Configuration
|
|
|
|
**File:** `.env.production`
|
|
|
|
```bash
|
|
# Required for production deployment
|
|
cat > .env.production << 'EOF'
|
|
# Application
|
|
NODE_ENV=production
|
|
PORT=3000
|
|
API_BASE_URL=https://api.navidocs.app
|
|
APP_BASE_URL=https://app.navidocs.app
|
|
|
|
# Database
|
|
DATABASE_URL=/var/www/navidocs/navidocs.db
|
|
DATABASE_BACKUP_DIR=/var/www/navidocs/backups
|
|
|
|
# Authentication
|
|
JWT_SECRET=<use_strong_secret_from_vault>
|
|
JWT_EXPIRATION=24h
|
|
REFRESH_TOKEN_EXPIRATION=7d
|
|
|
|
# Email Configuration
|
|
SMTP_HOST=<email_provider_host>
|
|
SMTP_PORT=587
|
|
SMTP_USER=<email_service_account>
|
|
SMTP_PASSWORD=<use_password_from_vault>
|
|
SMTP_FROM=notifications@navidocs.app
|
|
SMTP_FROM_NAME=NaviDocs Notifications
|
|
|
|
# Webhook Configuration
|
|
WEBHOOK_SIGNATURE_SECRET=<use_strong_secret_from_vault>
|
|
WEBHOOK_TIMEOUT_MS=30000
|
|
WEBHOOK_MAX_RETRIES=3
|
|
|
|
# Home Assistant Integration
|
|
HOME_ASSISTANT_WEBHOOK_TIMEOUT=5000
|
|
|
|
# Redis/Queue Configuration
|
|
REDIS_URL=redis://<redis_host>:6379/0
|
|
QUEUE_PREFIX=navidocs:queue:
|
|
|
|
# MLS Integrations
|
|
YACHTWORLD_API_KEY=<get_from_partner>
|
|
YACHTWORLD_API_BASE=https://api.yachtworld.com
|
|
BOAT_TRADER_API_KEY=<get_from_partner>
|
|
BOAT_TRADER_API_BASE=https://api.boattrader.com
|
|
|
|
# Logging & Monitoring
|
|
LOG_LEVEL=info
|
|
SENTRY_DSN=<get_from_sentry>
|
|
NEW_RELIC_LICENSE_KEY=<get_from_new_relic>
|
|
|
|
# Security
|
|
CORS_ORIGIN=https://app.navidocs.app
|
|
RATE_LIMIT_WINDOW_MS=900000
|
|
RATE_LIMIT_MAX_REQUESTS=100
|
|
|
|
# Deployment
|
|
DEPLOYMENT_VERSION=$(git rev-parse --short HEAD)
|
|
DEPLOYMENT_TIMESTAMP=$(date -u +'%Y-%m-%dT%H:%M:%SZ')
|
|
EOF
|
|
```
|
|
|
|
**Environment Validation Checklist:**
|
|
- [ ] All required variables defined
|
|
- [ ] No hardcoded secrets in code
|
|
- [ ] Secrets sourced from vault/secret manager
|
|
- [ ] SSL certificate path configured
|
|
- [ ] CORS origins correct
|
|
|
|
#### SSL Certificate Verification
|
|
```bash
|
|
# Check certificate expiration date
|
|
openssl x509 -in /etc/ssl/certs/navidocs.crt -noout -dates
|
|
|
|
# Expected output similar to:
|
|
# notBefore=Nov 13 00:00:00 2024 GMT
|
|
# notAfter=Nov 13 23:59:59 2025 GMT
|
|
|
|
# If certificate expires within 30 days, renew immediately
|
|
# Renew using Let's Encrypt (automated)
|
|
certbot renew
|
|
```
|
|
|
|
**SSL Checklist:**
|
|
- [ ] Certificate valid (not expired)
|
|
- [ ] Certificate expires >30 days in future
|
|
- [ ] Private key exists and is readable
|
|
- [ ] Certificate matches domain
|
|
|
|
### C. Code Review & Quality Gates
|
|
|
|
#### Code Review Checklist
|
|
- [ ] All pull requests reviewed (minimum 2 reviewers)
|
|
- [ ] All review comments resolved
|
|
- [ ] No blocking feedback remaining
|
|
- [ ] Approval from tech lead obtained
|
|
|
|
#### Linting & Format Check
|
|
```bash
|
|
# Check code style
|
|
npm run lint
|
|
|
|
# Expected: 0 errors, 0 warnings
|
|
|
|
# Auto-format code if needed
|
|
npm run format
|
|
```
|
|
|
|
**Linting Checklist:**
|
|
- [ ] No ESLint errors
|
|
- [ ] No Prettier formatting issues
|
|
- [ ] No TypeScript type errors (if using TS)
|
|
|
|
#### Dependency Check
|
|
```bash
|
|
# Review dependency updates
|
|
npm outdated
|
|
|
|
# Update minor/patch versions if safe
|
|
npm update
|
|
|
|
# Document major version updates for next sprint
|
|
npm ls | grep -E "UNMET|peer"
|
|
```
|
|
|
|
**Dependency Checklist:**
|
|
- [ ] No unmet peer dependencies
|
|
- [ ] Critical security patches applied
|
|
- [ ] Major version updates documented for future
|
|
|
|
---
|
|
|
|
## Part 2: Deployment Procedure (Zero-Downtime)
|
|
|
|
### Pre-Deployment Verification (5 minutes)
|
|
|
|
```bash
|
|
# 1. Confirm current production state
|
|
pm2 list
|
|
# Should show both navidocs-api and navidocs-worker running
|
|
|
|
# 2. Check production database size (to estimate backup/migration time)
|
|
du -sh /var/www/navidocs/navidocs.db
|
|
|
|
# 3. Check system resources
|
|
free -h # RAM available
|
|
df -h # Disk space available (minimum 1GB for backup)
|
|
uptime # System load
|
|
```
|
|
|
|
**Pre-Deployment Criteria:**
|
|
- [ ] Both services running
|
|
- [ ] >1GB disk space available
|
|
- [ ] System load <80%
|
|
- [ ] No active user sessions (off-peak deployment recommended)
|
|
|
|
### Step 1: Notify Stakeholders & Prepare (2 minutes)
|
|
|
|
```bash
|
|
# Send deployment notification to monitoring/alerting
|
|
# Notify users of upcoming maintenance window (if necessary)
|
|
|
|
# Example notification:
|
|
cat > /tmp/deployment_notice.txt << 'EOF'
|
|
DEPLOYMENT IN PROGRESS
|
|
Time: 2025-12-08 02:00 UTC
|
|
Duration: ~30 minutes
|
|
Services: Will be briefly unavailable (~2 minutes for DB migration)
|
|
Impact: All users affected during migration window
|
|
Status Page: https://status.navidocs.app
|
|
EOF
|
|
|
|
# Post to Slack/Teams if integrated
|
|
# curl -X POST -H 'Content-type: application/json' \
|
|
# --data @/tmp/deployment_notice.txt \
|
|
# https://hooks.slack.com/services/YOUR/WEBHOOK/URL
|
|
```
|
|
|
|
### Step 2: Stop Background Workers (3 minutes)
|
|
|
|
```bash
|
|
# CRITICAL: Stop workers first to prevent job processing during migration
|
|
pm2 stop navidocs-worker
|
|
|
|
# Verify workers are stopped
|
|
pm2 list | grep navidocs-worker
|
|
# Should show: stopped
|
|
|
|
# Wait for any in-flight jobs to complete (max 2 minutes)
|
|
sleep 120
|
|
|
|
# Check for any stuck jobs
|
|
redis-cli LLEN navidocs:queue:default
|
|
|
|
# If queue length > 0, wait additional 30 seconds
|
|
# redis-cli LLEN navidocs:queue:default
|
|
```
|
|
|
|
**Worker Stop Checklist:**
|
|
- [ ] navidocs-worker process stopped
|
|
- [ ] No new jobs being queued
|
|
- [ ] In-flight jobs completed or timed out
|
|
- [ ] Queue is empty or nearly empty
|
|
|
|
### Step 3: Create Production Backup (5 minutes)
|
|
|
|
```bash
|
|
# Create timestamped backup with full verification
|
|
BACKUP_TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
|
BACKUP_DIR=/var/www/navidocs/backups
|
|
BACKUP_FILE="${BACKUP_DIR}/navidocs.db.backup-${BACKUP_TIMESTAMP}"
|
|
|
|
# Backup with file locking (SQLite safe copy)
|
|
sqlite3 /var/www/navidocs/navidocs.db ".backup '${BACKUP_FILE}'"
|
|
|
|
# Verify backup size
|
|
BACKUP_SIZE=$(du -s "${BACKUP_FILE}" | cut -f1)
|
|
ORIGINAL_SIZE=$(du -s /var/www/navidocs/navidocs.db | cut -f1)
|
|
|
|
echo "Original DB: ${ORIGINAL_SIZE}KB"
|
|
echo "Backup File: ${BACKUP_SIZE}KB"
|
|
|
|
# Verify backup integrity (attempt to query)
|
|
BACKUP_TABLES=$(sqlite3 "${BACKUP_FILE}" ".tables" 2>/dev/null | wc -w)
|
|
ORIGINAL_TABLES=$(sqlite3 /var/www/navidocs/navidocs.db ".tables" 2>/dev/null | wc -w)
|
|
|
|
echo "Original tables: ${ORIGINAL_TABLES}"
|
|
echo "Backup tables: ${BACKUP_TABLES}"
|
|
|
|
if [ "${BACKUP_TABLES}" -ne "${ORIGINAL_TABLES}" ]; then
|
|
echo "ERROR: Backup verification failed!"
|
|
exit 1
|
|
fi
|
|
|
|
# Keep only last 5 backups (clean up old ones)
|
|
cd "${BACKUP_DIR}"
|
|
ls -t navidocs.db.backup-* | tail -n +6 | xargs rm -f
|
|
|
|
echo "Backup created successfully: ${BACKUP_FILE}"
|
|
```
|
|
|
|
**Backup Verification Checklist:**
|
|
- [ ] Backup file created
|
|
- [ ] Backup size reasonable (within 90-110% of original)
|
|
- [ ] Backup integrity verified (same table count)
|
|
- [ ] Old backups cleaned up (keeping last 5)
|
|
- [ ] Backup timestamp recorded for rollback
|
|
|
|
### Step 4: Deploy Code (8 minutes)
|
|
|
|
```bash
|
|
# Navigate to production directory
|
|
cd /var/www/navidocs
|
|
|
|
# Fetch latest code from repository
|
|
git fetch origin main
|
|
git status
|
|
# Should show "Your branch is behind 'origin/main'"
|
|
|
|
# Review changes before merging
|
|
git diff HEAD origin/main --stat
|
|
# Shows files changed
|
|
|
|
# Checkout main and pull (assuming CI/CD passed)
|
|
git checkout main
|
|
git pull origin main
|
|
|
|
# Expected: "Fast-forward" message
|
|
|
|
# Verify deployment branch
|
|
git log -1 --oneline
|
|
# Should match the release commit hash
|
|
```
|
|
|
|
**Code Deployment Checklist:**
|
|
- [ ] git fetch successful
|
|
- [ ] Changes reviewed (diff --stat)
|
|
- [ ] No merge conflicts
|
|
- [ ] Correct branch deployed (main)
|
|
- [ ] Deployment commit hash recorded
|
|
|
|
### Step 5: Install/Update Dependencies (4 minutes)
|
|
|
|
```bash
|
|
# Install production dependencies only
|
|
npm install --production
|
|
|
|
# Verify installation
|
|
npm list --depth=0
|
|
# Should show all required packages
|
|
|
|
# Check for any installation errors
|
|
npm ls --all 2>&1 | grep -i "error\|unmet"
|
|
|
|
# If errors found, investigate before proceeding
|
|
```
|
|
|
|
**Dependency Installation Checklist:**
|
|
- [ ] npm install completes without errors
|
|
- [ ] No peer dependency warnings
|
|
- [ ] node_modules directory created
|
|
- [ ] package-lock.json consistent
|
|
|
|
### Step 6: Build Application (3 minutes)
|
|
|
|
```bash
|
|
# Build frontend/backend assets if applicable
|
|
npm run build
|
|
|
|
# Verify build output
|
|
ls -la dist/
|
|
# Should contain compiled assets
|
|
|
|
# Check build size (ensure no unexpected bloat)
|
|
du -sh dist/
|
|
# Should be <50MB for typical Node.js app
|
|
|
|
# If build fails, abort deployment
|
|
if [ $? -ne 0 ]; then
|
|
echo "Build failed! Rolling back..."
|
|
git revert HEAD
|
|
npm install --production
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
**Build Verification Checklist:**
|
|
- [ ] Build completes successfully
|
|
- [ ] Dist directory created with assets
|
|
- [ ] Build size reasonable (<50MB)
|
|
- [ ] No build warnings (or documented)
|
|
|
|
### Step 7: Run Database Migrations (5 minutes) - CRITICAL
|
|
|
|
```bash
|
|
# List pending migrations
|
|
npm run migrate:status
|
|
|
|
# Expected output showing 5 new migrations:
|
|
# Pending migrations:
|
|
# 1. migrations/20251113_add_warranty_tracking.sql
|
|
# 2. migrations/20251113_add_webhooks.sql
|
|
# 3. migrations/20251113_add_sale_workflows.sql
|
|
# 4. migrations/20251113_add_notification_templates.sql
|
|
# 5. migrations/20251120_add_home_assistant_config.sql
|
|
|
|
# Apply migrations (this is the brief downtime window ~2 minutes)
|
|
echo "=== MIGRATION START TIME: $(date) ==="
|
|
npm run migrate:up
|
|
|
|
# Expected output:
|
|
# Running migration: 20251113_add_warranty_tracking.sql
|
|
# Running migration: 20251113_add_webhooks.sql
|
|
# Running migration: 20251113_add_sale_workflows.sql
|
|
# Running migration: 20251113_add_notification_templates.sql
|
|
# Running migration: 20251120_add_home_assistant_config.sql
|
|
# ✓ All migrations completed successfully
|
|
echo "=== MIGRATION END TIME: $(date) ==="
|
|
|
|
# Verify migration success
|
|
sqlite3 /var/www/navidocs/navidocs.db ".schema warranty_tracking"
|
|
# Should output warranty_tracking schema
|
|
|
|
# If migration fails, rollback:
|
|
if [ $? -ne 0 ]; then
|
|
echo "ERROR: Migration failed! Rolling back..."
|
|
npm run migrate:down
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
**Migration Verification Checklist:**
|
|
- [ ] All migrations listed (npm run migrate:status)
|
|
- [ ] Migration execution successful
|
|
- [ ] New tables created (verify with sqlite3 .schema)
|
|
- [ ] New indexes created
|
|
- [ ] Data integrity maintained (row counts match)
|
|
|
|
### Step 8: Restart API Server (2 minutes)
|
|
|
|
```bash
|
|
# Clear Node.js module cache (optional but recommended)
|
|
# Restart the API with graceful shutdown
|
|
pm2 restart navidocs-api --wait-ready --listen-timeout 5000
|
|
|
|
# Verify API is running
|
|
pm2 list | grep navidocs-api
|
|
# Should show: "online"
|
|
|
|
# Wait for server to be ready (health check)
|
|
RETRY_COUNT=0
|
|
MAX_RETRIES=30 # 30 * 2 seconds = 60 seconds max wait
|
|
|
|
while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
|
|
if curl -sf http://localhost:3000/api/health > /dev/null; then
|
|
echo "✓ API server is responding to health checks"
|
|
break
|
|
fi
|
|
RETRY_COUNT=$((RETRY_COUNT+1))
|
|
echo "Waiting for API server... ($RETRY_COUNT/$MAX_RETRIES)"
|
|
sleep 2
|
|
done
|
|
|
|
if [ $RETRY_COUNT -eq $MAX_RETRIES ]; then
|
|
echo "ERROR: API server failed to start!"
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
**API Server Startup Checklist:**
|
|
- [ ] Process restarted (pm2 restart)
|
|
- [ ] Process shows "online" status
|
|
- [ ] Health check endpoint returns 200
|
|
- [ ] No errors in logs (pm2 logs)
|
|
|
|
### Step 9: Restart Background Workers (2 minutes)
|
|
|
|
```bash
|
|
# Restart workers with the new code
|
|
pm2 restart navidocs-worker --wait-ready --listen-timeout 5000
|
|
|
|
# Verify worker is running
|
|
pm2 list | grep navidocs-worker
|
|
# Should show: "online"
|
|
|
|
# Check worker logs for startup messages
|
|
pm2 logs navidocs-worker --lines 10 --nostream
|
|
# Should show "Worker started" messages
|
|
|
|
# Monitor queue for 30 seconds (verify jobs are being processed)
|
|
for i in {1..15}; do
|
|
QUEUE_SIZE=$(redis-cli LLEN navidocs:queue:default 2>/dev/null || echo "0")
|
|
echo "Queue size: $QUEUE_SIZE (check $i/15)"
|
|
sleep 2
|
|
done
|
|
```
|
|
|
|
**Worker Startup Checklist:**
|
|
- [ ] Process restarted (pm2 restart)
|
|
- [ ] Process shows "online" status
|
|
- [ ] No errors in logs
|
|
- [ ] Jobs being processed from queue
|
|
|
|
---
|
|
|
|
## Part 3: Post-Deployment Validation (10 minutes)
|
|
|
|
### A. Health Check (2 minutes)
|
|
|
|
```bash
|
|
# 1. Health endpoint
|
|
curl -v http://localhost:3000/api/health
|
|
|
|
# Expected response:
|
|
# HTTP/1.1 200 OK
|
|
# Content-Type: application/json
|
|
# {
|
|
# "status": "ok",
|
|
# "timestamp": "2025-12-08T02:35:00Z",
|
|
# "database": "connected",
|
|
# "redis": "connected",
|
|
# "workers": "running"
|
|
# }
|
|
```
|
|
|
|
**Health Check Criteria:**
|
|
- [ ] HTTP 200 response
|
|
- [ ] All services showing as "connected" or "running"
|
|
- [ ] No error messages in response
|
|
|
|
### B. Critical Endpoint Tests (3 minutes)
|
|
|
|
```bash
|
|
# Test authentication endpoints
|
|
curl -X POST http://localhost:3000/api/auth/login \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"email":"demo@navidocs.app","password":"test"}' \
|
|
| jq '.'
|
|
|
|
# Expected: { "token": "...", "user": {...} }
|
|
# HTTP 200-401 (depending on demo account)
|
|
|
|
# Test boat listing endpoint
|
|
curl -H "Authorization: Bearer ${AUTH_TOKEN}" \
|
|
http://localhost:3000/api/boats \
|
|
| jq '.length'
|
|
|
|
# Expected: Numeric count (could be 0 if no boats)
|
|
|
|
# Test warranty endpoint
|
|
curl -H "Authorization: Bearer ${AUTH_TOKEN}" \
|
|
http://localhost:3000/api/warranties/expiring \
|
|
| jq '.'
|
|
|
|
# Expected: Array of warranties (could be empty [])
|
|
|
|
# Test warranty creation
|
|
curl -X POST -H "Authorization: Bearer ${AUTH_TOKEN}" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"boat_id":"test-boat",
|
|
"item_name":"Engine",
|
|
"purchase_date":"2023-01-15",
|
|
"warranty_period_months":24
|
|
}' \
|
|
http://localhost:3000/api/warranties
|
|
|
|
# Expected: { "id": "...", "expiration_date": "2025-01-15" }
|
|
```
|
|
|
|
**Endpoint Test Checklist:**
|
|
- [ ] /api/health returns 200
|
|
- [ ] /api/auth/login responds (200 or 401)
|
|
- [ ] /api/boats returns data or empty array
|
|
- [ ] /api/warranties/expiring returns array
|
|
- [ ] POST /api/warranties creates warranty successfully
|
|
|
|
### C. Database Verification (2 minutes)
|
|
|
|
```bash
|
|
# Verify all new tables exist
|
|
sqlite3 /var/www/navidocs/navidocs.db << 'EOF'
|
|
.mode column
|
|
.headers on
|
|
|
|
-- Check warranty_tracking table
|
|
SELECT COUNT(*) as warranty_count FROM warranty_tracking;
|
|
|
|
-- Check webhooks table
|
|
SELECT COUNT(*) as webhook_count FROM webhooks;
|
|
|
|
-- Check sale_workflows table
|
|
SELECT COUNT(*) as sale_count FROM sale_workflows;
|
|
|
|
-- Check notification_templates table
|
|
SELECT COUNT(*) as template_count FROM notification_templates;
|
|
|
|
-- Verify indexes created
|
|
SELECT COUNT(*) as index_count FROM sqlite_master
|
|
WHERE type='index' AND tbl_name IN (
|
|
'warranty_tracking', 'webhooks', 'sale_workflows'
|
|
);
|
|
EOF
|
|
|
|
# Expected output:
|
|
# warranty_count: 0 (or >0 if test data inserted)
|
|
# webhook_count: 0
|
|
# sale_count: 0
|
|
# template_count: >0 (seed templates inserted)
|
|
# index_count: >5 (all required indexes)
|
|
```
|
|
|
|
**Database Verification Checklist:**
|
|
- [ ] warranty_tracking table exists
|
|
- [ ] webhooks table exists
|
|
- [ ] sale_workflows table exists
|
|
- [ ] notification_templates table exists
|
|
- [ ] All indexes created successfully
|
|
|
|
### D. Smoke Tests (3 minutes)
|
|
|
|
```bash
|
|
# Run critical smoke tests
|
|
npm run test:smoke
|
|
|
|
# Expected output:
|
|
# PASS smoke-tests/warranty-creation.spec.js
|
|
# PASS smoke-tests/webhook-delivery.spec.js
|
|
# PASS smoke-tests/notification-sending.spec.js
|
|
# PASS smoke-tests/database-operations.spec.js
|
|
# ============================================
|
|
# Smoke Tests: 4 passed, 4 total
|
|
# Duration: 1m 30s
|
|
|
|
# If smoke tests fail, check logs:
|
|
pm2 logs navidocs-api --lines 50
|
|
```
|
|
|
|
**Smoke Test Criteria:**
|
|
- [ ] All smoke tests pass
|
|
- [ ] No timeout errors
|
|
- [ ] No database connectivity errors
|
|
- [ ] No authentication errors
|
|
|
|
### E. Error Rate & Logs Monitoring (Continuous for 30 minutes)
|
|
|
|
```bash
|
|
# Monitor application logs for errors
|
|
pm2 logs navidocs-api --lines 20
|
|
|
|
# Monitor worker logs for failed jobs
|
|
pm2 logs navidocs-worker --lines 20
|
|
|
|
# Check error rate in monitoring system
|
|
# Example query (if using Sentry/New Relic):
|
|
# SELECT COUNT(*) FROM errors WHERE timestamp > now() - 30 minutes
|
|
|
|
# Alert if:
|
|
# - Error rate > 1% of requests
|
|
# - Any critical errors in logs
|
|
# - Worker jobs consistently failing
|
|
|
|
# If issues detected:
|
|
# 1. Check logs for root cause
|
|
# 2. If severe, proceed to ROLLBACK
|
|
# 3. If minor, create incident ticket for next sprint
|
|
```
|
|
|
|
**Log Monitoring Checklist:**
|
|
- [ ] No critical errors in logs
|
|
- [ ] Error rate <1% of requests
|
|
- [ ] Worker processing jobs successfully
|
|
- [ ] No database connection errors
|
|
- [ ] No memory leaks (consistent RAM usage)
|
|
|
|
---
|
|
|
|
## Part 4: Rollback Procedure (Emergency Recovery)
|
|
|
|
### When to Rollback
|
|
|
|
Initiate rollback immediately if:
|
|
- API server won't start (after 5 minutes)
|
|
- Database migrations fail
|
|
- Health check endpoints fail
|
|
- Critical business logic broken
|
|
- Error rate >5% of requests
|
|
- Database corrupted or locked
|
|
|
|
**Do NOT rollback for:**
|
|
- Minor UI bugs
|
|
- Non-critical feature failures
|
|
- Cosmetic issues
|
|
- Warnings in logs (errors must be critical)
|
|
|
|
### Rollback Steps (Automated Script)
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# File: /var/www/navidocs/scripts/rollback.sh
|
|
# Emergency rollback script
|
|
|
|
set -e # Exit on any error
|
|
|
|
ROLLBACK_TIME=$(date +%Y-%m-%dT%H:%M:%SZ)
|
|
CURRENT_VERSION=$(git rev-parse --short HEAD)
|
|
BACKUP_DIR=/var/www/navidocs/backups
|
|
|
|
echo "============================================"
|
|
echo "EMERGENCY ROLLBACK INITIATED"
|
|
echo "Time: $ROLLBACK_TIME"
|
|
echo "Current Version: $CURRENT_VERSION"
|
|
echo "============================================"
|
|
|
|
# Step 1: Stop all services
|
|
echo "Step 1: Stopping services..."
|
|
pm2 stop navidocs-api navidocs-worker
|
|
sleep 3
|
|
|
|
# Step 2: Find most recent backup
|
|
echo "Step 2: Finding latest backup..."
|
|
LATEST_BACKUP=$(ls -t "${BACKUP_DIR}"/navidocs.db.backup-* 2>/dev/null | head -1)
|
|
|
|
if [ -z "$LATEST_BACKUP" ]; then
|
|
echo "ERROR: No backup found! Manual recovery required."
|
|
exit 1
|
|
fi
|
|
|
|
echo "Using backup: $LATEST_BACKUP"
|
|
|
|
# Step 3: Verify backup before restore
|
|
echo "Step 3: Verifying backup integrity..."
|
|
BACKUP_TABLES=$(sqlite3 "$LATEST_BACKUP" ".tables" 2>/dev/null | wc -w)
|
|
if [ "$BACKUP_TABLES" -lt 10 ]; then
|
|
echo "ERROR: Backup appears corrupted (only $BACKUP_TABLES tables)"
|
|
exit 1
|
|
fi
|
|
|
|
# Step 4: Restore database
|
|
echo "Step 4: Restoring database from backup..."
|
|
cp "$LATEST_BACKUP" /var/www/navidocs/navidocs.db
|
|
|
|
# Verify restore
|
|
RESTORED_TABLES=$(sqlite3 /var/www/navidocs/navidocs.db ".tables" 2>/dev/null | wc -w)
|
|
echo "Restored database has $RESTORED_TABLES tables"
|
|
|
|
# Step 5: Revert code to previous version
|
|
echo "Step 5: Reverting code..."
|
|
cd /var/www/navidocs
|
|
PREVIOUS_VERSION=$(git rev-parse HEAD~1)
|
|
git reset --hard $PREVIOUS_VERSION
|
|
|
|
# Step 6: Reinstall dependencies
|
|
echo "Step 6: Reinstalling dependencies..."
|
|
npm install --production
|
|
|
|
# Step 7: Restart services
|
|
echo "Step 7: Restarting services..."
|
|
pm2 start navidocs-api navidocs-worker
|
|
|
|
# Step 8: Health check
|
|
echo "Step 8: Verifying services..."
|
|
sleep 5
|
|
pm2 list
|
|
|
|
# Step 9: Final verification
|
|
echo "Step 9: Running health checks..."
|
|
RETRY_COUNT=0
|
|
while [ $RETRY_COUNT -lt 30 ]; do
|
|
if curl -sf http://localhost:3000/api/health > /dev/null; then
|
|
echo "✓ Rollback successful - API is responding"
|
|
break
|
|
fi
|
|
RETRY_COUNT=$((RETRY_COUNT+1))
|
|
sleep 2
|
|
done
|
|
|
|
if [ $RETRY_COUNT -eq 30 ]; then
|
|
echo "ERROR: Rollback failed - API not responding"
|
|
exit 1
|
|
fi
|
|
|
|
echo "============================================"
|
|
echo "ROLLBACK COMPLETE"
|
|
echo "Previous Version: $CURRENT_VERSION"
|
|
echo "Rolled Back To: $PREVIOUS_VERSION"
|
|
echo "Database Restored From: $LATEST_BACKUP"
|
|
echo "Time: $(date +%Y-%m-%dT%H:%M:%SZ)"
|
|
echo "============================================"
|
|
|
|
# Send notification to Slack/email
|
|
# curl -X POST ... # notification code
|
|
```
|
|
|
|
### Manual Rollback (If Automated Fails)
|
|
|
|
```bash
|
|
# 1. Stop services
|
|
pm2 stop navidocs-api navidocs-worker
|
|
|
|
# 2. Restore database (replace TIMESTAMP with actual backup timestamp)
|
|
TIMESTAMP="20251208-020000" # Example from backup
|
|
cp /var/www/navidocs/backups/navidocs.db.backup-${TIMESTAMP} \
|
|
/var/www/navidocs/navidocs.db
|
|
|
|
# 3. Verify database integrity
|
|
sqlite3 /var/www/navidocs/navidocs.db ".tables"
|
|
|
|
# 4. Revert code
|
|
cd /var/www/navidocs
|
|
git log --oneline -5 # Find previous good commit
|
|
git reset --hard <previous-commit-hash>
|
|
|
|
# 5. Reinstall dependencies
|
|
npm install --production
|
|
|
|
# 6. Restart services
|
|
pm2 start navidocs-api navidocs-worker
|
|
|
|
# 7. Monitor logs
|
|
pm2 logs navidocs-api --lines 50
|
|
pm2 logs navidocs-worker --lines 50
|
|
|
|
# 8. Verify health
|
|
curl http://localhost:3000/api/health
|
|
```
|
|
|
|
**Rollback Verification Checklist:**
|
|
- [ ] Services stopped cleanly
|
|
- [ ] Database restored from backup
|
|
- [ ] Database integrity verified
|
|
- [ ] Code reverted to previous version
|
|
- [ ] Dependencies reinstalled
|
|
- [ ] Services restarted successfully
|
|
- [ ] Health check passes
|
|
- [ ] No errors in logs
|
|
|
|
### Post-Rollback Actions
|
|
|
|
```bash
|
|
# After rollback is complete and verified:
|
|
|
|
# 1. Document the incident
|
|
cat > /tmp/rollback_incident.log << 'EOF'
|
|
ROLLBACK INCIDENT REPORT
|
|
Date: 2025-12-08
|
|
Time: 02:35 UTC
|
|
Duration: 25 minutes
|
|
Reason: [Root cause analysis]
|
|
Version Rolled Back From: [commit hash]
|
|
Version Restored To: [commit hash]
|
|
Data Loss: None (database restored from backup)
|
|
Actions Taken: [List all steps]
|
|
Root Cause: [Analysis]
|
|
Prevention: [How to avoid in future]
|
|
EOF
|
|
|
|
# 2. Notify team
|
|
# Email incident report to team
|
|
# Post to incident channel in Slack
|
|
|
|
# 3. Create post-mortem ticket
|
|
# Add to sprint backlog: "Post-mortem: Deployment failure on 2025-12-08"
|
|
|
|
# 4. Review deployment process
|
|
# Schedule review meeting for next day
|
|
# Document lessons learned
|
|
```
|
|
|
|
---
|
|
|
|
## Part 5: Monitoring & Support
|
|
|
|
### Real-Time Monitoring Dashboard
|
|
|
|
**Tools to Monitor:**
|
|
1. **Error Tracking:** Sentry/New Relic
|
|
- Alert if error rate >1% within 5 minutes
|
|
- Critical errors require immediate investigation
|
|
|
|
2. **Performance Monitoring:** New Relic/DataDog
|
|
- API response time <200ms (p95)
|
|
- Database query time <100ms (p95)
|
|
- Worker job processing time <5s (p95)
|
|
|
|
3. **Infrastructure Monitoring:** CloudWatch/Datadog
|
|
- CPU usage <80%
|
|
- Memory usage <85%
|
|
- Disk usage <90%
|
|
- Network throughput normal
|
|
|
|
4. **Application Logs:** PM2/ELK Stack
|
|
- Check for "ERROR" and "CRITICAL" messages
|
|
- Monitor for "OutOfMemory" warnings
|
|
- Check for "Database locked" errors
|
|
|
|
### Incident Response
|
|
|
|
**If Issues Detected During First 30 Minutes:**
|
|
|
|
```bash
|
|
# Immediate steps:
|
|
# 1. Check if issue is configuration (env var, network) or code
|
|
pm2 logs navidocs-api --lines 100
|
|
pm2 logs navidocs-worker --lines 100
|
|
|
|
# 2. If quick fix available (< 5 minutes):
|
|
# - Apply fix
|
|
# - Restart services
|
|
# - Monitor for 10 minutes
|
|
|
|
# 3. If issue is critical or fix takes >5 minutes:
|
|
# - Execute rollback (see Part 4)
|
|
# - Create incident ticket
|
|
# - Plan hotfix for next deployment
|
|
|
|
# 4. If issue is intermittent:
|
|
# - Monitor for 15 additional minutes
|
|
# - Check system resources (memory, disk, CPU)
|
|
# - If issue persists, rollback
|
|
```
|
|
|
|
### Deployment Success Criteria
|
|
|
|
**Deployment is SUCCESSFUL if:**
|
|
- [ ] All tests pass (unit, integration, E2E)
|
|
- [ ] Deployment completes without errors
|
|
- [ ] All health checks pass
|
|
- [ ] All smoke tests pass
|
|
- [ ] Error rate <0.1% during first 24 hours
|
|
- [ ] No critical issues in logs
|
|
- [ ] Database integrity verified
|
|
- [ ] All new features working as expected
|
|
|
|
**Deployment is FAILED if:**
|
|
- [ ] Tests fail before deployment
|
|
- [ ] Deployment process errors
|
|
- [ ] Health checks fail after deployment
|
|
- [ ] Smoke tests fail
|
|
- [ ] Error rate >1% during first hour
|
|
- [ ] Critical errors in logs
|
|
- [ ] Database corruption detected
|
|
- [ ] Rollback required
|
|
|
|
---
|
|
|
|
## Appendix A: Quick Reference
|
|
|
|
### Deployment Timeline
|
|
```
|
|
Pre-Deployment Checks: 5 min (tests, backups)
|
|
Stakeholder Notification: 2 min
|
|
Stop Workers: 3 min
|
|
Database Backup: 5 min
|
|
Code Deploy: 8 min
|
|
Dependencies: 4 min
|
|
Build: 3 min
|
|
Migrations: 5 min
|
|
API Restart: 2 min
|
|
Worker Restart: 2 min
|
|
─────────────────────────────
|
|
TOTAL DOWNTIME: ~2 min (migration window)
|
|
TOTAL TIME: ~39 min (with all steps)
|
|
|
|
Post-Deployment Validation: 10 min
|
|
Monitoring Period: 30 min (continuous)
|
|
```
|
|
|
|
### Critical Commands
|
|
|
|
```bash
|
|
# Health check
|
|
curl http://localhost:3000/api/health
|
|
|
|
# View logs
|
|
pm2 logs navidocs-api
|
|
pm2 logs navidocs-worker
|
|
|
|
# View process status
|
|
pm2 list
|
|
|
|
# Restart services
|
|
pm2 restart navidocs-api navidocs-worker
|
|
|
|
# Emergency rollback
|
|
/var/www/navidocs/scripts/rollback.sh
|
|
|
|
# Database backup
|
|
sqlite3 /var/www/navidocs/navidocs.db ".backup '/var/www/navidocs/backups/backup.db'"
|
|
|
|
# Check queue size
|
|
redis-cli LLEN navidocs:queue:default
|
|
```
|
|
|
|
### Emergency Contacts
|
|
|
|
```
|
|
Tech Lead: [Name/Email]
|
|
DevOps: [Name/Email]
|
|
On-Call: [Phone/Email]
|
|
Incident Channel: #incident-response (Slack)
|
|
```
|
|
|
|
---
|
|
|
|
## Appendix B: Testing Checklist Template
|
|
|
|
Use this template for deployment day:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Pre-Deployment Checklist - Copy and use on deployment day
|
|
|
|
DEPLOYMENT_DATE=$(date +%Y-%m-%d)
|
|
DEPLOYMENT_TIME=$(date +%H:%M:%S)
|
|
|
|
echo "NaviDocs Deployment Checklist"
|
|
echo "Date: $DEPLOYMENT_DATE"
|
|
echo "Time: $DEPLOYMENT_TIME"
|
|
echo "========================================"
|
|
|
|
# Tests
|
|
echo "[ ] Unit tests passing"
|
|
echo "[ ] Integration tests passing"
|
|
echo "[ ] E2E tests passing"
|
|
echo "[ ] Security audit passed"
|
|
|
|
# Backups
|
|
echo "[ ] Database backup created"
|
|
echo "[ ] Backup verified"
|
|
|
|
# Environment
|
|
echo "[ ] .env.production configured"
|
|
echo "[ ] SSL certificate valid"
|
|
echo "[ ] Secrets in vault, not in code"
|
|
|
|
# Deployment
|
|
echo "[ ] Code reviewed and approved"
|
|
echo "[ ] Dependencies check passed"
|
|
echo "[ ] Build successful"
|
|
echo "[ ] Migrations ready"
|
|
|
|
# Deployment Steps
|
|
echo "[ ] Pre-deployment checks complete"
|
|
echo "[ ] Workers stopped"
|
|
echo "[ ] Database backed up"
|
|
echo "[ ] Code deployed"
|
|
echo "[ ] Dependencies installed"
|
|
echo "[ ] Build completed"
|
|
echo "[ ] Migrations applied"
|
|
echo "[ ] API restarted"
|
|
echo "[ ] Workers restarted"
|
|
|
|
# Post-Deployment
|
|
echo "[ ] Health checks pass"
|
|
echo "[ ] Smoke tests pass"
|
|
echo "[ ] Critical endpoints responding"
|
|
echo "[ ] Database verified"
|
|
echo "[ ] No errors in logs"
|
|
|
|
# Sign-Off
|
|
echo "========================================"
|
|
echo "Deployed by: [Your Name]"
|
|
echo "Approved by: [Tech Lead Name]"
|
|
echo "Timestamp: $DEPLOYMENT_DATE $DEPLOYMENT_TIME UTC"
|
|
```
|
|
|
|
---
|
|
|
|
**Document Status:** Ready for Phase 2 Synthesis
|
|
**Next Steps:** Await completion of agents S4-H01 through S4-H09, then synthesize all outputs in `session-4-handoff.md`
|