openwebui-cli/docs/internals/CODEX_EVALUATION_PROMPT.md
2025-12-01 04:24:51 +01:00

16 KiB
Raw Export PDF Blame History

OpenWebUI CLI - Comprehensive Code Evaluation Prompt

Use this prompt with Claude, GPT-4, or any LLM code assistant to perform a thorough evaluation of the OpenWebUI CLI implementation.


Your Mission

You are an expert code reviewer evaluating the OpenWebUI CLI project. Your task is to perform a comprehensive technical assessment covering architecture, code quality, RFC compliance, security, testing, and production readiness.

Repository: https://github.com/dannystocker/openwebui-cli Local Path: /home/setup/openwebui-cli/ RFC Document: /home/setup/openwebui-cli/docs/RFC.md Current Status: v0.1.0 MVP - Alpha development


Evaluation Framework

Assess the implementation across 10 critical dimensions, providing both qualitative analysis and quantitative scores (0-10 scale).


1. ARCHITECTURE & DESIGN QUALITY

Assessment Criteria

Modularity (0-10):

  • Clear separation of concerns (commands, client, config, errors)
  • Minimal coupling between modules
  • Appropriate abstraction levels
  • Extensibility for future features

Code Structure (0-10):

  • Logical file organization (package layout)
  • Consistent naming conventions
  • Appropriate use of OOP vs functional patterns
  • Dependencies are well-managed (pyproject.toml)

Error Handling (0-10):

  • Comprehensive exception handling
  • Meaningful error messages
  • Proper exit codes (0-5 defined)
  • Graceful degradation

Tasks:

  1. Map the directory structure - is it intuitive?
  2. Check openwebui_cli/ package layout - any red flags?
  3. Review errors.py - comprehensive coverage?
  4. Assess http_client.py - proper abstraction?

Score: __/30


2. RFC COMPLIANCE (v1.2)

Reference: /home/setup/openwebui-cli/docs/RFC.md

Core Features Implemented (0-10):

  • Authentication (login, logout, whoami, token storage)
  • Chat (send, streaming, continue conversation)
  • RAG (files upload, collections, search)
  • Models (list, info)
  • Config (init, show, profiles)
  • Admin commands (stats, diagnostics)

22-Step Implementation Checklist (0-10): Cross-reference the RFC's implementation checklist:

  1. Are all 22 steps addressed?
  2. Which steps are incomplete?
  3. Are there deviations from the RFC design?

CLI Interface Match (0-10): Compare actual commands vs RFC specification:

# Run these commands and verify against RFC
openwebui --help
openwebui auth --help
openwebui chat --help
openwebui rag --help
openwebui models --help
openwebui config --help
openwebui admin --help

Tasks:

  1. Read RFC.md thoroughly
  2. Check each command group exists
  3. Verify arguments match RFC specification
  4. Identify missing features

Score: __/30


3. CODE QUALITY & BEST PRACTICES

Python Standards (0-10)

Type Hints:

mypy openwebui_cli --strict
  • 100% type coverage on public APIs?
  • Proper use of Optional, Union, Generic?
  • Any mypy errors/warnings?

Code Style:

ruff check openwebui_cli
  • PEP 8 compliant?
  • Consistent formatting?
  • Any linting violations?

Documentation:

  • Docstrings on all public functions/classes?
  • Docstrings follow Google or NumPy style?
  • Inline comments where necessary?

Security Best Practices (0-10)

Authentication Storage:

  • Tokens stored securely (keyring integration)?
  • No hardcoded credentials?
  • Proper handling of secrets (env vars, config)?

Input Validation:

  • User inputs sanitized?
  • API responses validated before use?
  • File paths properly validated (no path traversal)?

Dependencies:

pip-audit  # Check for known vulnerabilities
  • All dependencies up-to-date?
  • No known CVEs?

Performance (0-10)

Efficiency:

  • Streaming properly implemented (not buffering entire response)?
  • No unnecessary API calls?
  • Appropriate use of caching?

Resource Management:

  • File handles properly closed?
  • HTTP connections reused (session)?
  • Memory leaks avoided?

Tasks:

  1. Run mypy openwebui_cli --strict - capture output
  2. Run ruff check openwebui_cli - any violations?
  3. Check auth.py - how are tokens stored?
  4. Review chat.py - is streaming efficient?

Score: __/30


4. FUNCTIONAL COMPLETENESS

Core Workflows (0-10)

Test these end-to-end workflows:

Workflow 1: First-time Setup

openwebui config init
openwebui auth login  # Interactive
openwebui auth whoami
  • Config created at correct XDG path?
  • Login prompts for username/password?
  • Token stored securely in keyring?
  • Whoami displays user info?

Workflow 2: Chat (Streaming)

openwebui chat send -m llama3.2:latest -p "Count to 10"
  • Streaming displays tokens as they arrive?
  • Ctrl-C cancels gracefully?
  • Final response saved to history?

Workflow 3: RAG Pipeline

openwebui rag files upload document.pdf
openwebui rag collections create "Test Docs"
openwebui chat send -m llama3.2:latest -p "Summarize doc" --file <ID>
  • File uploads successfully?
  • Collection created?
  • Chat retrieves RAG context?

Edge Cases (0-10)

Test error handling:

  • Invalid credentials (401)?
  • Network timeout (connection refused)?
  • Invalid model name (404)?
  • Malformed JSON response?
  • Disk full during file upload?

Missing Features (0-10)

RFC features NOT yet implemented:

  • chat continue with conversation history?
  • --system prompt support?
  • Stdin pipe support (cat prompt.txt | openwebui chat send)?
  • --history-file loading?
  • rag search semantic search?
  • admin stats and admin diagnostics?

Tasks:

  1. Install CLI: pip install -e ".[dev]"
  2. Run Workflow 1, 2, 3 - document results
  3. Test 3+ error scenarios - capture behavior
  4. List ALL missing features from RFC

Score: __/30


5. API ENDPOINT ACCURACY

Verify Against OpenWebUI Source

Critical Endpoints:

Command Expected Endpoint Actual Endpoint Match?
auth login POST /api/v1/auths/signin ??? ?
auth whoami GET /api/v1/auths/ ??? ?
models list GET /api/models ??? ?
chat send POST /api/v1/chat/completions ??? ?
rag files upload POST /api/v1/files/ ??? ?
rag collections list GET /api/v1/knowledge/ ??? ?

Tasks:

  1. Read openwebui_cli/commands/*.py files
  2. Extract API endpoints from each command
  3. Cross-reference with OpenWebUI source (if available)
  4. Flag any mismatches

Score: __/10


6. TESTING & VALIDATION

Test Coverage (0-10)

pytest tests/ -v --cov=openwebui_cli --cov-report=term-missing

Coverage Metrics:

  • Overall coverage: ___%
  • auth.py coverage: ___%
  • chat.py coverage: ___%
  • http_client.py coverage: ___%
  • config.py coverage: ___%

Target: >80% coverage for production-ready CLI

Test Quality (0-10)

Review tests/ directory:

  • Unit tests exist for all command groups?
  • Integration tests with mocked API?
  • Error scenario tests?
  • Fixtures for common test data?
  • Clear test naming (test_should)?

CI/CD (0-10)

Check for automation:

  • GitHub Actions workflow exists?
  • Tests run on every commit?
  • Linting/type checking in CI?
  • Automated releases?

Tasks:

  1. Run pytest with coverage - capture report
  2. Review test files - assess quality
  3. Check .github/workflows/ for CI config

Score: __/30


7. DOCUMENTATION QUALITY

User-Facing Docs (0-10)

README.md:

  • Clear installation instructions?
  • Comprehensive usage examples?
  • Configuration file documented?
  • Exit codes explained?
  • Links to RFC and contributing guide?

CLI Help Text:

openwebui --help
openwebui chat --help
  • Help text is clear and actionable?
  • Examples provided in --help?
  • All arguments documented?

Developer Docs (0-10)

RFC.md:

  • Design rationale explained?
  • Architecture diagrams (if applicable)?
  • Implementation checklist?
  • API endpoint mapping?

CONTRIBUTING.md:

  • Development setup guide?
  • Code style guidelines?
  • Pull request process?

Code Comments (0-10)

  • Complex logic explained with comments?
  • TODOs/FIXMEs documented?
  • API contract explained in docstrings?

Tasks:

  1. Read README.md - rate clarity (0-10)
  2. Run --help for all commands - rate usefulness
  3. Review RFC.md for completeness

Score: __/30


8. USER EXPERIENCE

CLI Ergonomics (0-10)

Intuitiveness:

  • Command names are self-explanatory?
  • Argument flags follow conventions (-m for model)?
  • Consistent flag naming across commands?

Output Formatting:

  • Readable table output (models list)?
  • Colored output for errors/success?
  • Progress indicators for long operations?

Interactive Features:

  • Password input hidden (getpass)?
  • Confirmations for destructive actions?
  • Autocomplete support (argcomplete)?

Error Messages (0-10)

Test error scenarios and rate messages:

# Example: Invalid credentials
openwebui auth login  # Enter wrong password

Error Message Quality:

  • Clear description of what went wrong?
  • Actionable suggestions ("Try: openwebui auth login")?
  • Proper exit codes?
  • No stack traces shown to users (unless --debug)?

Performance Perception (0-10)

  • Startup time <500ms?
  • Streaming feels responsive (<250ms first token)?
  • No noticeable lag in interactive prompts?

Tasks:

  1. Use CLI for 5+ commands - rate intuitiveness
  2. Trigger 3+ errors - rate message quality
  3. Time startup: time openwebui --help

Score: __/30


9. PRODUCTION READINESS

Configuration Management (0-10)

Config File:

  • XDG-compliant paths (Linux/macOS)?
  • Windows support (%APPDATA%)?
  • Profile switching works?
  • Environment variable overrides?

Deployment:

  • pyproject.toml properly configured for PyPI?
  • Dependencies pinned with version ranges?
  • Entry point (openwebui command) works?

Logging & Debugging (0-10)

  • --verbose or --debug flag?
  • Logs to file (optional)?
  • Request/response logging (for debugging)?
  • No sensitive data in logs?

Compatibility (0-10)

Python Versions:

# Check pyproject.toml
requires-python = ">=3.X"
  • Minimum Python version documented?
  • Tested on Python 3.9, 3.10, 3.11, 3.12?

Operating Systems:

  • Linux tested?
  • macOS tested?
  • Windows tested?

Tasks:

  1. Check config file creation on your OS
  2. Test profile switching
  3. Review pyproject.toml dependencies

Score: __/30


10. SECURITY AUDIT

Threat Model (0-10)

Authentication:

  • Token storage uses OS keyring (not plaintext)?
  • Tokens expire and refresh?
  • Session management secure?

Input Validation:

  • Command injection prevented?
  • Path traversal prevented (file uploads)?
  • SQL injection N/A (no direct DB access)?

Dependencies:

pip-audit
safety check
  • No known vulnerabilities?
  • Dependencies from trusted sources?

Secrets Management (0-10)

  • No API keys in code?
  • No tokens in logs?
  • Config file permissions restricted (chmod 600)?

Tasks:

  1. Check auth.py - how is keyring used?
  2. Run pip-audit - any vulnerabilities?
  3. Review file upload code - path validation?

Score: __/20


FINAL EVALUATION REPORT

Scoring Summary

Dimension Max Score Actual Score Notes
1. Architecture & Design 30 __
2. RFC Compliance 30 __
3. Code Quality 30 __
4. Functional Completeness 30 __
5. API Endpoint Accuracy 10 __
6. Testing & Validation 30 __
7. Documentation Quality 30 __
8. User Experience 30 __
9. Production Readiness 30 __
10. Security Audit 20 __
TOTAL 270 __

Overall Grade: ___% (Score/270 × 100)


Grading Scale

Grade Score Range Assessment
A+ 95-100% Production-ready, exemplary implementation
A 90-94% Production-ready with minor refinements
B+ 85-89% Near production, needs moderate work
B 80-84% Alpha-ready, significant work remains
C 70-79% Prototype stage, major gaps
D 60-69% Early development, needs restructuring
F <60% Incomplete, fundamental issues

CRITICAL FINDINGS

P0 (Blockers - Must Fix Before Alpha)

  1. [List any critical issues that prevent basic functionality]
  2. ...

P1 (High Priority - Should Fix Before Beta)

  1. [List important issues affecting user experience]
  2. ...

P2 (Medium Priority - Fix Before v1.0)

  1. [List nice-to-haves for production release]
  2. ...

TOP 10 RECOMMENDATIONS

Priority Order:

  1. [Recommendation #1]

    • Issue: [Description]
    • Impact: [User/Developer/Security]
    • Effort: [Low/Medium/High]
    • Suggested Fix: [Actionable steps]
  2. [Recommendation #2]

    • ...
  3. ...


IMPLEMENTATION GAPS vs RFC

Missing from RFC v1.2:

  • Feature: chat continue --chat-id <ID>
  • Feature: --system prompt support
  • Feature: Stdin pipe support
  • Feature: --history-file loading
  • Feature: rag search semantic search
  • Feature: admin stats and admin diagnostics
  • ...

Estimated Effort to Complete RFC: __ hours


BENCHMARK COMPARISONS

Compare Against:

Strengths of this implementation:

  1. ...

Weaknesses compared to alternatives:

  1. ...

NEXT STEPS - PRIORITIZED ROADMAP

Week 1: Critical Path

  1. Fix any P0 blockers
  2. Achieve >70% test coverage
  3. Verify all API endpoints
  4. Complete streaming implementation

Week 2: Polish

  1. Implement missing RFC features
  2. Improve error messages
  3. Add comprehensive examples to docs
  4. Set up CI/CD

Week 3: Beta Prep

  1. Security audit fixes
  2. Performance optimization
  3. Cross-platform testing
  4. Beta user testing

EVALUATION METHODOLOGY

How to Use This Prompt:

  1. Clone the repository:

    git clone https://github.com/dannystocker/openwebui-cli.git
    cd openwebui-cli
    pip install -e ".[dev]"
    
  2. Read the RFC:

    cat docs/RFC.md
    
  3. Systematically evaluate each dimension:

    • Read the relevant code files
    • Run the specified commands
    • Fill in the scoring tables
    • Document findings in each section
  4. Synthesize the report:

    • Calculate total score
    • Identify top 10 issues
    • Prioritize recommendations
    • Provide actionable roadmap
  5. Format output:

    • Use markdown tables for scores
    • Include code snippets for issues
    • Link to specific files/line numbers
    • Be specific and actionable

OUTPUT FORMAT

Provide your evaluation in this structure:

# OpenWebUI CLI - Code Evaluation Report
**Evaluator:** [Your Name/LLM Model]
**Date:** 2025-11-30
**Version Evaluated:** v0.1.0

## Executive Summary
[2-3 paragraph overview of overall assessment]

## Scoring Summary
[Table with scores for all 10 dimensions]

## Critical Findings
[P0, P1, P2 issues]

## Top 10 Recommendations
[Prioritized list with effort estimates]

## Detailed Analysis
[Section for each of the 10 dimensions with findings]

## Conclusion
[Final verdict and next steps]

BEGIN EVALUATION NOW

Systematically work through dimensions 1-10, documenting findings, assigning scores, and building the final report.