navidocs/intelligence/session-5/guardian-evaluation-criteria.md
Claude 6798ade197
Session 5 Phase 1 complete: Guardian methodology preparation
Phase 1 deliverables:
- Guardian evaluation criteria (3 dimensions: Empirical, Logical, Practical)
- Guardian briefing templates for all 20 guardians
- Session 5 readiness report with IF.TTT compliance framework

Status: READY - Awaiting Sessions 1-4 handoff files before deploying 10 Haiku agents

Next: Poll for intelligence/session-{1,2,3,4}/session-X-handoff.md every 5min
2025-11-13 01:53:25 +00:00

12 KiB

Guardian Council Evaluation Criteria

NaviDocs Intelligence Dossier Assessment Framework

Session: Session 5 - Evidence Synthesis & Guardian Validation Generated: 2025-11-13 Version: 1.0


Overview

Each of the 20 Guardian Council members evaluates the NaviDocs intelligence dossier across 3 dimensions, scoring 0-10 on each. The average score determines the vote:

  • Approve: Average ≥7.0
  • Abstain: Average 5.0-6.9 (needs more evidence)
  • Reject: Average <5.0 (fundamental flaws)

Target Consensus: >90% approval (18/20 guardians)


Dimension 1: Empirical Soundness (0-10)

Definition: Evidence quality, source verification, data reliability

Scoring Rubric

10 - Exceptional:

  • 100% of claims have ≥2 primary sources (credibility 8-10)
  • All citations include file:line, URLs with SHA-256, or git commits
  • Multi-source verification across all critical claims
  • Zero unverified claims

8-9 - Strong:

  • 90-99% of claims have ≥2 sources
  • Mix of primary (≥70%) and secondary (≤30%) sources
  • 1-2 unverified claims, clearly flagged
  • Citation database complete and traceable

7 - Good (Minimum Approval):

  • 80-89% of claims have ≥2 sources
  • Mix of primary (≥60%) and secondary (≤40%) sources
  • 3-5 unverified claims, with follow-up plan
  • Most citations traceable

5-6 - Weak (Abstain):

  • 60-79% of claims have ≥2 sources
  • Significant tertiary sources (>10%)
  • 6-10 unverified claims
  • Some citations missing line numbers or hashes

3-4 - Poor:

  • 40-59% of claims have ≥2 sources
  • Heavy reliance on tertiary sources (>20%)
  • 11-20 unverified claims
  • Many citations incomplete

0-2 - Failing:

  • <40% of claims have ≥2 sources
  • Tertiary sources dominate (>30%)
  • 20 unverified claims or no citation database

  • Citations largely missing or unverifiable

Key Questions for Guardians

  1. Empiricism: "Is the market size (€2.3B) derived from observable data or speculation?"
  2. Verificationism: "Can I reproduce the ROI calculation (€8K-€33K) from the sources cited?"
  3. Russell: "Are the definitions precise enough to verify empirically?"

Dimension 2: Logical Coherence (0-10)

Definition: Internal consistency, argument validity, contradiction-free

Scoring Rubric

10 - Exceptional:

  • Zero contradictions between Sessions 1-4
  • All claims logically follow from evidence
  • Cross-session consistency verified (Agent 6 report)
  • Integration points align perfectly (market → tech → sales → implementation)

8-9 - Strong:

  • 1-2 minor contradictions, resolved with clarification
  • Arguments logically sound with explicit reasoning chains
  • Cross-session alignment validated
  • Integration points clearly documented

7 - Good (Minimum Approval):

  • 3-4 contradictions, resolved or acknowledged
  • Most arguments logically valid
  • Sessions generally consistent
  • Integration points identified

5-6 - Weak (Abstain):

  • 5-7 contradictions, some unresolved
  • Logical gaps in 10-20% of arguments
  • Sessions partially inconsistent
  • Integration points unclear

3-4 - Poor:

  • 8-12 contradictions, mostly unresolved
  • Logical fallacies present (>20% of arguments)
  • Sessions conflict significantly
  • Integration points missing

0-2 - Failing:

  • 12 contradictions or fundamental logical errors

  • Arguments lack coherent structure
  • Sessions fundamentally incompatible
  • No integration strategy

Key Questions for Guardians

  1. Coherentism: "Do the market findings (Session 1) align with the pricing strategy (Session 3)?"
  2. Falsificationism: "Are there contradictions that falsify key claims?"
  3. Kant: "Is the logical structure universally valid?"

Dimension 3: Practical Viability (0-10)

Definition: Implementation feasibility, ROI justification, real-world applicability

Scoring Rubric

10 - Exceptional:

  • 4-week timeline validated by codebase analysis
  • ROI calculator backed by ≥3 independent sources
  • All acceptance criteria testable (Given/When/Then)
  • Zero implementation blockers identified
  • Migration scripts tested and safe

8-9 - Strong:

  • 4-week timeline realistic with minor contingencies
  • ROI calculator backed by ≥2 sources
  • 90%+ acceptance criteria testable
  • 1-2 minor blockers with clear resolutions
  • Migration scripts validated

7 - Good (Minimum Approval):

  • 4-week timeline achievable with contingency planning
  • ROI calculator backed by ≥2 sources (1 primary)
  • 80%+ acceptance criteria testable
  • 3-5 blockers with resolution paths
  • Migration scripts reviewed

5-6 - Weak (Abstain):

  • 4-week timeline optimistic, lacks contingencies
  • ROI calculator based on 1 source or assumptions
  • 60-79% acceptance criteria testable
  • 6-10 blockers, some unaddressed
  • Migration scripts not tested

3-4 - Poor:

  • 4-week timeline unrealistic
  • ROI calculator unverified
  • <60% acceptance criteria testable
  • 10 blockers or critical risks

  • Migration scripts unsafe

0-2 - Failing:

  • Timeline completely infeasible
  • ROI calculator speculative
  • Acceptance criteria missing or untestable
  • Fundamental technical blockers
  • No migration strategy

Key Questions for Guardians

  1. Pragmatism: "Does this solve real broker problems worth €8K-€33K?"
  2. Fallibilism: "What could go wrong? Are uncertainties acknowledged?"
  3. IF.sam (Dark - Pragmatic Survivor): "Will this actually generate revenue?"

Guardian-Specific Evaluation Focuses

Core Guardians (1-6)

1. Empiricism:

  • Focus: Evidence quality, source verification
  • Critical on: Market sizing methodology, warranty savings calculation
  • Approval bar: 90%+ verified claims, primary sources dominate

2. Verificationism:

  • Focus: Testable predictions, measurable outcomes
  • Critical on: ROI calculator verifiability, acceptance criteria
  • Approval bar: All critical claims have 2+ independent sources

3. Fallibilism:

  • Focus: Uncertainty acknowledgment, risk mitigation
  • Critical on: Timeline contingencies, assumption validation
  • Approval bar: Risks documented, failure modes addressed

4. Falsificationism:

  • Focus: Contradiction detection, refutability
  • Critical on: Cross-session consistency, conflicting claims
  • Approval bar: Zero unresolved contradictions

5. Coherentism:

  • Focus: Internal consistency, integration
  • Critical on: Session alignment, logical flow
  • Approval bar: All 4 sessions form coherent whole

6. Pragmatism:

  • Focus: Business value, ROI, real-world utility
  • Critical on: Broker pain points, revenue potential
  • Approval bar: Clear value proposition, measurable ROI

Western Philosophers (7-9)

7. Aristotle (Virtue Ethics):

  • Focus: Broker welfare, honest representation, excellence
  • Critical on: Sales pitch truthfulness, client benefit
  • Approval bar: Ethical sales practices, genuine broker value

8. Kant (Deontology):

  • Focus: Universalizability, treating brokers as ends, duty to accuracy
  • Critical on: Misleading claims, broker exploitation
  • Approval bar: No manipulative tactics, honest representation

9. Russell (Logical Positivism):

  • Focus: Logical validity, empirical verifiability, clear definitions
  • Critical on: Argument soundness, term precision
  • Approval bar: Logically valid, empirically verifiable

Eastern Philosophers (10-12)

10. Confucius (Ren/Li):

  • Focus: Relationship harmony, social benefit, propriety
  • Critical on: Broker-buyer trust, ecosystem impact
  • Approval bar: Enhances relationships, benefits community

11. Nagarjuna (Madhyamaka):

  • Focus: Dependent origination, avoiding extremes, uncertainty
  • Critical on: Market projections, economic assumptions
  • Approval bar: Acknowledges interdependence, avoids dogmatism

12. Zhuangzi (Daoism):

  • Focus: Natural flow, effortless adoption, perspective diversity
  • Critical on: User experience, forced vs organic change
  • Approval bar: Feels natural to brokers, wu wei design

IF.sam Facets (13-20)

13. Ethical Idealist (Light):

  • Focus: Mission alignment, transparency, user empowerment
  • Critical on: Marine safety advancement, broker control
  • Approval bar: Transparent claims, ethical practices

14. Visionary Optimist (Light):

  • Focus: Innovation, market expansion, long-term impact
  • Critical on: Cutting-edge features, 10-year vision
  • Approval bar: Genuinely innovative, expansion potential

15. Democratic Collaborator (Light):

  • Focus: Stakeholder input, feedback loops, open communication
  • Critical on: Broker consultation, team involvement
  • Approval bar: Stakeholders consulted, feedback mechanisms

16. Transparent Communicator (Light):

  • Focus: Clarity, honesty, evidence disclosure
  • Critical on: Pitch deck understandability, limitation acknowledgment
  • Approval bar: Clear communication, accessible citations

17. Pragmatic Survivor (Dark):

  • Focus: Competitive edge, revenue potential, risk management
  • Critical on: Market viability, profitability, competitor threats
  • Approval bar: Sustainable revenue, competitive advantage

18. Strategic Manipulator (Dark):

  • Focus: Persuasion effectiveness, objection handling, narrative control
  • Critical on: Pitch persuasiveness, objection pre-emption
  • Approval bar: Compelling narrative, handles objections

19. Ends-Justify-Means (Dark):

  • Focus: Goal achievement, efficiency, sacrifice assessment
  • Critical on: NaviDocs adoption, deployment speed
  • Approval bar: Fastest path to deployment, MVP defined

20. Corporate Diplomat (Dark):

  • Focus: Stakeholder alignment, political navigation, relationship preservation
  • Critical on: Riviera Plaisance satisfaction, no bridges burned
  • Approval bar: All stakeholders satisfied, political risks mitigated

Voting Formula

For Each Guardian:

Average Score = (Empirical + Logical + Practical) / 3

If Average ≥ 7.0: APPROVE
If 5.0 ≤ Average < 7.0: ABSTAIN
If Average < 5.0: REJECT

Consensus Calculation:

Approval % = (Approve Votes) / (Total Guardians - Abstentions) * 100

Outcome Thresholds:

  • 100% Consensus: 20/20 approve (gold standard)
  • >95% Supermajority: 19/20 approve (subject to Contrarian veto)
  • >90% Strong Consensus: 18/20 approve (standard for production)
  • <90% Weak Consensus: Requires revision

IF.sam Debate Protocol

Before voting, the 8 IF.sam facets debate:

Light Side Coalition (13-16):

  • Argues for ethical practices, transparency, stakeholder empowerment
  • Challenges: "Is this genuinely helping brokers or just extracting revenue?"

Dark Side Coalition (17-20):

  • Argues for competitive advantage, persuasive tactics, goal achievement
  • Challenges: "Will this actually close the Riviera deal and generate revenue?"

Debate Format:

  1. Light Side presents ethical concerns (5 min)
  2. Dark Side presents pragmatic concerns (5 min)
  3. Cross-debate: Light challenges Dark assumptions (5 min)
  4. Cross-debate: Dark challenges Light idealism (5 min)
  5. Synthesis: Identify common ground (5 min)
  6. Vote: Each facet scores independently

Agent 10 (S5-H10) monitors for:

  • Unresolved tensions (Light vs Dark >30% divergence)
  • Consensus emerging points (Light + Dark agree)
  • ESCALATE triggers (>20% of facets reject)

ESCALATE Triggers

Agent 10 must ESCALATE if:

  1. <80% approval: Weak consensus requires human review
  2. >20% rejection: Fundamental flaws detected
  3. IF.sam Light/Dark split >30%: Ethical vs pragmatic tension unresolved
  4. Contradictions >10: Cross-session inconsistencies
  5. Unverified claims >10%: Evidence quality below threshold

Success Criteria

Minimum Viable Consensus (90%):

  • 18/20 guardians approve
  • Average empirical score ≥7.0
  • Average logical score ≥7.0
  • Average practical score ≥7.0
  • IF.sam Light/Dark split <30%

Stretch Goal (100% Consensus):

  • 20/20 guardians approve
  • All 3 dimensions score ≥8.0
  • IF.sam Light + Dark aligned
  • Zero unverified claims
  • Zero contradictions

Document Signature:

if://doc/session-5/guardian-evaluation-criteria-2025-11-13
Version: 1.0
Status: READY for Guardian Council