# Guardian Council Evaluation Criteria ## NaviDocs Intelligence Dossier Assessment Framework **Session:** Session 5 - Evidence Synthesis & Guardian Validation **Generated:** 2025-11-13 **Version:** 1.0 --- ## Overview Each of the 20 Guardian Council members evaluates the NaviDocs intelligence dossier across 3 dimensions, scoring 0-10 on each. The average score determines the vote: - **Approve:** Average ≥7.0 - **Abstain:** Average 5.0-6.9 (needs more evidence) - **Reject:** Average <5.0 (fundamental flaws) **Target Consensus:** >90% approval (18/20 guardians) --- ## Dimension 1: Empirical Soundness (0-10) **Definition:** Evidence quality, source verification, data reliability ### Scoring Rubric **10 - Exceptional:** - 100% of claims have ≥2 primary sources (credibility 8-10) - All citations include file:line, URLs with SHA-256, or git commits - Multi-source verification across all critical claims - Zero unverified claims **8-9 - Strong:** - 90-99% of claims have ≥2 sources - Mix of primary (≥70%) and secondary (≤30%) sources - 1-2 unverified claims, clearly flagged - Citation database complete and traceable **7 - Good (Minimum Approval):** - 80-89% of claims have ≥2 sources - Mix of primary (≥60%) and secondary (≤40%) sources - 3-5 unverified claims, with follow-up plan - Most citations traceable **5-6 - Weak (Abstain):** - 60-79% of claims have ≥2 sources - Significant tertiary sources (>10%) - 6-10 unverified claims - Some citations missing line numbers or hashes **3-4 - Poor:** - 40-59% of claims have ≥2 sources - Heavy reliance on tertiary sources (>20%) - 11-20 unverified claims - Many citations incomplete **0-2 - Failing:** - <40% of claims have ≥2 sources - Tertiary sources dominate (>30%) - >20 unverified claims or no citation database - Citations largely missing or unverifiable ### Key Questions for Guardians 1. **Empiricism:** "Is the market size (€2.3B) derived from observable data or speculation?" 2. **Verificationism:** "Can I reproduce the ROI calculation (€8K-€33K) from the sources cited?" 3. **Russell:** "Are the definitions precise enough to verify empirically?" --- ## Dimension 2: Logical Coherence (0-10) **Definition:** Internal consistency, argument validity, contradiction-free ### Scoring Rubric **10 - Exceptional:** - Zero contradictions between Sessions 1-4 - All claims logically follow from evidence - Cross-session consistency verified (Agent 6 report) - Integration points align perfectly (market → tech → sales → implementation) **8-9 - Strong:** - 1-2 minor contradictions, resolved with clarification - Arguments logically sound with explicit reasoning chains - Cross-session alignment validated - Integration points clearly documented **7 - Good (Minimum Approval):** - 3-4 contradictions, resolved or acknowledged - Most arguments logically valid - Sessions generally consistent - Integration points identified **5-6 - Weak (Abstain):** - 5-7 contradictions, some unresolved - Logical gaps in 10-20% of arguments - Sessions partially inconsistent - Integration points unclear **3-4 - Poor:** - 8-12 contradictions, mostly unresolved - Logical fallacies present (>20% of arguments) - Sessions conflict significantly - Integration points missing **0-2 - Failing:** - >12 contradictions or fundamental logical errors - Arguments lack coherent structure - Sessions fundamentally incompatible - No integration strategy ### Key Questions for Guardians 1. **Coherentism:** "Do the market findings (Session 1) align with the pricing strategy (Session 3)?" 2. **Falsificationism:** "Are there contradictions that falsify key claims?" 3. **Kant:** "Is the logical structure universally valid?" --- ## Dimension 3: Practical Viability (0-10) **Definition:** Implementation feasibility, ROI justification, real-world applicability ### Scoring Rubric **10 - Exceptional:** - 4-week timeline validated by codebase analysis - ROI calculator backed by ≥3 independent sources - All acceptance criteria testable (Given/When/Then) - Zero implementation blockers identified - Migration scripts tested and safe **8-9 - Strong:** - 4-week timeline realistic with minor contingencies - ROI calculator backed by ≥2 sources - 90%+ acceptance criteria testable - 1-2 minor blockers with clear resolutions - Migration scripts validated **7 - Good (Minimum Approval):** - 4-week timeline achievable with contingency planning - ROI calculator backed by ≥2 sources (1 primary) - 80%+ acceptance criteria testable - 3-5 blockers with resolution paths - Migration scripts reviewed **5-6 - Weak (Abstain):** - 4-week timeline optimistic, lacks contingencies - ROI calculator based on 1 source or assumptions - 60-79% acceptance criteria testable - 6-10 blockers, some unaddressed - Migration scripts not tested **3-4 - Poor:** - 4-week timeline unrealistic - ROI calculator unverified - <60% acceptance criteria testable - >10 blockers or critical risks - Migration scripts unsafe **0-2 - Failing:** - Timeline completely infeasible - ROI calculator speculative - Acceptance criteria missing or untestable - Fundamental technical blockers - No migration strategy ### Key Questions for Guardians 1. **Pragmatism:** "Does this solve real broker problems worth €8K-€33K?" 2. **Fallibilism:** "What could go wrong? Are uncertainties acknowledged?" 3. **IF.sam (Dark - Pragmatic Survivor):** "Will this actually generate revenue?" --- ## Guardian-Specific Evaluation Focuses ### Core Guardians (1-6) **1. Empiricism:** - Focus: Evidence quality, source verification - Critical on: Market sizing methodology, warranty savings calculation - Approval bar: 90%+ verified claims, primary sources dominate **2. Verificationism:** - Focus: Testable predictions, measurable outcomes - Critical on: ROI calculator verifiability, acceptance criteria - Approval bar: All critical claims have 2+ independent sources **3. Fallibilism:** - Focus: Uncertainty acknowledgment, risk mitigation - Critical on: Timeline contingencies, assumption validation - Approval bar: Risks documented, failure modes addressed **4. Falsificationism:** - Focus: Contradiction detection, refutability - Critical on: Cross-session consistency, conflicting claims - Approval bar: Zero unresolved contradictions **5. Coherentism:** - Focus: Internal consistency, integration - Critical on: Session alignment, logical flow - Approval bar: All 4 sessions form coherent whole **6. Pragmatism:** - Focus: Business value, ROI, real-world utility - Critical on: Broker pain points, revenue potential - Approval bar: Clear value proposition, measurable ROI ### Western Philosophers (7-9) **7. Aristotle (Virtue Ethics):** - Focus: Broker welfare, honest representation, excellence - Critical on: Sales pitch truthfulness, client benefit - Approval bar: Ethical sales practices, genuine broker value **8. Kant (Deontology):** - Focus: Universalizability, treating brokers as ends, duty to accuracy - Critical on: Misleading claims, broker exploitation - Approval bar: No manipulative tactics, honest representation **9. Russell (Logical Positivism):** - Focus: Logical validity, empirical verifiability, clear definitions - Critical on: Argument soundness, term precision - Approval bar: Logically valid, empirically verifiable ### Eastern Philosophers (10-12) **10. Confucius (Ren/Li):** - Focus: Relationship harmony, social benefit, propriety - Critical on: Broker-buyer trust, ecosystem impact - Approval bar: Enhances relationships, benefits community **11. Nagarjuna (Madhyamaka):** - Focus: Dependent origination, avoiding extremes, uncertainty - Critical on: Market projections, economic assumptions - Approval bar: Acknowledges interdependence, avoids dogmatism **12. Zhuangzi (Daoism):** - Focus: Natural flow, effortless adoption, perspective diversity - Critical on: User experience, forced vs organic change - Approval bar: Feels natural to brokers, wu wei design ### IF.sam Facets (13-20) **13. Ethical Idealist (Light):** - Focus: Mission alignment, transparency, user empowerment - Critical on: Marine safety advancement, broker control - Approval bar: Transparent claims, ethical practices **14. Visionary Optimist (Light):** - Focus: Innovation, market expansion, long-term impact - Critical on: Cutting-edge features, 10-year vision - Approval bar: Genuinely innovative, expansion potential **15. Democratic Collaborator (Light):** - Focus: Stakeholder input, feedback loops, open communication - Critical on: Broker consultation, team involvement - Approval bar: Stakeholders consulted, feedback mechanisms **16. Transparent Communicator (Light):** - Focus: Clarity, honesty, evidence disclosure - Critical on: Pitch deck understandability, limitation acknowledgment - Approval bar: Clear communication, accessible citations **17. Pragmatic Survivor (Dark):** - Focus: Competitive edge, revenue potential, risk management - Critical on: Market viability, profitability, competitor threats - Approval bar: Sustainable revenue, competitive advantage **18. Strategic Manipulator (Dark):** - Focus: Persuasion effectiveness, objection handling, narrative control - Critical on: Pitch persuasiveness, objection pre-emption - Approval bar: Compelling narrative, handles objections **19. Ends-Justify-Means (Dark):** - Focus: Goal achievement, efficiency, sacrifice assessment - Critical on: NaviDocs adoption, deployment speed - Approval bar: Fastest path to deployment, MVP defined **20. Corporate Diplomat (Dark):** - Focus: Stakeholder alignment, political navigation, relationship preservation - Critical on: Riviera Plaisance satisfaction, no bridges burned - Approval bar: All stakeholders satisfied, political risks mitigated --- ## Voting Formula **For Each Guardian:** ``` Average Score = (Empirical + Logical + Practical) / 3 If Average ≥ 7.0: APPROVE If 5.0 ≤ Average < 7.0: ABSTAIN If Average < 5.0: REJECT ``` **Consensus Calculation:** ``` Approval % = (Approve Votes) / (Total Guardians - Abstentions) * 100 ``` **Outcome Thresholds:** - **100% Consensus:** 20/20 approve (gold standard) - **>95% Supermajority:** 19/20 approve (subject to Contrarian veto) - **>90% Strong Consensus:** 18/20 approve (standard for production) - **<90% Weak Consensus:** Requires revision --- ## IF.sam Debate Protocol **Before voting, the 8 IF.sam facets debate:** **Light Side Coalition (13-16):** - Argues for ethical practices, transparency, stakeholder empowerment - Challenges: "Is this genuinely helping brokers or just extracting revenue?" **Dark Side Coalition (17-20):** - Argues for competitive advantage, persuasive tactics, goal achievement - Challenges: "Will this actually close the Riviera deal and generate revenue?" **Debate Format:** 1. Light Side presents ethical concerns (5 min) 2. Dark Side presents pragmatic concerns (5 min) 3. Cross-debate: Light challenges Dark assumptions (5 min) 4. Cross-debate: Dark challenges Light idealism (5 min) 5. Synthesis: Identify common ground (5 min) 6. Vote: Each facet scores independently **Agent 10 (S5-H10) monitors for:** - Unresolved tensions (Light vs Dark >30% divergence) - Consensus emerging points (Light + Dark agree) - ESCALATE triggers (>20% of facets reject) --- ## ESCALATE Triggers **Agent 10 must ESCALATE if:** 1. **<80% approval:** Weak consensus requires human review 2. **>20% rejection:** Fundamental flaws detected 3. **IF.sam Light/Dark split >30%:** Ethical vs pragmatic tension unresolved 4. **Contradictions >10:** Cross-session inconsistencies 5. **Unverified claims >10%:** Evidence quality below threshold --- ## Success Criteria **Minimum Viable Consensus (90%):** - 18/20 guardians approve - Average empirical score ≥7.0 - Average logical score ≥7.0 - Average practical score ≥7.0 - IF.sam Light/Dark split <30% **Stretch Goal (100% Consensus):** - 20/20 guardians approve - All 3 dimensions score ≥8.0 - IF.sam Light + Dark aligned - Zero unverified claims - Zero contradictions --- **Document Signature:** ``` if://doc/session-5/guardian-evaluation-criteria-2025-11-13 Version: 1.0 Status: READY for Guardian Council ```