Phase 1 deliverables:
- Guardian evaluation criteria (3 dimensions: Empirical, Logical, Practical)
- Guardian briefing templates for all 20 guardians
- Session 5 readiness report with IF.TTT compliance framework
Status: READY - Awaiting Sessions 1-4 handoff files before deploying 10 Haiku agents
Next: Poll for intelligence/session-{1,2,3,4}/session-X-handoff.md every 5min
375 lines
12 KiB
Markdown
375 lines
12 KiB
Markdown
# Guardian Council Evaluation Criteria
|
|
## NaviDocs Intelligence Dossier Assessment Framework
|
|
|
|
**Session:** Session 5 - Evidence Synthesis & Guardian Validation
|
|
**Generated:** 2025-11-13
|
|
**Version:** 1.0
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Each of the 20 Guardian Council members evaluates the NaviDocs intelligence dossier across 3 dimensions, scoring 0-10 on each. The average score determines the vote:
|
|
|
|
- **Approve:** Average ≥7.0
|
|
- **Abstain:** Average 5.0-6.9 (needs more evidence)
|
|
- **Reject:** Average <5.0 (fundamental flaws)
|
|
|
|
**Target Consensus:** >90% approval (18/20 guardians)
|
|
|
|
---
|
|
|
|
## Dimension 1: Empirical Soundness (0-10)
|
|
|
|
**Definition:** Evidence quality, source verification, data reliability
|
|
|
|
### Scoring Rubric
|
|
|
|
**10 - Exceptional:**
|
|
- 100% of claims have ≥2 primary sources (credibility 8-10)
|
|
- All citations include file:line, URLs with SHA-256, or git commits
|
|
- Multi-source verification across all critical claims
|
|
- Zero unverified claims
|
|
|
|
**8-9 - Strong:**
|
|
- 90-99% of claims have ≥2 sources
|
|
- Mix of primary (≥70%) and secondary (≤30%) sources
|
|
- 1-2 unverified claims, clearly flagged
|
|
- Citation database complete and traceable
|
|
|
|
**7 - Good (Minimum Approval):**
|
|
- 80-89% of claims have ≥2 sources
|
|
- Mix of primary (≥60%) and secondary (≤40%) sources
|
|
- 3-5 unverified claims, with follow-up plan
|
|
- Most citations traceable
|
|
|
|
**5-6 - Weak (Abstain):**
|
|
- 60-79% of claims have ≥2 sources
|
|
- Significant tertiary sources (>10%)
|
|
- 6-10 unverified claims
|
|
- Some citations missing line numbers or hashes
|
|
|
|
**3-4 - Poor:**
|
|
- 40-59% of claims have ≥2 sources
|
|
- Heavy reliance on tertiary sources (>20%)
|
|
- 11-20 unverified claims
|
|
- Many citations incomplete
|
|
|
|
**0-2 - Failing:**
|
|
- <40% of claims have ≥2 sources
|
|
- Tertiary sources dominate (>30%)
|
|
- >20 unverified claims or no citation database
|
|
- Citations largely missing or unverifiable
|
|
|
|
### Key Questions for Guardians
|
|
|
|
1. **Empiricism:** "Is the market size (€2.3B) derived from observable data or speculation?"
|
|
2. **Verificationism:** "Can I reproduce the ROI calculation (€8K-€33K) from the sources cited?"
|
|
3. **Russell:** "Are the definitions precise enough to verify empirically?"
|
|
|
|
---
|
|
|
|
## Dimension 2: Logical Coherence (0-10)
|
|
|
|
**Definition:** Internal consistency, argument validity, contradiction-free
|
|
|
|
### Scoring Rubric
|
|
|
|
**10 - Exceptional:**
|
|
- Zero contradictions between Sessions 1-4
|
|
- All claims logically follow from evidence
|
|
- Cross-session consistency verified (Agent 6 report)
|
|
- Integration points align perfectly (market → tech → sales → implementation)
|
|
|
|
**8-9 - Strong:**
|
|
- 1-2 minor contradictions, resolved with clarification
|
|
- Arguments logically sound with explicit reasoning chains
|
|
- Cross-session alignment validated
|
|
- Integration points clearly documented
|
|
|
|
**7 - Good (Minimum Approval):**
|
|
- 3-4 contradictions, resolved or acknowledged
|
|
- Most arguments logically valid
|
|
- Sessions generally consistent
|
|
- Integration points identified
|
|
|
|
**5-6 - Weak (Abstain):**
|
|
- 5-7 contradictions, some unresolved
|
|
- Logical gaps in 10-20% of arguments
|
|
- Sessions partially inconsistent
|
|
- Integration points unclear
|
|
|
|
**3-4 - Poor:**
|
|
- 8-12 contradictions, mostly unresolved
|
|
- Logical fallacies present (>20% of arguments)
|
|
- Sessions conflict significantly
|
|
- Integration points missing
|
|
|
|
**0-2 - Failing:**
|
|
- >12 contradictions or fundamental logical errors
|
|
- Arguments lack coherent structure
|
|
- Sessions fundamentally incompatible
|
|
- No integration strategy
|
|
|
|
### Key Questions for Guardians
|
|
|
|
1. **Coherentism:** "Do the market findings (Session 1) align with the pricing strategy (Session 3)?"
|
|
2. **Falsificationism:** "Are there contradictions that falsify key claims?"
|
|
3. **Kant:** "Is the logical structure universally valid?"
|
|
|
|
---
|
|
|
|
## Dimension 3: Practical Viability (0-10)
|
|
|
|
**Definition:** Implementation feasibility, ROI justification, real-world applicability
|
|
|
|
### Scoring Rubric
|
|
|
|
**10 - Exceptional:**
|
|
- 4-week timeline validated by codebase analysis
|
|
- ROI calculator backed by ≥3 independent sources
|
|
- All acceptance criteria testable (Given/When/Then)
|
|
- Zero implementation blockers identified
|
|
- Migration scripts tested and safe
|
|
|
|
**8-9 - Strong:**
|
|
- 4-week timeline realistic with minor contingencies
|
|
- ROI calculator backed by ≥2 sources
|
|
- 90%+ acceptance criteria testable
|
|
- 1-2 minor blockers with clear resolutions
|
|
- Migration scripts validated
|
|
|
|
**7 - Good (Minimum Approval):**
|
|
- 4-week timeline achievable with contingency planning
|
|
- ROI calculator backed by ≥2 sources (1 primary)
|
|
- 80%+ acceptance criteria testable
|
|
- 3-5 blockers with resolution paths
|
|
- Migration scripts reviewed
|
|
|
|
**5-6 - Weak (Abstain):**
|
|
- 4-week timeline optimistic, lacks contingencies
|
|
- ROI calculator based on 1 source or assumptions
|
|
- 60-79% acceptance criteria testable
|
|
- 6-10 blockers, some unaddressed
|
|
- Migration scripts not tested
|
|
|
|
**3-4 - Poor:**
|
|
- 4-week timeline unrealistic
|
|
- ROI calculator unverified
|
|
- <60% acceptance criteria testable
|
|
- >10 blockers or critical risks
|
|
- Migration scripts unsafe
|
|
|
|
**0-2 - Failing:**
|
|
- Timeline completely infeasible
|
|
- ROI calculator speculative
|
|
- Acceptance criteria missing or untestable
|
|
- Fundamental technical blockers
|
|
- No migration strategy
|
|
|
|
### Key Questions for Guardians
|
|
|
|
1. **Pragmatism:** "Does this solve real broker problems worth €8K-€33K?"
|
|
2. **Fallibilism:** "What could go wrong? Are uncertainties acknowledged?"
|
|
3. **IF.sam (Dark - Pragmatic Survivor):** "Will this actually generate revenue?"
|
|
|
|
---
|
|
|
|
## Guardian-Specific Evaluation Focuses
|
|
|
|
### Core Guardians (1-6)
|
|
|
|
**1. Empiricism:**
|
|
- Focus: Evidence quality, source verification
|
|
- Critical on: Market sizing methodology, warranty savings calculation
|
|
- Approval bar: 90%+ verified claims, primary sources dominate
|
|
|
|
**2. Verificationism:**
|
|
- Focus: Testable predictions, measurable outcomes
|
|
- Critical on: ROI calculator verifiability, acceptance criteria
|
|
- Approval bar: All critical claims have 2+ independent sources
|
|
|
|
**3. Fallibilism:**
|
|
- Focus: Uncertainty acknowledgment, risk mitigation
|
|
- Critical on: Timeline contingencies, assumption validation
|
|
- Approval bar: Risks documented, failure modes addressed
|
|
|
|
**4. Falsificationism:**
|
|
- Focus: Contradiction detection, refutability
|
|
- Critical on: Cross-session consistency, conflicting claims
|
|
- Approval bar: Zero unresolved contradictions
|
|
|
|
**5. Coherentism:**
|
|
- Focus: Internal consistency, integration
|
|
- Critical on: Session alignment, logical flow
|
|
- Approval bar: All 4 sessions form coherent whole
|
|
|
|
**6. Pragmatism:**
|
|
- Focus: Business value, ROI, real-world utility
|
|
- Critical on: Broker pain points, revenue potential
|
|
- Approval bar: Clear value proposition, measurable ROI
|
|
|
|
### Western Philosophers (7-9)
|
|
|
|
**7. Aristotle (Virtue Ethics):**
|
|
- Focus: Broker welfare, honest representation, excellence
|
|
- Critical on: Sales pitch truthfulness, client benefit
|
|
- Approval bar: Ethical sales practices, genuine broker value
|
|
|
|
**8. Kant (Deontology):**
|
|
- Focus: Universalizability, treating brokers as ends, duty to accuracy
|
|
- Critical on: Misleading claims, broker exploitation
|
|
- Approval bar: No manipulative tactics, honest representation
|
|
|
|
**9. Russell (Logical Positivism):**
|
|
- Focus: Logical validity, empirical verifiability, clear definitions
|
|
- Critical on: Argument soundness, term precision
|
|
- Approval bar: Logically valid, empirically verifiable
|
|
|
|
### Eastern Philosophers (10-12)
|
|
|
|
**10. Confucius (Ren/Li):**
|
|
- Focus: Relationship harmony, social benefit, propriety
|
|
- Critical on: Broker-buyer trust, ecosystem impact
|
|
- Approval bar: Enhances relationships, benefits community
|
|
|
|
**11. Nagarjuna (Madhyamaka):**
|
|
- Focus: Dependent origination, avoiding extremes, uncertainty
|
|
- Critical on: Market projections, economic assumptions
|
|
- Approval bar: Acknowledges interdependence, avoids dogmatism
|
|
|
|
**12. Zhuangzi (Daoism):**
|
|
- Focus: Natural flow, effortless adoption, perspective diversity
|
|
- Critical on: User experience, forced vs organic change
|
|
- Approval bar: Feels natural to brokers, wu wei design
|
|
|
|
### IF.sam Facets (13-20)
|
|
|
|
**13. Ethical Idealist (Light):**
|
|
- Focus: Mission alignment, transparency, user empowerment
|
|
- Critical on: Marine safety advancement, broker control
|
|
- Approval bar: Transparent claims, ethical practices
|
|
|
|
**14. Visionary Optimist (Light):**
|
|
- Focus: Innovation, market expansion, long-term impact
|
|
- Critical on: Cutting-edge features, 10-year vision
|
|
- Approval bar: Genuinely innovative, expansion potential
|
|
|
|
**15. Democratic Collaborator (Light):**
|
|
- Focus: Stakeholder input, feedback loops, open communication
|
|
- Critical on: Broker consultation, team involvement
|
|
- Approval bar: Stakeholders consulted, feedback mechanisms
|
|
|
|
**16. Transparent Communicator (Light):**
|
|
- Focus: Clarity, honesty, evidence disclosure
|
|
- Critical on: Pitch deck understandability, limitation acknowledgment
|
|
- Approval bar: Clear communication, accessible citations
|
|
|
|
**17. Pragmatic Survivor (Dark):**
|
|
- Focus: Competitive edge, revenue potential, risk management
|
|
- Critical on: Market viability, profitability, competitor threats
|
|
- Approval bar: Sustainable revenue, competitive advantage
|
|
|
|
**18. Strategic Manipulator (Dark):**
|
|
- Focus: Persuasion effectiveness, objection handling, narrative control
|
|
- Critical on: Pitch persuasiveness, objection pre-emption
|
|
- Approval bar: Compelling narrative, handles objections
|
|
|
|
**19. Ends-Justify-Means (Dark):**
|
|
- Focus: Goal achievement, efficiency, sacrifice assessment
|
|
- Critical on: NaviDocs adoption, deployment speed
|
|
- Approval bar: Fastest path to deployment, MVP defined
|
|
|
|
**20. Corporate Diplomat (Dark):**
|
|
- Focus: Stakeholder alignment, political navigation, relationship preservation
|
|
- Critical on: Riviera Plaisance satisfaction, no bridges burned
|
|
- Approval bar: All stakeholders satisfied, political risks mitigated
|
|
|
|
---
|
|
|
|
## Voting Formula
|
|
|
|
**For Each Guardian:**
|
|
```
|
|
Average Score = (Empirical + Logical + Practical) / 3
|
|
|
|
If Average ≥ 7.0: APPROVE
|
|
If 5.0 ≤ Average < 7.0: ABSTAIN
|
|
If Average < 5.0: REJECT
|
|
```
|
|
|
|
**Consensus Calculation:**
|
|
```
|
|
Approval % = (Approve Votes) / (Total Guardians - Abstentions) * 100
|
|
```
|
|
|
|
**Outcome Thresholds:**
|
|
- **100% Consensus:** 20/20 approve (gold standard)
|
|
- **>95% Supermajority:** 19/20 approve (subject to Contrarian veto)
|
|
- **>90% Strong Consensus:** 18/20 approve (standard for production)
|
|
- **<90% Weak Consensus:** Requires revision
|
|
|
|
---
|
|
|
|
## IF.sam Debate Protocol
|
|
|
|
**Before voting, the 8 IF.sam facets debate:**
|
|
|
|
**Light Side Coalition (13-16):**
|
|
- Argues for ethical practices, transparency, stakeholder empowerment
|
|
- Challenges: "Is this genuinely helping brokers or just extracting revenue?"
|
|
|
|
**Dark Side Coalition (17-20):**
|
|
- Argues for competitive advantage, persuasive tactics, goal achievement
|
|
- Challenges: "Will this actually close the Riviera deal and generate revenue?"
|
|
|
|
**Debate Format:**
|
|
1. Light Side presents ethical concerns (5 min)
|
|
2. Dark Side presents pragmatic concerns (5 min)
|
|
3. Cross-debate: Light challenges Dark assumptions (5 min)
|
|
4. Cross-debate: Dark challenges Light idealism (5 min)
|
|
5. Synthesis: Identify common ground (5 min)
|
|
6. Vote: Each facet scores independently
|
|
|
|
**Agent 10 (S5-H10) monitors for:**
|
|
- Unresolved tensions (Light vs Dark >30% divergence)
|
|
- Consensus emerging points (Light + Dark agree)
|
|
- ESCALATE triggers (>20% of facets reject)
|
|
|
|
---
|
|
|
|
## ESCALATE Triggers
|
|
|
|
**Agent 10 must ESCALATE if:**
|
|
1. **<80% approval:** Weak consensus requires human review
|
|
2. **>20% rejection:** Fundamental flaws detected
|
|
3. **IF.sam Light/Dark split >30%:** Ethical vs pragmatic tension unresolved
|
|
4. **Contradictions >10:** Cross-session inconsistencies
|
|
5. **Unverified claims >10%:** Evidence quality below threshold
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
**Minimum Viable Consensus (90%):**
|
|
- 18/20 guardians approve
|
|
- Average empirical score ≥7.0
|
|
- Average logical score ≥7.0
|
|
- Average practical score ≥7.0
|
|
- IF.sam Light/Dark split <30%
|
|
|
|
**Stretch Goal (100% Consensus):**
|
|
- 20/20 guardians approve
|
|
- All 3 dimensions score ≥8.0
|
|
- IF.sam Light + Dark aligned
|
|
- Zero unverified claims
|
|
- Zero contradictions
|
|
|
|
---
|
|
|
|
**Document Signature:**
|
|
```
|
|
if://doc/session-5/guardian-evaluation-criteria-2025-11-13
|
|
Version: 1.0
|
|
Status: READY for Guardian Council
|
|
```
|