This document explains how the Idea Stock Exchange calculates criterion scores using the ReasonRank algorithm.
The algorithm answers: "How good is this criterion for measuring this topic?"
Scores range from 0-100%, where:
- 80-100%: Excellent measure (e.g., Glacier Mass Balance for climate change)
- 60-80%: Good measure
- 40-60%: Moderate measure
- 0-40%: Weak measure (e.g., Twitter Sentiment for climate change)
Criterion Overall Score
↓
Four Dimension Scores (Validity, Reliability, Independence, Linkage)
↓
Supporting vs. Opposing Arguments (each weighted by quality)
Each argument has three quality metrics (0-100%):
- Evidence Quality: How well-supported is this argument?
- Logical Validity: How logically sound is this argument?
- Importance: How important is this consideration?
The argument's weight is calculated using the geometric mean:
def calculate_argument_weight(evidence_quality, logical_validity, importance):
# Normalize to 0-1
eq = evidence_quality / 100.0
lv = logical_validity / 100.0
imp = importance / 100.0
# Geometric mean ensures all three factors matter
weight = (eq * lv * imp) ** (1/3)
# Convert back to 0-100
return weight * 100.0Why geometric mean?
The geometric mean ensures that if ANY quality metric is very low, the overall weight drops significantly. This prevents poorly-supported arguments from having undue influence.
Example:
- Argument A: Evidence=90%, Logic=90%, Importance=90% → Weight=90%
- Argument B: Evidence=90%, Logic=90%, Importance=30% → Weight=66%
- Argument C: Evidence=30%, Logic=30%, Importance=30% → Weight=30%
Each dimension (Validity, Reliability, Independence, Linkage) is scored based on the balance of supporting vs. opposing arguments.
def calculate_dimension_score(supporting_args, opposing_args):
# Sum weights
total_support = sum(arg.weight for arg in supporting_args)
total_oppose = sum(arg.weight for arg in opposing_args)
# Calculate balance
balance = total_support - total_oppose
# Map to 0-100 using sigmoid function
scale = 100.0
sigmoid = 1 / (1 + exp(-balance / scale))
score = sigmoid * 100.0
return scoreKey properties:
- Equal support/oppose → 50% score
- More support → Higher score (approaches 100%)
- More opposition → Lower score (approaches 0%)
- Smooth and continuous → Small changes don't cause jumps
Example:
Validity dimension for "Glacier Mass Balance":
- Supporting arguments: Total weight = 300
- Opposing arguments: Total weight = 100
- Balance = 300 - 100 = +200
- Score ≈ 88%
The overall score is the weighted average of the four dimension scores.
def calculate_overall_score(validity, reliability, independence, linkage):
# Default: equal weights
return (validity + reliability + independence + linkage) / 4Default weights: All dimensions equally important (25% each)
Custom weights: Can emphasize certain dimensions for specific contexts
Example:
For "Glacier Mass Balance":
- Validity: 95%
- Reliability: 92%
- Independence: 90%
- Linkage: 92%
- Overall: (95+92+90+92)/4 = 92%
Validity: 95%
- ✓ Supporting (weight=90): "Ice melts only when heat is added - integrates temperature naturally"
- ✗ Opposing (weight=70): "Some glaciers affected by local precipitation"
- Balance: +20 → High score
Reliability: 92%
- ✓ Supporting (weight=98): "Satellite imagery provides objective measurements"
- ✗ Opposing (weight=65): "Historical data less precise"
- Balance: +33 → High score
Independence: 90%
- ✓ Supporting (weight=92): "Independent verification across different systems"
- Balance: +92 → High score
Linkage: 92%
- ✓ Supporting (weight=90): "Correlates strongly with atmospheric CO2"
- Balance: +90 → High score
Overall Score: 92% ← Excellent criterion
Validity: 8%
- ✗ Opposing (weight=95): "Measures perception, not reality"
- Balance: -95 → Very low score
Reliability: 10%
- ✗ Opposing (weight=90): "Heavily influenced by bots and viral events"
- Balance: -90 → Very low score
Independence: 12%
- ✗ Opposing (weight=92): "Subject to manipulation and echo chambers"
- Balance: -92 → Very low score
Linkage: 5%
- ✗ Opposing (weight=95): "No correlation with actual temperature"
- Balance: -95 → Very low score
Overall Score: 9% ← Very weak criterion
Validity: 40%
- ✗ Opposing (weight=88): "Measures profit, not wellbeing"
- ✓ Supporting (weight=70): "Predicts future employment"
- Balance: -18 → Below-average score
Reliability: 85%
- ✓ Supporting (weight=98): "Precise real-time data"
- Balance: +98 → High score
Independence: 50%
- (No strong arguments either way)
- Balance: 0 → Neutral score
Linkage: 30%
- ✗ Opposing (weight=90): "Weak correlation with median quality of life"
- Balance: -90 → Low score
Overall Score: 51% ← Moderate criterion
Arguments with strong evidence carry more weight than speculation. This filters noise automatically.
Every score can be traced back to specific arguments. No black boxes.
As the community adds better arguments, scores converge toward accuracy.
Gaming requires successfully arguing that a biased measure is unbiased, using verifiable evidence, against informed opposition. Much harder than gaming unstructured debates.
- Criteria quality (what we're measuring)
- Evidence (what the measurements show)
- Values (which criteria we prioritize)
Mixing these layers causes endless confusion. Separating them enables progress.
Once criteria are scored, they're used to weigh evidence:
def calculate_truth_score(claim, evidence_items):
weighted_evidence = 0
total_weight = 0
for evidence in evidence_items:
# Use criterion score as evidence weight
criterion_score = evidence.criterion.overall_score
weighted_evidence += evidence.supports * criterion_score
total_weight += criterion_score
if total_weight == 0:
return 50 # Neutral if no evidence
return (weighted_evidence / total_weight) * 100Result: Claims measured by excellent criteria get higher confidence than claims measured by poor criteria.
- Default: All dimensions start at 50% (neutral)
- Overall: 50% (neutral)
- Interpretation: "We don't know if this is good or bad yet"
- Balance: Positive
- Score: High (70-100%)
- Interpretation: "Community thinks this is a good measure"
- Balance: Negative
- Score: Low (0-30%)
- Interpretation: "Community thinks this is a poor measure"
- Balance: Near zero
- Score: Around 50%
- Interpretation: "Legitimate disagreement about quality"
For domain-specific applications, you can adjust dimension importance:
# Emphasize validity and linkage over reliability
weights = {
'validity': 0.35,
'reliability': 0.15,
'independence': 0.15,
'linkage': 0.35
}
overall_score = calculate_overall_score(
validity, reliability, independence, linkage, weights
)You can add new quality dimensions (e.g., "Timeliness", "Cost"):
- Add to DimensionType enum in models.py
- Add score field to Criterion model
- Update scoring algorithm to include new dimension
- Update UI to display new dimension
Arguments can themselves have sub-arguments, creating nested reasoning:
Criterion: "GDP Growth"
↓
Dimension: Validity
↓
Argument: "GDP doesn't measure distribution"
↓
Counter-Argument: "But it correlates with employment"
↓
Counter-Counter-Argument: "Only for certain sectors"
This is implemented through the recursive argument scoring system.
Scores are cached in the database. Recalculation only happens when:
- New argument is added
- Existing argument is updated
- Manual recalculation is triggered
When many arguments change, batch the recalculation:
# Don't recalculate after each argument
for arg in new_arguments:
db.add(arg)
db.commit()
# Recalculate once at the end
recalculate_criterion_scores(db, criterion_id)Adding a supporting argument can only increase (or maintain) the score. Adding an opposing argument can only decrease (or maintain) the score.
Scores are always in [0, 100]. No overflow or underflow.
Small changes in argument quality → small changes in score. No discontinuities.
As more arguments are added, scores converge toward "true" quality (assuming good-faith participants).
- Fisher, R., & Ury, W. (1981). Getting to Yes - Separating criteria from conclusions
- Pearl, J. (2009). Causality - Causal inference and measurement validity
- Silver, N. (2012). The Signal and the Noise - Bayesian updating and evidence weighting
See:
/backend/algorithms/scoring.py- Core algorithm implementation/backend/models.py- Data model/backend/main.py- API endpoints/frontend/src/components/DimensionBreakdown.tsx- Score visualization