Add stacked terminology share chart to dashboard#16
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| label: context => { | ||
| const value = typeof context.parsed.y === 'number' ? context.parsed.y : 0; | ||
| return context.dataset.label + ': ' + (value * 100).toFixed(2) + '%'; | ||
| } |
There was a problem hiding this comment.
Correct stacked tooltip to use dataset share
The tooltip for the terminology distribution chart builds its label from context.parsed.y. On a stacked axis context.parsed.y is the cumulative stack total, so datasets other than the first display the sum of their share and all previous ones (e.g., a 30% series will show 70% if 40% is stacked below it). This misreports the proportions the chart is meant to convey. Use context.raw (the dataset’s own value) or compute the difference from the stack to show the actual share for each terminology.
Useful? React with 👍 / 👎.
Code ReviewOverviewThis PR adds a new stacked area chart to visualize the proportional distribution of racial terminology mentions over time. The implementation is generally solid, but I've identified several areas for improvement. ✅ Strengths
🐛 Potential Issues1. Division by Zero Handling Missing (extractor/generate_dashboard.py:356-363)The code only creates proportion entries when Issue: If a year has articles but no terminology usage, it won't be in Recommendation: Always append proportion data, using zeros when if terminology_total:
term_proportion_by_year.append({
'year': row['year'],
'caucasian_prop': caucasian_count / terminology_total,
'white_prop': white_count / terminology_total,
'european_prop': european_count / terminology_total,
'other_prop': other_count / terminology_total,
})
else:
# Include years with no terminology to maintain timeline continuity
term_proportion_by_year.append({
'year': row['year'],
'caucasian_prop': 0,
'white_prop': 0,
'european_prop': 0,
'other_prop': 0,
})2. Terminology Counting Logic Issue (extractor/generate_dashboard.py:349-354)The proportion calculation sums individual counts, but articles can have multiple terminology flags set. This means the "total" might not represent unique articles, but rather total mentions/flags. Question: Is this intentional? The SQL query at line 308 uses Impact: If an article uses both "Caucasian" AND "White", it's counted twice in the denominator, making proportions potentially misleading. For example, if 100 articles use "Caucasian" and the same 100 use "White", the chart would show 50%/50%, when actually 100% of articles use both. Recommendation: Consider whether you want:
Add a comment explaining the calculation methodology. 3. Chart Type Mismatch (extractor/generate_dashboard.py:879)type: 'line',The chart uses Recommendation: Consider using 4. Accessibility ConcernColor is the only differentiator between terminology types. Consider adding patterns or using the Chart.js pattern plugin for better accessibility. 🎯 Code QualityMissing Error Handling
Suggestion: Add defensive checks: if (termDistributionCanvas && Array.isArray(termProportionData) && termProportionData.length > 0) {
try {
// ... chart creation
} catch (error) {
console.error('Failed to create terminology distribution chart:', error);
}
}Code DuplicationThe 📊 Performance ConsiderationsMinor: The chart data is embedded directly in HTML. For large datasets (hundreds of years), consider:
🧪 TestingCritical Issue: The PR description states "Testing: not run" Required before merge:
📝 DocumentationMissing: No comments explaining the proportion calculation methodology or the purpose of the new chart. Recommendation: Add docstring-style comments: # Calculate proportions of each terminology type relative to total terminology usage
# Note: Articles may use multiple terms, so proportions represent share of term occurrences,
# not share of articles. Sum across terms may exceed 100% for any given year.
terminology_total = (...)🔒 SecurityNo security concerns identified. The data flow is: PostgreSQL → Python → embedded JSON → Chart.js, all server-side generated. SummaryRecommendation: The code is close to merge-ready but needs:
The visualization concept is excellent and will provide valuable insights into terminology trends. Once these issues are addressed, this will be a great addition to the dashboard! Suggested Next Steps
Let me know if you'd like me to clarify any of these points! |
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_690a86b255a083258ac4a7d3ce80ff83