Skip to content

Add stacked terminology share chart to dashboard#16

Merged
solresol merged 2 commits into
mainfrom
codex/add-stacked-area-chart-for-racial-terminology
Nov 5, 2025
Merged

Add stacked terminology share chart to dashboard#16
solresol merged 2 commits into
mainfrom
codex/add-stacked-area-chart-for-racial-terminology

Conversation

@solresol

@solresol solresol commented Nov 5, 2025

Copy link
Copy Markdown
Collaborator

Summary

  • compute yearly racial terminology totals and derive a proportion dataset for charting
  • render a stacked area chart on the dashboard to show the share of each terminology among yearly racial references

Testing

  • not run

https://chatgpt.com/codex/tasks/task_e_690a86b255a083258ac4a7d3ce80ff83

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +892 to +895
label: context => {
const value = typeof context.parsed.y === 'number' ? context.parsed.y : 0;
return context.dataset.label + ': ' + (value * 100).toFixed(2) + '%';
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Correct stacked tooltip to use dataset share

The tooltip for the terminology distribution chart builds its label from context.parsed.y. On a stacked axis context.parsed.y is the cumulative stack total, so datasets other than the first display the sum of their share and all previous ones (e.g., a 30% series will show 70% if 40% is stacked below it). This misreports the proportions the chart is meant to convey. Use context.raw (the dataset’s own value) or compute the difference from the stack to show the actual share for each terminology.

Useful? React with 👍 / 👎.

@claude

claude Bot commented Nov 5, 2025

Copy link
Copy Markdown

Code Review

Overview

This PR adds a new stacked area chart to visualize the proportional distribution of racial terminology mentions over time. The implementation is generally solid, but I've identified several areas for improvement.


✅ Strengths

  1. Clear purpose: The chart effectively shows how the share of different terminology types changes over time
  2. Consistent styling: Follows the existing Chart.js patterns used throughout the dashboard
  3. Proper data validation: Uses typeof checks and fallback to 0 for numeric values
  4. Reuses color scheme: Maintains visual consistency with existing charts

🐛 Potential Issues

1. Division by Zero Handling Missing (extractor/generate_dashboard.py:356-363)

The code only creates proportion entries when terminology_total > 0, which is good. However, this means years with zero terminology usage won't appear in the chart, creating visual gaps.

Issue: If a year has articles but no terminology usage, it won't be in termProportionData, but will be in other charts, causing inconsistent x-axis alignment.

Recommendation: Always append proportion data, using zeros when terminology_total == 0:

if terminology_total:
    term_proportion_by_year.append({
        'year': row['year'],
        'caucasian_prop': caucasian_count / terminology_total,
        'white_prop': white_count / terminology_total,
        'european_prop': european_count / terminology_total,
        'other_prop': other_count / terminology_total,
    })
else:
    # Include years with no terminology to maintain timeline continuity
    term_proportion_by_year.append({
        'year': row['year'],
        'caucasian_prop': 0,
        'white_prop': 0,
        'european_prop': 0,
        'other_prop': 0,
    })

2. Terminology Counting Logic Issue (extractor/generate_dashboard.py:349-354)

The proportion calculation sums individual counts, but articles can have multiple terminology flags set. This means the "total" might not represent unique articles, but rather total mentions/flags.

Question: Is this intentional? The SQL query at line 308 uses COUNT(*) FILTER, which counts articles, not term occurrences within articles.

Impact: If an article uses both "Caucasian" AND "White", it's counted twice in the denominator, making proportions potentially misleading. For example, if 100 articles use "Caucasian" and the same 100 use "White", the chart would show 50%/50%, when actually 100% of articles use both.

Recommendation: Consider whether you want:

  • Option A: Proportion of articles using each term (current approach, but potentially confusing)
  • Option B: Proportion among mutually exclusive categories (would require re-categorizing articles)
  • Option C: Raw counts instead of proportions (clearer but loses normalization benefit)

Add a comment explaining the calculation methodology.

3. Chart Type Mismatch (extractor/generate_dashboard.py:879)

type: 'line',

The chart uses type: 'line' but aims to create a stacked area chart. While Chart.js does support stacked line charts with fill: 'origin', this is somewhat unconventional.

Recommendation: Consider using type: 'bar' with stacked: true for a traditional stacked bar chart, or explicitly document why line+fill is preferred over the native area chart approach.

4. Accessibility Concern

Color is the only differentiator between terminology types. Consider adding patterns or using the Chart.js pattern plugin for better accessibility.


🎯 Code Quality

Missing Error Handling

  • Line 856: No try-catch around chart creation
  • Line 871: No validation that termProportionData has the expected structure

Suggestion: Add defensive checks:

if (termDistributionCanvas && Array.isArray(termProportionData) && termProportionData.length > 0) {
    try {
        // ... chart creation
    } catch (error) {
        console.error('Failed to create terminology distribution chart:', error);
    }
}

Code Duplication

The distributionConfigs array (lines 860-865) duplicates color definitions that likely exist elsewhere. Consider centralizing color constants.


📊 Performance Considerations

Minor: The chart data is embedded directly in HTML. For large datasets (hundreds of years), consider:

  • Limiting the date range displayed by default
  • Implementing chart filtering/zooming
  • The current approach is fine for typical academic publication timespans (50-100 years)

🧪 Testing

Critical Issue: The PR description states "Testing: not run"

Required before merge:

  1. Generate the dashboard with real data: ./generate_dashboard.py --output-dir test-dashboard
  2. Open test-dashboard/index.html in a browser
  3. Verify the chart renders correctly
  4. Test edge cases:
    • Years with zero terminology usage
    • Years with only one terminology type
    • First/last years in dataset
  5. Check browser console for JavaScript errors
  6. Test in multiple browsers (Chrome, Firefox, Safari)

📝 Documentation

Missing: No comments explaining the proportion calculation methodology or the purpose of the new chart.

Recommendation: Add docstring-style comments:

# Calculate proportions of each terminology type relative to total terminology usage
# Note: Articles may use multiple terms, so proportions represent share of term occurrences,
# not share of articles. Sum across terms may exceed 100% for any given year.
terminology_total = (...)

🔒 Security

No security concerns identified. The data flow is: PostgreSQL → Python → embedded JSON → Chart.js, all server-side generated.


Summary

Recommendation: ⚠️ Request Changes

The code is close to merge-ready but needs:

  1. Critical: Handle years with zero terminology usage to prevent timeline gaps
  2. Critical: Actually test the implementation before merge
  3. Important: Clarify the proportion calculation methodology (overlapping categories)
  4. Nice-to-have: Add error handling and documentation

The visualization concept is excellent and will provide valuable insights into terminology trends. Once these issues are addressed, this will be a great addition to the dashboard!


Suggested Next Steps

  1. Address the zero-total handling issue
  2. Run the dashboard generation and verify chart renders correctly
  3. Add a comment explaining that proportions may sum to >100% due to overlapping categories
  4. Consider adding a note on the dashboard itself explaining this chart's interpretation
  5. Update PR description with test results

Let me know if you'd like me to clarify any of these points!

@solresol solresol merged commit 8095ade into main Nov 5, 2025
1 check passed
@solresol solresol deleted the codex/add-stacked-area-chart-for-racial-terminology branch November 5, 2025 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant