Skip to content

feat(stats): reviewer-proof correlation helpers + prisma correlate#2

Open
javiercuervo wants to merge 1 commit into
mainfrom
feat/stats-correlations
Open

feat(stats): reviewer-proof correlation helpers + prisma correlate#2
javiercuervo wants to merge 1 commit into
mainfrom
feat/stats-correlations

Conversation

@javiercuervo

Copy link
Copy Markdown
Contributor

What

Adds a new prisma.stats subpackage and a prisma correlate CLI command: correlation and robustness statistics designed so every estimate ships with what a sceptical reviewer asks for.

  • fisher_ci(r, n) — Fisher z-transform 95% CI for a Pearson r.
  • correlation_table(df, x, y, group=None) — Pearson r + Fisher-z CI and Spearman ρ, with n, overall and per stratum (the Pearson/Spearman gap flags outlier-driven associations).
  • partial_correlation(df, x, y, covar) — first-order partial r(x,y|covar) (controls for confounders such as general ability).
  • bootstrap_ci(df, x, y) — percentile bootstrap CI for small/fragile cells.
  • missingness_compare(df, indicator, by) — selection-bias check (Mann-Whitney) between present vs missing rows.
  • mixed_model_icc(df, outcome, predictor, group) — random-intercept model + ICC for clustered data (students nested in cohorts/courses); optional statsmodels.

Why

Extracted while hardening the statistical rigour of the 20-60-20 AI-permitted-assessment study (Cuervo, 2026, Universidade de Aveiro) against an adversarial three-reviewer panel: within-stratum reporting, rank robustness, partial correlation for mechanical/ability overlap, bootstrap for small cells, missingness and clustering. Reusable for any education/SLR dataset.

Notes

  • Adds scipy as a core dependency; statsmodels is an optional [stats] extra (only mixed_model_icc needs it).
  • New CLI: prisma correlate --in data.csv --x A --y B [--group G] [--partial Z] [--bootstrap] [--out table.csv].
  • Tests: tests/test_stats.py (5 cases); full suite passes (16/16). CHANGELOG updated.

🤖 Generated with Claude Code

Add prisma.stats: fisher_ci, correlation_table (Pearson + Fisher-z CI +
Spearman, overall and per stratum), partial_correlation, bootstrap_ci,
missingness_compare (Mann-Whitney selection check), and mixed_model_icc
(random-intercept clustering + ICC). New 'prisma correlate' CLI command.
scipy becomes a core dependency; statsmodels is an optional [stats] extra.
Extracted while hardening the statistical rigour of the 20-60-20
AI-permitted-assessment study (Cuervo, 2026, Universidade de Aveiro).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant