This project analyzes structural brain connectomes from the SWU4 dataset using graph-theoretic features and machine learning. It is derived from an assignment in JHU's EN.585.781 Frontiers in Neuroengineering and refactored into a compact project.
Can structural connectome features support prediction of subject sex?
This is an exploratory analysis, not a clinical, diagnostic, or causal claim.
Each file represents one subject's structural brain network:
- nodes: Desikan atlas brain regions
- edges: estimated structural connections
- weights: connection strength
The local matched subset contains 207 subjects with age and sex metadata.
The local files came from the course-provided wrgr/graph-explorer repo and are excluded from git by default. Source and license links are listed in References.
- Build weighted undirected graphs with NetworkX.
- Visualize mean and sex-conditional mean connectomes.
- Extract graph features: efficiency, modularity, path length, node strength, clustering, centrality, participation coefficient, and log-scaled edge weights.
- Benchmark feature sets with 5-fold cross-validation.
- Train a random forest classifier using a stratified train/test split.
pip install -r requirements.txt
python scripts/run_analysis.py
python scripts/benchmark_features.pyThe scripts expect the local SWU4 files under data/. Data files are excluded from git by default.
Outputs are saved to results/figures/ and results/metrics/.
Best feature set: graph summaries + node topology + log-scaled edge weights.
- Features per subject: 2,781
- Holdout balanced accuracy: 0.846
- 5-fold CV balanced accuracy: 0.815 +/- 0.038
| Feature Set | Features | CV Balanced Accuracy |
|---|---|---|
| graph + topology + log edge weights | 2,781 | 0.815 +/- 0.038 |
| graph + log edge weights | 2,493 | 0.801 +/- 0.036 |
| graph features only | 78 | 0.758 +/- 0.032 |
Direct region-to-region edge weights were the strongest signal, and adding graph topology features improved performance further. The result should be treated as exploratory because the final feature set was selected after benchmarking alternatives. A stronger study would use nested validation, independent test data, richer covariates, and acquisition/site controls.
- NeuroData. m2g usage documentation. m2g docs
- NeuroData. m2g license: PolyForm Noncommercial License 1.0.0. m2g license
wrgr/graph-explorer. Course-provided SWU4 connectome files. GitHub- Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity. PubMed
- Ingalhalikar, M. et al. (2014). Sex differences in the structural connectome. PubMed
- Breiman, L. (2001). Random forests. DOI