Ancestry bias in GWAS and polygenic risk scores, examined through the 2024 All of Us controversy, with implications for the Gulf.
An eight-minute critical talk on health-data equity. It argues that bias in genomics has two layers, not one: who gets sampled, and how their data is shown.
Decades of European-dominated genome-wide association studies (GWAS) produced polygenic risk scores (PRS) that work less well for everyone else. The 2024 All of Us paper in Nature set out to fix the sampling problem, and largely did. Its headline figure then reintroduced bias at the level of visualisation. A UMAP projection coloured by self-described race made a genetic continuum look like discrete groups. The same paper also published an admixture plot that told the honest, continuous story, yet the UMAP became the press image.
The conclusion is simple. Fixing who is sampled is necessary, but it is not sufficient. For the Gulf, where Middle Eastern ancestry is barely represented even in All of Us, regional cohorts are a precondition for safe genomic medicine rather than an optional extra.
- About 86% of pre-2024 GWAS participants were of European ancestry. PRS are several-fold less accurate outside that group.
- All of Us (Bick et al., Nature 2024) sequenced 245,000 genomes, about 46% non-European, and added 275 million variants.
- The UMAP-by-race figure drew public criticism from geneticists for implying discrete biological races.
- "Diverse" still meant almost no Middle Eastern representation, so the Emirati and Qatar genome programmes matter for the region.
- Five recommendations follow, covering visualisation standards, bias-and-equity review, and Gulf clinical practice.
presentation.pdf: the slide deck (ten content slides plus references).summary.md: a short written walkthrough of the argument, with the full reference list.
- Bick et al. (2024). Genomic data in the All of Us Research Program. Nature, 627, 340. doi
- Martin et al. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics, 51, 584.
- Sirugo et al. (2019). The missing diversity in human genetic studies. Cell, 177, 26.
The full reference list is in summary.md.
Original text, analysis, and author-made figures are licensed under
CC BY 4.0. Third-party figures and quotations reproduced in
presentation.pdf for commentary remain under their original copyright and are not relicensed (see
LICENSE).
AI tools were used for language refinement and slide organization. The analysis, argument, and final conclusions are the author's own.
Author: Riya Shet. Analysis completed as part of MSc Health Data Science coursework. This repository is a cleaned version of an individual project.