Skip to content

riyashet-hds/genomic-data-bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Bias in Genomic Data: When "Diverse" Is Not Enough

Ancestry bias in GWAS and polygenic risk scores, examined through the 2024 All of Us controversy, with implications for the Gulf.

An eight-minute critical talk on health-data equity. It argues that bias in genomics has two layers, not one: who gets sampled, and how their data is shown.

The argument

Decades of European-dominated genome-wide association studies (GWAS) produced polygenic risk scores (PRS) that work less well for everyone else. The 2024 All of Us paper in Nature set out to fix the sampling problem, and largely did. Its headline figure then reintroduced bias at the level of visualisation. A UMAP projection coloured by self-described race made a genetic continuum look like discrete groups. The same paper also published an admixture plot that told the honest, continuous story, yet the UMAP became the press image.

The conclusion is simple. Fixing who is sampled is necessary, but it is not sufficient. For the Gulf, where Middle Eastern ancestry is barely represented even in All of Us, regional cohorts are a precondition for safe genomic medicine rather than an optional extra.

Key points

  • About 86% of pre-2024 GWAS participants were of European ancestry. PRS are several-fold less accurate outside that group.
  • All of Us (Bick et al., Nature 2024) sequenced 245,000 genomes, about 46% non-European, and added 275 million variants.
  • The UMAP-by-race figure drew public criticism from geneticists for implying discrete biological races.
  • "Diverse" still meant almost no Middle Eastern representation, so the Emirati and Qatar genome programmes matter for the region.
  • Five recommendations follow, covering visualisation standards, bias-and-equity review, and Gulf clinical practice.

In this repository

  • presentation.pdf: the slide deck (ten content slides plus references).
  • summary.md: a short written walkthrough of the argument, with the full reference list.

Selected sources

  • Bick et al. (2024). Genomic data in the All of Us Research Program. Nature, 627, 340. doi
  • Martin et al. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics, 51, 584.
  • Sirugo et al. (2019). The missing diversity in human genetic studies. Cell, 177, 26.

The full reference list is in summary.md.

Licensing

Original text, analysis, and author-made figures are licensed under CC BY 4.0. Third-party figures and quotations reproduced in presentation.pdf for commentary remain under their original copyright and are not relicensed (see LICENSE).

AI assistance

AI tools were used for language refinement and slide organization. The analysis, argument, and final conclusions are the author's own.


Author: Riya Shet. Analysis completed as part of MSc Health Data Science coursework. This repository is a cleaned version of an individual project.

About

A critical analysis of ancestry bias in GWAS and polygenic risk scores, using the 2024 All of Us controversy, with implications for the Gulf.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors