Author(s)

  • Ina Devos | COMPOST Collective | Rodestraat 14, 2000, Antwerp, Belgium
  • Daan Kenis | COMPOST Collective | Rodestraat 14, 2000, Antwerp, Belgium
  • Frédérique Vilenne | Data Science Institute | Agoralaan Gebouw D, 3590, Diepenbeek, Belgium
  • Kurt Boonen | Cell Death Signaling | Universiteitsplein 1, 2610, Wilrijk, Belgium
  • Gökhan Ertaylan | VITO Health | Industrial park Vlasmeer 5, 2400, Mol, Belgium
  • Kristien Hens | COMPOST Collective | Rodestraat 14, 2000, Antwerp, Belgium
  • Dirk Valkenborg | Data Science Institute | Agoralaan Gebouw D, 3590, Diepenbeek, Belgium
  • Charlotte Adams (Presenting Author) | Adrem Data Lab | Middelheimlaan 1, 2020, Antwerp, Belgium

Abstract

For health equity, it is important that individuals from diverse populations are considered equally in clinical proteomics research. One aspect of this is ensuring cohort diversity. However, as human reference proteomes are generally not representative across populations, the choice of reference database may introduce additional bias. Such reference bias in clinical proteomics is both a scientific and an ethical issue, as it results in a lower accuracy for all populations, and may unfairly disadvantage underrepresented populations. Currently, little is known about the extent of these issues. Here, we investigate the performance for different populations using UniProt and alternative reference proteomes in a scientific-ethics research collaboration.

We reprocessed a dataset on primary tumor samples from patients assigned to 5 different ethnic groups: Alaskan or Native American, Black or African American, Asian, Latin or Hispanic, and White. For more effective comparison we searched each sample against 5 different geography-based population reference proteomes and one ‘panproteome’ from ProHap, based on the 1000 Genomes Project.

We show that the default use of Uniprot reference proteome may need to be questioned. Through outlining and evaluating (ethically and scientifically) alternative options, we suggest that database choice is a context- and purpose-dependent matter. As such, we hope to inspire discussion on the responsible and effective use of reference proteomes.