Ian Engels (Presenting Author) | ProGenTomics, Laboratory of Pharmaceutical Biotechnology | Ottergemsesteenweg 460, 9000, Gent, Belgium
Abstract
LC-MS/MS extends the MALDI-TOF ZooMS ‘mass fingerprinting’ approach to species identification by providing fragmentation spectra for each peptide. However, ancient bone samples generate sparse data containing only a few collagen proteins, rendering target-decoy strategies unusable and increasing uncertainty in peptide annotation. To ameliorate this issue, we present a ZooMS/MS data pipeline that builds on a manually curated Collagen database and comprises two novel algorithms: isoBLAST and ClassiCOL. isoBLAST first extends peptide ambiguity by generating all ‘potential peptide candidates’ isobaric to the annotated precursor. The exhaustive set of candidates created is then used to retain or reject different potential paths at each taxonomic branching point from superkingdom to species, until the greatest possible specificity is reached. Uniquely, ClassiCOL allows for the identification of taxonomic mixtures, including contaminated samples, as well as suggesting taxonomies not represented in sequence databases, including extinct taxa. All considered ambiguity is then graphically represented with clear prioritization of the potential taxa in the sample. Using public as well as in-house data acquired on different instruments, we demonstrate the performance of this universal postprocessing and explore the identification of both genetic and sample mixtures. Diet reconstruction from 40,000-year-old cave hyena coprolites illustrates the exciting potential of this approach.