Home » Abstracts » AI and Bioinformatics » Back to Basics: Spectrum and Peptide Sequence are Sufficient for Top-tier Mass Spectrometry Proteomics Identification

Back to Basics: Spectrum and Peptide Sequence are Sufficient for Top-tier Mass Spectrometry Proteomics Identification

Author(s)

  • Maximilien Burq (Presenting Author) | Tesorai | Rue des Noyers, 41, 2000, Neuchâtel, Switzerland
  • Dejan Stepec | Tesorai | Rue des Noyers, 41, 2000, Neuchatel, Switzerland
  • Peter Cimermancic | Tesorai | Rue des Noyers, 41, 2000, Neuchâtel, Switzerland

Abstract

Traditional mass spectrometry search engines use simple barcode matching for peptide identification. Newer tools improved scoring with dozens of features such as predicted fragment ion intensities and retention times, using machine learning classifiers trained on-the-fly, yielding inconsistent and sometimes inaccurate results across runs.

We propose that limitations of traditional methods arose from reliance on scoring functions that fail to capture complex, non-linear associations between spectra and peptide sequence. We introduce an end-to-end deep-learning model that ingests the complete spectrum and peptide sequence, outputting a precise match likelihood. We avoid decoy-based classifier training, demonstrating that the full spectrum and peptide sequence are sufficient for accurate peptide identification.

The resulting search engine is free for academics, fast – 250 immunopeptidomics searches in 45 minutes – and identifies significantly more peptides than FragPipe, PEAKS or ProteomeDiscoverer (12%, 9% and 21% respectively) on average across a wide array of use-cases. Importantly it works even on sample types never seen during training (TMT, PTMs or data from TOF instruments).