Abstract
Burrows-Wheeler Transform (BWT) [1] is widely used in genomic data analysis as an efficient algorithm allowing contig to reference mapping with some degree of uncertainty which led to variant discovery. As implemented in many state-of-the-art tools it has become a gold standard in Next-generation sequencing analysis pipelines [2,3]. Following the ideas from the genomic world we have decided to check how such algorithm could behave with proteomic datasets. In our work, we use the same advantage of BWT to map both identified and de novo peptides to human proteome (including canonical sequences and isoforms) based on public datasets from PRIDE database and our house-work data. As of that, we have implemented our own tool with BTW algorithm making it possible to work with amino acid sequences. As the first results suggest, the number of identifications increases significantly. Whereas many identifications are well known and match positions of variants described in the literature, there are plenty of new ones, waiting for further and deeper analysis.1. Burrows, M. A block-sorting lossless data compression algorithm. SRS Research Report. 1994;124.
2. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754-1760
3. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589-595

