Tagger: Enhance Database Search Tools with De Novo Sequencing Tags

by Qi Tang

Institution: University of Waterloo
Year: 2017
Keywords: de novo sequencing; database search; fast speed; protein identification
Posted: 02/01/2018
Record ID: 2165188
Full text PDF: http://hdl.handle.net/10012/11147


Tandem mass spectrometry (MS/MS) is widely used in proteomics nowadays to identify peptides and proteins from a sequence database. In a classic procedure of MS/MS protein identification, proteins are digested into short peptides by enzymes. Then, a tandem mass spectrometer is used to measure the tandem mass spectra for the peptides. Finally, the spectra are interpreted by computer software to identify the sequences of peptides and proteins. However, regular methods become too slow when both the mass spectrometry data and sequence database sizes are large.In this paper, we study the possibility of using de novo tag search to improve traditional database search methods and propose a novel software named "Tagger". As a tag-based method, it utilizes the de novo sequencing results from Novor software as its input and performs approximate sequence matches in the sequence database. According to the test results, the search speed is significantly increased by the ability of indexing de novo sequence tags, as well as the search sensitivity.