Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra

Yue Chen, Sung Won Kwon, Sung Chan Kim, Yingming Zhao

Research output: Contribution to journalArticle

142 Citations (Scopus)

Abstract

Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.

Original languageEnglish (US)
Pages (from-to)998-1005
Number of pages8
JournalJournal of Proteome Research
Volume4
Issue number3
DOIs
StatePublished - May 2005

Fingerprint

Protein Databases
Peptides
Proteins
Search Engine
Search engines
Complex Mixtures
HeLa Cells
Proteomics
Escherichia coli
Inspection
Databases
Ions

Keywords

  • Automated database search
  • Manual evaluation
  • Protein identification

ASJC Scopus subject areas

  • Genetics
  • Biotechnology
  • Biochemistry

Cite this

Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. / Chen, Yue; Kwon, Sung Won; Kim, Sung Chan; Zhao, Yingming.

In: Journal of Proteome Research, Vol. 4, No. 3, 05.2005, p. 998-1005.

Research output: Contribution to journalArticle

Chen, Yue ; Kwon, Sung Won ; Kim, Sung Chan ; Zhao, Yingming. / Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. In: Journal of Proteome Research. 2005 ; Vol. 4, No. 3. pp. 998-1005.
@article{87c3f58e4b194f4fa616cb8b8938ab88,
title = "Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra",
abstract = "Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.",
keywords = "Automated database search, Manual evaluation, Protein identification",
author = "Yue Chen and Kwon, {Sung Won} and Kim, {Sung Chan} and Yingming Zhao",
year = "2005",
month = "5",
doi = "10.1021/pr049754t",
language = "English (US)",
volume = "4",
pages = "998--1005",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "3",

}

TY - JOUR

T1 - Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra

AU - Chen, Yue

AU - Kwon, Sung Won

AU - Kim, Sung Chan

AU - Zhao, Yingming

PY - 2005/5

Y1 - 2005/5

N2 - Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.

AB - Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.

KW - Automated database search

KW - Manual evaluation

KW - Protein identification

UR - http://www.scopus.com/inward/record.url?scp=20844457100&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=20844457100&partnerID=8YFLogxK

U2 - 10.1021/pr049754t

DO - 10.1021/pr049754t

M3 - Article

C2 - 15952748

AN - SCOPUS:20844457100

VL - 4

SP - 998

EP - 1005

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 3

ER -