TY - JOUR
T1 - Sensitive sequence comparison as protein function predictor.
AU - Pawłowski, K.
AU - Jaroszewski, L.
AU - Rychlewski, L.
AU - Godzik, A.
N1 - Copyright:
This record is sourced from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine
PY - 2000
Y1 - 2000
N2 - Protein function assignments based on postulated homology as recognized by high sequence similarity are used routinely in genome analysis. Improvements in sensitivity of sequence comparison algorithms got to the point, that proteins with previously undetectable sequence similarity, such as for instance 10-15% of identical residues, sometimes can be classified as similar. What is the relation between such proteins? Is it possible that they are homologous? What is the practical significance of detecting such similarities? A simplified analysis of the relation between sequence similarity and function similarity is presented here for the well-characterized proteins from the E. coli genome. Using a simple measure of functional similarity based on E.C. classification of enzymes, it is shown that it correlates well with sequence similarity measured by statistical significance of the alignment score. Proteins, similar by this standard, even in cases of low sequence identity, have a much larger chance of having similar function than the randomly chosen protein pairs. Interesting exceptions to these rules are discussed.
AB - Protein function assignments based on postulated homology as recognized by high sequence similarity are used routinely in genome analysis. Improvements in sensitivity of sequence comparison algorithms got to the point, that proteins with previously undetectable sequence similarity, such as for instance 10-15% of identical residues, sometimes can be classified as similar. What is the relation between such proteins? Is it possible that they are homologous? What is the practical significance of detecting such similarities? A simplified analysis of the relation between sequence similarity and function similarity is presented here for the well-characterized proteins from the E. coli genome. Using a simple measure of functional similarity based on E.C. classification of enzymes, it is shown that it correlates well with sequence similarity measured by statistical significance of the alignment score. Proteins, similar by this standard, even in cases of low sequence identity, have a much larger chance of having similar function than the randomly chosen protein pairs. Interesting exceptions to these rules are discussed.
UR - http://www.scopus.com/inward/record.url?scp=0033657259&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033657259&partnerID=8YFLogxK
M3 - Article
C2 - 10902155
AN - SCOPUS:0033657259
SP - 42
EP - 53
JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
SN - 2335-6936
ER -