Using homology relations within a database markedly boosts protein sequence similarity search

Jing Tong; Ruslan I. Sadreyev; Jimin Pei; Lisa N. Kinch; Nick V. Grishin

doi:10.1073/pnas.1424324112

Using homology relations within a database markedly boosts protein sequence similarity search

Jing Tong, Ruslan I. Sadreyev, Jimin Pei, Lisa N. Kinch, Nick V. Grishin

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

Original language	English (US)
Pages (from-to)	7003-7008
Number of pages	6
Journal	Proceedings of the National Academy of Sciences of the United States of America
Volume	112
Issue number	22
DOIs	https://doi.org/10.1073/pnas.1424324112
State	Published - Jun 2 2015

Keywords

Homology detection
Homology network
Protein modeling
Remote sequence similarity search
Similarity score

ASJC Scopus subject areas

General

Access to Document

10.1073/pnas.1424324112

Cite this

@article{1f9f0393e5594c6bb71c6c5dcc25fef7,

title = "Using homology relations within a database markedly boosts protein sequence similarity search",

abstract = "Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.",

keywords = "Homology detection, Homology network, Protein modeling, Remote sequence similarity search, Similarity score",

author = "Jing Tong and Sadreyev, {Ruslan I.} and Jimin Pei and Kinch, {Lisa N.} and Grishin, {Nick V.}",

year = "2015",

month = jun,

day = "2",

doi = "10.1073/pnas.1424324112",

language = "English (US)",

volume = "112",

pages = "7003--7008",

journal = "Proceedings of the National Academy of Sciences of the United States of America",

issn = "0027-8424",

publisher = "National Academy of Sciences",

number = "22",

}

TY - JOUR

T1 - Using homology relations within a database markedly boosts protein sequence similarity search

AU - Tong, Jing

AU - Sadreyev, Ruslan I.

AU - Pei, Jimin

AU - Kinch, Lisa N.

AU - Grishin, Nick V.

PY - 2015/6/2

Y1 - 2015/6/2

N2 - Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

AB - Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

KW - Homology detection

KW - Homology network

KW - Protein modeling

KW - Remote sequence similarity search

KW - Similarity score

UR - http://www.scopus.com/inward/record.url?scp=84930959685&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930959685&partnerID=8YFLogxK

U2 - 10.1073/pnas.1424324112

DO - 10.1073/pnas.1424324112

M3 - Article

C2 - 26038555

AN - SCOPUS:84930959685

SN - 0027-8424

VL - 112

SP - 7003

EP - 7008

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

IS - 22

ER -

Using homology relations within a database markedly boosts protein sequence similarity search

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this