Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs

Ruslan I. Sadreyev, Nick V. Grishin

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Motivation: Adding more distant homologs to a multiple alignment and thus increasing its diversity may eventually deteriorate the numerical profile constructed from this alignment. Here, we addressed the question whether such a diversity limit can be reached in the alignments of confident homologs found by PSI-BLAST, and we analyzed the dependence of the quality of the profile-profile comparison made by COMPASS on the sequence diversity within these alignments. Results: Protein families that have a greater number of diverse confident homologs in the current sequence data-bases provide an increased quality of similarity detection in profile databases, but produce on average less accurate profile-profile alignments with their remote relatives. This lower alignment accuracy cannot be improved when the most distant members of these families are excluded from their profiles. On the contrary, the presence of more diverse members results in more accurate alignments. For families with a high diversity of confident homologs, the lower quality of profile alignments with their remote relatives seems to be an attribute of these families or their alignments, rather than to be caused by the large number of diverse sequences itself. Our results suggest that at any level of profile diversity, one should include in the multiple alignment as many confident sequence homologs as possible in order to produce the most accurate results.

Original languageEnglish (US)
Pages (from-to)818-828
Number of pages11
JournalBioinformatics
Volume20
Issue number6
DOIs
StatePublished - Apr 12 2004

Fingerprint

Alignment
Inclusion
Databases
Sequence Homology
Proteins
Profile
Attribute
Protein
Family

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs. / Sadreyev, Ruslan I.; Grishin, Nick V.

In: Bioinformatics, Vol. 20, No. 6, 12.04.2004, p. 818-828.

Research output: Contribution to journalArticle

@article{7a9e6125f0fd46749be1fd88bf82f555,
title = "Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs",
abstract = "Motivation: Adding more distant homologs to a multiple alignment and thus increasing its diversity may eventually deteriorate the numerical profile constructed from this alignment. Here, we addressed the question whether such a diversity limit can be reached in the alignments of confident homologs found by PSI-BLAST, and we analyzed the dependence of the quality of the profile-profile comparison made by COMPASS on the sequence diversity within these alignments. Results: Protein families that have a greater number of diverse confident homologs in the current sequence data-bases provide an increased quality of similarity detection in profile databases, but produce on average less accurate profile-profile alignments with their remote relatives. This lower alignment accuracy cannot be improved when the most distant members of these families are excluded from their profiles. On the contrary, the presence of more diverse members results in more accurate alignments. For families with a high diversity of confident homologs, the lower quality of profile alignments with their remote relatives seems to be an attribute of these families or their alignments, rather than to be caused by the large number of diverse sequences itself. Our results suggest that at any level of profile diversity, one should include in the multiple alignment as many confident sequence homologs as possible in order to produce the most accurate results.",
author = "Sadreyev, {Ruslan I.} and Grishin, {Nick V.}",
year = "2004",
month = "4",
day = "12",
doi = "10.1093/bioinformatics/btg485",
language = "English (US)",
volume = "20",
pages = "818--828",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs

AU - Sadreyev, Ruslan I.

AU - Grishin, Nick V.

PY - 2004/4/12

Y1 - 2004/4/12

N2 - Motivation: Adding more distant homologs to a multiple alignment and thus increasing its diversity may eventually deteriorate the numerical profile constructed from this alignment. Here, we addressed the question whether such a diversity limit can be reached in the alignments of confident homologs found by PSI-BLAST, and we analyzed the dependence of the quality of the profile-profile comparison made by COMPASS on the sequence diversity within these alignments. Results: Protein families that have a greater number of diverse confident homologs in the current sequence data-bases provide an increased quality of similarity detection in profile databases, but produce on average less accurate profile-profile alignments with their remote relatives. This lower alignment accuracy cannot be improved when the most distant members of these families are excluded from their profiles. On the contrary, the presence of more diverse members results in more accurate alignments. For families with a high diversity of confident homologs, the lower quality of profile alignments with their remote relatives seems to be an attribute of these families or their alignments, rather than to be caused by the large number of diverse sequences itself. Our results suggest that at any level of profile diversity, one should include in the multiple alignment as many confident sequence homologs as possible in order to produce the most accurate results.

AB - Motivation: Adding more distant homologs to a multiple alignment and thus increasing its diversity may eventually deteriorate the numerical profile constructed from this alignment. Here, we addressed the question whether such a diversity limit can be reached in the alignments of confident homologs found by PSI-BLAST, and we analyzed the dependence of the quality of the profile-profile comparison made by COMPASS on the sequence diversity within these alignments. Results: Protein families that have a greater number of diverse confident homologs in the current sequence data-bases provide an increased quality of similarity detection in profile databases, but produce on average less accurate profile-profile alignments with their remote relatives. This lower alignment accuracy cannot be improved when the most distant members of these families are excluded from their profiles. On the contrary, the presence of more diverse members results in more accurate alignments. For families with a high diversity of confident homologs, the lower quality of profile alignments with their remote relatives seems to be an attribute of these families or their alignments, rather than to be caused by the large number of diverse sequences itself. Our results suggest that at any level of profile diversity, one should include in the multiple alignment as many confident sequence homologs as possible in order to produce the most accurate results.

UR - http://www.scopus.com/inward/record.url?scp=2342423130&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2342423130&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btg485

DO - 10.1093/bioinformatics/btg485

M3 - Article

C2 - 14751996

AN - SCOPUS:2342423130

VL - 20

SP - 818

EP - 828

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 6

ER -