Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments

David Mittelman, Ruslan Sadreyev, Nick Grishin

Research output: Contribution to journalArticle

64 Citations (Scopus)

Abstract

Motivation: The development of powerful automatic methods for the comparison of protein sequences has become increasingly important. Profile-to-profile comparisons allow for the use of broader information about protein families, resulting in more sensitive and accurate comparisons of distantly related sequences. A key part in the comparison of two profiles is the method for the calculation of scores for the position matches. A number of methods based on various theoretical considerations have been proposed. We implemented several previously reported scoring functions as well as our own functions, and compared them on the basis of their ability to produce accurate short ungapped alignments of a given length. Results: Our results suggest that the family of the probabilistic methods (log-odds based methods and profℐm) may be the more appropriate choice for the generation of initial 'seeds' as the first step to produce local profile-profile alignments. The most effective scoring systems were the closely related modifications of functions previously implemented in the COMPASS and Picasso methods.

Original languageEnglish (US)
Pages (from-to)1531-1539
Number of pages9
JournalBioinformatics
Volume19
Issue number12
DOIs
StatePublished - Aug 12 2003

Fingerprint

Scoring
Seed
Seeds
Alignment
Proteins
Odds
Probabilistic Methods
Protein Sequence
Profile
Protein
Family

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments. / Mittelman, David; Sadreyev, Ruslan; Grishin, Nick.

In: Bioinformatics, Vol. 19, No. 12, 12.08.2003, p. 1531-1539.

Research output: Contribution to journalArticle

@article{c1e9663fd6bb4d66ae74c0cf7bf5919b,
title = "Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments",
abstract = "Motivation: The development of powerful automatic methods for the comparison of protein sequences has become increasingly important. Profile-to-profile comparisons allow for the use of broader information about protein families, resulting in more sensitive and accurate comparisons of distantly related sequences. A key part in the comparison of two profiles is the method for the calculation of scores for the position matches. A number of methods based on various theoretical considerations have been proposed. We implemented several previously reported scoring functions as well as our own functions, and compared them on the basis of their ability to produce accurate short ungapped alignments of a given length. Results: Our results suggest that the family of the probabilistic methods (log-odds based methods and profℐm) may be the more appropriate choice for the generation of initial 'seeds' as the first step to produce local profile-profile alignments. The most effective scoring systems were the closely related modifications of functions previously implemented in the COMPASS and Picasso methods.",
author = "David Mittelman and Ruslan Sadreyev and Nick Grishin",
year = "2003",
month = "8",
day = "12",
doi = "10.1093/bioinformatics/btg185",
language = "English (US)",
volume = "19",
pages = "1531--1539",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

TY - JOUR

T1 - Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments

AU - Mittelman, David

AU - Sadreyev, Ruslan

AU - Grishin, Nick

PY - 2003/8/12

Y1 - 2003/8/12

N2 - Motivation: The development of powerful automatic methods for the comparison of protein sequences has become increasingly important. Profile-to-profile comparisons allow for the use of broader information about protein families, resulting in more sensitive and accurate comparisons of distantly related sequences. A key part in the comparison of two profiles is the method for the calculation of scores for the position matches. A number of methods based on various theoretical considerations have been proposed. We implemented several previously reported scoring functions as well as our own functions, and compared them on the basis of their ability to produce accurate short ungapped alignments of a given length. Results: Our results suggest that the family of the probabilistic methods (log-odds based methods and profℐm) may be the more appropriate choice for the generation of initial 'seeds' as the first step to produce local profile-profile alignments. The most effective scoring systems were the closely related modifications of functions previously implemented in the COMPASS and Picasso methods.

AB - Motivation: The development of powerful automatic methods for the comparison of protein sequences has become increasingly important. Profile-to-profile comparisons allow for the use of broader information about protein families, resulting in more sensitive and accurate comparisons of distantly related sequences. A key part in the comparison of two profiles is the method for the calculation of scores for the position matches. A number of methods based on various theoretical considerations have been proposed. We implemented several previously reported scoring functions as well as our own functions, and compared them on the basis of their ability to produce accurate short ungapped alignments of a given length. Results: Our results suggest that the family of the probabilistic methods (log-odds based methods and profℐm) may be the more appropriate choice for the generation of initial 'seeds' as the first step to produce local profile-profile alignments. The most effective scoring systems were the closely related modifications of functions previously implemented in the COMPASS and Picasso methods.

UR - http://www.scopus.com/inward/record.url?scp=0041886960&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0041886960&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btg185

DO - 10.1093/bioinformatics/btg185

M3 - Article

C2 - 12912834

AN - SCOPUS:0041886960

VL - 19

SP - 1531

EP - 1539

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

ER -