Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

Ruslan I. Sadreyev, Nick V. Grishin

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Background: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. Results: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion: The proposed computational method is of significant potential value for the analysis of protein families.

Original languageEnglish (US)
Article number106
JournalBMC Bioinformatics
Volume5
DOIs
StatePublished - Aug 5 2004

Fingerprint

Multiple Sequence Alignment
Sequence Alignment
Statistical Significance
Dissimilarity
Estimate
Protein
Alignment
Proteins
Automatic Sequences
Sequence Homology
Prediction
Computational Methods
Superposition
Specificity
Refinement
Computational methods
Optimization
Evaluate
Evaluation
Family

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

Estimates of statistical significance for comparison of individual positions in multiple sequence alignments. / Sadreyev, Ruslan I.; Grishin, Nick V.

In: BMC Bioinformatics, Vol. 5, 106, 05.08.2004.

Research output: Contribution to journalArticle

@article{d47de1efd2f14792a46b2ab62f3ffa13,
title = "Estimates of statistical significance for comparison of individual positions in multiple sequence alignments",
abstract = "Background: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. Results: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion: The proposed computational method is of significant potential value for the analysis of protein families.",
author = "Sadreyev, {Ruslan I.} and Grishin, {Nick V.}",
year = "2004",
month = "8",
day = "5",
doi = "10.1186/1471-2105-5-106",
language = "English (US)",
volume = "5",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

AU - Sadreyev, Ruslan I.

AU - Grishin, Nick V.

PY - 2004/8/5

Y1 - 2004/8/5

N2 - Background: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. Results: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion: The proposed computational method is of significant potential value for the analysis of protein families.

AB - Background: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. Results: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. Conclusion: The proposed computational method is of significant potential value for the analysis of protein families.

UR - http://www.scopus.com/inward/record.url?scp=13244259259&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=13244259259&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-5-106

DO - 10.1186/1471-2105-5-106

M3 - Article

C2 - 15296518

AN - SCOPUS:13244259259

VL - 5

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 106

ER -