AL2CO

Calculation of positional conservation in a protein sequence alignment

Jimin Pei, Nick V. Grishin

Research output: Contribution to journalArticle

269 Citations (Scopus)

Abstract

Motivation: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments. Results: We developed a program to calculate a conservation index at each position in a multiple sequence alignment using several methods. Namely, amino acid frequencies at each position are estimated and the conservation index is calculated from these frequencies. We utilize both unweighted frequencies and frequencies weighted using two different strategies. Three conceptually different approaches (entropy-based, variance-based and matrix score-based) are implemented in the algorithm to define the conservation index. Calculating conservation indices for 35 522 positions in 284 alignments from SMART database we demonstrate that different methods result in highly correlated (correlation coefficient more than 0.85) conservation indices. Conservation indices show statistically significant correlation between sequentially adjacent positions i and i + j, where j < 13, and averaging of the indices over the window of three positions is optimal for motif detection. Positions with gaps display substantially lower conservation properties. We compare conservation properties of the SMART alignments or FSSP structural alignments to those of the ClustalW alignments. The results suggest that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments.

Original languageEnglish (US)
Pages (from-to)700-712
Number of pages13
JournalBioinformatics
Volume17
Issue number8
StatePublished - 2001

Fingerprint

Sequence Alignment
Protein Sequence
Conservation
Proteins
Alignment
Entropy
Amino Acid Sequence
Databases
Amino Acids
Protein Structure
Amino acids
Multiple Sequence Alignment
Quality Assessment
Spatial Structure
Imperfections
Structure-function
Correlation coefficient
Quantification

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

AL2CO : Calculation of positional conservation in a protein sequence alignment. / Pei, Jimin; Grishin, Nick V.

In: Bioinformatics, Vol. 17, No. 8, 2001, p. 700-712.

Research output: Contribution to journalArticle

@article{b09f091b24654f39aab525ddcaebb425,
title = "AL2CO: Calculation of positional conservation in a protein sequence alignment",
abstract = "Motivation: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments. Results: We developed a program to calculate a conservation index at each position in a multiple sequence alignment using several methods. Namely, amino acid frequencies at each position are estimated and the conservation index is calculated from these frequencies. We utilize both unweighted frequencies and frequencies weighted using two different strategies. Three conceptually different approaches (entropy-based, variance-based and matrix score-based) are implemented in the algorithm to define the conservation index. Calculating conservation indices for 35 522 positions in 284 alignments from SMART database we demonstrate that different methods result in highly correlated (correlation coefficient more than 0.85) conservation indices. Conservation indices show statistically significant correlation between sequentially adjacent positions i and i + j, where j < 13, and averaging of the indices over the window of three positions is optimal for motif detection. Positions with gaps display substantially lower conservation properties. We compare conservation properties of the SMART alignments or FSSP structural alignments to those of the ClustalW alignments. The results suggest that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments.",
author = "Jimin Pei and Grishin, {Nick V.}",
year = "2001",
language = "English (US)",
volume = "17",
pages = "700--712",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "8",

}

TY - JOUR

T1 - AL2CO

T2 - Calculation of positional conservation in a protein sequence alignment

AU - Pei, Jimin

AU - Grishin, Nick V.

PY - 2001

Y1 - 2001

N2 - Motivation: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments. Results: We developed a program to calculate a conservation index at each position in a multiple sequence alignment using several methods. Namely, amino acid frequencies at each position are estimated and the conservation index is calculated from these frequencies. We utilize both unweighted frequencies and frequencies weighted using two different strategies. Three conceptually different approaches (entropy-based, variance-based and matrix score-based) are implemented in the algorithm to define the conservation index. Calculating conservation indices for 35 522 positions in 284 alignments from SMART database we demonstrate that different methods result in highly correlated (correlation coefficient more than 0.85) conservation indices. Conservation indices show statistically significant correlation between sequentially adjacent positions i and i + j, where j < 13, and averaging of the indices over the window of three positions is optimal for motif detection. Positions with gaps display substantially lower conservation properties. We compare conservation properties of the SMART alignments or FSSP structural alignments to those of the ClustalW alignments. The results suggest that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments.

AB - Motivation: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments. Results: We developed a program to calculate a conservation index at each position in a multiple sequence alignment using several methods. Namely, amino acid frequencies at each position are estimated and the conservation index is calculated from these frequencies. We utilize both unweighted frequencies and frequencies weighted using two different strategies. Three conceptually different approaches (entropy-based, variance-based and matrix score-based) are implemented in the algorithm to define the conservation index. Calculating conservation indices for 35 522 positions in 284 alignments from SMART database we demonstrate that different methods result in highly correlated (correlation coefficient more than 0.85) conservation indices. Conservation indices show statistically significant correlation between sequentially adjacent positions i and i + j, where j < 13, and averaging of the indices over the window of three positions is optimal for motif detection. Positions with gaps display substantially lower conservation properties. We compare conservation properties of the SMART alignments or FSSP structural alignments to those of the ClustalW alignments. The results suggest that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments.

UR - http://www.scopus.com/inward/record.url?scp=0034843597&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034843597&partnerID=8YFLogxK

M3 - Article

VL - 17

SP - 700

EP - 712

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 8

ER -