Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization

James O. Wrabl, Nick V. Grishin

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal properties was more complex. Although the principal components accounting for the largest variances exhibited modest correlations with hydrophobicity and conservation of glycine, in general principal components did not correspond to physical properties of amino acids. Although not intuitive, these amino acid mathematical properties were demonstrated to be robust and to improve local pairwise alignment accuracy, relative to 20 amino acid frequencies alone, for a simple test case.

Original languageEnglish (US)
Pages (from-to)523-534
Number of pages12
JournalProteins: Structure, Function and Genetics
Volume61
Issue number3
DOIs
StatePublished - Nov 15 2005

Fingerprint

Sequence Alignment
Amino Acids
Hydrophobicity
Hydrophobic and Hydrophilic Interactions
Physical properties
Principal Component Analysis
Principal component analysis
Glycine
Conservation
Databases

Keywords

  • Amino acid physical properties
  • Amino acid similarity
  • Principal components analysis

ASJC Scopus subject areas

  • Genetics
  • Structural Biology
  • Biochemistry

Cite this

@article{34e09420194b4697ba7f93a7858f650d,
title = "Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization",
abstract = "Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal properties was more complex. Although the principal components accounting for the largest variances exhibited modest correlations with hydrophobicity and conservation of glycine, in general principal components did not correspond to physical properties of amino acids. Although not intuitive, these amino acid mathematical properties were demonstrated to be robust and to improve local pairwise alignment accuracy, relative to 20 amino acid frequencies alone, for a simple test case.",
keywords = "Amino acid physical properties, Amino acid similarity, Principal components analysis",
author = "Wrabl, {James O.} and Grishin, {Nick V.}",
year = "2005",
month = "11",
day = "15",
doi = "10.1002/prot.20648",
language = "English (US)",
volume = "61",
pages = "523--534",
journal = "Proteins: Structure, Function and Bioinformatics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization

AU - Wrabl, James O.

AU - Grishin, Nick V.

PY - 2005/11/15

Y1 - 2005/11/15

N2 - Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal properties was more complex. Although the principal components accounting for the largest variances exhibited modest correlations with hydrophobicity and conservation of glycine, in general principal components did not correspond to physical properties of amino acids. Although not intuitive, these amino acid mathematical properties were demonstrated to be robust and to improve local pairwise alignment accuracy, relative to 20 amino acid frequencies alone, for a simple test case.

AB - Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal properties was more complex. Although the principal components accounting for the largest variances exhibited modest correlations with hydrophobicity and conservation of glycine, in general principal components did not correspond to physical properties of amino acids. Although not intuitive, these amino acid mathematical properties were demonstrated to be robust and to improve local pairwise alignment accuracy, relative to 20 amino acid frequencies alone, for a simple test case.

KW - Amino acid physical properties

KW - Amino acid similarity

KW - Principal components analysis

UR - http://www.scopus.com/inward/record.url?scp=27544433454&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544433454&partnerID=8YFLogxK

U2 - 10.1002/prot.20648

DO - 10.1002/prot.20648

M3 - Article

C2 - 16184599

AN - SCOPUS:27544433454

VL - 61

SP - 523

EP - 534

JO - Proteins: Structure, Function and Bioinformatics

JF - Proteins: Structure, Function and Bioinformatics

SN - 0887-3585

IS - 3

ER -