MUMMALS: Multiple sequence alignment improved by using hidden Markov models with local structural information

Jimin Pei, Nick V. Grishin

Research output: Contribution to journalArticle

94 Citations (Scopus)

Abstract

We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.

Original languageEnglish (US)
Pages (from-to)4364-4374
Number of pages11
JournalNucleic Acids Research
Volume34
Issue number16
DOIs
StatePublished - Sep 2006

Fingerprint

Sequence Alignment
Libraries
Proteins

ASJC Scopus subject areas

  • Genetics

Cite this

MUMMALS : Multiple sequence alignment improved by using hidden Markov models with local structural information. / Pei, Jimin; Grishin, Nick V.

In: Nucleic Acids Research, Vol. 34, No. 16, 09.2006, p. 4364-4374.

Research output: Contribution to journalArticle

@article{0bccf12d546342d399e17ddcb89a17b2,
title = "MUMMALS: Multiple sequence alignment improved by using hidden Markov models with local structural information",
abstract = "We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.",
author = "Jimin Pei and Grishin, {Nick V.}",
year = "2006",
month = "9",
doi = "10.1093/nar/gkl514",
language = "English (US)",
volume = "34",
pages = "4364--4374",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "16",

}

TY - JOUR

T1 - MUMMALS

T2 - Multiple sequence alignment improved by using hidden Markov models with local structural information

AU - Pei, Jimin

AU - Grishin, Nick V.

PY - 2006/9

Y1 - 2006/9

N2 - We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.

AB - We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.

UR - http://www.scopus.com/inward/record.url?scp=33750001271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750001271&partnerID=8YFLogxK

U2 - 10.1093/nar/gkl514

DO - 10.1093/nar/gkl514

M3 - Article

C2 - 16936316

AN - SCOPUS:33750001271

VL - 34

SP - 4364

EP - 4374

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 16

ER -