PROMALS

Towards accurate multiple sequence alignments of distantly related proteins

Jimin Pei, Nick V. Grishin

Research output: Contribution to journalArticle

207 Citations (Scopus)

Abstract

Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.

Original languageEnglish (US)
Pages (from-to)802-808
Number of pages7
JournalBioinformatics
Volume23
Issue number7
DOIs
StatePublished - Apr 2007

Fingerprint

Multiple Sequence Alignment
Sequence Alignment
Secondary Structure
Alignment
Structure Prediction
Databases
Proteins
Protein
Imino Acids
Amino Acid Sequence Homology
Scoring
Amino Acids
Amino acids
Protein Structure
Markov Model
Percent
Pairwise
Preparation
Hidden Markov models
Planning

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Medicine(all)

Cite this

PROMALS : Towards accurate multiple sequence alignments of distantly related proteins. / Pei, Jimin; Grishin, Nick V.

In: Bioinformatics, Vol. 23, No. 7, 04.2007, p. 802-808.

Research output: Contribution to journalArticle

@article{8097de81e9b14777b184ca8a3d471104,
title = "PROMALS: Towards accurate multiple sequence alignments of distantly related proteins",
abstract = "Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10{\%}, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30{\%} more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.",
author = "Jimin Pei and Grishin, {Nick V.}",
year = "2007",
month = "4",
doi = "10.1093/bioinformatics/btm017",
language = "English (US)",
volume = "23",
pages = "802--808",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "7",

}

TY - JOUR

T1 - PROMALS

T2 - Towards accurate multiple sequence alignments of distantly related proteins

AU - Pei, Jimin

AU - Grishin, Nick V.

PY - 2007/4

Y1 - 2007/4

N2 - Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.

AB - Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.

UR - http://www.scopus.com/inward/record.url?scp=34248532415&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34248532415&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm017

DO - 10.1093/bioinformatics/btm017

M3 - Article

VL - 23

SP - 802

EP - 808

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 7

ER -