PROMALS: Towards accurate multiple sequence alignments of distantly related proteins

Jimin Pei; Nick V. Grishin

doi:10.1093/bioinformatics/btm017

PROMALS: Towards accurate multiple sequence alignments of distantly related proteins

Jimin Pei, Nick V. Grishin

Research output: Contribution to journal › Article › peer-review

266 Scopus citations

Abstract

Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.

Original language	English (US)
Pages (from-to)	802-808
Number of pages	7
Journal	Bioinformatics
Volume	23
Issue number	7
DOIs	https://doi.org/10.1093/bioinformatics/btm017
State	Published - Apr 2007

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btm017

Cite this

@article{8097de81e9b14777b184ca8a3d471104,

title = "PROMALS: Towards accurate multiple sequence alignments of distantly related proteins",

abstract = "Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.",

author = "Jimin Pei and Grishin, {Nick V.}",

note = "Funding Information: of the manuscript and helpful comments. This work was supported in part by NIH grant GM67165 to NVG.",

year = "2007",

month = apr,

doi = "10.1093/bioinformatics/btm017",

language = "English (US)",

volume = "23",

pages = "802--808",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "7",

}

TY - JOUR

T1 - PROMALS

T2 - Towards accurate multiple sequence alignments of distantly related proteins

AU - Pei, Jimin

AU - Grishin, Nick V.

N1 - Funding Information: of the manuscript and helpful comments. This work was supported in part by NIH grant GM67165 to NVG.

PY - 2007/4

Y1 - 2007/4

N2 - Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.

AB - Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.

UR - http://www.scopus.com/inward/record.url?scp=34248532415&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34248532415&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm017

DO - 10.1093/bioinformatics/btm017

M3 - Article

C2 - 17267437

AN - SCOPUS:34248532415

SN - 1367-4803

VL - 23

SP - 802

EP - 808

JO - Bioinformatics

JF - Bioinformatics

IS - 7

ER -

PROMALS: Towards accurate multiple sequence alignments of distantly related proteins

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this