Generation of a consensus protein domain dictionary

R. Dustin Schaeffer, Amanda L. Jonsson, Andrew M. Simms, Valerie Daggett

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Motivation: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. Results: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40% of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD.

Original languageEnglish (US)
Article numberbtq625
Pages (from-to)46-54
Number of pages9
JournalBioinformatics
Volume27
Issue number1
DOIs
StatePublished - Jan 1 2011

Fingerprint

Glossaries
Proteins
Protein
Fold
Folding
Pathway
Protein Folding
MD Simulation
Dictionary
Protein Domains
Computational Biology
Target
Unit
Ramification
Protein Structure
Bioinformatics
Unfolding
Mining
Choose
Topology

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Generation of a consensus protein domain dictionary. / Schaeffer, R. Dustin; Jonsson, Amanda L.; Simms, Andrew M.; Daggett, Valerie.

In: Bioinformatics, Vol. 27, No. 1, btq625, 01.01.2011, p. 46-54.

Research output: Contribution to journalArticle

Schaeffer, RD, Jonsson, AL, Simms, AM & Daggett, V 2011, 'Generation of a consensus protein domain dictionary', Bioinformatics, vol. 27, no. 1, btq625, pp. 46-54. https://doi.org/10.1093/bioinformatics/btq625
Schaeffer, R. Dustin ; Jonsson, Amanda L. ; Simms, Andrew M. ; Daggett, Valerie. / Generation of a consensus protein domain dictionary. In: Bioinformatics. 2011 ; Vol. 27, No. 1. pp. 46-54.
@article{a2f21214a80f407da69917b3777cbfd4,
title = "Generation of a consensus protein domain dictionary",
abstract = "Motivation: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. Results: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40{\%} of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD.",
author = "Schaeffer, {R. Dustin} and Jonsson, {Amanda L.} and Simms, {Andrew M.} and Valerie Daggett",
year = "2011",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/btq625",
language = "English (US)",
volume = "27",
pages = "46--54",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - Generation of a consensus protein domain dictionary

AU - Schaeffer, R. Dustin

AU - Jonsson, Amanda L.

AU - Simms, Andrew M.

AU - Daggett, Valerie

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Motivation: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. Results: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40% of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD.

AB - Motivation: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. Results: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40% of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD.

UR - http://www.scopus.com/inward/record.url?scp=78650524534&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650524534&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btq625

DO - 10.1093/bioinformatics/btq625

M3 - Article

C2 - 21068000

AN - SCOPUS:78650524534

VL - 27

SP - 46

EP - 54

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 1

M1 - btq625

ER -