ChSeq: A database of chameleon sequences

Wenlin Li, Lisa N. Kinch, P. Andrew Karplus, Nick V. Grishin

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.

Original languageEnglish (US)
Pages (from-to)1075-1086
Number of pages12
JournalProtein Science
Volume24
Issue number7
DOIs
StatePublished - Jul 1 2015

Fingerprint

Lizards
Databases
Proteins
Plasticity
Conformations
Amino Acids
Protein Conformation
Sequence Homology
Research Personnel

Keywords

  • biological function
  • chameleon sequence
  • ChSeq
  • conformational change
  • secondary structure
  • secondary structure prediction
  • sequence profile
  • structural plasticity

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology

Cite this

ChSeq : A database of chameleon sequences. / Li, Wenlin; Kinch, Lisa N.; Karplus, P. Andrew; Grishin, Nick V.

In: Protein Science, Vol. 24, No. 7, 01.07.2015, p. 1075-1086.

Research output: Contribution to journalArticle

Li, W, Kinch, LN, Karplus, PA & Grishin, NV 2015, 'ChSeq: A database of chameleon sequences', Protein Science, vol. 24, no. 7, pp. 1075-1086. https://doi.org/10.1002/pro.2689
Li, Wenlin ; Kinch, Lisa N. ; Karplus, P. Andrew ; Grishin, Nick V. / ChSeq : A database of chameleon sequences. In: Protein Science. 2015 ; Vol. 24, No. 7. pp. 1075-1086.
@article{a216e108b6674ec2b36de340e7d78d33,
title = "ChSeq: A database of chameleon sequences",
abstract = "Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.",
keywords = "biological function, chameleon sequence, ChSeq, conformational change, secondary structure, secondary structure prediction, sequence profile, structural plasticity",
author = "Wenlin Li and Kinch, {Lisa N.} and Karplus, {P. Andrew} and Grishin, {Nick V.}",
year = "2015",
month = "7",
day = "1",
doi = "10.1002/pro.2689",
language = "English (US)",
volume = "24",
pages = "1075--1086",
journal = "Protein Science",
issn = "0961-8368",
publisher = "Cold Spring Harbor Laboratory Press",
number = "7",

}

TY - JOUR

T1 - ChSeq

T2 - A database of chameleon sequences

AU - Li, Wenlin

AU - Kinch, Lisa N.

AU - Karplus, P. Andrew

AU - Grishin, Nick V.

PY - 2015/7/1

Y1 - 2015/7/1

N2 - Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.

AB - Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.

KW - biological function

KW - chameleon sequence

KW - ChSeq

KW - conformational change

KW - secondary structure

KW - secondary structure prediction

KW - sequence profile

KW - structural plasticity

UR - http://www.scopus.com/inward/record.url?scp=84946145560&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946145560&partnerID=8YFLogxK

U2 - 10.1002/pro.2689

DO - 10.1002/pro.2689

M3 - Article

C2 - 25970262

AN - SCOPUS:84946145560

VL - 24

SP - 1075

EP - 1086

JO - Protein Science

JF - Protein Science

SN - 0961-8368

IS - 7

ER -