ChSeq: A database of chameleon sequences

Wenlin Li; Lisa N. Kinch; P. Andrew Karplus; Nick V. Grishin

doi:10.1002/pro.2689

ChSeq: A database of chameleon sequences

Wenlin Li, Lisa N. Kinch, P. Andrew Karplus, Nick V. Grishin

Research output: Contribution to journal › Article › peer-review

44 Scopus citations

Abstract

Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.

Original language	English (US)
Pages (from-to)	1075-1086
Number of pages	12
Journal	Protein Science
Volume	24
Issue number	7
DOIs	https://doi.org/10.1002/pro.2689
State	Published - Jul 1 2015

Keywords

ChSeq
biological function
chameleon sequence
conformational change
secondary structure
secondary structure prediction
sequence profile
structural plasticity

ASJC Scopus subject areas

Biochemistry
Molecular Biology

Access to Document

10.1002/pro.2689

Cite this

@article{a216e108b6674ec2b36de340e7d78d33,

title = "ChSeq: A database of chameleon sequences",

abstract = "Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.",

keywords = "ChSeq, biological function, chameleon sequence, conformational change, secondary structure, secondary structure prediction, sequence profile, structural plasticity",

author = "Wenlin Li and Kinch, {Lisa N.} and Karplus, {P. Andrew} and Grishin, {Nick V.}",

note = "Publisher Copyright: {\textcopyright} 2015 The Protein Society.",

year = "2015",

month = jul,

day = "1",

doi = "10.1002/pro.2689",

language = "English (US)",

volume = "24",

pages = "1075--1086",

journal = "Protein Science",

issn = "0961-8368",

publisher = "Cold Spring Harbor Laboratory Press",

number = "7",

}

TY - JOUR

T1 - ChSeq

T2 - A database of chameleon sequences

AU - Li, Wenlin

AU - Kinch, Lisa N.

AU - Karplus, P. Andrew

AU - Grishin, Nick V.

PY - 2015/7/1

Y1 - 2015/7/1

N2 - Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.

AB - Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.

KW - ChSeq

KW - biological function

KW - chameleon sequence

KW - conformational change

KW - secondary structure

KW - secondary structure prediction

KW - sequence profile

KW - structural plasticity

UR - http://www.scopus.com/inward/record.url?scp=84946145560&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946145560&partnerID=8YFLogxK

U2 - 10.1002/pro.2689

DO - 10.1002/pro.2689

M3 - Article

C2 - 25970262

AN - SCOPUS:84946145560

SN - 0961-8368

VL - 24

SP - 1075

EP - 1086

JO - Protein Science

JF - Protein Science

IS - 7

ER -

ChSeq: A database of chameleon sequences

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this