A sequence family database built on ECOD structural domains

Yuxing Liao, R. Dustin Schaeffer, Jimin Pei, Nick V. Grishin

Research output: Contribution to journalArticle

1 Scopus citations

Abstract

Motivation: The ECOD database classifies protein domains based on their evolutionary relationships, considering both remote and close homology. The family group in ECOD provides classification of domains that are closely related to each other based on sequence similarity. Due to different perspectives on domain definition, direct application of existing sequence domain databases, such as Pfam, to ECOD struggles with several shortcomings. Results: We created multiple sequence alignments and profiles from ECOD domains with the help of structural information in alignment building and boundary delineation. We validated the alignment quality by scoring structure superposition to demonstrate that they are comparable to curated seed alignments in Pfam. Comparison to Pfam and CDD reveals that 27 and 16% of ECOD families are new, but they are also dominated by small families, likely because of the sampling bias from the PDB database. There are 35 and 48% of families whose boundaries are modified comparing to counterparts in Pfam and CDD, respectively.

Original languageEnglish (US)
Pages (from-to)2997-3003
Number of pages7
JournalBioinformatics
Volume34
Issue number17
DOIs
StatePublished - Sep 1 2018

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'A sequence family database built on ECOD structural domains'. Together they form a unique fingerprint.

  • Cite this