VDJML: A file format with tools for capturing the results of inferring immune receptor rearrangements

Inimary T. Toby, Mikhail K. Levin, Edward A. Salinas, Scott Christley, Sanchita Bhattacharya, Felix Breden, Adam Buntzman, Brian Corrie, John Fonner, Namita T. Gupta, Uri Hershberg, Nishanth Marthandan, Aaron Rosenfeld, William Rounds, Florian Rubelt, Walter Scarborough, Jamie K. Scott, Mohamed Uduman, Jason A. Vander Heiden, Richard H. ScheuermannNancy Monson, Steven H. Kleinstein, Lindsay G. Cowell

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Background: The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses. Results: To help address this problem, we propose a standardized file format for representing V(D)J analysis results. The proposed format, VDJML, provides a common standardized format for different V(D)J analysis applications to facilitate downstream processing of the results in an application-agnostic manner. The VDJML file format specification is accompanied by a support library, written in C++ and Python, for reading and writing the VDJML file format. Conclusions: The VDJML suite will allow users to streamline their V(D)J analysis and facilitate the sharing of scientific knowledge within the community. The VDJML suite and documentation are available from https://vdjserver.org/vdjml/. We welcome participation from the community in developing the file format standard, as well as code contributions.

Original languageEnglish (US)
Article number333
JournalBMC Bioinformatics
Volume17
DOIs
StatePublished - Oct 6 2016

Fingerprint

Rearrangement
Receptor
Software
Genes
V(D)J Recombination
Lymphocytes
Gene
Boidae
Explosions
Profiling
Software Tools
Recombination
Composite materials
Adaptive Immunity
Sequencing
Antigens
Research
Antibodies
Documentation
Libraries

Keywords

  • Antigen receptor repertoire
  • C++
  • Data sharing
  • Data standards
  • Immune repertoire
  • Python
  • Repertoire profiling
  • XML

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

VDJML : A file format with tools for capturing the results of inferring immune receptor rearrangements. / Toby, Inimary T.; Levin, Mikhail K.; Salinas, Edward A.; Christley, Scott; Bhattacharya, Sanchita; Breden, Felix; Buntzman, Adam; Corrie, Brian; Fonner, John; Gupta, Namita T.; Hershberg, Uri; Marthandan, Nishanth; Rosenfeld, Aaron; Rounds, William; Rubelt, Florian; Scarborough, Walter; Scott, Jamie K.; Uduman, Mohamed; Vander Heiden, Jason A.; Scheuermann, Richard H.; Monson, Nancy; Kleinstein, Steven H.; Cowell, Lindsay G.

In: BMC Bioinformatics, Vol. 17, 333, 06.10.2016.

Research output: Contribution to journalArticle

Toby, IT, Levin, MK, Salinas, EA, Christley, S, Bhattacharya, S, Breden, F, Buntzman, A, Corrie, B, Fonner, J, Gupta, NT, Hershberg, U, Marthandan, N, Rosenfeld, A, Rounds, W, Rubelt, F, Scarborough, W, Scott, JK, Uduman, M, Vander Heiden, JA, Scheuermann, RH, Monson, N, Kleinstein, SH & Cowell, LG 2016, 'VDJML: A file format with tools for capturing the results of inferring immune receptor rearrangements', BMC Bioinformatics, vol. 17, 333. https://doi.org/10.1186/s12859-016-1214-3
Toby, Inimary T. ; Levin, Mikhail K. ; Salinas, Edward A. ; Christley, Scott ; Bhattacharya, Sanchita ; Breden, Felix ; Buntzman, Adam ; Corrie, Brian ; Fonner, John ; Gupta, Namita T. ; Hershberg, Uri ; Marthandan, Nishanth ; Rosenfeld, Aaron ; Rounds, William ; Rubelt, Florian ; Scarborough, Walter ; Scott, Jamie K. ; Uduman, Mohamed ; Vander Heiden, Jason A. ; Scheuermann, Richard H. ; Monson, Nancy ; Kleinstein, Steven H. ; Cowell, Lindsay G. / VDJML : A file format with tools for capturing the results of inferring immune receptor rearrangements. In: BMC Bioinformatics. 2016 ; Vol. 17.
@article{5c85fb843d3d4b1b99061d03eb368dbf,
title = "VDJML: A file format with tools for capturing the results of inferring immune receptor rearrangements",
abstract = "Background: The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses. Results: To help address this problem, we propose a standardized file format for representing V(D)J analysis results. The proposed format, VDJML, provides a common standardized format for different V(D)J analysis applications to facilitate downstream processing of the results in an application-agnostic manner. The VDJML file format specification is accompanied by a support library, written in C++ and Python, for reading and writing the VDJML file format. Conclusions: The VDJML suite will allow users to streamline their V(D)J analysis and facilitate the sharing of scientific knowledge within the community. The VDJML suite and documentation are available from https://vdjserver.org/vdjml/. We welcome participation from the community in developing the file format standard, as well as code contributions.",
keywords = "Antigen receptor repertoire, C++, Data sharing, Data standards, Immune repertoire, Python, Repertoire profiling, XML",
author = "Toby, {Inimary T.} and Levin, {Mikhail K.} and Salinas, {Edward A.} and Scott Christley and Sanchita Bhattacharya and Felix Breden and Adam Buntzman and Brian Corrie and John Fonner and Gupta, {Namita T.} and Uri Hershberg and Nishanth Marthandan and Aaron Rosenfeld and William Rounds and Florian Rubelt and Walter Scarborough and Scott, {Jamie K.} and Mohamed Uduman and {Vander Heiden}, {Jason A.} and Scheuermann, {Richard H.} and Nancy Monson and Kleinstein, {Steven H.} and Cowell, {Lindsay G.}",
year = "2016",
month = "10",
day = "6",
doi = "10.1186/s12859-016-1214-3",
language = "English (US)",
volume = "17",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - VDJML

T2 - A file format with tools for capturing the results of inferring immune receptor rearrangements

AU - Toby, Inimary T.

AU - Levin, Mikhail K.

AU - Salinas, Edward A.

AU - Christley, Scott

AU - Bhattacharya, Sanchita

AU - Breden, Felix

AU - Buntzman, Adam

AU - Corrie, Brian

AU - Fonner, John

AU - Gupta, Namita T.

AU - Hershberg, Uri

AU - Marthandan, Nishanth

AU - Rosenfeld, Aaron

AU - Rounds, William

AU - Rubelt, Florian

AU - Scarborough, Walter

AU - Scott, Jamie K.

AU - Uduman, Mohamed

AU - Vander Heiden, Jason A.

AU - Scheuermann, Richard H.

AU - Monson, Nancy

AU - Kleinstein, Steven H.

AU - Cowell, Lindsay G.

PY - 2016/10/6

Y1 - 2016/10/6

N2 - Background: The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses. Results: To help address this problem, we propose a standardized file format for representing V(D)J analysis results. The proposed format, VDJML, provides a common standardized format for different V(D)J analysis applications to facilitate downstream processing of the results in an application-agnostic manner. The VDJML file format specification is accompanied by a support library, written in C++ and Python, for reading and writing the VDJML file format. Conclusions: The VDJML suite will allow users to streamline their V(D)J analysis and facilitate the sharing of scientific knowledge within the community. The VDJML suite and documentation are available from https://vdjserver.org/vdjml/. We welcome participation from the community in developing the file format standard, as well as code contributions.

AB - Background: The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses. Results: To help address this problem, we propose a standardized file format for representing V(D)J analysis results. The proposed format, VDJML, provides a common standardized format for different V(D)J analysis applications to facilitate downstream processing of the results in an application-agnostic manner. The VDJML file format specification is accompanied by a support library, written in C++ and Python, for reading and writing the VDJML file format. Conclusions: The VDJML suite will allow users to streamline their V(D)J analysis and facilitate the sharing of scientific knowledge within the community. The VDJML suite and documentation are available from https://vdjserver.org/vdjml/. We welcome participation from the community in developing the file format standard, as well as code contributions.

KW - Antigen receptor repertoire

KW - C++

KW - Data sharing

KW - Data standards

KW - Immune repertoire

KW - Python

KW - Repertoire profiling

KW - XML

UR - http://www.scopus.com/inward/record.url?scp=84990817455&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990817455&partnerID=8YFLogxK

U2 - 10.1186/s12859-016-1214-3

DO - 10.1186/s12859-016-1214-3

M3 - Article

C2 - 27766961

AN - SCOPUS:84990817455

VL - 17

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 333

ER -