Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1: A Method by the AIRR Community

on behalf of the AIRR Community

doi:10.1007/978-1-0716-2115-8_22

Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1: A Method by the AIRR Community

on behalf of the AIRR Community

Research output: Chapter in Book/Report/Conference proceeding › Chapter

Abstract

AIRR-seq data sets are usually large and require specialized analysis methods and software tools. A typical Illumina MiSeq sequencing run generates 20–30 million 2 × 300 bp paired-end sequence reads, which roughly corresponds to 15 GB of sequence data to be processed. Other platforms like NextSeq, which is useful in projects where the full V gene is not needed, create about 400 million 2 × 150 bp paired-end reads. Because of the size of the data sets, the analysis can be computationally expensive, particularly the early analysis steps like preprocessing and gene annotation that process the majority of the sequence data. A standard desktop PC may take 3–5 days of constant processing for a single MiSeq run, so dedicated high-performance computational resources may be required. VDJServer provides free access to high-performance computing (HPC) at the Texas Advanced Computing Center (TACC) through a graphical user interface (Christley et al. Front Immunol 9:976, 2018). VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provides access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene assignment, repertoire characterization, and repertoire comparison. Furthermore, VDJServer has parallelized execution for tools such as IgBLAST, so more compute resources are utilized as the size of the input data grows. Analysis that takes days on a desktop PC might take only a few hours on VDJServer. VDJServer is a free, publicly available, and open-source licensed resource. Here, we describe the workflow for performing immune repertoire analysis on VDJServer’s high-performance computing.

Original language	English (US)
Title of host publication	Methods in Molecular Biology
Publisher	Humana Press Inc.
Pages	439-446
Number of pages	8
DOIs	https://doi.org/10.1007/978-1-0716-2115-8_22
State	Published - 2022

Publication series

Name	Methods in Molecular Biology
Volume	2453
ISSN (Print)	1064-3745
ISSN (Electronic)	1940-6029

Keywords

AIRR-Seq
B-cell receptor
Cloud computing
High-performance computing
T-cell receptor

ASJC Scopus subject areas

Molecular Biology
Genetics

Access to Document

10.1007/978-1-0716-2115-8_22

Cite this

@inbook{62036de8d3b44a6d81aa448697e2afeb,

title = "Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1: A Method by the AIRR Community",

abstract = "AIRR-seq data sets are usually large and require specialized analysis methods and software tools. A typical Illumina MiSeq sequencing run generates 20–30 million 2 × 300 bp paired-end sequence reads, which roughly corresponds to 15 GB of sequence data to be processed. Other platforms like NextSeq, which is useful in projects where the full V gene is not needed, create about 400 million 2 × 150 bp paired-end reads. Because of the size of the data sets, the analysis can be computationally expensive, particularly the early analysis steps like preprocessing and gene annotation that process the majority of the sequence data. A standard desktop PC may take 3–5 days of constant processing for a single MiSeq run, so dedicated high-performance computational resources may be required. VDJServer provides free access to high-performance computing (HPC) at the Texas Advanced Computing Center (TACC) through a graphical user interface (Christley et al. Front Immunol 9:976, 2018). VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provides access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene assignment, repertoire characterization, and repertoire comparison. Furthermore, VDJServer has parallelized execution for tools such as IgBLAST, so more compute resources are utilized as the size of the input data grows. Analysis that takes days on a desktop PC might take only a few hours on VDJServer. VDJServer is a free, publicly available, and open-source licensed resource. Here, we describe the workflow for performing immune repertoire analysis on VDJServer{\textquoteright}s high-performance computing.",

keywords = "AIRR-Seq, B-cell receptor, Cloud computing, High-performance computing, T-cell receptor",

author = "{on behalf of the AIRR Community} and Scott Christley and Ulrik Stervbo and Cowell, {Lindsay G.}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

doi = "10.1007/978-1-0716-2115-8_22",

language = "English (US)",

series = "Methods in Molecular Biology",

publisher = "Humana Press Inc.",

pages = "439--446",

booktitle = "Methods in Molecular Biology",

}

TY - CHAP

T1 - Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1

T2 - A Method by the AIRR Community

AU - on behalf of the AIRR Community

AU - Christley, Scott

AU - Stervbo, Ulrik

AU - Cowell, Lindsay G.

PY - 2022

Y1 - 2022

N2 - AIRR-seq data sets are usually large and require specialized analysis methods and software tools. A typical Illumina MiSeq sequencing run generates 20–30 million 2 × 300 bp paired-end sequence reads, which roughly corresponds to 15 GB of sequence data to be processed. Other platforms like NextSeq, which is useful in projects where the full V gene is not needed, create about 400 million 2 × 150 bp paired-end reads. Because of the size of the data sets, the analysis can be computationally expensive, particularly the early analysis steps like preprocessing and gene annotation that process the majority of the sequence data. A standard desktop PC may take 3–5 days of constant processing for a single MiSeq run, so dedicated high-performance computational resources may be required. VDJServer provides free access to high-performance computing (HPC) at the Texas Advanced Computing Center (TACC) through a graphical user interface (Christley et al. Front Immunol 9:976, 2018). VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provides access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene assignment, repertoire characterization, and repertoire comparison. Furthermore, VDJServer has parallelized execution for tools such as IgBLAST, so more compute resources are utilized as the size of the input data grows. Analysis that takes days on a desktop PC might take only a few hours on VDJServer. VDJServer is a free, publicly available, and open-source licensed resource. Here, we describe the workflow for performing immune repertoire analysis on VDJServer’s high-performance computing.

AB - AIRR-seq data sets are usually large and require specialized analysis methods and software tools. A typical Illumina MiSeq sequencing run generates 20–30 million 2 × 300 bp paired-end sequence reads, which roughly corresponds to 15 GB of sequence data to be processed. Other platforms like NextSeq, which is useful in projects where the full V gene is not needed, create about 400 million 2 × 150 bp paired-end reads. Because of the size of the data sets, the analysis can be computationally expensive, particularly the early analysis steps like preprocessing and gene annotation that process the majority of the sequence data. A standard desktop PC may take 3–5 days of constant processing for a single MiSeq run, so dedicated high-performance computational resources may be required. VDJServer provides free access to high-performance computing (HPC) at the Texas Advanced Computing Center (TACC) through a graphical user interface (Christley et al. Front Immunol 9:976, 2018). VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provides access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene assignment, repertoire characterization, and repertoire comparison. Furthermore, VDJServer has parallelized execution for tools such as IgBLAST, so more compute resources are utilized as the size of the input data grows. Analysis that takes days on a desktop PC might take only a few hours on VDJServer. VDJServer is a free, publicly available, and open-source licensed resource. Here, we describe the workflow for performing immune repertoire analysis on VDJServer’s high-performance computing.

KW - AIRR-Seq

KW - B-cell receptor

KW - Cloud computing

KW - High-performance computing

KW - T-cell receptor

UR - http://www.scopus.com/inward/record.url?scp=85131108245&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85131108245&partnerID=8YFLogxK

U2 - 10.1007/978-1-0716-2115-8_22

DO - 10.1007/978-1-0716-2115-8_22

M3 - Chapter

C2 - 35622338

AN - SCOPUS:85131108245

T3 - Methods in Molecular Biology

SP - 439

EP - 446

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -

Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1: A Method by the AIRR Community

Abstract

Publication series

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this