VDJPipe: A pipelined tool for pre-processing immune repertoire sequencing data

Scott Christley; Mikhail K. Levin; Inimary T. Toby; John M. Fonner; Nancy L. Monson; William H. Rounds; Florian Rubelt; Walter Scarborough; Richard H. Scheuermann; Lindsay G. Cowell

doi:10.1186/s12859-017-1853-z

VDJPipe: A pipelined tool for pre-processing immune repertoire sequencing data

Scott Christley, Mikhail K. Levin, Inimary T. Toby, John M. Fonner, Nancy L. Monson, William H. Rounds, Florian Rubelt, Walter Scarborough, Richard H. Scheuermann, Lindsay G. Cowell

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

Background: Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. Results: Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5' and 3' PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. Conclusions: VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.

Original language	English (US)
Article number	448
Journal	BMC Bioinformatics
Volume	18
Issue number	1
DOIs	https://doi.org/10.1186/s12859-017-1853-z
State	Published - Oct 11 2017

Keywords

Bioinformatics
Immune repertoire analysis
Rep-seq

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/s12859-017-1853-z

Cite this

@article{d0649977711d403582359337a95d20ed,

title = "VDJPipe: A pipelined tool for pre-processing immune repertoire sequencing data",

abstract = "Background: Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. Results: Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5' and 3' PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. Conclusions: VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.",

keywords = "Bioinformatics, Immune repertoire analysis, Rep-seq",

author = "Scott Christley and Levin, {Mikhail K.} and Toby, {Inimary T.} and Fonner, {John M.} and Monson, {Nancy L.} and Rounds, {William H.} and Florian Rubelt and Walter Scarborough and Scheuermann, {Richard H.} and Cowell, {Lindsay G.}",

note = "Publisher Copyright: {\textcopyright} 2017 The Author(s).",

year = "2017",

month = oct,

day = "11",

doi = "10.1186/s12859-017-1853-z",

language = "English (US)",

volume = "18",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - VDJPipe

T2 - A pipelined tool for pre-processing immune repertoire sequencing data

AU - Christley, Scott

AU - Levin, Mikhail K.

AU - Toby, Inimary T.

AU - Fonner, John M.

AU - Monson, Nancy L.

AU - Rounds, William H.

AU - Rubelt, Florian

AU - Scarborough, Walter

AU - Scheuermann, Richard H.

AU - Cowell, Lindsay G.

PY - 2017/10/11

Y1 - 2017/10/11

N2 - Background: Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. Results: Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5' and 3' PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. Conclusions: VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.

AB - Background: Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. Results: Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5' and 3' PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. Conclusions: VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.

KW - Bioinformatics

KW - Immune repertoire analysis

KW - Rep-seq

UR - http://www.scopus.com/inward/record.url?scp=85030840900&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030840900&partnerID=8YFLogxK

U2 - 10.1186/s12859-017-1853-z

DO - 10.1186/s12859-017-1853-z

M3 - Article

C2 - 29020925

AN - SCOPUS:85030840900

SN - 1471-2105

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - 1

M1 - 448

ER -

VDJPipe: A pipelined tool for pre-processing immune repertoire sequencing data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this