Centrifuge

Rapid and sensitive classification of metagenomic sequences

Daehwan Kim, Li Song, Florian P. Breitwieser, Steven L. Salzberg

Research output: Contribution to journalArticle

95 Citations (Scopus)

Abstract

Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space.

Original languageEnglish (US)
Pages (from-to)1721-1729
Number of pages9
JournalGenome Research
Volume26
Issue number12
DOIs
StatePublished - Dec 1 2016

Fingerprint

Metagenomics
Archaeal Genome
High-Throughput Nucleotide Sequencing
Databases
Datasets

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Centrifuge : Rapid and sensitive classification of metagenomic sequences. / Kim, Daehwan; Song, Li; Breitwieser, Florian P.; Salzberg, Steven L.

In: Genome Research, Vol. 26, No. 12, 01.12.2016, p. 1721-1729.

Research output: Contribution to journalArticle

Kim, D, Song, L, Breitwieser, FP & Salzberg, SL 2016, 'Centrifuge: Rapid and sensitive classification of metagenomic sequences', Genome Research, vol. 26, no. 12, pp. 1721-1729. https://doi.org/10.1101/gr.210641.116
Kim, Daehwan ; Song, Li ; Breitwieser, Florian P. ; Salzberg, Steven L. / Centrifuge : Rapid and sensitive classification of metagenomic sequences. In: Genome Research. 2016 ; Vol. 26, No. 12. pp. 1721-1729.
@article{0c0d252a22c146da89d8bff9b5234649,
title = "Centrifuge: Rapid and sensitive classification of metagenomic sequences",
abstract = "Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space.",
author = "Daehwan Kim and Li Song and Breitwieser, {Florian P.} and Salzberg, {Steven L.}",
year = "2016",
month = "12",
day = "1",
doi = "10.1101/gr.210641.116",
language = "English (US)",
volume = "26",
pages = "1721--1729",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "12",

}

TY - JOUR

T1 - Centrifuge

T2 - Rapid and sensitive classification of metagenomic sequences

AU - Kim, Daehwan

AU - Song, Li

AU - Breitwieser, Florian P.

AU - Salzberg, Steven L.

PY - 2016/12/1

Y1 - 2016/12/1

N2 - Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space.

AB - Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space.

UR - http://www.scopus.com/inward/record.url?scp=85002575175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85002575175&partnerID=8YFLogxK

U2 - 10.1101/gr.210641.116

DO - 10.1101/gr.210641.116

M3 - Article

VL - 26

SP - 1721

EP - 1729

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 12

ER -