Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz.

Original languageEnglish (US)
Article numbere1004985
JournalPLoS Computational Biology
Volume12
Issue number6
DOIs
StatePublished - Jun 1 2016

Fingerprint

Gene transfer
Horizontal Gene Transfer
gene transfer
Phylogeny
phylogeny
Alignment
Horizontal
Gene
Genes
Bacteria
Archaea
Genome
genome
Sulfur-Reducing Bacteria
Tars
thermophilic bacteria
methodology
Shigella
bacteria
thermophilic bacterium

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Modeling and Simulation
  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Molecular Biology
  • Ecology
  • Cellular and Molecular Neuroscience

Cite this

Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer. / Bromberg, Raquel; Grishin, Nick V.; Otwinowski, Zbyszek.

In: PLoS Computational Biology, Vol. 12, No. 6, e1004985, 01.06.2016.

Research output: Contribution to journalArticle

@article{c5a0f0c7576c4299b14b19622e991a8b,
title = "Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer",
abstract = "Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz.",
author = "Raquel Bromberg and Grishin, {Nick V.} and Zbyszek Otwinowski",
year = "2016",
month = "6",
day = "1",
doi = "10.1371/journal.pcbi.1004985",
language = "English (US)",
volume = "12",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer

AU - Bromberg, Raquel

AU - Grishin, Nick V.

AU - Otwinowski, Zbyszek

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz.

AB - Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz.

UR - http://www.scopus.com/inward/record.url?scp=84978857996&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84978857996&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1004985

DO - 10.1371/journal.pcbi.1004985

M3 - Article

VL - 12

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 6

M1 - e1004985

ER -