Random forest classification of etiologies for an orphan disease

Jaime Lynn Speiser, Valerie L. Durkalski, William M. Lee

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Classification of objects into pre-defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation-specific information.

Original languageEnglish (US)
Pages (from-to)887-899
Number of pages13
JournalStatistics in Medicine
Volume34
Issue number5
DOIs
StatePublished - Feb 28 2014

Fingerprint

Random Forest
Rare Diseases
Predictors
Observation
Statistics

Keywords

  • Acute liver failure
  • Etiology
  • Random forest
  • Statistical classification

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Random forest classification of etiologies for an orphan disease. / Speiser, Jaime Lynn; Durkalski, Valerie L.; Lee, William M.

In: Statistics in Medicine, Vol. 34, No. 5, 28.02.2014, p. 887-899.

Research output: Contribution to journalArticle

Speiser, Jaime Lynn ; Durkalski, Valerie L. ; Lee, William M. / Random forest classification of etiologies for an orphan disease. In: Statistics in Medicine. 2014 ; Vol. 34, No. 5. pp. 887-899.
@article{6c1022e58b8f4974888f8055075d6b75,
title = "Random forest classification of etiologies for an orphan disease",
abstract = "Classification of objects into pre-defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation-specific information.",
keywords = "Acute liver failure, Etiology, Random forest, Statistical classification",
author = "Speiser, {Jaime Lynn} and Durkalski, {Valerie L.} and Lee, {William M.}",
year = "2014",
month = "2",
day = "28",
doi = "10.1002/sim.6351",
language = "English (US)",
volume = "34",
pages = "887--899",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "5",

}

TY - JOUR

T1 - Random forest classification of etiologies for an orphan disease

AU - Speiser, Jaime Lynn

AU - Durkalski, Valerie L.

AU - Lee, William M.

PY - 2014/2/28

Y1 - 2014/2/28

N2 - Classification of objects into pre-defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation-specific information.

AB - Classification of objects into pre-defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation-specific information.

KW - Acute liver failure

KW - Etiology

KW - Random forest

KW - Statistical classification

UR - http://www.scopus.com/inward/record.url?scp=84921478427&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921478427&partnerID=8YFLogxK

U2 - 10.1002/sim.6351

DO - 10.1002/sim.6351

M3 - Article

VL - 34

SP - 887

EP - 899

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 5

ER -