Machine learning provides evidence that stroke risk is not linear: The non-linear Framingham stroke risk score

Agni Orfanoudaki; Emma Chesley; Christian Cadisch; Barry Stein; Amre Nouh; Mark J. Alberts; Dimitris Bertsimas

doi:10.1371/journal.pone.0232414

Machine learning provides evidence that stroke risk is not linear: The non-linear Framingham stroke risk score

Agni Orfanoudaki, Emma Chesley, Christian Cadisch, Barry Stein, Amre Nouh, Mark J. Alberts, Dimitris Bertsimas

Research output: Contribution to journal › Article › peer-review

24 Scopus citations

Abstract

Current stroke risk assessment tools presume the impact of risk factors is linear and cumulative. However, both novel risk factors and their interplay influencing stroke incidence are difficult to reveal using traditional additive models. The goal of this study was to improve upon the established Revised Framingham Stroke Risk Score and design an interactive Non-Linear Stroke Risk Score. Leveraging machine learning algorithms, our work aimed at increasing the accuracy of event prediction and uncovering new relationships in an interpretable fashion. A two-phase approach was used to create our stroke risk prediction score. First, clinical examinations of the Framingham offspring cohort were utilized as the training dataset for the predictive model. Optimal Classification Trees were used to develop a tree-based model to predict 10-year risk of stroke. Unlike classical methods, this algorithm adaptively changes the splits on the independent variables, introducing non-linear interactions among them. Second, the model was validated with a multi-ethnicity cohort from the Boston Medical Center. Our stroke risk score suggests a key dichotomy between patients with history of cardiovascular disease and the rest of the population. While it agrees with known findings, it also identified 23 unique stroke risk profiles and highlighted new non-linear relationships; such as the role of T-wave abnormality on electrocardiography and hematocrit levels in a patient’s risk profile. Our results suggested that the non-linear approach significantly improves upon the baseline in the c-statistic (training 87.43% (CI 0.85–0.90) vs. 73.74% (CI 0.70–0.76); validation 75.29% (CI 0.74–0.76) vs 65.93% (CI 0.64–0.67), even in multi-ethnicity populations. The clinical implications of the new risk score include prioritization of risk factor modification and personalized care at the patient level with improved targeting of interventions for stroke prevention.

Original language	English (US)
Article number	e0232414
Journal	PloS one
Volume	15
Issue number	5
DOIs	https://doi.org/10.1371/journal.pone.0232414
State	Published - May 2020
Externally published	Yes

ASJC Scopus subject areas

General

Access to Document

10.1371/journal.pone.0232414

Cite this

@article{4661a8c9d8e24a4b857411dc9b66146a,

title = "Machine learning provides evidence that stroke risk is not linear: The non-linear Framingham stroke risk score",

abstract = "Current stroke risk assessment tools presume the impact of risk factors is linear and cumulative. However, both novel risk factors and their interplay influencing stroke incidence are difficult to reveal using traditional additive models. The goal of this study was to improve upon the established Revised Framingham Stroke Risk Score and design an interactive Non-Linear Stroke Risk Score. Leveraging machine learning algorithms, our work aimed at increasing the accuracy of event prediction and uncovering new relationships in an interpretable fashion. A two-phase approach was used to create our stroke risk prediction score. First, clinical examinations of the Framingham offspring cohort were utilized as the training dataset for the predictive model. Optimal Classification Trees were used to develop a tree-based model to predict 10-year risk of stroke. Unlike classical methods, this algorithm adaptively changes the splits on the independent variables, introducing non-linear interactions among them. Second, the model was validated with a multi-ethnicity cohort from the Boston Medical Center. Our stroke risk score suggests a key dichotomy between patients with history of cardiovascular disease and the rest of the population. While it agrees with known findings, it also identified 23 unique stroke risk profiles and highlighted new non-linear relationships; such as the role of T-wave abnormality on electrocardiography and hematocrit levels in a patient{\textquoteright}s risk profile. Our results suggested that the non-linear approach significantly improves upon the baseline in the c-statistic (training 87.43% (CI 0.85–0.90) vs. 73.74% (CI 0.70–0.76); validation 75.29% (CI 0.74–0.76) vs 65.93% (CI 0.64–0.67), even in multi-ethnicity populations. The clinical implications of the new risk score include prioritization of risk factor modification and personalized care at the patient level with improved targeting of interventions for stroke prevention.",

author = "Agni Orfanoudaki and Emma Chesley and Christian Cadisch and Barry Stein and Amre Nouh and Alberts, {Mark J.} and Dimitris Bertsimas",

note = "Funding Information: Study was funded by a grant to the Massachusetts Institute of Technology from Hartford HealthCare. The resources partially covered the research stipends of the students who were engaged in the study. There was no additional external funding received for this study. Publisher Copyright: {\textcopyright} 2020 Orfanoudaki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",

year = "2020",

month = may,

doi = "10.1371/journal.pone.0232414",

language = "English (US)",

volume = "15",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "5",

}

TY - JOUR

T1 - Machine learning provides evidence that stroke risk is not linear

T2 - The non-linear Framingham stroke risk score

AU - Orfanoudaki, Agni

AU - Chesley, Emma

AU - Cadisch, Christian

AU - Stein, Barry

AU - Nouh, Amre

AU - Alberts, Mark J.

AU - Bertsimas, Dimitris

N1 - Funding Information: Study was funded by a grant to the Massachusetts Institute of Technology from Hartford HealthCare. The resources partially covered the research stipends of the students who were engaged in the study. There was no additional external funding received for this study. Publisher Copyright: © 2020 Orfanoudaki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2020/5

Y1 - 2020/5

N2 - Current stroke risk assessment tools presume the impact of risk factors is linear and cumulative. However, both novel risk factors and their interplay influencing stroke incidence are difficult to reveal using traditional additive models. The goal of this study was to improve upon the established Revised Framingham Stroke Risk Score and design an interactive Non-Linear Stroke Risk Score. Leveraging machine learning algorithms, our work aimed at increasing the accuracy of event prediction and uncovering new relationships in an interpretable fashion. A two-phase approach was used to create our stroke risk prediction score. First, clinical examinations of the Framingham offspring cohort were utilized as the training dataset for the predictive model. Optimal Classification Trees were used to develop a tree-based model to predict 10-year risk of stroke. Unlike classical methods, this algorithm adaptively changes the splits on the independent variables, introducing non-linear interactions among them. Second, the model was validated with a multi-ethnicity cohort from the Boston Medical Center. Our stroke risk score suggests a key dichotomy between patients with history of cardiovascular disease and the rest of the population. While it agrees with known findings, it also identified 23 unique stroke risk profiles and highlighted new non-linear relationships; such as the role of T-wave abnormality on electrocardiography and hematocrit levels in a patient’s risk profile. Our results suggested that the non-linear approach significantly improves upon the baseline in the c-statistic (training 87.43% (CI 0.85–0.90) vs. 73.74% (CI 0.70–0.76); validation 75.29% (CI 0.74–0.76) vs 65.93% (CI 0.64–0.67), even in multi-ethnicity populations. The clinical implications of the new risk score include prioritization of risk factor modification and personalized care at the patient level with improved targeting of interventions for stroke prevention.

AB - Current stroke risk assessment tools presume the impact of risk factors is linear and cumulative. However, both novel risk factors and their interplay influencing stroke incidence are difficult to reveal using traditional additive models. The goal of this study was to improve upon the established Revised Framingham Stroke Risk Score and design an interactive Non-Linear Stroke Risk Score. Leveraging machine learning algorithms, our work aimed at increasing the accuracy of event prediction and uncovering new relationships in an interpretable fashion. A two-phase approach was used to create our stroke risk prediction score. First, clinical examinations of the Framingham offspring cohort were utilized as the training dataset for the predictive model. Optimal Classification Trees were used to develop a tree-based model to predict 10-year risk of stroke. Unlike classical methods, this algorithm adaptively changes the splits on the independent variables, introducing non-linear interactions among them. Second, the model was validated with a multi-ethnicity cohort from the Boston Medical Center. Our stroke risk score suggests a key dichotomy between patients with history of cardiovascular disease and the rest of the population. While it agrees with known findings, it also identified 23 unique stroke risk profiles and highlighted new non-linear relationships; such as the role of T-wave abnormality on electrocardiography and hematocrit levels in a patient’s risk profile. Our results suggested that the non-linear approach significantly improves upon the baseline in the c-statistic (training 87.43% (CI 0.85–0.90) vs. 73.74% (CI 0.70–0.76); validation 75.29% (CI 0.74–0.76) vs 65.93% (CI 0.64–0.67), even in multi-ethnicity populations. The clinical implications of the new risk score include prioritization of risk factor modification and personalized care at the patient level with improved targeting of interventions for stroke prevention.

UR - http://www.scopus.com/inward/record.url?scp=85085155938&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85085155938&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0232414

DO - 10.1371/journal.pone.0232414

M3 - Article

C2 - 32437368

AN - SCOPUS:85085155938

SN - 1932-6203

VL - 15

JO - PloS one

JF - PloS one

IS - 5

M1 - e0232414

ER -

Machine learning provides evidence that stroke risk is not linear: The non-linear Framingham stroke risk score

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this