TY - JOUR
T1 - Gaussian Process Based Heteroscedastic Noise Modeling for Tumor Mutation Burden Prediction from Whole Slide Images
AU - Park, Sunho
AU - Xu, Hongming
AU - Hwang, Tae Hyun
N1 - Publisher Copyright:
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/2/18
Y1 - 2019/2/18
N2 - Tumor mutation burden (TMB) is a quantitative measurement of how many mutations present in tumor cells from a patient tumor as assessed by next-generation sequencing (NGS) technology. High TMB is used as a predictive biomarker to select patients that likely respond to immunotherapy in many cancer types; thus it is critical to accurately measure TMB for guiding patients to immunotherapy treatments.be used to predict genetic Recent studies showed that image features from histopathology whole slide images could be used to predict genetic features (e.g., mutation status) or clinical outcome of cancer patients. In this study, we develop a computational algorithm to predict the TMB level from cancer patients’ histopathology whole slide images. We formulate TMP prediction problem based on whole slide images as a multiple instance learning (MIL) problem. A whole slide image (a bag) is divided into multiple small image blocks/patches (instances), but a single label (e.g., TMB level) is available only to an entire whole slide image not to each image block. In particular, we propose a novel heteroscedastic noise model for MIL based on the framework of Gaussian process (GP), where the noise variance is assumed to be a latent function of image level features. This noise variance can encode the confidence in predicting the TMB level from each training image and make the method to put different levels of effort to classify images according to how difficult each image can be correctly classified. The proposed method tries to fit an easier image well while it does not put much effort into classifying a harder (ambiguous) image correctly for TMP prediction. Expectation and propagation (EP) is employed to infer our model efficiently and to find the optimal hyper-parameters. In experiments using the whole slide images from synthetic and real-world data sets from The Cancer Genome Atlas (TCGA), we demonstrate that our method outperforms base-line methods for TMP prediction including a special case of our method that does not include the heteroscedastic noise modeling and a multiple instance ordinal regression (MIOR) to solve ordinal regression in the MIL setting.
AB - Tumor mutation burden (TMB) is a quantitative measurement of how many mutations present in tumor cells from a patient tumor as assessed by next-generation sequencing (NGS) technology. High TMB is used as a predictive biomarker to select patients that likely respond to immunotherapy in many cancer types; thus it is critical to accurately measure TMB for guiding patients to immunotherapy treatments.be used to predict genetic Recent studies showed that image features from histopathology whole slide images could be used to predict genetic features (e.g., mutation status) or clinical outcome of cancer patients. In this study, we develop a computational algorithm to predict the TMB level from cancer patients’ histopathology whole slide images. We formulate TMP prediction problem based on whole slide images as a multiple instance learning (MIL) problem. A whole slide image (a bag) is divided into multiple small image blocks/patches (instances), but a single label (e.g., TMB level) is available only to an entire whole slide image not to each image block. In particular, we propose a novel heteroscedastic noise model for MIL based on the framework of Gaussian process (GP), where the noise variance is assumed to be a latent function of image level features. This noise variance can encode the confidence in predicting the TMB level from each training image and make the method to put different levels of effort to classify images according to how difficult each image can be correctly classified. The proposed method tries to fit an easier image well while it does not put much effort into classifying a harder (ambiguous) image correctly for TMP prediction. Expectation and propagation (EP) is employed to infer our model efficiently and to find the optimal hyper-parameters. In experiments using the whole slide images from synthetic and real-world data sets from The Cancer Genome Atlas (TCGA), we demonstrate that our method outperforms base-line methods for TMP prediction including a special case of our method that does not include the heteroscedastic noise modeling and a multiple instance ordinal regression (MIOR) to solve ordinal regression in the MIL setting.
UR - http://www.scopus.com/inward/record.url?scp=85095513137&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095513137&partnerID=8YFLogxK
U2 - 10.1101/554261
DO - 10.1101/554261
M3 - Article
AN - SCOPUS:85095513137
JO - Seminars in Fetal and Neonatal Medicine
JF - Seminars in Fetal and Neonatal Medicine
SN - 1744-165X
ER -