Application of BERT to Enable Gene Classification Based on Clinical Evidence

Yuhan Su; Hongxin Xiang; Haotian Xie; Yong Yu; Shiyan Dong; Zhaogang Yang; Na Zhao

doi:10.1155/2020/5491963

Application of BERT to Enable Gene Classification Based on Clinical Evidence

Yuhan Su, Hongxin Xiang, Haotian Xie, Yong Yu, Shiyan Dong, Zhaogang Yang, Na Zhao

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 F-measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.

Original language	English (US)
Article number	5491963
Journal	BioMed Research International
Volume	2020
DOIs	https://doi.org/10.1155/2020/5491963
State	Published - 2020

ASJC Scopus subject areas

General Biochemistry, Genetics and Molecular Biology
General Immunology and Microbiology

Access to Document

10.1155/2020/5491963

Cite this

@article{65f67573c26b476092533ec660869b94,

title = "Application of BERT to Enable Gene Classification Based on Clinical Evidence",

abstract = "The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 F-measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.",

author = "Yuhan Su and Hongxin Xiang and Haotian Xie and Yong Yu and Shiyan Dong and Zhaogang Yang and Na Zhao",

note = "Publisher Copyright: {\textcopyright} 2020 Yuhan Su et al.",

year = "2020",

doi = "10.1155/2020/5491963",

language = "English (US)",

volume = "2020",

journal = "BioMed Research International",

issn = "2314-6133",

publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Application of BERT to Enable Gene Classification Based on Clinical Evidence

AU - Su, Yuhan

AU - Xiang, Hongxin

AU - Xie, Haotian

AU - Yu, Yong

AU - Dong, Shiyan

AU - Yang, Zhaogang

AU - Zhao, Na

PY - 2020

Y1 - 2020

N2 - The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 F-measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.

AB - The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 F-measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.

UR - http://www.scopus.com/inward/record.url?scp=85094220538&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85094220538&partnerID=8YFLogxK

U2 - 10.1155/2020/5491963

DO - 10.1155/2020/5491963

M3 - Article

C2 - 33083472

AN - SCOPUS:85094220538

SN - 2314-6133

VL - 2020

JO - BioMed Research International

JF - BioMed Research International

M1 - 5491963

ER -

Application of BERT to Enable Gene Classification Based on Clinical Evidence

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this