Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets

Ali S. Tejani; Yee S. Ng; Yin Xi; Julia R. Fielding; Travis G. Browning; Jesse C. Rayan

doi:10.1148/ryai.220007

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets

Ali S. Tejani, Yee S. Ng, Yin Xi, Julia R. Fielding, Travis G. Browning, Jesse C. Rayan

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Purpose: To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. Materials and Methods: The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020–March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. Results: The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. Conclusion: Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.

Original language	English (US)
Article number	e220007
Journal	Radiology: Artificial Intelligence
Volume	4
Issue number	4
DOIs	https://doi.org/10.1148/ryai.220007
State	Published - Jul 2022

ASJC Scopus subject areas

Radiological and Ultrasound Technology
Radiology Nuclear Medicine and imaging
Artificial Intelligence

Access to Document

10.1148/ryai.220007

Cite this

@article{2bf64e879b944b79b9151bc91e2d76cd,

title = "Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets",

abstract = "Purpose: To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. Materials and Methods: The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020–March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. Results: The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. Conclusion: Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.",

author = "Tejani, {Ali S.} and Ng, {Yee S.} and Yin Xi and Fielding, {Julia R.} and Browning, {Travis G.} and Rayan, {Jesse C.}",

note = "Publisher Copyright: {\textcopyright} RSNA, 2022.",

year = "2022",

month = jul,

doi = "10.1148/ryai.220007",

language = "English (US)",

volume = "4",

journal = "Radiology: Artificial Intelligence",

issn = "2638-6100",

publisher = "Radiological Society of North America Inc.",

number = "4",

}

TY - JOUR

T1 - Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets

AU - Tejani, Ali S.

AU - Ng, Yee S.

AU - Xi, Yin

AU - Fielding, Julia R.

AU - Browning, Travis G.

AU - Rayan, Jesse C.

N1 - Publisher Copyright: © RSNA, 2022.

PY - 2022/7

Y1 - 2022/7

N2 - Purpose: To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. Materials and Methods: The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020–March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. Results: The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. Conclusion: Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.

AB - Purpose: To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. Materials and Methods: The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020–March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. Results: The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. Conclusion: Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.

UR - http://www.scopus.com/inward/record.url?scp=85134995842&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85134995842&partnerID=8YFLogxK

U2 - 10.1148/ryai.220007

DO - 10.1148/ryai.220007

M3 - Article

C2 - 35923377

AN - SCOPUS:85134995842

SN - 2638-6100

VL - 4

JO - Radiology: Artificial Intelligence

JF - Radiology: Artificial Intelligence

IS - 4

M1 - e220007

ER -

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this