Computer-aided detection. The effect of training databases on detection of subtle breast masses

Bin Zheng, Xingwei Wang, Dror Lederman, Jun Tan, David Gur

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Rationale and Objectives: Lesion conspicuity is typically highly correlated with visual difficulty for lesion detection, and computer-aided detection (CAD) has been widely used as a "second reader" in mammography. Hence, increasing CAD sensitivity in detecting subtle cancers without increasing false-positive rates is important. The aim of this study was to investigate the effect of training database case selection on CAD performance in detecting low-conspicuity breast masses. Materials and Methods: A full-field digital mammographic image database that included 525 cases depicting malignant masses was randomly partitioned into three subsets. A CAD scheme was applied to detect all initially suspected mass regions and compute region conspicuity. Training samples were iteratively selected from two of the subsets. Four types of training data sets-(1) one including all available true-positive mass regions in the two subsets (" all" ), (2) one including 350 randomly selected mass regions (" diverse" ), (3) one including 350 high-conspicuity mass regions (" easy" ), and (4) one including 350 low-conspicuity mass regions (" difficult" )-were assembled. In each training data set, the same number of randomly selected false-positive regions as the true-positives were also included. Two classifiers, an artificial neural network (ANN) and a k-nearest neighbor (KNN) algorithm, were trained using each of the four training data sets and tested on all suspected regions in the remaining data set. Using a threefold cross-validation method, the performance changes of the CAD schemes trained using one of the four training data sets were computed and compared. Results: CAD initially detected 1025 true-positive mass regions depicted on 507 cases (97% case-based sensitivity) and 9569 false-positive regions (3.5 per image) in the entire database. Using the all training data set, CAD achieved the highest overall performance on the entire testing database. However, CAD detected the highest number of low-conspicuity masses when the difficult training data set was used for training. Results did agree for both ANN-based and KNN-based classifiers in all tests. Compared to the use of the all training data set, the sensitivity of the schemes trained using the difficult data set decreased by 8.6% and 8.4% for the ANN and KNN algorithm on the entire database, respectively, but the detection of low-conspicuity masses increased by 7.1% and 15.1% for the ANN and KNN algorithm at a false-positive rate of 0.3 per image. Conclusions: CAD performance depends on the size, diversity, and difficulty level of the training database. To increase CAD sensitivity in detecting subtle cancer, one should increase the fraction of difficult cases in the training database rather than simply increasing the training data set size.

Original languageEnglish (US)
Pages (from-to)1401-1408
Number of pages8
JournalAcademic Radiology
Volume17
Issue number11
DOIs
StatePublished - Nov 1 2010

Fingerprint

Breast
Databases
Datasets
Mammography
Neoplasms

Keywords

  • CAD
  • Computer-aided detection
  • FFDM
  • Full-field digital mammography
  • Image databases
  • Performance assessment

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging

Cite this

Computer-aided detection. The effect of training databases on detection of subtle breast masses. / Zheng, Bin; Wang, Xingwei; Lederman, Dror; Tan, Jun; Gur, David.

In: Academic Radiology, Vol. 17, No. 11, 01.11.2010, p. 1401-1408.

Research output: Contribution to journalArticle

Zheng, Bin ; Wang, Xingwei ; Lederman, Dror ; Tan, Jun ; Gur, David. / Computer-aided detection. The effect of training databases on detection of subtle breast masses. In: Academic Radiology. 2010 ; Vol. 17, No. 11. pp. 1401-1408.
@article{808e1c1d18194d5eafc3fd804740f441,
title = "Computer-aided detection. The effect of training databases on detection of subtle breast masses",
abstract = "Rationale and Objectives: Lesion conspicuity is typically highly correlated with visual difficulty for lesion detection, and computer-aided detection (CAD) has been widely used as a {"}second reader{"} in mammography. Hence, increasing CAD sensitivity in detecting subtle cancers without increasing false-positive rates is important. The aim of this study was to investigate the effect of training database case selection on CAD performance in detecting low-conspicuity breast masses. Materials and Methods: A full-field digital mammographic image database that included 525 cases depicting malignant masses was randomly partitioned into three subsets. A CAD scheme was applied to detect all initially suspected mass regions and compute region conspicuity. Training samples were iteratively selected from two of the subsets. Four types of training data sets-(1) one including all available true-positive mass regions in the two subsets ({"} all{"} ), (2) one including 350 randomly selected mass regions ({"} diverse{"} ), (3) one including 350 high-conspicuity mass regions ({"} easy{"} ), and (4) one including 350 low-conspicuity mass regions ({"} difficult{"} )-were assembled. In each training data set, the same number of randomly selected false-positive regions as the true-positives were also included. Two classifiers, an artificial neural network (ANN) and a k-nearest neighbor (KNN) algorithm, were trained using each of the four training data sets and tested on all suspected regions in the remaining data set. Using a threefold cross-validation method, the performance changes of the CAD schemes trained using one of the four training data sets were computed and compared. Results: CAD initially detected 1025 true-positive mass regions depicted on 507 cases (97{\%} case-based sensitivity) and 9569 false-positive regions (3.5 per image) in the entire database. Using the all training data set, CAD achieved the highest overall performance on the entire testing database. However, CAD detected the highest number of low-conspicuity masses when the difficult training data set was used for training. Results did agree for both ANN-based and KNN-based classifiers in all tests. Compared to the use of the all training data set, the sensitivity of the schemes trained using the difficult data set decreased by 8.6{\%} and 8.4{\%} for the ANN and KNN algorithm on the entire database, respectively, but the detection of low-conspicuity masses increased by 7.1{\%} and 15.1{\%} for the ANN and KNN algorithm at a false-positive rate of 0.3 per image. Conclusions: CAD performance depends on the size, diversity, and difficulty level of the training database. To increase CAD sensitivity in detecting subtle cancer, one should increase the fraction of difficult cases in the training database rather than simply increasing the training data set size.",
keywords = "CAD, Computer-aided detection, FFDM, Full-field digital mammography, Image databases, Performance assessment",
author = "Bin Zheng and Xingwei Wang and Dror Lederman and Jun Tan and David Gur",
year = "2010",
month = "11",
day = "1",
doi = "10.1016/j.acra.2010.06.009",
language = "English (US)",
volume = "17",
pages = "1401--1408",
journal = "Academic Radiology",
issn = "1076-6332",
publisher = "Elsevier USA",
number = "11",

}

TY - JOUR

T1 - Computer-aided detection. The effect of training databases on detection of subtle breast masses

AU - Zheng, Bin

AU - Wang, Xingwei

AU - Lederman, Dror

AU - Tan, Jun

AU - Gur, David

PY - 2010/11/1

Y1 - 2010/11/1

N2 - Rationale and Objectives: Lesion conspicuity is typically highly correlated with visual difficulty for lesion detection, and computer-aided detection (CAD) has been widely used as a "second reader" in mammography. Hence, increasing CAD sensitivity in detecting subtle cancers without increasing false-positive rates is important. The aim of this study was to investigate the effect of training database case selection on CAD performance in detecting low-conspicuity breast masses. Materials and Methods: A full-field digital mammographic image database that included 525 cases depicting malignant masses was randomly partitioned into three subsets. A CAD scheme was applied to detect all initially suspected mass regions and compute region conspicuity. Training samples were iteratively selected from two of the subsets. Four types of training data sets-(1) one including all available true-positive mass regions in the two subsets (" all" ), (2) one including 350 randomly selected mass regions (" diverse" ), (3) one including 350 high-conspicuity mass regions (" easy" ), and (4) one including 350 low-conspicuity mass regions (" difficult" )-were assembled. In each training data set, the same number of randomly selected false-positive regions as the true-positives were also included. Two classifiers, an artificial neural network (ANN) and a k-nearest neighbor (KNN) algorithm, were trained using each of the four training data sets and tested on all suspected regions in the remaining data set. Using a threefold cross-validation method, the performance changes of the CAD schemes trained using one of the four training data sets were computed and compared. Results: CAD initially detected 1025 true-positive mass regions depicted on 507 cases (97% case-based sensitivity) and 9569 false-positive regions (3.5 per image) in the entire database. Using the all training data set, CAD achieved the highest overall performance on the entire testing database. However, CAD detected the highest number of low-conspicuity masses when the difficult training data set was used for training. Results did agree for both ANN-based and KNN-based classifiers in all tests. Compared to the use of the all training data set, the sensitivity of the schemes trained using the difficult data set decreased by 8.6% and 8.4% for the ANN and KNN algorithm on the entire database, respectively, but the detection of low-conspicuity masses increased by 7.1% and 15.1% for the ANN and KNN algorithm at a false-positive rate of 0.3 per image. Conclusions: CAD performance depends on the size, diversity, and difficulty level of the training database. To increase CAD sensitivity in detecting subtle cancer, one should increase the fraction of difficult cases in the training database rather than simply increasing the training data set size.

AB - Rationale and Objectives: Lesion conspicuity is typically highly correlated with visual difficulty for lesion detection, and computer-aided detection (CAD) has been widely used as a "second reader" in mammography. Hence, increasing CAD sensitivity in detecting subtle cancers without increasing false-positive rates is important. The aim of this study was to investigate the effect of training database case selection on CAD performance in detecting low-conspicuity breast masses. Materials and Methods: A full-field digital mammographic image database that included 525 cases depicting malignant masses was randomly partitioned into three subsets. A CAD scheme was applied to detect all initially suspected mass regions and compute region conspicuity. Training samples were iteratively selected from two of the subsets. Four types of training data sets-(1) one including all available true-positive mass regions in the two subsets (" all" ), (2) one including 350 randomly selected mass regions (" diverse" ), (3) one including 350 high-conspicuity mass regions (" easy" ), and (4) one including 350 low-conspicuity mass regions (" difficult" )-were assembled. In each training data set, the same number of randomly selected false-positive regions as the true-positives were also included. Two classifiers, an artificial neural network (ANN) and a k-nearest neighbor (KNN) algorithm, were trained using each of the four training data sets and tested on all suspected regions in the remaining data set. Using a threefold cross-validation method, the performance changes of the CAD schemes trained using one of the four training data sets were computed and compared. Results: CAD initially detected 1025 true-positive mass regions depicted on 507 cases (97% case-based sensitivity) and 9569 false-positive regions (3.5 per image) in the entire database. Using the all training data set, CAD achieved the highest overall performance on the entire testing database. However, CAD detected the highest number of low-conspicuity masses when the difficult training data set was used for training. Results did agree for both ANN-based and KNN-based classifiers in all tests. Compared to the use of the all training data set, the sensitivity of the schemes trained using the difficult data set decreased by 8.6% and 8.4% for the ANN and KNN algorithm on the entire database, respectively, but the detection of low-conspicuity masses increased by 7.1% and 15.1% for the ANN and KNN algorithm at a false-positive rate of 0.3 per image. Conclusions: CAD performance depends on the size, diversity, and difficulty level of the training database. To increase CAD sensitivity in detecting subtle cancer, one should increase the fraction of difficult cases in the training database rather than simply increasing the training data set size.

KW - CAD

KW - Computer-aided detection

KW - FFDM

KW - Full-field digital mammography

KW - Image databases

KW - Performance assessment

UR - http://www.scopus.com/inward/record.url?scp=77957658513&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77957658513&partnerID=8YFLogxK

U2 - 10.1016/j.acra.2010.06.009

DO - 10.1016/j.acra.2010.06.009

M3 - Article

C2 - 20650667

AN - SCOPUS:77957658513

VL - 17

SP - 1401

EP - 1408

JO - Academic Radiology

JF - Academic Radiology

SN - 1076-6332

IS - 11

ER -