Journal of Medical Imaging, Vol. 9, Issue 03, 034502, (June 2022) https://doi.org/10.1117/1.JMI.9.3.034502
TOPICS: Breast, Magnetic resonance imaging, Artificial intelligence, Diagnostics, Drug discovery, Data modeling, Performance modeling, Breast cancer, Tumor growth modeling, Feature selection
Purpose: We demonstrate continuous learning and assess its impact on the performance of artificial intelligence of breast dynamic contrast-enhanced magnetic resonance imaging in the task of distinguishing malignant from benign lesions on an independent clinical test dataset.
Approach: The study included 1979 patients with 1990 lesions who underwent breast MR imaging during 2015, 2016, and 2017, retrospectively collected under an IRB-approved protocol; there were 1494 malignant and 496 benign lesions based on histopathology. AI was conducted in the task of distinguishing malignant and benign lesions, and independent testing was performed to assess the effect of increasing the numbers of training cases. Five training sets mimicking clinical implementation of continuous AI learning included cases from (1) first quarter of 2015, (2) first half of 2015, (3) all 2015, (4) all 2015 and first half of 2016, and (5) all 2015 and 2016. All classifiers were evaluated on the 2017 independent test set. The area under the ROC curve (AUC) served as the performance metric and was calculated over all lesions in the test set, as well as only mass lesions and only non-mass enhancements. The Mann–Kendall test was used to determine if continuous learning resulted in a positive trend in classification performance. P < 0.05 was considered to be statistically significant.
Results: Over the continuous training period, the selected feature subsets tended to become more similar and stable. Performance of the five training conditions on the independent test dataset yielded AUCs of 0.86 (95% CI: [0.83,0.90]), 0.87 (95% CI: [0.83,0.90]), 0.88 (95% CI: [0.84,0.91]), 0.89 (95% CI: [0.85,0.92]), and 0.89 (95% CI: [0.86,0.92]). The Mann–Kendall test indicated a statistically significant positive trend (P = 0.0167) in classification performance with continuous learning.
Conclusions: Improved diagnostic performance over time was observed when continuous learning of AI was implemented on an independent clinical test dataset.