Authors: Bogdana Schmidt, Hriday P. Bhambhvani, Richard E. Fan, Christian Kunder, Chia Sui Kao, John P. Higgins, Mirabela Rusu, Geoffrey A. Sonn
Introduction: Prostate cancer treatment relies on accurate Gleason grading. Gleason scoring has high inter- and intra-rater variability, even among high-volume uropathologists. Deep learning on digital pathology has the potential to address these problems. We aimed to externally validate a deep learning algorithm’s performance on prostate cancer identification and Gleason grading.
Methods: DeepDx Prostate (DeepBio, Seoul, South Korea) is an automated Gleason scoring system that was trained using 1133 prostate core needle biopsy images and validated on 700. We performed external validation using 150 whole mount prostatectomy specimens from which 500 (1mm2) tiles were created and evaluated by 2 uropathologists and the DeepDx algorithm to establish Gleason grade, amount of cancer, and percentage of Gleason pattern 4 and 5 in the tiles. The reference standard was established by consensus of two experienced uropathologists with a third expert to evaluate discordant cases. We defined the main metric as the agreement with the reference standard, measured using quadratic Cohen’s kappa (?).
Results: The DeepDx algorithm achieved overall high agreement with the reference standard (? 0.79, 95% CI 0.75 - 0.82). It performed well at clinical decision thresholds benign vs malignant, (? 0.927), and clinically low risk (benign, GG1, or GG2) versus high risk (GG 3-5) disease (? 0.858). In evaluating benign and GG1 vs GG 2-5, the algorithm had less agreement (? 0.771). Of 83 tiles classified as GG1 by uropathologists, the algorithm upgraded 53 (64%) to GG2, but median pattern 4 area was 3.9% (IQR 0.0009, 28.703) in those cases.
Conclusions: In this external validation, we found that the DeepDx algorithm had high agreement with expert uropathologists in cancer identification and grading, despite being trained with a different patient population and using biopsy cores instead of prostatectomy specimens. The speed and accuracy of deep learning-based systems has broad applications from allowing clinicians to better counsel patients to facilitating research requiring detailed annotation of datasets not feasible by human pathologists.
Source of Funding: None