Deep Learning Models for Skin Cancer Detection Show Significant Performance Drop on Clinical Data
Researchers compared four deep learning architectures for classifying skin lesions from dermoscopic images, finding that a two-stage cascade approach with adjustable sensitivity thresholds outperforms single-stage classification. Performance metrics that reached 0.95+ AUC on internal test sets dropped substantially to 0.80-0.89 AUC when applied to independent clinical datasets from Russian medical centers, revealing a significant generalization gap. The findings highlight the importance of external clinical validation and recalibration before deploying such systems in real-world medical practice.
A study published on arXiv compared four neural network architectures (ViT-B/16, Swin-S, ConvNeXt-S, EfficientNetV2-S) for classifying skin neoplasms from dermoscopic images using three different classification schemes: binary (malignant/benign), single-stage four-class, and a two-stage cascade approach. All models were trained on publicly available ISIC Archive data with ImageNet pretraining. While the binary classification stage achieved ROC-AUC scores of 0.952-0.966 on internal test data, performance degraded substantially when evaluated on independent clinical datasets from Melanoscope AI and Sechenov University, with AUC dropping to 0.797-0.893 and sensitivity falling to 0.53-0.67. The cascade approach with tunable sensitivity thresholds recovered malignant lesions that single-stage classification had misclassified as benign, particularly improving results for the Vision Transformer model. The authors emphasize that the persistent generalization gap between laboratory and clinical settings necessitates external validation and recalibration before real-world deployment.
What's missing
The study does not discuss potential reasons for the substantial generalization gap between internal and clinical datasets (e.g., differences in image acquisition protocols, patient demographics, lesion prevalence, or imaging device characteristics). Additionally, the paper does not provide information about the size of the clinical validation datasets or statistical power analysis for the clinical evaluations.
What different sources said
- arXiv cs.AICenter
Cascade Classification of Dermoscopic Images of Skin Neoplasms with Controllable Sensitivity and External Clinical Validation
Related
Topology-Aware Thermodynamics Improves DNA Probe Specificity Design
Researchers developed a new framework for designing DNA probes that accounts for the spatial organization of matched sequences, not just overall thermodynamic stability. Traditional methods rely on scalar measures like melting temperature and free energy, which miss how mismatches are distributed along the probe. The approach could improve diagnostic accuracy in applications like HPV detection and gene expression profiling.
Study Identifies Optimal Thermal Dose for Combining Focused Ultrasound with Immunotherapy in Tumors
Researchers used multimodal PET imaging to identify an optimal thermal dose range for focused ultrasound ablation that destroys tumor tissue while preserving conditions for immunotherapy delivery. The study found that excessive heating collapses blood vessels needed for antibody access, while insufficient heating fails to adequately reduce tumor burden. The findings could guide clinical design of combination treatments pairing thermal ablation with immunotherapies.
Plant MSH1 Protein Functions as Mismatch-Directed Nuclease for Organelle Genome Maintenance
Researchers have identified the precise mechanism by which the AtMSH1 protein in Arabidopsis plants recognizes and cleaves DNA mismatches and lesions, preventing mutations in organellar genomes. The protein combines a DNA mismatch recognition module with a nuclease domain that makes staggered cuts at specific positions relative to DNA damage. This discovery explains how plants maintain unusually low mutation rates in their mitochondrial and chloroplast DNA compared to other eukaryotes.