top of page

AI and Tech in Dermatology

GENERAL MEDICINE APPLICATIONS

Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses. JAMA Netw Open. 2025. PMID: 40440012.

Definitions

  • AI Expert System / DDSS (Diagnostic Decision Support System): Represented by DXplain, a rule-based expert system developed at MGH. Uses a structured knowledge base with disease profiles and clinical findings.  Requires manual input of coded findings from a controlled vocabulary.  Outputs ranked lists of possible diagnoses and explains its reasoning.

  • Large Language Models (LLMs): Represented by ChatGPT-4 (LLM1) and Gemini 1.5 (LLM2). Trained on large datasets to generate human-like text.  Accept narrative case descriptions and generate differential diagnoses, but typically function as black boxes without explanations.

Study Design & Methods

  • Design: Diagnostic accuracy comparison study (Oct 2023 – Nov 2024).

  • Setting: Three academic medical centers; 36 unpublished, diagnostically challenging general medicine cases.

  • Inputs: LLMs received narrative text with and without lab results.  DDSS required structured input of all or only "relevant" clinical findings, with and without lab data.

  • Outcome Measured: Whether the correct diagnosis was listed in the top 25 differential diagnoses, and its ranking (using quintile scoring).

  • Case Evaluation: Each case reviewed by 3 blinded physicians to identify relevant findings.  Separate blinded reviewers entered data into each system.  All systems were “naïve” to the cases.

Results Summary

  • Without Lab Results: DDSS (all findings), correct diagnosis listed in 56% of cases, LLM1 in 42%, and LLM2 in 39%.  DDSS listed the diagnosis higher more often, but differences were not statistically significant (P ~ 0.08–0.09).  DDSS outperformed LLMs in ranking diagnoses higher overall. However, LLMs identified some correct diagnoses missed by DDSS, suggesting complementary strengths.

  • With Lab Results: DDSS 72%, LLM1 64%, LLM2 58%.

SKIN CANCER SCREENING

Single Lesion Assessment

Deep Ensemble for Recognition of Malignancy in the UK (link)

  • DERM (Deep Ensemble for Recognition of Malignancy) is an AI-based skin lesion analysis tool developed by Skin Analytics in the UK, designed to triage lesions referred under the National Health Service (NHS)’s "urgent suspected skin cancer pathway". It uses dermoscopic images captured via smartphone with a lens attachment and applies a fixed algorithm to classify lesions as benign, pre-cancerous, or malignant. In May 2025, the UK’s National Institute for Health and Care Excellence (NICE) issued early guidance allowing DERM to be used within NHS teledermatology services during a 3-year evidence-generation phase, provided safety protocols are followed—particularly for patients with darker skin, where data are sparse. The goal is to reduce dermatology service burden by safely discharging benign cases and prioritizing malignancies. Preliminary data suggest high sensitivity (~95–97%) for cancer detection but lower and variable specificity (42-73%). NICE emphasized the need for further validation, especially regarding performance across diverse skin types and its impact on dermatology service capacity. 

High-Risk Melanoma Surveillance

Standard Dermatoscope Images vs an Autonomous Total Body Photography and Dermoscopic Imaging Device. JAMA Dermatol. 2025. PMID: 40202727.

  • Goal: The investigators wanted to know whether a fully autonomous “imaging booth” could deliver non-inferior total-body photography (TBP) plus dermoscopy compared with the current 2-step manual workflow (wide-field photos + hand-held contact dermoscopy) for high-risk melanoma surveillance. Their primary yardstick was image acceptability; secondary end-points were image-quality score, diagnostic agreement, and time to acquire all images. 

  • Design: Prospective, two-center cohort (Barcelona & Figueres, Spain, March–Oct 2023). Every one of the 316 adults with atypical-mole syndrome was imaged both ways at the same visit, so each lesion served as its own control. A non-inferiority margin of 20 percentage points for acceptable images was prespecified and tested with the two one-sided tests (TOST) method.

  • Results: The autonomous TBP and dermoscopic device produced dermoscopic images with a mean (SD) quality score of 9.84 (0.72), compared with 9.44 (0.85) for manual digital dermoscopy, with no significant differences by body site or lesion type. Diagnostic classification agreement between the 2 methods was 91.60%, with most discrepancies related to small benign lesions. The mean (SD) imaging time for the autonomous device was 570 (169) seconds, compared with 606 (286) seconds for the manual method.

PHOTOGRAPHY

Digital Photography Guide for Dermatologists With Special Considerations for Diverse Populations. JAMA Dermatol. 2025. PMID: 40238107.

Anchor 1

External Medicine

 Conceived 2016

DISCLAIMER: This website is a collection of primary literature and the opinions of the website creators on that literature.  It is not intended to be used for the practice of medicine or the delivery of medical care in the absence of other appropriate credentials (like a medical degree).  Discuss any information with your doctor before pursuing treatments mentioned on this site.  

bottom of page