Artificial intelligence in dermatology: advancements and challenges in skin of color
Conflict of interest: None.
Funding source: None.
Abstract
Artificial intelligence (AI) uses algorithms and large language models in computers to simulate human-like problem-solving and decision-making. AI programs have recently acquired widespread popularity in the field of dermatology through the application of online tools in the assessment, diagnosis, and treatment of skin conditions. A literature review was conducted using PubMed and Google Scholar analyzing recent literature (from the last 10 years through October 2023) to evaluate current AI programs in use for dermatologic purposes, identifying challenges in this technology when applied to skin of color (SOC), and proposing future steps to enhance the role of AI in dermatologic practice. Challenges surrounding AI and its application to SOC stem from the underrepresentation of SOC in datasets and issues with image quality and standardization. With these existing issues, current AI programs inevitably do worse at identifying lesions in SOC. Additionally, only 30% of the programs identified in this review had data reported on their use in dermatology, specifically in SOC. Significant development of these applications is required for the accurate depiction of darker skin tone images in datasets. More research is warranted in the future to better understand the efficacy of AI in aiding diagnosis and treatment options for SOC patients.
Introduction
Artificial intelligence (AI) refers to the development of large language model computer systems to perform tasks that require human intelligence.1 AI uses algorithms and models to not only learn from data but also to recognize patterns and potentially make decisions.2 AI has emerged as a powerful tool in dermatology because of its potential to efficiently recognize patterns and features of dermatologic conditions, such as aiding in the early detection of skin cancer.3 AI has also personalized treatment recommendations using patient data, including medical history, symptoms, and treatment outcomes.3 Given these advancements, AI is a valuable tool to assist dermatologists by enhancing diagnostic accuracy and clinical decision-making and ultimately improving patient outcomes.
Despite these advancements, AI in skin of color (SOC) is limited by the data it is trained on. AI programs frequently utilize the Fitzpatrick Skin Phototype (FST I-VI) for the classification of skin tones. These programs rely on images labeled with FST (I-VI) to represent various skin types.4 However, the FST is inherently flawed when it comes to classification in SOC, as the scale was initially created to determine the risk of burning during phototherapy. Another less common tool used to classify skin tones is the Monk scale. This skin scale accounts for skin color and ethnic background to classify individuals into 10 different skin type categories, allowing for more inclusive categorization.5, 6 Skin type is an important factor because some skin tones are more at risk for some dermatologic disorders, that is, darker skin tones are associated with increased risk of hyperpigmentation and keloid formations.7 In conjunction with dataset drawbacks, the underrepresentation of individuals with SOC in dermatologic literature and research is a significant concern that can contribute to gaps in AI algorithms.7, 8 There has been a historical lack of research focused on populations with SOC, resulting in limited data and studies addressing their unique clinical presentations.8 There has also been a limited emphasis on dermatologic conditions specific to these populations in medical education, which creates a lack of awareness and evidence-based data to support the future approach toward dermatologic education.9-13 Addressing the challenges of AI when applied to SOC is crucial for achieving equitable and inclusive outcomes in dermatologic care. Failure to account for diverse skin tones can lead to biased algorithms and inaccurate results, disproportionately impacting individuals with darker skin. Recognizing and rectifying these challenges is essential to ensure fair representation, eliminate discrimination, and foster the development of AI systems that work effectively for people of all skin colors.
Currently, there is a lack of representation of SOC in datasets used by AI algorithms.14, 15 When used appropriately, AI can be particularly powerful in addressing the specific challenges of individuals with SOC. These tools should be trained on diverse datasets, ensuring the representation of these populations, thus improving diagnostic accuracy.2 AI can also assist in overcoming biases by providing objective insights from diverse patient data, which can enhance the detection and management of conditions that are present uniquely in SOC.2 The objective of this review is to identify and address gaps in current AI usage in dermatology, specifically regarding individuals with SOC, and to propose future steps to enhance its application in dermatologic practice.
Methods
A literature review was conducted using PubMed and Google Scholar databases to evaluate the current uses of AI in dermatologic conditions among patients with SOC. Search dates were limited from February 2002 until June 2023. Primary search terms included artificial intelligence and AI, with additional search terms including dermatology, skin of color, pigmentation, representation of skin, racial inequities, public health, dermatologic screening, skin cancer, and melanoma. Inclusion criteria included articles written in English and various study types, including systematic reviews, clinical trials, retrospective single-center studies, and case reports. Exclusion criteria consisted of studies not in English. All resulting studies were assessed for relevance.
Results
Discussion
Challenges of AI in Dermatology for SOC
Author | Year | Apps and AI technology | SOC in trained datasets |
---|---|---|---|
Chen et al.16 | 2016 | Lubax | Lesion Image Database: Black or African American: 4.3% |
Kamulegeya et al.17 | 2019 | SIS from First Derm | Overall diagnostic accuracy of 17% in FST VI |
MacLellan et al.18 | 2019 | FotoFinder | Sensitivities, Specificities: FotoFinder: 88.1%, 78.8%; Dermatologist: 96.6%, 32.2%. MelaFind: 82.5%, 52.4%; Teledermatology: 84.5%, 82.6%. Verisante Aura: 21.4%, 86.2%. Exclusion criteria included individuals with a FST > III due to limitations in patients with higher phototypes |
2019 | MelaFind | ||
2019 | Verisante Aura | ||
Phillips et al.19 | 2019 | Clinicians and the Deep Ensemble for Recognition of Malignancy | Participant characteristics: FST IV: 12.6% of the participants; FST V: 2.0% of the participants; FST VI: 0.8% of the participants. |
Liu et al.20 | 2020 | Derm Assist from Google Health | Development Set: FST IV 31.3%; FST V 3.2%; FST VI 0.3% |
Patil et al.21 | 2020 | Tibot |
Prediction accuracy: Alopecia: 100% Benign tumors: 71.4%; Fungal infections: 95.6% Infestation: 59.3%; Eczema: 91.7% Viral infection: 26.7% No skin types were recorded in this study, however the researchers report that skin types were between FST IV-VI |
Alvarado et al.22 | 2021 | DermExpert from VisualDx | 28.5% of images are FST IV, V, VI |
Daneshjou et al.15 | 2021 | ModelDerm | ROC-AUC performance was significantly lower in FST V-VI (0.55) when compared to I-II (0.64) |
2021 | DeepDerm | ROC-AUC performance was significantly lower in FST V-VI (0.50) when compared to I-II (0.61) | |
2021 | HAM 10000 | ROC-AUC performance was significantly lower in FST V-VI (0.57) when compared to I-II (0.72) | |
ROC-AUC was lower on DDI dataset when compared to original dataset for ModelDerm, DeepDerm, and HAM 10000 | |||
Guido et al.23 | 2021 | SkinIO | FST IV: 16% of the participants; FST V: 0% of the participants; FST VI: 0% of the participants |
DeGrave et al.24 | 2023 | Scanoma | Both models did not achieve satisfactory performance on a dataset. Emphasis was made for further inclusion of darker skin tones in dermatological AI algorithms |
2023 | Smart Skin Cancer Detection |
- This is a nonexhaustive list and is subject to change. DDI Dataset, Diverse Dermatology Images Dataset; FST, Fitzpatrick Skin Type; ROC-AUC, receiver operator curve area under the curve; SIS, skin image search; SOC, skin of color.
AI programs with gaps in SOC image analysis | ||||
---|---|---|---|---|
AIDiagnose | Derma analytics – analyze for skin cancer | DermIA | Mole Detective | Piel |
ApreSkin | Derma pic | EczemaLess | MoleAgnose | Rash ID |
Aysa | Deep learning for melanoma | Firstcheck skin | Molemapper | Skin-Check: Dermatology App |
CompariSkin | Dermadetect | Health AI | MoleScope | SkinScan |
Curifai | DermAI | Helfie | MoleScreener | SkinVision |
Cutis.ai | DermEngine | Medgic | MoleWatcher | Skinly |
Deep learning for melanoma | DermatologistOnCall – online Dermatology care | MelApp | Molexplore | TroveSkin |
- This is a nonexhaustive list and is subject to change.
Monk vs. Fitzpatrick Skin Scale
There are currently numerous challenges that hinder the effectiveness of AI in its application to SOC. One of the most prominent challenges is the widespread use of FST for data annotation and measurement of skin color. Research suggests that utilizing the FST may be problematic because it is an unreliable indicator of skin pigmentation.25 The FST's original use was to determine how skin reacts to phototherapy (UV) among Whites being treated for psoriasis.26, 27 Since then, the FST has been widely used to classify different skin colors. The FST, however, was not designed to incorporate a diverse representation of skin colors. Studies have shown that the FST fails to encompass a significant portion of the Black community, resulting in data that overstates the prevalence of type IV skin among Black individuals and may lead to inadequate cancer risk assessment.28, 29 To further exemplify the limitations of FST in SOC, Goon et al. determined the FST to be neither correct nor sufficient for SOC, suggesting it may provide a false sense of confidence to both patients and physicians regarding risk, and even suggest photoanalysis and AI as a feasible mechanism to visualize and categorize SOC, albeit on properly trained models.30
More recently, the Monk scale was designed by Harvard Professor of Sociology, Dr. Elias Monk, to better capture the variations in human skin pigmentation.6 In contrast to the FST, the Monk scale aims to categorize skin tones within a broader spectrum, with the goal of accommodating diverse populations. In fact, Google has collaborated with Dr. Elias Monk's skin scale to enhance the representation of diverse images across Google products, including image searches on Google.31 Additionally, Casual Conversations data from Facebook/Meta have suggested that the Monk scale be used in conjunction with the FST for skin tone annotation in AI algorithms. The Casual Conversations dataset has been created with the aim of reinforcing the fairness and diversity of AI systems while ensuring they remain unbiased.32
Utilizing more inclusive scales, such as the Monk scale, can contribute to better categorization of skin tones within a broader spectrum. Additionally, collaborations across industries incorporating diverse skin scales into work products demonstrate a step toward comprehensive representation.
Underrepresentation in Datasets
Currently, SOC images are largely underrepresented in AI training datasets, with programs historically trained on datasets with lighter skin tones.14, 15 Teaching AI programs to recognize skin lesions across various skin tones is crucial because dermatologic conditions may present differently in patients with different skin tones, which may lead to biased algorithms. In a recent article by Diao et al.,33 responses to the New England Journal of Medicine (NEJM) image challenge were analyzed. There were significantly fewer questions on dark skin (P < 0.001), suggesting less image availability. Some researchers have proposed training AI models to recognize “darkened” images in datasets to simulate SOC patients. The Fast Contrastive Unpaired Translation (FastCUT) technique was employed to “darken” the skin color in the images without altering the dermatologic lesion. The study showed that training an AI model on artificially “darkened” images as compared to training on “lighter-skinned” images resulted in a higher statistical significance in differentiating between basal cell carcinoma and melanoma in patients with brown skin tone.34 It is important to note, however, that “darkening” skin images may not be the most effective method to teach AI technologies to recognize SOC lesions. Instead, it is crucial to incorporate more images of SOC lesions in the datasets themselves.
Image Quality
As an image-oriented field, dermatology relies on precise photographic techniques to best capture lesions on the hair, skin, and nails.35 Medical photography is an accurate tool to document physical examination findings in patient records and AI databases. One study found that publicly available skin image datasets that rely on medical photography have limitations in terms of their applicability to real-life clinical settings and the representation of diverse populations, particularly those with darker skin tones.36 This is due to factors such as lighting, exposure, and focus, centering in the background, aperture, and shutter speed. Consistent lighting is essential; the flash on many smartphones and tablets may not adequately provide consistent lighting. However, the flash on Digital Single Lens Reflex (DSLR) cameras can dominate the ambient light within a room because of the precisely timed camera shutter, effectively compensating for poor lighting conditions in office settings.37, 38 As smartphones and tablets have become more widespread, dermatology practices are incorporating these devices to take pictures of patients' dermatologic lesions. To our knowledge, there are no current research studies examining the effectiveness of smartphones, such as iPhones or tablets, compared to professional cameras in terms of image quality. However, it is clear that the key factor in achieving high-quality images is the consistency of image capture. Establishing guidelines for consistent image capture practices, regardless of the device used, is essential to improve the applicability of AI systems in real-life clinical settings and ensure the accurate representation of dermatologic conditions in SOC.
Image Standardization
AI technology currently lacks the ability to accurately discern artifacts and distortions in images.39 Studies have found that image rotation, blurry photographs, the body position of the patient, the type of camera used, and brightness manipulation greatly affect the quality of images AI algorithms learn from.40, 41 The performance of AI in recognizing images degrades if different types of cameras are used.42 This quality further decreases if images of skin lesions are taken and sent in by patients themselves. Capturing images on darker skin tones differs from photographing light skin tones due to overexposure on SOC, resulting in unwanted flash reflections.43, 44 While there are techniques to avoid overexposure,22 to our knowledge, there are no studies to determine the proficiency of dermatologists in photographing SOC using these techniques. This is important to assess, as practitioners may have inexperience in taking images of patients with SOC, resulting in images not being properly exposed.
Additionally, other extraneous markings on images such as rulers, ink margins, dark corners, gel bubbles, and skin hairs hinder AI's ability to interpret images correctly.45 One study noted that images of melanoma frequently included rulers to measure lesion size. Therefore, the AI algorithm unintentionally had a higher tendency to classify images containing rulers as cancerous.46 Another study found that surgical skin markings interfered with the algorithm's ability to discern benign lesions from melanoma.47 Tattoos and permanent makeup may also affect AI's ability to properly identify skin lesions.48 Efforts should be directed toward educating practitioners on the nuances of photographing skin for medical purposes, especially in darker skin tones to mitigate issues such as unwanted flash reflections.
Image Metadata Accuracy
After an image is uploaded to an AI database, it is processed with its description and diagnosis. A study found that dermatologists often label SOC images incorrectly compared to images of skin lesions in lighter skin tones.15 Accuracy of skin type classification varied significantly between FST I–II and FST V–VI, although using this classification as a proxy for interpretation of skin tone is not the true intent behind FST. To help with the preprocessing of images for AI databases, researchers developed a standardized checklist called CLEAR Derm that may be used to mitigate biases in the processing of images.49 This checklist comprehensively assesses crucial aspects of image preprocessing in four main categories: Data, Technique, Technical Assessment, and Application. It evaluates the presence or absence of details such as image characteristics, preprocessing procedures, patient metadata, skin tone information, labeling methods, and algorithm development descriptions. CLEAR Derm ensures transparency and completeness in documenting key elements involved in the development and assessment of AI algorithms. Unclear, inaccurate, and insufficient descriptions of images can lead to incorrect diagnosis.50 A study found that a significant number of AI imaging studies in populations with SOC did not adequately address multiple elements outlined in the CLEAR Checklist. Researchers noted that even skin color information was lacking in processing of several studies. This study also reported that anatomical lesion locations were specified in fewer than 50% of the cases.
Lesion location information is crucial, as some conditions have characteristic appearances on certain areas of the skin. SOC populations are more likely to develop acral lentiginous melanoma, which is commonly found on the palms, soles, subungual region, and mucous membranes.51 Other key clinical information is often missing in preprocessing such as age, gender, degree of sun damage, and personal and family history.39 Implementing standardized checklists, like CLEAR Derm, during the preprocessing of images for AI databases can help mitigate biases and inaccuracies introduced by dermatologists' labeling variations, ensuring a more accurate and comprehensive representation of SOC lesions.
Current AI Programs in Use
AI has been available in dermatology for both clinicians and laypersons. These programs make it feasible to use large datasets to analyze dermatologic conditions. They can be used as a supportive tool for helping clinicians make medical decisions while giving patients a better understanding of their condition. There has been a large amount of work to explore potential applications of AI in dermatology, with many aspects of dermatologic care currently using AI as a tool. Recent advancements include application of AI technology in skin malignancy diagnosis, identification of inflammatory dermatological conditions like atopic dermatitis and psoriasis, assessment of ulcers, and dermatopathology evaluation.52, 53 Given the problems of image quality, standardization, and lack of representation of SOC in datasets, the programs have a clear need for further development and growth. A nonexhaustive list of AI programs that included SOC images are discussed in Table 1.
Our results showed 10 studies and 15 AI technologies that investigated AI's effectiveness in analyzing SOC images. According to Table 1, most of these studies elucidated that SOC images are underrepresented in datasets. Other studies applied the AI technologies to patient samples that either omitted or minimally included SOC. The data show a lack of representation of SOC in datasets as well as a lack of accuracy of AI technologies in SOC. VisualDx offers DermExpert, an AI program aimed at improving diagnostic accuracy and patient outcomes. VisualDx encompasses a wide range of dermatological conditions across various skin types, with a focus on providing medical context and therapeutic suggestions for each case. However, a study revealed that while VisualDx is a valuable tool, its diagnostic accuracy did not significantly improve in SOC patients, underscoring the need for further development in AI programs tailored to diverse skin tones.22 DermAssist, a Google Health AI program, offers a similar service, but its effectiveness is yet to be fully evaluated, and concerns exist regarding the diversity of its dataset.54
Other AI programs like Proscia DermAI, SkinVision, and Skin Image Search (SIS) also aim to assist in dermatological diagnosis, but they face challenges in achieving accurate results, especially in SOC. Deep learning algorithms like ModelDerm, DeepDerm, and HAM 10000 show decreased accuracy when applied to diverse skin tones, highlighting the need for improved AI development. Additionally, mobile apps like Scanoma and Smart Skin Cancer Detection struggled to perform well on diverse datasets, further emphasizing the necessity of more inclusive training data. While some AI algorithms, like Clinicians and the Deep Ensemble, show promise in melanoma detection, they suffer from a lack of representation of SOC in their studies. Lubax, SkinIO, and Tibot exhibit varying degrees of success in diagnosis but still require further validation in SOC. FotoFinder, MelaFind, and Verisante Aura perform reasonably well in skin cancer detection but face limitations in diagnosing skin cancer in individuals with higher phototypes, indicating the ongoing challenges AI faces in serving diverse populations. Overall, the study of AI in dermatology emphasizes the urgent need for more diverse datasets and continued development to ensure equitable and accurate diagnosis and treatment for all skin types.
Table 2 shows that 70% of the programs that were identified in this review had no data reported on their use in dermatology, specifically in SOC. Clinicians and the general public should be cautious about the use of these AI technologies until research-based conclusions are determined.
Future Applications
The integration of AI in dermatology brings immense potential for improving patient care, especially for those facing barriers to accessing specialized services. AI's database capabilities may contribute to more accurate diagnoses and streamlined referrals in settings without direct dermatological expertise, such as primary care offices and urgent care centers.55 However, without the inclusion of skin lesions from a variety of diverse skin tones, inaccurate diagnoses may occur, adversely impacting patients.
Benefits include more rapid assessments, limiting unnecessary biopsies of benign lesions, and optimized resource allocation.56, 57 However, challenges persist, such as the need for rigorous validation of AI algorithms to prevent biases and ensure accuracy across diverse demographics. Additionally, healthcare professionals require specialized training to effectively integrate AI tools into patient care while navigating ethical considerations surrounding patient privacy and data security. Ensuring that image acquisition from SOC populations and AI implementation is ethically sound and unbiased is crucial to maintaining trust and delivering equitable care to all patients. While AI holds promise for enhancing dermatological care, addressing validation, training needs, and ethical concerns is crucial for its responsible and effective integration.
Conclusion
It is essential to customize AI approaches to address the challenges associated with evaluating skin manifestations in individuals with SOC. Limiting underlying biases in the datasets used by AI can promote a more uniform representation of SOC populations. Bridging the gap between AI and SOC populations with more diverse datasets may result in improved outcomes with fewer disparities in the provision of dermatologic care. Training dermatologists to capture images of lesions on SOC will also help produce quality images to train datasets with. Although the mentioned AI programs do not constitute a comprehensive list, many of these programs show promise in supporting clinical decision-making; however, they require modification to become reliable diagnostic aids for SOC patients. AI companies should disclose information regarding training sets for SOC. This is critical for diagnostic accuracy and for clinician analysis of AI tools. Further research is needed to evaluate the efficacy of AI generated diagnoses and treatment algorithms for dermatological conditions in individuals with SOC.