Facial redness is a common concern in dermatology, affecting patients with conditions such as rosacea, post-inflammatory erythema, and other vascular irregularities. Despite its prevalence, existing tools for quantifying facial redness are limited in their clinical utility and ease of use.

Aims

To develop a high-performing redness scale.

Methods

This study introduces the Kesty Redness Scale (KRS), outlines its development and validation process, and discusses its potential clinical applications.

Results

The investigators rated the scale as useful and easy to use, and the majority stated they would use it in clinical practice to document patient characteristics. The results of the evaluation utilizing Gwet's AC2, Kendall's W, Spearman's ρ, Weighted Cohen's kappa, and Bland–Altman analysis —showcasing strong ordinal agreement, robust rank concordance, and negligible bias—demonstrate that this new rating system is both reliable and valid for measuring skin hyperpigmentation on a 0–3 scale.

Conclusions

1 Introduction

Facial redness is a common concern in dermatology, affecting patients with conditions such as rosacea, post-inflammatory erythema, and other vascular irregularities [1, 2]. In cosmetic dermatology, facial redness due to sun exposure, acne, rosacea, and other skin conditions is a common presenting complaint during a cosmetic consult [3-5]. Lasers and combinations of lasers treat redness, and a scale to quantify the before and after change in redness would be useful for cosmetic dermatology and lasers. Despite its prevalence, existing tools for quantifying facial redness are limited in their clinical utility and ease of use. Most scales for redness are tailored to disease states such as rosacea, and thus are limited in the cosmetic dermatology and laser application [6-8]. To address this gap, we developed the Kesty Redness Scale (KRS), a five-point ordinal scale for assessing facial redness severity (Table 1). This study aimed to validate the KRS using expert evaluations of clinical images and to explore its clinical utility.

TABLE 1. Kesty redness scale (KRS).

Grade	Description	Examples
0	Clear
1	Almost clear. Some mild but almost imperceptible redness covering less than 10% of facial surface area
2	Mild, somewhat noticeable redness covering 10%–25% of facial surface area
3	Moderate, noticeable redness covering 25%–50% of facial surface area
4	Severe redness that distracts from facial features and covers > 50% of facial surface area

2 Methods

This prospective observational study was conducted to evaluate the inter-rater reliability and clinical applicability of the KRS. Ten esthetic professionals, including board-certified dermatologists, board-certified plastic surgeons, and others in esthetics, participated as evaluators. More than 100 photographs of faces of volunteer subjects were compiled. All were taken with the patient facing forward or turned but both eyes were visible. The median age of the participants was 48 (range 31–65). The images represented a range of redness severities associated with different dermatological conditions and cosmetic treatment outcomes. A study team, comprised of a physician, selected five photographs as representing the five severity scales on the KRS. A written description of each level on the scale was written (Table 1). Photographs were chosen based on the ability of each photograph to represent the level on the scale as well as an equal difference between levels (e.g., the difference between level 1 and 2 is the same as between level 2 and 3). This set of reference photographs and written descriptions was used to rate the photographs in the set of photographs. The reference photographs as well as the written scale was sent to the participants in the study. A set of 20 unlabeled photographs was also sent and participants were asked to rank the photographs according to the KRS. The participants were also asked “Is the scale easy to use?” (Yes/No), and “Would you use this scale as part of your clinical practice?” (Yes/No).

3 Statistical Methods

This study focused on evaluating a new method for rating facial skin redness (Rater: Kesty) and comparing its performance to that of a group of expert raters (Raters: B–L) using a 0–4 ordinal scale. A variety of statistical tools were employed to assess both overall agreement across all raters and pairwise alignment between the novel approach and the experts. The findings demonstrated that the novel method performs at a high level of accuracy and consistency, aligning closely with industry expectations.

4 Overall Measures of Agreement

4.1 Gwet's AC2

To measure inter-rater consistency, we applied Gwet's AC2, a statistical measure specifically designed for ordinal data that accounts for chance agreement. The categories set was

\left\{1,2,\dots, R\right\}

and defined quadratic weights were:

W\left({c}_i,{c}_j\right)=1-{\left(\frac{\left|{c}_i-{c}_j\right|}{R-1}\right)}^2

The observed agreement

Po

for

N

items, each with

K

raters, is computed by considering category frequencies

fc

per item and forming pairwise proportions

pci, cj

. Expected agreement

Pe

is derived from the marginal category probabilities

pc

. Gwet's AC2 is given by:

\mathrm{AC}2=\frac{P_o-{P}_e}{1-{P}_e}

Gwet's AC2 was calculated to be approximately 0.9207, highlighting an excellent degree of agreement among raters.

4.2 Kendall's W

Rank agreement was further explored using Kendall's W, which assesses how consistently raters ranked items. A value nearing 1 indicated that the raters displayed strong alignment in their rankings, reflecting excellent consistency in evaluating skin pigmentation.

For n items and m raters, let $Rij$ be the rank of the $i- th$ item by the $j- th$ rater.

Define ${R}_i=\sum \limits_{j=1}^m{R}_{ij}\mathrm{and}\overline{R}=\frac{1}{n}\sum \limits_{i=1}^n{R}_i$ .

Kendall's W is given by:

W=\frac{12\sum \limits_{i=1}^n{\left({R}_i-\overline{R}\right)}^2}{m^2\left({n}^3-n\right)}

A value of $W=0.8839$ suggests a high degree of consistency in the ranking of items across raters.

5 Pairwise Measures of Association and Agreement

5.1 Spearman's Rank Correlation

The monotonic relationships between the novel method and individual expert raters were analyzed using Spearman's rank correlation (ρ). For n items, let

di= Ri1- Ri2

be the difference in the ranks assigned by the two raters. Spearman's ρ is given by:

\rho =1-\frac{6\sum \limits_{i=1}^n{d}_i^2}{n\left({n}^2-1\right)}

Values at 0.90 or above consistently indicate an extremely strong monotonic relationship (Table 2).

TABLE 2. Spearman's rank correlation (ρ).

Rater pair	Spearman's p
Kesty—B	0.9498
Kesty—C	0.9472
Kesty—D	0.9522
Kesty—E	0.8975
Kesty—F	0.9481
Kesty—G	0.8978
Kesty—H	0.8522
Kesty—I	0.9074
Kesty—J	0.9235
Kesty—K	0.9330
Kesty—L	0.9353

Rater	Bias	Lower limit	Upper limit
B	−0.1579	−1.1408	0.8250
C	0.2105	−0.8387	1.2597
D	0.1053	−1.0063	1.2168
E	0.3158	−0.9994	1.6310
F	−0.3158	−1.2518	0.6202
G	0.1053	−1.1841	1.3946
H	0.2632	−1.5664	2.0927
I	0.1053	−1.3402	1.5507
J	−0.0526	−1.2703	1.1650
K	0.0000	−1.1316	1.1316
L	0.0526	−1.1650	1.2703

5.2 Weighted Cohen's Kappa

For direct comparisons on categorical agreement, we used Weighted Cohen's kappa. Let

Oij

be the observed proportion of assignments where Rater A and another rater choose categories

i

and

j

, and let

Eij

be the expected proportion under independence. Using the same quadratic weights

wij=W\left({c}_i,{c}_j\right)

defined above, weighted kappa is:

{\kappa}_w=\frac{\sum \limits_{i,j}{w}_{ij}{O}_{ij}-\sum \limits_{i,j}{w}_{ij}{E}_{ij}}{1-\sum \limits_{i,j}{w}_{ij}{E}_{ij}}

The results of our study included weighted kappas frequently above 0.90, indicating that the novel method's categorical assignments closely align with those of the experts (Table 3).

TABLE 3. Weighted Cohen's kappa.

Rater pair	Weighted Cohen's kappa
Kesty—B	0.9474
Kesty—C	0.9434
Kesty—D	0.9412
Kesty—E	0.9000
Kesty—F	0.9412
Kesty—G	0.9184
Kesty—H	0.8046
Kesty—I	0.9020
Kesty—J	0.9293
Kesty—K	0.9388
Kesty—L	0.9278

5.3 Bias and Limits of Agreement

Finally, a Bland–Altman analysis was conducted to explore potential systematic bias. For two raters, define the difference

Di= Ri1- Ri2

and the average

Ai=\left( Ri1+ Ri2\right)/2

. The mean difference (bias) and the standard deviation (SD) of differences provide limits of agreement (LoA):

\mathrm{Bias}=\frac{1}{n}\sum \limits_{i=1}^n{D}_i,\mathrm{LoA}=\mathrm{Bias}\pm 1.96\cdotp \mathrm{SD}

Minimal bias and narrow LoA were apparent after our analysis, suggesting no substantial systematic deviation of the novel method's scores from those of industry experts. Although Bland–Altman is more commonly applied to continuous data, it still provides a useful check for consistent over- or underestimation, which was not evident here.

6 Results

To summarize, the evaluation utilizing Gwet's AC2, Kendall's W, Spearman's ρ, Weighted Cohen's kappa, and Bland–Altman analysis offers a well-rounded assessment of the novel method's performance. The results—showcasing strong ordinal agreement, robust rank concordance, and negligible bias—demonstrate that this new rating system is both reliable and valid for measuring skin hyperpigmentation on a 0–3 scale. Furthermore, these findings establish the method as a credible and industry-aligned tool. Notably, 100% of participants found the scale easy to use, with all users expressing their willingness to incorporate it into their clinical workflows.

7 Discussion

The number of cosmetic procedures in the United States increases every year. One of the fastest-growing procedures within cosmetics is lasers for rejuvenation. Although patient satisfaction is important in lasers and cosmetics, a clinically useful scale to objectively evaluate the results of laser treatments can help the doctor–patient interaction. Having various scales for redness can help the evaluator choose the appropriate one for the patient, as some are more appropriate for medical conditions and others may be tailored to cosmetic concerns, such as the KRS [6, 9-13]. The KRS provides a standardized framework for evaluating facial redness in cosmetic dermatology, addressing a significant unmet need in both cosmetic and clinical dermatology. Its simplicity and reliability make it an ideal tool for use in diverse clinical settings. Potential applications include in cosmetic procedures. The KRS can objectively quantify redness before and after treatments such as laser therapy, enabling clinicians to document treatment outcomes and improve patient satisfaction.

Statistical validation of scales is the current gold standard for approval and acceptance of clinical scales. Despite this, the reliance on a human's visual assessment can introduce subjectivity that may be influenced by the evaluator's training, experience, and personal biases. An ideal situation would be a non-biased and non-human large language model based on artificial intelligence that “learns” the scale and is rigorously tested; then it itself rates patient photographs. An artificial intelligence model can help eliminate human subjectivity and would be extremely helpful in both clinical and research applications. Further studies can include a larger sample size of evaluators as well as more patient images for classification, which can further support the use of the KRS across a wide demographic of the population. Studies with a larger sample size of evaluators and patient images can be done to improve the statistics that support the scale's reliability and ease of use. Additional patient photographs, including patients that have facial erythema from a wide variety of dermatologic conditions, including acne, rosacea, post-inflammatory, lupus, sun damage, and others, can also be included in further trials on the KRS to support the scale's applicability to all causes of facial redness.

Further use of the KRS can include implementing the scale in clinical trials. The scale can serve as an endpoint for studies investigating lasers and other treatments for conditions like poikiloderma, post-acne erythema, and photodamage. This scale can also facilitate improved doctor–patient communication by providing an intuitive structure that facilitates discussions with patients about the severity of their condition and the expected outcomes of treatment.

8 Conclusion

The KRS is a reliable, easy-to-use tool that enhances the assessment of facial redness in dermatology. Its validation through expert evaluation and statistical analysis underscores its potential to improve clinical practice and research. Future studies may focus on the application of scales in artificial intelligence models to minimize dependence on human evaluators.

Author Contributions

K.R.K. and C.E.K. conceived the study, wrote and revised the manuscript, and funded the study. All authors have reviewed and approved the article for submission.

Acknowledgments

The authors would like to thank John Smith for his contribution to statistical analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

1Y. Cai, Y. Zhu, Y. Wang, and W. Xiang, “Intense Pulsed Light Treatment for Inflammatory Skin Diseases: A Review,” Lasers in Medical Science 37, no. 8 (2022): 3085–3105, https://doi.org/10.1007/s10103-022-03620-1.
10.1007/s10103-022-03620-1
PubMed Web of Science® Google Scholar
2C. Ohanenye, S. Taliaferro, and V. D. Callender, “Diagnosing Disorders of Facial Erythema,” Dermatologic Clinics 41, no. 3 (2023): 377–392, https://doi.org/10.1016/j.det.2023.02.004.
10.1016/j.det.2023.02.004
CAS PubMed Web of Science® Google Scholar
3R. Amiri, M. Khalili, S. Mohammadi, B. Iranmanesh, and M. Aflatoonian, “Treatment Protocols and Efficacy of Light and Laser Treatments in Post-Acne Erythema,” Journal of Cosmetic Dermatology 21, no. 2 (2022): 648–656, https://doi.org/10.1111/jocd.14729.
10.1111/jocd.14729
PubMed Web of Science® Google Scholar
4A. Sharma, G. Kroumpouzos, M. Kassir, et al., “Rosacea Management: A Comprehensive Review,” Journal of Cosmetic Dermatology 21, no. 5 (2022): 1895–1904, https://doi.org/10.1111/jocd.14816.
10.1111/jocd.14816
PubMed Web of Science® Google Scholar
5S. Park, H. Jang, S. H. Seong, et al., “The Effects of Long-Pulsed Alexandrite Laser Therapy on Facial Redness and Skin Microbiota Compositions in Rosacea: A Prospective, Multicentre, Single-Arm Clinical Trial,” Photodermatology, Photoimmunology & Photomedicine 40, no. 1 (2024): e12921, https://doi.org/10.1111/phpp.12921.
10.1111/phpp.12921
Web of Science® Google Scholar
6J. Tan, H. Liu, J. J. Leyden, and M. J. Leoni, “Reliability of Clinician Erythema Assessment Grading Scale,” Journal of the American Academy of Dermatology 71, no. 4 (2014): 760–763, https://doi.org/10.1016/j.jaad.2014.05.044.
10.1016/j.jaad.2014.05.044
PubMed Web of Science® Google Scholar
7J. W. Choi, S. H. Kwon, J. I. Youn, and S. W. Youn, “Objective Measurements of Erythema, Elasticity and Scale Could Overcome the Inter- and Intra-Observer Variations of Subjective Evaluations for Psoriasis Severity,” European Journal of Dermatology 23, no. 2 (2013): 224–229, https://doi.org/10.1684/ejd.2013.1931.
10.1684/ejd.2013.1931
PubMed Web of Science® Google Scholar
8Y. Aoki, S. L. Wehage, and P. Talalay, “Quantification of Skin Erythema Response to Topical Alcohol in Alcohol-Intolerant East Asians,” Skin Research and Technology 23, no. 4 (2017): 593–596, https://doi.org/10.1111/srt.12376.
10.1111/srt.12376
CAS PubMed Web of Science® Google Scholar
9J. G. M. Logger, E. M. G. J. de Jong, R. J. B. Driessen, and P. E. J. van Erp, “Evaluation of a Simple Image-Based Tool to Quantify Facial Erythema in Rosacea During Treatment,” Skin Research and Technology 26, no. 6 (2020): 804–812, https://doi.org/10.1111/srt.12878.
10.1111/srt.12878
PubMed Web of Science® Google Scholar
10L. F. Eichenfield, J. Q. Del Rosso, J. K. L. Tan, et al., “Use of an Alternative Method to Evaluate Erythema Severity in a Clinical Trial: Difference in Vehicle Response With Evaluation of Baseline and Postdose Photographs for Effect of Oxymetazoline Cream 1·0% for Persistent Erythema of Rosacea in a Phase IV Study,” British Journal of Dermatology 180, no. 5 (2019): 1050–1057, https://doi.org/10.1111/bjd.17462.
10.1111/bjd.17462
CAS PubMed Web of Science® Google Scholar
11D. Hopkinson, S. Moradi Tuchayi, H. Alinia, and S. R. Feldman, “Assessment of Rosacea Severity: A Review of Evaluation Methods Used in Clinical Trials,” Journal of the American Academy of Dermatology 73, no. 1 (2015): 138–143.
10.1016/j.jaad.2015.02.1121
PubMed Web of Science® Google Scholar
12J. T. Bamford, C. E. Gessert, and C. M. Renier, “Measurement of the Severity of Rosacea,” Journal of the American Academy of Dermatology 51 (2004): 697–703.
10.1016/j.jaad.2004.04.013
PubMed Web of Science® Google Scholar
13J. G. M. Logger, F. M. C. de Vries, P. E. J. van Erp, et al., “Noninvasive Objective Skin Measurement Methods for Rosacea Assessment: A Systematic Review,” British Journal of Dermatology 182 (2020): 55–66.
10.1111/bjd.18151
CAS PubMed Web of Science® Google Scholar

Volume24, Issue4

April 2025

e70039

The Kesty Redness Scale: A Pilot Validation Study for a Novel Tool for Evaluating Facial Redness in Cosmetic and Clinical Dermatology