Volume 2025, Issue 1 8845913
Review Article
Open Access

Assessment Tools for Health-Related Quality of Life in Patients With Nasopharyngeal Carcinoma: A Systematic Review of Psychometric Properties

Jianxia Lyu

Jianxia Lyu

Department of Head and Neck Radiation Oncology , Sichuan Clinical Research Center for Cancer , Sichuan Cancer Hospital & Institute , Sichuan Cancer Center , University of Electronic Science and Technology of China , Chengdu , Sichuan Province, China , sichuancancer.org

Department of Nursing , The First Affiliated Hospital of China Medical University , No. 155 Nanjing North Street Heping District, Shenyang , Liaoning Province, China , cmu.edu.cn

Search for more papers by this author
Li Yin

Li Yin

Department of Head and Neck Radiation Oncology , Sichuan Clinical Research Center for Cancer , Sichuan Cancer Hospital & Institute , Sichuan Cancer Center , University of Electronic Science and Technology of China , Chengdu , Sichuan Province, China , sichuancancer.org

Search for more papers by this author
Hao Zhang

Hao Zhang

Department of Nursing , Sichuan Clinical Research Center for Cancer , Sichuan Cancer Hospital & Institute , Sichuan Cancer Center , Affiliated Cancer Hospital of University of Electronic Science and Technology of China , Chengdu , Sichuan Province, China , sichuancancer.org

Search for more papers by this author
Shichuan Zhang

Shichuan Zhang

Department of Head and Neck Radiation Oncology , Sichuan Clinical Research Center for Cancer , Sichuan Cancer Hospital & Institute , Sichuan Cancer Center , University of Electronic Science and Technology of China , Chengdu , Sichuan Province, China , sichuancancer.org

Search for more papers by this author
Yunhua Jing

Yunhua Jing

Department of Nursing , Sichuan Clinical Research Center for Cancer , Sichuan Cancer Hospital & Institute , Sichuan Cancer Center , Affiliated Cancer Hospital of University of Electronic Science and Technology of China , Chengdu , Sichuan Province, China , sichuancancer.org

Search for more papers by this author
Qing Yang

Corresponding Author

Qing Yang

Department of Nursing , Sichuan Clinical Research Center for Cancer , Sichuan Cancer Hospital & Institute , Sichuan Cancer Center , Affiliated Cancer Hospital of University of Electronic Science and Technology of China , Chengdu , Sichuan Province, China , sichuancancer.org

Search for more papers by this author
Aiping Wang

Corresponding Author

Aiping Wang

Department of Nursing , The First Affiliated Hospital of China Medical University , No. 155 Nanjing North Street Heping District, Shenyang , Liaoning Province, China , cmu.edu.cn

Search for more papers by this author
First published: 03 July 2025
Academic Editor: Brian D. Adams

Abstract

Objectives: Self-reported health-related quality of life (HRQoL) is a critical metric for evaluating clinical outcomes. Although the HRQoL of patients with nasopharyngeal carcinoma (NPC) has been widely studied, the performance of these assessments in clinical practice remains uncertain, and there is a significant gap in the quality evaluation of the scales used. This review aimed to systematically evaluate self-reported HRQoL scales for patients with NPC, thereby providing guidelines for the informed selection of assessment tools.

Design: A systematic review based on the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) methodology and following the PRISMA guidelines.

Methods: PubMed, Web of Science, Embase, CINAHL, PsycINFO, CNKI, SinoMed, and WanFang databases were systematically searched from their inception until August 2024. The included studies must report the assessment of measurement properties of HRQoL scales designed for NPC. Two authors independently screened the eligible literature, extracted data, and evaluated their methodological and psychometric quality. The measurement properties of HRQoL scales for NPC were evaluated according to COSMIN systematic review guidelines. Additionally, the GRADE approach was used to grade the quality of evidence.

Results: Among 17 instruments across 19 studies, all demonstrated adequate content validity, construct validity, and internal consistency. However, information on cross-cultural validity, criterion validity, reliability, hypothesis testing, and responsiveness was limited. High-quality evidence on psychometric properties was provided for HRQoL instruments for Cancer Patients-Nasopharyngeal Cancer (QLICP-NA), the Functional Assessment of Cancer Therapy-Nasopharyngeal (FACT-NP), and the Quality of Life Scale for Nasopharyngeal Carcinoma Patients Version 2 (QoL-NPC V2).

Conclusion: The measurement characteristics of QLICP-NA, FACT-NP, and QoL-NPC V2 scales were comprehensively assessed, exhibiting good methodological quality, strong measurement attributes, and robust supporting evidence. Therefore, these scales are recommended for evaluating the quality of life of patients with NPC. However, further validation of the remaining assessment tools is required.

Relevance to Clinical Practice: Our findings will help healthcare professionals select suitable instruments for patients with NPC.

1. Introduction

Nasopharyngeal carcinoma (NPC), a malignant tumor originating in the mucosal epithelium of the nasopharynx, is the 18th most common cancer in men and the 22nd most common cancer in women globally [1]. It is more common in Asian populations, including those from southern China and Southeast Asia, as well as in Eskimo and North African populations, with an annual incidence rate of approximately 50 cases per 100,000 people [2]. Given the relatively concealed onset site of NPC, early symptoms are often subtle and easily overlooked, resulting in the diagnosis of most patients at an advanced stage [3]. Moreover, due to the complex biological and anatomical characteristics of NPC, surgery can damage the surrounding nerves and tissues, making radiotherapy and chemotherapy the primary treatment options [4, 5]. Despite the improved prognosis of NPC with chemoradiotherapy, this treatment approach has numerous adverse effects, leading to a range of physical and psychosocial challenges for patients. These challenges include radiation-induced xerostomia, hearing loss, dysphagia, odynophagia, vision impairment, altered taste, emotional distress, and fear of cancer recurrence [68]. These symptoms significantly affected their quality of life at the time of diagnosis, throughout the treatment phase, and during subsequent survival periods.

Health-related quality of life (HRQoL) refers to the degree to which individuals from different cultures and value systems are satisfied with a combination of living conditions related to their goals, expectations, standards, and concerns [9]. It is a comprehensive metric of health that includes physical, social function and roles, and mental and general health [10]. This multidimensional approach provides a holistic understanding of both the subjective and objective aspects of a patient’s well-being. Compared to the general concept of quality of life (QoL), HRQoL focuses specifically on health aspects and is an important component of health status evaluation systems, particularly in “patient-reported outcomes” (PROs) [11]. HRQoL emphasizes existing physical and mental symptoms and functional limitations, reflecting an individual’s overall health status [12]. Patient-reported outcome measures (PROMs) are commonly used to assess HRQoL accurately. These tools not only accurately capture the specific effects of diseases on patients’ lives but also provide insights into the physical discomfort, psychological distress, and effects of diseases and treatments on daily living and social activities, thereby providing crucial support for treatment and care decisions [13, 14].

Currently, several instruments are used to assess HRQoL in patients with NPC, including the Medical Outcomes SF-36 [15, 16], the European Organization for Research and Treatment of Cancer Quality of Life questionnaire-Nasopharyngeal cancer (EORTC QLQ-NPC) module [17], and the Functional Assessment of Cancer Therapy-Nasopharyngeal (FACT-NP) scale [18]. These complex questionnaire designs not only pose significant challenges to healthcare professionals responsible for assessing HRQoL in patients with NPC but also affect the accuracy of assessment results due to the requirement of prolonged time and expertise for administration and interpretation [1921]. Selecting appropriate assessment tools is a critical component of HRQoL research in patients with NPC. There is a lack of standardized criteria for evaluating and determining the measurement performance of these tools in clinical applications due to the wide variety of questionnaires currently used to assess HRQoL in patients with NPC, most of which are self-reported. As a result, a systematic evaluation of the psychometric properties of these instruments is necessary, particularly for assessing psychological resilience in patients with cancer.

The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) Steering Committee, composed of psychometric experts from research institutions in different countries, including the Netherlands, the United States, and Spain, has proposed a set of COSMIN evaluation methods [22, 23]. This method aims to provide researchers with a scientific basis for selecting appropriate measurement tools by evaluating the methodological quality and psychometric properties of the PRO instruments. The COSMIN systematic review guidelines [24] provide a comprehensive framework for the systematic evaluation of the measurement properties of assessment tools and guide the research process.

The methodological quality and measurement properties of HRQoL scales were evaluated in this study according to COSMIN systematic review guidelines, and the quality of evidence was graded using the GRADE approach. The objective was to assess the applicability of these scales for NPC populations, thereby providing evidence for their use in clinical practice and research.

2. Methods

This systematic review was conducted and reported in accordance with COSMIN guidelines [24, 25] and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [26]. Ethical approval was not required for this review because it involved only secondary data analysis of publicly available content and did not include human subjects.

2.1. Design

This systematic review was conducted according to COSMIN guidelines and reported according to the PRISMA statement. The study protocol has been registered on the PROSPERO website (registration number CRD42024567365).

2.2. Literature Search Strategy

This review aimed to identify published studies assessing the methodological quality and measurement properties of HRQoL scales for patients with NPC. PubMed, Web of Science, Embase, CINAHL, PsycINFO, CNKI, WanFang, and SinoMed databases were systematically searched from inception to August 2024. A systematic literature search was conducted across multiple electronic databases employing a combination of controlled subject headings and free-text keywords. The search strategy strictly followed a three-step approach to ensure comprehensive coverage of relevant evidence. The search terms included the following: “Quality of Life,” “Life Quality,” “Nasopharyngeal Neoplasm,” “Nasopharyngeal cancer,” Scale OR Instrument OR Tool.

2.3. Literature Inclusion and Exclusion Criteria

We included original studies published in peer-reviewed journals aimed to report the measurement properties of HRQoL scales in patients with NPC. The specific inclusion criteria were as follows: (1) Adult patients diagnosed with NPC via pathology without restriction on histological type or disease stage; (2) studies involving the development, validation, or cross-cultural adaptation of HRQoL measurement tools; and (3) studies that reported results for at least one measurement property. The exclusion criteria included the following: (1) The study sample consisted of < 50% patients with NPC; (2) the HRQoL tool was used solely as an outcome measure without evaluation of its psychometric properties; (3) the tool was used to measure PROs other than HRQoL; (4) reviews, systematic reviews, other secondary research, or conference papers; (5) duplicate publications, articles without full text or complete data; and (6) non-Chinese or non-English language literature.

2.4. Literature Screening and Data Extraction

Based on the inclusion and exclusion criteria, two researchers independently conducted literature screening and data extraction, followed by cross-verification. Any discrepancies were resolved through discussion with a third researcher to reach a consensus. During the initial screening phase, the titles and abstracts of each article were reviewed to exclude irrelevant studies. The full texts of the potentially relevant articles were then reviewed to determine their eligibility for inclusion. The extracted data included the following main components: (1) Basic information on the included studies: author names, publication year, country, study objectives, research design, participants, sample size, study setting, and data collection and analysis methods; (2) dimensions and characteristics of NPC patient HRQoL scales: dimensions, number of items, and assessment methods; and (3) measurement properties of the scales: content validity, construct validity, internal consistency, cross-cultural validity, reliability, and measurement error.

2.5. Quality Assessment

2.5.1. Assessment Procedure

Two experienced researchers independently evaluated the methodological quality and measurement properties of HRQoL measurement tools for patients with NPC according to COSMIN guidelines. The results were then cross-checked. Any disagreements were resolved by discussion with a third researcher. First, the methodological quality of the included studies was evaluated using the COSMIN risk of bias checklist [24, 27]. Consequently, the measurement properties were comprehensively evaluated based on COSMIN quality criteria. Finally, the level of evidence for each measurement property was systematically evaluated using an adapted version of the GRADE approach, thereby providing corresponding recommendations and the strength of evidence [28].

2.5.2. Methodological Quality Assessment

The methodological quality was evaluated using the COSMIN Risk of Bias checklist, which includes 10 modules [23]: PROMs development, content validity, structural validity, internal consistency, cross-cultural validity, stability, measurement error, criterion validity, hypothesis testing, and reactivity. The checklist includes 116 items divided into three parts. Each item was rated on a four-point scale: “Very Good (V),” “Adequate (A),” “Doubtful (D),” and “Inadequate (I).” The overall quality score for each study was determined using the lowest rating among the items. If a study included multiple measurement properties, then each property was evaluated separately.

2.5.3. Quality Assessment of Measurement Properties

The quality was evaluated according to the criteria for measurement properties provided by the COSMIN website [29, 30]. The criteria covered three aspects: validity, reliability, and reactivity. These include content validity, structural validity, internal consistency, cross-cultural validity or measurement invariance, stability, measurement error, criterion validity, hypothesis testing, and reactivity. The evaluation options were set at three levels: Sufficient (+), insufficient (−), and indeterminate (?). If a measurement property was rated as sufficient (+), insufficient (−), or indeterminate (?) in each study, the overall rating was also categorized as sufficient (+), insufficient (−), or indeterminate (?). If a measurement property was evaluated inconsistently across studies, and the cause of the inconsistency could not be explained, the overall evaluation was considered inconsistent (±).

2.5.4. Grading the Quality of Evidence

Various measurement properties were included in this systematic review, and a modified version of the GRADE approach was used to determine the strength of the recommendations [31]. Each measurement property of the scale was initially considered “high quality” and then downgraded based on four factors: risk of bias, inconsistency, imprecision, and indirectness. The final evidence quality was categorized into four levels: “high,” “moderate,” “low,” and “very low.” If the content validity of the scale was “sufficient” and the internal consistency evidence level was at least “low,” it can be recommended with a recommendation strength of Level A. If the scale exhibits potential for application but requires further evaluation of its measurement properties through additional studies, the recommendation strength is Level B. If there is “high-quality” evidence that the content validity of the scale is “insufficient,” the scale is not recommended for use, and its recommendation strength is Level C.

3. Results

3.1. Literature Search Results

After a preliminary search, 2402 studies were identified. Subsequently, 2383 ineligible studies were removed by deleting duplicates, scanning titles/abstracts, and reading the full text. The final 19 included studies [1518, 3246] reported 17 different instruments (Figure 1).

Details are in the caption following the image
PRISMA flowchart of the search strategy.

3.2. Basic Characteristics of Included Assessment Tools

Out of the 17 tools included in the 19 studies, 7 studies [18, 33, 34, 39, 41, 43, 44] focused on scale development, while the cultural adaptation and validation of these scales were assessed in 12 studies [1517, 32, 3538, 40, 42, 45, 46]. These studies were conducted primarily in mainland China, Taiwan, and Hong Kong and were published between 1996 and 2023. Table 1 provides an overview of the fundamental characteristics of the included scales.

Table 1. Basic characteristics of the included assessment tools.
Instruments Author, year Country and region Application population Sample size Age Dimensions/items Dimensions Scale completion time
SF-36 Gu, 2007 [15] China, Guangzhou NPC 527 15–87 8/36 Physical functioning (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role emotional (RE), mental health (MH)
SF-36 Wu, 2010 [16] China, Guangzhou NPC of long-term survivors 85 23–74 8/36 Physical functioning (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role emotional (RE), mental health (MH)
QoL-NPC Guo, 2008 [17] China, Guangzhou NPC 40 29–66 6/30 Body functions (BO), social activities (SO), spirit and psychology (SP), classification of clinical symptoms from the point of view of traditional Chinese medicine (TCM), symptom and adverse reaction (SA) and quality of life (QoL)
SQoL-NPC Chen, 1996 [32] China, Guangzhou NPC 101 / 4/20 Body functions, spirit and psychology, social activities, symptoms and adverse reactions
QLICP-HN (V2.0T) Li, 2017 [33] China, Yunnan Head and neck cancer 100 21–71 5/55 Physical domain, psychological domain, social domain, common symptom and side effects, esophageal cancer special domain 20–30 min
QLICP-NA V2.0 Wu, 2015 [34] China, Guangzhou NPC 121 18–79 5/61 Physical domain, psychological domain, social domain, common symptoms and side effects domain, specific module 20–30 min
FACT-H&N Xiao, 2010 [35] China, Guangzhou NPC 444 18–80 5/39 Physical well-being, social/Family well-being, emotional well-being, functional well-being, head & neck cancer subscale
QoL-RTI/H&N Gu, 2012 [36] China, Guangzhou NPC 416 21.2–79.6 5/39 Function, emotion, family/socio-economics (family for short) and general (overall) QoL 9.3 ± 4.1 min
QLQ H&N35 Gu, 2012 [37] China, Guangzhou NPC 391 18–80 8/35 Pain, swallowing, senses (taste/smell), speech, social eating, social contact, sexuality, teeth, opening mouth, dry mouth, sticky saliva, coughing, feeling ill 8.7 ± 3.3 min
QLQ C30 Gu, 2012 [38] China, Guangzhou NPC 391 18–80 15/30 Physical functioning (PF), role functioning (RF); EF: Emotional functioning (EF); cognitive functioning (CF); social functioning (SF); global health (QL); fatigue(FA); nausea and vomiting (NV); pain (PA); dyspnea (DY); sleep; appetite loss; constipation; diarrhea, financial difficulties 8.3 ± 3.0 min
QoL-NPC Gu, 2009 [39] China, Guangzhou NPC 433 15–87 4/30 Physical domain, psychological domain, social domain, side effects domain
QoL-NPC Chen, 2010 [40] China, Guangzhou NPC 433 15–87 4/30 Physical domain, psychological domain, social domain, side effects domain
QoL-NPC13 Chen, 2016 [41] China, Guangzhou NPC 206 16–80 4/13 Physical domain, psychological domain, social domain, side effects domain
SNOT-22 Wu, 2023 [42] China, Taiwan NPC 275 27–78 22
QLICP-NA V2.0 Wu, 2016 [43] China, Guangzhou NPC 121 18–78 4/32 Physical domain, psychological domain, social domain, common symptoms, and side effects domain
FACT-NP Michael, 2009 [18] Prince of Wales Hospital, Queen Elizabeth Hospital, Pamela Youde Nethersole Eastern Hospital in Hong Kong. NPC 357 26–86 4/43 Physical well-being; social/family well-being; emotional well-being, functional well-being
QoL-NPC V2.0 Su, 2016 [44] China, Guangzhou NPC 487 16–81 4/26 Physical domain, psychological domain, social domain, side effects domain 8.4 ± 4.6 min
Taiwan Chinese version QLQ-C30 and QLQ-H&N35 Chie, 2003 [45] China, Taiwan NPC 50 40–49
  • 15/30
  • 8/35
  • Physical functioning (PF), role functioning (RF); EF: Emotional functioning (EF); cognitive functioning (CF); social functioning (SF); global health (QL); fatigue(FA); nausea and vomiting (NV); pain (PA); dyspnea (DY); sleep; appetite loss; constipation; diarrhea, financial difficulties
  • Pain, swallowing, senses (taste/smell), speech, social eating, social contact, sexuality, teeth, opening mouth, dry mouth, sticky saliva, coughing, feeling ill
QoL-RTI/H&N Chen, 2014 [46] China, Guangzhou
  • NPC
  • And HNC
238 22–80 11/39 Function, emotion, family, general, Pain, swallow, saliva, appearance, speech, taste, cough

Existing research tools to assess the QoL of patients with NPC include general QoL scales, head and neck modules of cancer-specific QoL scales, and NPC-specific modules. These tools include the MOS 36-Item Short Form Health Survey (SF-36), European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Core 30 (EORTC QLQ-C30), European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Head & Neck Cancer (EORTC QLQ-H&N35), Quality of Life Instruments for Cancer Patients-Head and Neck Cancer Version 2.0 (QLICP-HN V2.0), Quality of Life Instruments for Cancer Patients-Nasopharyngeal Cancer (QLICP-NA), Functional Assessment of Cancer Therapy-Head & Neck Cancer (FACT-H&N), Functional Assessment of Cancer Therapy-Nasopharyngeal (FACT-NP), Quality of Life Radiation Therapy Instrument Head and Neck Module (QoL-RTI/H&N), Quality of Life Scale for Nasopharyngeal Carcinoma Version 2 (QoL-NPC V2), Quality of Life Scale for Nasopharyngeal Carcinoma (SQoL-NP), QoL-NPC V2, Short Quality of Life Scales for Nasopharyngeal Carcinoma (QoL-NPC13), and the Sino-Nasal Outcome Test (SNOT-22).

Although all assessment tools have been validated in patients with NPC, the SF-36 [15, 16] is suitable for HRQoL measurement in the general population, and the EORTC QLQ-C30 [38, 45] is appropriate as the core module of QoL in all patients with cancer. Moreover, head and neck cancer–specific modules, including the EORTC QLQ-H&N35 [37], QLICP-HN [33], FACT-H&N [35], and QOL-RTI/H&N [36], are used to measure QoL in patients with NPC. Specialized QoL measurement scales for patients with NPC include QLICP-NA, FACT-NP [18], QoL-NPC [39, 40], SQoL-NP [32], QoL-NPC V2 [44], QoL-NPC13 [41], and SNOT-22 [42].

3.3. Methodological Quality and Measurement Attribute Quality Evaluation of the Assessment Tools

All included studies examined content validity, construct validity, internal consistency, and stability, but none investigated the instrument’s measurement error or cross-cultural validity. A comprehensive breakdown of the methodological quality and measurement attribute evaluations for the assessment tools is presented in Table 2.

Table 2. Evaluation of methodological quality and measurement attribute quality of included assessment tools.
Instruments Author, year Content validity Construct validity Internal consistency Stability Cross-cultural validity Criterion validity Hypotheses testing
Relevance Comprehensiveness Comprehensibility Indicators Results of evaluation Cronbach’s α coefficient Results of evaluation ICC Results of evaluation Indicators Results of evaluation Indicators Results of evaluation Indicators Results of evaluation
SF-36 Gu, 2007 [15] D/− NR NR EFA: 2 factors A/? 0.91 V/− NR NR NR NR NR NR NR D/+
SF-36 Wu, 2010 [16] D/− NR NR EFA: 2 factors A/? > 0.7 V/− NR NR NR NR NR NR NR D/+
QoL-NPC Guo, 2008 [17] Dab/? Dab/+ Dab/? NR NR 0.69 D/− 0.766 D/+ NR NR 0.939 D/+ NR A/+
SQoL-NPC Chen, 1996 [32] Db/? NR NR EFA: 4 factors A/? 0.828 D/+ 0.838 V/+ NR NR 0.504 D/? NR NR
QLICP-HN (V2.0T) Li, 2017 [33] Da/? Da/? Dab/? EFA: 4 factors A/? 0.646 V/− 0.990 D/+ NR NR 0.756 D/+ NR NR
QLICP-NA V2.0 Wu, 2015 [34] Vab/+ Vab/+ Vab/+ EFA: 6 factors A/? 0.927 V/+ 0.978 V/+ NR NR 0.706, 0.851 D/+ NR D/+
FACT-H&N Xiao, 2010 [35] Dab/? Dab/? Dab/? EFA: 5 factors A/? 0.899 A/+ 0.395 V/− NR NR NR NR NR NR
QoL-RTI/H&N Gu, 2012 [36] Dab/? Dab/? Dab/? RMSEA = 0, CFI = 1 V/+ 0.62∼0.79 A/− 0.64–0.80 V/− NR NR NR NR NR NR
QLQ H&N35 Gu, 2012 [37] Dab/? Dab/? Dab/? EFA: 6 factors A/? 0.67–0.94 A/+ 0.7–0.84 V/+ NR NR NR NR NR D/−
QLQ C30 Gu, 2012 [38] Dab/? Dab/? Dab/? EFA: 5 factors A/? 0.62–0.89 A/+ 0.71–0.87 V/+ NR NR NR NR NR D/−
QoL-NPC Gu, 2009 [39] Aab/+ Aab/+ Aab/+ EFA: 7 factors A/? 0.82 V/+ NR I/? NR NR NR NR NR NR
QoL-NPC Chen, 2010 [40] Dab/? Dab/? Dab/? RMSEA = 0.071, CFI = 0.95 V/− 0.917 V/+ NR I/? NR NR NR NR NR NR
QoL-NPC13 Chen, 2016 [41] Aab/+ Aab/+ Aab/+ RMSEA = 0.08, CFI = 0.95 V/− 0.832 V/+ 0.9–0.94 V/+ NR NR NR NR NR NR
SNOT-22 Wu, 2023 [42] NR NR NR RMSEA < 0.06, CFI > 0.95 V/+ ≥ 0.8 I/− 0.85–0.97 V/+ NR NR NR NR NR NR
QLICP-NA V2.0 Wu, 2016 [43] Va/+ Va/+ Va/? EFA: 3 factors D/? > 0.70 V/+ 0.973 V/+ NR NR NR NR NR NR
FACT-NP Michael, 2009 [18] Vab/+ Vab/+ Vab/+ NA NR/? 0.87–0.90 V/+ 0.73–0.88 V/+ NR NR 0.47–0.81 D/− NR NR
QoL-NPC V2.0 Su, 2016 [44] Vab/+ Vab/+ Vab/+ RMSEA = 0.097, CFI = 0.90 V/− 0.77–0.84 V/+ 0.82–0.88 V/+ NR NR NR NR NR NR
Taiwan Chinese version QLQ-C30 and QLQ-H&N35 Chie, 2003 [45] NR NR NR NR NR
  • EORTC QLQ-C30: 0.51–0.90
  • QLQ-H&N35:> 0.70
EORTC QLQ-C30: V/?EORTC QLQ-H&N35:V/+
  • EORTC QLQ-c30:0.33–0.77
  • QLQ-H&N35: 0.58–0.83
A/− NR NR QLQ-C30:0.47–0.74 QLQ-H&N35:0.40–0.49 D/− NR NR
QoL-RTI/H&N Chen, 2014 [46] Vab/+ Vab/+ Vab/+ RMSEA = 0.01, CFI = 1 V/+ 0.41–0.77 V/? 0.80–0.94 V/+ NR NR 0.28–0.45 D/− NR NR
  • Note: Methodological quality: “V” = very good, “A” = adequate, “D” = doubtful, “I” = inadequate; quality of measured attributes: “+ ” = sufficient, and “−” = insufficient, “?” = indeterminate.
  • Abbreviations: CFA, confirmatory factor analysis; CFI, comparative fit index; EFA, exploratory factor analysis; ICC, intraclass correlation coefficient; NR, not reported.
  • aAsk for patient opinion.
  • bAsk for expert opinion.

3.3.1. Content Validity

Most included studies investigated relevance, comprehensiveness, and understandability. QLICP-NA [34], FACT-NP [18], QoL-NPC V2 [44], and QoL-RTI/H&N [46] conducted qualitative analyses, systematically evaluating item relevance, comprehensiveness, and comprehensibility through expert consultation and patient feedback. The research process and statistical methods were comprehensively described in these studies [18, 34, 44, 46], resulting in a “Very good” methodological quality rating and a “sufficient” measurement attribute quality rating.

The relevance, comprehensiveness, and clarity of the items were evaluated by QoL-NPC13 [41] and QoL-NPC [39, 40] through expert consultation and patient feedback, with adjustments made to accommodate the Chinese cultural customs specific to NPC. The research process and statistical methods were comprehensively described in these studies [3941], resulting in a “Very Good” rating for methodological quality and an “Adequate” rating for measurement attributes.

Conversely, the SF-36 [15, 16] addressed content validity but lacked information on comprehensiveness and comprehensibility. Content validity was not assessed using the SNOT-22 [42] or the Taiwanese Chinese versions of the EORTC QLQ-C30 and EORTC QLQ-H&N35 [45]. Other studies lacked normative descriptions of methodological validity. Consequently, these studies received an “Indeterminate” methodological quality rating, and the measurement attribute quality was considered “Indeterminate.”

3.3.2. Construct Validity

Confirmatory factor analysis (CFA) was used to evaluate construct validity according to COSMIN guidelines [47]. The construct validity of the scales was assessed in nine studies [15, 16, 3242, 44, 46], with eight studies [15, 16, 3335, 3739] of these using exploratory factor analysis (EFA). The methodological quality of these studies was deemed good. Seven studies [3, 12, 36, 4042, 44, 46] used the root mean square error of approximation (RMSEA) in their analyses and were rated as having “Very Good” methodological quality. However, construct validity was not reported in four studies [15, 18, 33, 45], leading to their methodological quality being classified as “Indeterminate.”

3.3.3. Internal Consistency

All studies reported the internal consistency of the scales, with the majority receiving a methodological quality assessment categorized as “Very Good” or “Adequate.” However, no information was provided on the consistency of the subscales or dimensions in the three studies [3638], resulting in an ambiguous evaluation of their methodological quality.

3.3.4. Stability

The stability was assessed using the intraclass correlation coefficient (ICC) according to COSMIN guidelines. It is essential to specify whether the patients’ conditions remained stable during the measurement, whether the measurement interval was appropriate, and whether the measurement conditions were similar before and after the assessment [48]. Four studies [15, 16, 39, 40] failed to provide information on retest reliability, and two [17, 32] questioned the appropriateness of the measurement interval, leading to an assessment of vague methodological quality. Additionally, none of the studies reported findings related to measurement errors or cross-cultural validity.

3.3.5. Criterion Validity

Criterion validity was reported in seven studies [17, 18, 3234, 45, 46]; however, the commonly used QoL metrics for patients with cancer, considered the gold standard, exhibited methodological faults. Consequently, the methodological quality of these studies was considered ambiguous. Additionally, criterion validity was not reported in 12 studies [15, 16, 3544].

3.3.6. Hypothesis Testing

Hypothesis testing was conducted according to COSMIN guidelines, which included assessments of convergent validity (expected relationships with other well-established instruments) and discriminant validity (expected differences between two related groups or subgroups) [49]. However, only one study [16] reported the hypothesis testing results.

3.4. Evidence Quality Grading and Recommendations for the Evaluation Results of the Included Assessment Tools

The assessment bias was not reported in the SNOT-22 [42], the Taiwanese Chinese version of the EORTC QLQ-C30, and the EORTC QLQ-H&N35 [46], and therefore do not require downgrading. The content validity of the SF-36 scale [15, 16] was evaluated based only on correlation, leading to a one-grade downgrading. The methodological quality of content validity for QL-NPC [17], SQoL-NPC [32], QLICP-HN [33], FACT-H&N [45], QoL-RTI H&N [46], QLQ-H&N35 [37], QLQ-C30 [38], and QoL-NPC [39, 40] scales was deemed “Fuzzy,” resulting in a one-grade downgrading. However, the methodological quality of content validity for QLICP-NA [34], QoL-NPC13 [41], QLICP-NA V2.0 [43], FACT-NP [18], QoL-NPC [40, 44], and QoL-RTI/H&N [36] scales was classified as “Very Good,” requiring no downgrading in Table 3.

Table 3. Evaluation of evidence level and recommendation of evidence included in the evaluation tool.
Measurement tools Content validity Construct validity Internal consistency Stability Cross-cultural validity Criterion validity Hypotheses testing Level of recommendation
Overall evaluation Quality of evidence Overall evaluation Quality of evidence Overall evaluation Quality of evidence Overall evaluation Quality of evidence Overall evaluation Quality of evidence Overall evaluation Quality of evidence Overall evaluation Quality of evidence
SF-36 Moderate ? Moderate Moderate NR NR NR NR NR NR + Moderate C
QL-NPC ? Moderate NR NR ? Moderate + Moderate NR NR + Moderate + Moderate B
SQoL-NP ? Moderate ? Moderate + Moderate + High NR NR ? Moderate NR NR B
QLICP-HN (V2.0T) ? Moderate ? Moderate + Moderate + Moderate NR NR + Moderate NR NR B
QLICP-NA + High ? Moderate + High + High NR NR + Moderate + Moderate A
FACT-H&N ? Moderate ? Moderate + High High NR NR NR NR NR NR B
QoL-RTI/H&N ? Moderate + High ? Moderate High NR NR NR NR NR NR B
QLQ H&N35 ? Moderate ? Moderate + High + High NR NR NR NR NR NR B
QLQ C30 ? Moderate ? Moderate + High + High NR NR NR NR NR D/− B
QoL-NPC + Moderate ? Moderate + High ? Low High NR NR NR NR NR B
QoL-NPC13 + High High + High + High NR NR NR NR NR NR B
SNOT-22 NR NR + High Low + High NR NR NR NR NR NR B
QLICP-NA V2.0 + High ? Moderate + High + High NR NR Moderate NR NR B
QoL-NPC V2.0 + High High + High + High NR NR NR NR NR NR A
FACT-NP + High ? Moderate + High + High NR NR Moderate NR NR A
QoL-RTI/H&N + High + High ? High + High NR NR Moderate NR NR C
  • Note: Methodological quality: “V” = very good, “A” = adequate, “D” = doubtful, “I” = inadequate; quality of measured attributes: “+” = sufficient, “−” = insufficient, and “?” = indeterminate.
  • Abbreviations: CFA, confirmatory factor analysis; CFI, comparative fit index; EFA, exploratory factor analysis; ICC, intraclass correlation coefficient; NR, not reported.

Regarding inconsistencies, only the QL-NPC [17] scale reported hypothesis testing for construct validity. Hypothesis testing was not reported by the remaining scales; therefore, they were not downgraded. The sample size of the Taiwanese Chinese versions of EORTC QLQ-C30 and EORTC QLQ-H&N35 [45] was 50, leading to a one-level downgrading due to imprecision. The sample size of QL-NPC [17] was 40, resulting in a two-level downgrading. Regarding indirectness, the QLICP-HN scale [33] primarily included patients with head and neck cancer during its development, potentially introducing some indirectness, leading to a one-level downgrading. The downgrade evaluation was conducted based on these factors. Among the 15 scales, QLICP-NA [34], FACT-NP [18], and QoL-NPC V2 [44] are recommended as level A due to their good content validity and internal consistency. SF-36 [15, 16], QL-NPC [17], SQoL-NP [32], QLICP-HN (V2.0T) [33], FACT-H&N [35], QoL-RTI/H&N [36], QLQ-H&N35 [37], QLQ-C30 [38], QoL-NPC [39], SNOT-22 [40], and the Taiwanese Chinese versions of EORTC QLQ-C30 and EORTC QLQ-H&N35 [45] are recommended as level B. QoL-NPC13 [41], QLICP-NA V2.0 [43], and QoL-RTI/H&N [46] are recommended as level C.

4. Discussion

4.1. Improving the Methodological and Psychometric Quality of Quality of Life Instruments for NPC Patients

To our knowledge, this is the first systematic review to critically analyze and compare the methodological and psychometric qualities of instruments designed to assess HRQoL in patients with NPC using the COSMIN methodology.

A total of 15 different instruments for measuring HRQoL in patients with NPC were identified in this review. According to the COSMIN methodology, an instrument is strongly recommended if it provides sufficient evidence of content validity (at any level) and internal consistency [22]. All instruments demonstrated limited reliability and validity. The review results indicated that five tools received an A-level recommendation, eight tools received a B-level recommendation, and two tools received a C-level recommendation. The detailed process of content validity for these scales is often overlooked and inadequately described. Content validity, which evaluates the relevance, comprehensiveness, and understandability of the assessment tools, is one of the most important psychometric properties [23]. The included studies had issues of incomplete and non-standard reporting of content validity. The understandability was not reported in the 17 studies included in this review when applied to patients with NPC. Some studies invited psychology professors but failed to meet the guideline requirements for the number of qualitative personnel.

The assessment of scale stability is limited, and the retest design is not rigorous. Stability assessment includes three aspects: test–retest reliability, intra-rater reliability, and inter-rater reliability [50]. Among the 15 tools included in this study, most studies evaluated only test–retest reliability, which may introduce bias. Despite the recommendation in COSMIN guidelines that the interval between the initial test and retest for reliability assessment should be approximately 2 weeks [22], eight studies [17, 3235, 37, 38, 43] failed to fulfill this requirement and did not provide a rationale for their chosen interval. Two studies [17, 32] conducted repeated measurements within 24 h. Future research should focus on study design to ensure that the construct being measured remains stable and that the measurement contexts before and after are similar. Additionally, appropriate time intervals should be selected based on the characteristics of the construct and target population. Besides, it is important to calculate and report the corresponding statistics for the different data types.

The assessment of criterion validity is lacking, and the choice of “criterion validity” is inappropriate [51]. Criterion validity refers to the extent to which the results of a measurement tool accurately reflect the “gold standard.” In the field of cancer, the gold standard for QoL, as specified by COSMIN, is that the original scale can serve as the “gold standard” for shorter versions of the scale. However, the gold standard was not reported in most of the included studies [15, 16, 3543]. Future research should further improve the assessment of criterion validity and select an appropriate “Criterion validity.” The report on the measurement attributes of the scale was partially missing and incomplete, and none of the included studies evaluated the measurement errors. Measurement errors include systematic and random errors. It is recommended to use COSMIN guidelines as a reference to improve the evaluation of measurement attributes.

4.2. Research on the Measurement Properties of HRQoL Assessment Tools for Patients With NPC Has Not Yet Been Improved

In addition to measurement properties, the development and target populations of the tools should be considered. In this study, the SF-36 was developed for the general population, the QLQ-C30 for general cancer patients, and the HN scale for head and neck cancer patients, which was later adapted for use in patients with NPC. Although these scales have good reliability and validity, they lack specificity for NPC. Consequently, it is important to consider the specific characteristics of the target population when selecting tools in the future. Additionally, there are shortcomings in the localization process. In this study, 12 studies [1517, 32, 3538, 40, 42, 45, 46] used Chinese versions of the scales; however, they lacked detailed descriptions of the localization process. According to COSMIN guidelines, cognitive interviews can be added to understand patients’ perspectives and attitudes better, improving the alignment of the tool with the construct being measured and the appropriateness and comprehensibility of each item [52, 53]. Cross-cultural validity assessment is lacking, and the primary methods for evaluating it include calculating measurement invariance or assessing differential item functioning (DIF) [54]. Cross-cultural reliability and validity were not reported in the 12 included studies [1517, 32, 3538, 40, 42, 45, 46]. In the future, when developing or introducing scales from other countries, it is essential to conduct cross-cultural validity testing and DIF assessments for each item. Moreover, it is important to ensure that all variables other than the grouping are as similar and balanced as possible [54]. Furthermore, sufficient sample sizes should be selected according to COSMIN guidelines, and scale validation should be conducted in a standardized manner.

4.3. QLICP-NA, FACT-NP, and QoL-NPC V2 Are Level-A Recommendations and Can Be Prioritized for Use

This system evaluation yielded 16 tools. Three of them are different versions, 3 A-level recommendations [18, 33, 44], 12 B-level recommendations [1517, 32, 33, 3540, 45], and 3 C-level recommendations [41, 43, 46]. Our results revealed that QLICP-NA [34], FACT-NP [18], and QoL-NPC V2 [44] were relatively complete in the measurement characteristics evaluation, and their content validity, structural validity, and internal consistency were proved by high-quality evidence. Moreover, QLICP-NA [34], FACT-NP [18], and QoL-NPC V2 [44] scales were specifically developed for patients with NPC. Considering that patients with NPC have their specificity compared with other cancer patients and head and neck cancer patients, the three scales cover the specific components of NPC patients with appropriate items and high sensitivity. It can be used as an effective clinical adjuvant strategy tool with good relevance and clinical applicability and can provide an effective clinical assessment of the QoL of patients with NPC. However, its measurement properties require further evaluation.

4.4. Strengths and Limitations

The strength of this study is its rigorous evaluation of the methodological and psychometric properties of HRQoL assessment tools for patients with NPC following the COSMIN guidelines. Recommendations based on these properties and levels of evidence could help researchers and decision-makers select the most appropriate tools for practical use. Additionally, common issues with these scales were identified, providing valuable insights for the future development or introduction of HRQoL assessment tools for patients with NPC and ensuring that they are more rigorous and standardized.

However, this study has several limitations. First, the research was limited to studies published in Chinese and English, excluding tools and studies in other languages. Additionally, most specific scales included in this study were developed for patients with head and neck cancer and were then applied to patients with NPC. It is crucial to recognize the unique symptoms that affect the QoL of patients with NPC, and future research should comprehensively address this aspect.

5. Conclusion

This systematic review conducted a comparative evaluation of assessment tools for HRQoL in patients with NPC according to the COSMIN guidelines. Among these, QLICP-NA [34], FACT-NP [18], and QoL-NPC V2 [44] were recommended as A-class assessment tools, while the remaining scales exhibited varying methodological issues. It is recommended to avoid the limitations identified in this study, strictly adhere to COSMIN guidelines, prioritize the quality of methodological and psychometric properties, and ensure comprehensive and scientific reporting to make the assessment tools more rigorous and standardized for the future development or introduction of HRQoL assessment tools for patients with NPC. Regardless of which tool researchers or decision-makers choose, further validation in different settings and populations is recommended to ensure optimal performance across various contexts.

Ethics Statement

The study protocol has been registered on the PROSPERO website (registration number CRD42024567365). This study is a systematic review with no patient or public participation.

Disclosure

This study is not commissioned and is externally peer reviewed.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Jianxia Lyu and Li Yin proposed the research question and designed the theoretical construct of the systematic review; Jianxia Lyu, Hao Zhang, and Yunhua Jing completed the systematic search of the literature, quality assessment, and data extraction; Jianxia Lyu, Hao Zhang, and Li Yin wrote the manuscript; Shichuan Zhang, Qing Yang, and Aiping Wang critically revised the intellectual content of the manuscript. All authors read, approved, and consent to participate in the final manuscript. Jianxia Lyu, Li Yin, and Hao Zhang made equal contributions to this manuscript.

Funding

This work was supported through Chengdu Science and Technology Project (2024-FY05-01307-SN), Sichuan Nursing Society (H22023), and Sichuan Medical Association (S23058).

Acknowledgments

We thank Home for Researchers editorial team (https://www.home-for-researchers.com) for language editing service. The authors declare that no AI tools were used in the design, execution, or analysis of this study.

    Data Availability Statement

    The data that support the findings of this study are available from the corresponding authors upon reasonable request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.