Development of a local nomogram-based scoring system for predicting overall survival in idiopathic pulmonary fibrosis: A rural appalachian experience
Abstract
Background
Accurate staging systems are essential for assessing the severity of idiopathic pulmonary fibrosis (IPF) and guiding clinical management. This study aimed to evaluate the prognostic value of pulmonary comorbidities and body mass index (BMI) in IPF, develop a nomogram predicting overall survival (OS), and create a nomogram-based survival prediction model.
Methods
Patients with IPF were identified from electronic medical records of the West Virginia hospital system. Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression analysis was used for variable selection, and a nomogram was constructed. Risk groups were defined based on the nomogram's probability tertiles. The performance of the nomogram-based model was evaluated using Harrell's concordance index (C-index) and the Hosmer–Lemeshow test.
Results
The study included 152 patients with IPF. The majority of the patients were elderly, male, and had a BMI above 24 kg/m2. The median survival duration was 7.6 years. The survival rates were 91% at 1 year, 78% at 3 years, and 68% at 5 years. LASSO regression selected carbon monoxide lung diffusion capacity percentage predicted (DLco%), BMI, pulmonary hypertension, pulmonary embolism, and sleep apnea as independent predictive variables. The nomogram demonstrated good discrimination (C-index = 0.71) and calibration.
Conclusions
Pulmonary comorbidities and BMI have significant prognostic value in IPF, emphasizing the necessity for consistent screening, assessment, and management of these factors in IPF care. Furthermore, the nomogram-based staging system showed promising performance in predicting OS and represents an actionable staging system that could potentially improve clinical management in IPF. Further validation of the nomogram is warranted to confirm its utility in clinical practice.
Abbreviations
-
- BMI
-
- body mass index
-
- C-index
-
- Harrell's concordance index
-
- DLco%
-
- carbon monoxide lung diffusion capacity percentage predicted
-
- FVC%
-
- forced vital capacity percentage predicted
-
- GAP index
-
- the gender-age-physiological index
-
- HR
-
- hazard ratio
-
- ICD
-
- International Classification of Diseases
-
- IPF
-
- idiopathic pulmonary fibrosis
-
- LASSO
-
- Least Absolute Shrinkage and Selection Operator
-
- OS
-
- overall survival
-
- PFTs
-
- pulmonary function tests
-
- Ref
-
- reference group
1 INTRODUCTION
The progression of idiopathic pulmonary fibrosis (IPF) is influenced by the disease severity at presentation. Therefore, accurate assessment of IPF disease severity using valid staging systems is integral to the clinical management of patients with IPF. Unfortunately, there is no widely-recognized staging system for IPF [1]. Currently, the most commonly used prognostic index for IPF is the Gender-Age-Physiological (GAP) index [2]. The GAP index classifies patients into three categories based on sex, age, and physiological variables: forced vital capacity percentage predicted (FVC%) and carbon monoxide lung diffusion capacity percentage predicted (DLco%). However, age and sex are already incorporated in the percentage predicted values of lung function estimates [1]. Nevertheless, over the last decade, the knowledge of IPF prognosis has evolved; patients with IPF prognosis depend on their concomitant comorbidities.
As IPF occurs mainly in elderly patients, it is often associated with comorbidities that can significantly impact the health outcomes of these patients [3]. Thus, managing comorbid conditions is crucial in the comprehensive care of patients with IPF [4]. Pulmonary comorbidities, such as pulmonary hypertension, pulmonary embolism, sleep apnea, emphysema, lung cancer, and malnutrition, have been found to be consistently associated with shorter survival in IPF [4]. However, there is still a dearth of research on the value of comorbidities for improving the prediction of survival in IPF. Torrisi et al. found that comorbidities (gastroesophageal reflux, pulmonary hypertension, atrial arrhythmias, lung cancer, and valvular heart disease) significantly improved the prediction of risk of death when added to GAP [5]. Also, body mass index (BMI) is one of the most accepted methods to assess malnutrition, which is common among patients with IPF. Previous work found that the inclusion of BMI reinforced the GAP performance in patients with IPF [6, 7]. Thus, including comorbidities in the staging system could potentially improve the real-life clinical approach to the management of IPF.
Tools that accurately enable the clinician to evaluate a patient's situation, also known as decision aids, are crucial for making the final decision on clinical management [8]. Among the available decision aids, nomograms represent the most accurate tools for predicting outcomes; the superior performance of nomograms compared to risk grouping has been well-documented [9]. Risk stratification is a standard procedure in clinical practice to categorize patients into risk groups. However, when continuous variables are used to stratify patients into risk groups, group assignment becomes less certain for patients with values close to cutoff points. In this situation, a nomogram is more desirable as it allows continuous variables to be integrated into risk prediction [10]. Nomograms provide a reliable visualization of multivariable prognostic regression models that quantify the risk of a clinical event. Regression coefficients, representing the weightings of different variables, determine the length and scale of corresponding variable axes in the nomogram construction [10]. In other words, nomograms take the weights of parameters into account, whereas conventional staging systems do not. Given the above-mentioned considerations, this study aimed to (1) investigate the prognostic value of pulmonary comorbidities and BMI (as a proxy for malnutrition) along with pulmonary function tests (PFTs), (2) establish a nomogram predicting the 5-year survival probability in IPF, and (3) develop a nomogram-based survival prediction model.
2 METHODS
2.1 Patient participants
Patients with IPF were identified using the International Classification of Diseases (ICD) coding (ICD 9-CM: 515–516.9; ICD-10-CM J84–84.9) from the electronic medical records of the West Virginia University Medicine hospitals system from January 1, 2015 to December 31, 2019. The West Virginia University Medicine system includes both tertiary centers and peripheral hospitals. The study inclusion criteria were based on the latest American Thoracic Society/European Respiratory Society/Japanese Respiratory Society/Latin American Thoracic Association (ATS/ERS/JRS/ALAT) diagnostic guidelines [11]. Details regarding study patient selection have been published elsewhere [12, 13].
2.2 Data collection
Demographic and clinical data were extracted manually from electronic medical records at the time of the first available PFTs, which were the closest to the recorded ICD code. Extracted data included age (≤60/61–65/>65 years), gender (males/females), BMI (>24/≤24 kg/m2), lung transplant (yes/no), pulmonary comorbidities, including pulmonary hypertension (absent/present), lung cancer (absent/present), emphysema (absent/present), pulmonary embolism (absent/present), and sleep apnea (present/absent), PFTs, including FVC% (>75%/50%–75%/<50%/missing/unknown), DLco% (>55%/36%–55%/≤35%/missing/unknown), and anti-fibrotic use (yes/no). In order to ensure the accuracy of our analysis, we decided not to include smoking status as a variable in the analysis. Respiratory research has suggested a potential bias known as the “healthy smoker effect,” where current smokers may have a better prognosis than never or former smokers, even after adjusting for individual and socioeconomic variables [14, 15]. Emphysema diagnosis was confirmed by checking computed tomography scans. The cutoff points for BMI and PFTs were parallel to those used in the GAP and GAP-Plus-BMI staging systems [2, 7]. Except for PFTs. We did not encounter any missing data for demographic and clinical data. In order to assess the pattern of missing data for PFTs, we compared the overall survival (OS) between two groups: patients with available PFT data and patients with missing PFT values, using Kaplan–Meier curves (Figure S1). This analysis revealed a significant difference in OS between patients with available PFT data and those with missing values. Specifically, the group with missing PFT values showed significantly worse OS compared to the group with available PFT data. These results strongly suggest that the missingness of PFT data was not random. Since the assumption of missing at random (MAR) was not satisfied to permit multiple imputation or complete case analysis, we adopted a coding approach where missing PFT values were designated as “unknown” [16-18] This approach aligns with the GAP index methodology and enhances the nomogram's practicality and applicability to real-world clinical scenarios, where high levels of missingness in PFTs occur due to challenges in adhering to guidelines and patients' deteriorating ability to perform these tests [18-20]. This coding strategy allowed us to explicitly acknowledge and account for the missing data in our analysis. Data were aggregated using REDCap, a HIPAA-compliant data aggregation tool [21].
2.3 Follow-up and outcome assessment
The primary endpoint was OS. The cohort was followed up until June 2022.
2.4 Statistical analysis
This study aimed to assess the association between independent predictors and OS. Variables were described using numbers and percentages. For data dimension reduction and variable selection, the Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression analysis with cross-validation was used to identify the best subset of variables as the predictive variables in the regression models [22]. Lambda (often denoted as λ), which refers to the regularization parameter or penalty term applied to the model, was selected by tenfold cross-validation and was used to determine the best subset of the variables. A smaller lambda allows more flexibility in the model, allowing coefficients to take on larger values. On the other hand, a larger lambda increases the amount of regularization, resulting in more coefficients being set to zero and, subsequently, to sparsity in the model. The significance of each variable in the best subset was evaluated by Cox regression analysis. The proportional hazard assumption was confirmed by checking the scaled Schoenfeld residuals [23]. All variables were considered in multivariable Cox proportional hazard regression models. The nomogram was constructed based on results from the multivariable Cox proportional hazard model. Values for each of the model covariates were mapped on a scale ranging from 0 to 100 points, with total points obtained for each covariate mapped to a 5-year survival probability associated with that combination of covariate values. In addition, calibration curves were plotted to assess the predictive accuracy of the nomogram. For clinical use of the model, a risk classification system was established based on the survival probability of each patient. Patients were categorized into three prognostic groups according to the following nomogram probability tertiles: patients with >66.6% 5-year survival probability (low-risk), 33.3%–66.6% 5-year survival probability (moderate risk), and <33.3% 5-year survival probability (high-risk). In our data set, censoring times (the outcome variable) varied among subjects, with entry times differing randomly among subjects due to the retrospective design of the study. These variations in censoring may introduce potential biases in the data analysis. However, the Kaplan–Meier and Cox regression analysis, which are based on maximum likelihood estimation, are commonly used to handle time-to-event outcomes and provide unbiased estimates under random censoring conditions [24]. Kaplan–Meier curves were constructed to analyze the difference in OS between risk groups, and hazard ratios were estimated using Cox proportional hazard models. Harrell's concordance index (C-index) was used to examine the discrimination of the fitted model. A C-index score of 0.5 suggests random chance, while a score of 1.0 implies perfect discrimination [25]. The Hosmer–Lemeshow test was performed where p > 0.05 indicates a good fit of the predictive model (well calibrated). Statistical analyses were performed using STATA (version 16) and R (version 3.4.4). The “glmnet,” “Rms,” and “survival” packages were used for variable selection, survival analysis, and building the nomogram. The R codes used for variable selection and nomogram development are available in Table S1. All tests were 2-sided, and a p-value of <0.05 was considered statistically significant. The reporting of this prognostic model study followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement [26].
3 RESULTS
3.1 Characteristics of patients
A total of 152 patients with IPF were included in the study. The majority of patients were elderly, males (59%), had a BMI above 24 kg/m2 (78%), and were on supplemental oxygen (78%). Around one-third of the study population received anti-fibrotics (37%). The most common pulmonary comorbidities were emphysema (40%), pulmonary hypertension (36%), and sleep apnea (35%). Coronary artery disease (78%), hyperlipidemia (73%), and gastroesophageal reflux disease (70%) were the most common non-pulmonary comorbidities. Seven patients (46%) underwent lung transplantation. The patient characteristics are shown in Table 1.
Total (N = 152) | Alive (N = 87) | Deceased (N = 65) | |
---|---|---|---|
Gender | |||
Female | 63 (41.4%) | 36 (41.4%) | 27 (41.5%) |
Male | 89 (58.6%) | 51 (58.6%) | 38 (58.5%) |
Age | |||
Mean (SD) | 75.7 (9.97) | 75.1 (9.53) | 76.5 (10.5) |
Median [min, max] | 77.0 [49.0, 99.0] | 76.0 [51.0, 95.0] | 78.0 [49.0, 99.0] |
BMI | |||
≤24 | 34 (22.4%) | 15 (17.2%) | 19 (29.2%) |
>24 | 118 (77.6%) | 72 (82.8%) | 46 (70.8%) |
FVC% | |||
>75% | 48 (31.6%) | 32 (36.8%) | 16 (24.6%) |
50%–75% | 53 (34.9%) | 30 (34.5%) | 23 (35.4%) |
<50% | 12 (7.9%) | 4 (4.6%) | 8 (12.3%) |
Unknown/missing | 39 (25.7%) | 21 (24.1%) | 18 (27.7%) |
DLco% | |||
>55% | 30 (19.7%) | 21 (24.1%) | 9 (13.8%) |
36%–55% | 46 (30.3%) | 30 (34.5%) | 16 (24.6%) |
≤35% | 28 (18.4%) | 12 (13.8%) | 16 (24.6%) |
Unknown/missing | 48 (31.6%) | 24 (27.6%) | 24 (36.9%) |
Pulmonary comorbidities | |||
Pulmonary hypertension | 54 (35.5%) | 20 (23.0%) | 34 (52.3%) |
Pulmonary embolism | 12 (7.9%) | 3 (3.4%) | 9 (13.8%) |
Lung cancer | 7 (4.6%) | 4 (4.6%) | 3 (4.6%) |
Sleep apnea | 53 (34.9%) | 27 (31.0%) | 26 (40.0%) |
Emphysema | 60 (39.5%) | 32 (36.8%) | 28 (43.1%) |
Non-pulmonary comorbidities | |||
GERD | 107 (70.4%) | 61 (70.1%) | 46 (70.8%) |
Diabetes | 52 (34.2%) | 30 (34.5%) | 22 (33.8%) |
Hyperlipidemia | 111 (73.0%) | 63 (72.4%) | 48 (73.8%) |
CAD | 82 (53.9%) | 44 (50.6%) | 38 (58.5%) |
Systemic hypertension | 118 (77.6%) | 64 (73.6%) | 54 (83.1%) |
CKD | 37 (24.3%) | 11 (12.6%) | 26 (40.0%) |
Smoking status | |||
Never | 46 (30.3%) | 27 (31.0%) | 19 (29.2%) |
Former | 94 (61.8%) | 50 (57.5%) | 44 (67.7%) |
Current | 12 (7.9%) | 10 (11.5%) | 2 (3.1%) |
Antifibrotic use | 57 (37.5%) | 34 (39.1%) | 23 (35.4%) |
Lung transplant | 7 (4.6%) | 7 (8.0%) | 0 (0%) |
Oxygen use | 119 (78.3%) | 62 (71.3%) | 57 (87.7%) |
- Abbreviations: BMI, body mass index; CAD, coronary artery disease; CKD, chronic kidney disease; DLco%, carbon monoxide lung diffusion capacity percentage predicted; FVC%, forced vital capacity percentage predicted; GERD, gastroesophageal reflux disease.
3.2 Prognostic data
At the end of the study follow-up, one-third (36.8%) of patients had died. The median survival was 5.8 years. The survival rates were 91% at 1 year, 78% at 3 years, and 68% at 5 years. LASSO regression was used for variable selection. The 10-fold cross-validation method was applied, and a model with excellent performance but a minimum number of variables was obtained when λ was 0.019 (Figure S2). The selected variables included DLco%, BMI, pulmonary hypertension, pulmonary embolism, and sleep apnea. The Cox regression model was established based on parameters selected by LASSO regression (Table 2). The C-index was 0.71 (95% CI: 0.63–0.77) for OS.
Variables | Unadjusted | Adjusted | ||
---|---|---|---|---|
HR (95% CI) | p-value | HR (95% CI) | p-value | |
DLco% | ||||
>55% | Ref | |||
36%–55% | 1.18 (0.46–2.99) | 0.73 | 1.23 (0.48–3.15) | 0.66 |
≤35% | 3.08 (0.25–7.58) | 0.014 | 2.95 (1.17–7.44) | 0.022 |
Unknown | 3.25 (1.39–7.62) | 0.007 | 3.58 (1.48–8.66) | 0.005 |
BMI | ||||
>24 | Ref | |||
≤24 | 2.06 (1.19–3.56) | 0.01 | 2.29 (1.27–4.17) | 0.006 |
Pulmonary hypertension | ||||
Absent | Ref | |||
Present | 2.39 (1.43–4.02) | 0.001 | 2.08 (1.22–3.55) | 0.007 |
Pulmonary embolism | ||||
Absent | ||||
Present | 2.46 (1.16–5.21) | 0.019 | 2.96 (1.32–6.59) | 0.008 |
Sleep apnea | ||||
Absent | Ref | |||
Present | 1.19 (0.71–2.03) | 0.49 | 1.92 (1.09–3.39) | 0.025 |
- Abbreviations: BMI, body mass index; DLco, carbon monoxide lung diffusion capacity; HR, hazard ratio; IPF, idiopathic pulmonary fibrosis; OS, overall survival.
3.3 Nomogram development
We constructed the nomogram based on five independent variables: DLco%, BMI, pulmonary hypertension, pulmonary embolism, and sleep apnea (Figure 1). Each variable was mapped on a scale from 0 to 100, and the range of the total points was 0–350. Finally, the total points on the risk axis represent the 5-year survival probability. The calibration curve indicated that the predicted survival rates were consistent with the actual survival rates (Figure 2).

Nomogram used to predict 5-year survival probability in patients with IPF. The usage of the nomogram is illustrated in a hypothetical patient with DLco% of 49%, BMI of 26, and pulmonary embolism. According to the nomogram, points for DLco%, BMI, and pulmonary embolism were 17, 0, and 87, respectively. The total points added up to 104 for this patient, representing approximately 74% 5-year survival probability. BMI, body mass index; DLco%, carbon monoxide lung diffusion capacity percentage predicted; IPF, idiopathic pulmonary fibrosis.

Calibration curve of overall survival at 5 years. The nomogram-predicted probability of survival is plotted on the x-axis, and the actual survival is plotted on the y-axis. Dashed lines along the 45-degree line through the point of origin represent the perfect calibration models where the predicted probabilities are identical to the actual probabilities.
Risk scores were calculated based on the 5-year survival probability tertiles displayed by the nomogram. The median follow-up duration for the low-risk, intermediate-risk, and high-risk groups was 6.72, 6.11, and 5.56 years, respectively. Univariable Cox hazard regression revealed significant differences between risk groups OS (Figure 3). The C-index of the nomogram-based risk score was 0.71 (95% CI: 0.65–0.78) for OS, indicating stable and favorable performance of the model. Moreover, the Hosmer–Lemeshow test showed good concordance between predicted and actual observations (p > 0.05).

Kaplan–Meier survival curves of a nomogram-based scoring system. One hundred and fifty-two patients were categorized into three prognostic groups according to the nomogram probability tertiles: patients with >66.6% 5-year survival probability as “low-risk group,” 33.3%–66.6% 5-year survival probability as “moderate risk group” and <33.3% 5-year survival probability as “high-risk group.” HR, hazard ratio; MST, median survival time; Ref, reference group.
4 DISCUSSION
With accurate risk estimation, patient management could be carried out more conveniently. Unfortunately, because IPF is remarkably heterogeneous regarding prognosis, the prediction of survival using current staging systems is imprecise [27]. In this study, we developed a nomogram and derived a risk-staging system using local inpatient data from the West Virginia University Medicine hospital systems. We identified DLco%, BMI (as a proxy for malnutrition), pulmonary hypertension, pulmonary embolism, and sleep apnea as independent prognostic factors for OS. These findings align with earlier research on risk factors for mortality among patients with IPF. Previous studies represented efforts to characterize the rural Appalachian IPF patient population, which is also uniquely challenged with the highest cigarette smoking rates nationally. While Sangani et al. identified BMI and pulmonary hypertension as significant prognostic predictors for the IPF cohort, the current study explored additional prognostic factors and examined their relative importance to each other using a nomogram [12, 13].
Several studies have suggested that a reduced DLco% is associated with higher death rates and worse outcomes in patients with IPF [28]. However, FVC% was not determined as an independent prognostic factor for inclusion in the nomogram. One plausible explanation could be its high collinearity with DLco%; LASSO regression selects one representative predictor from a set of highly correlated variables while shrinking the coefficients of the others. Another explanation could be the high representation of patients with emphysema in our sample (40%), which may alter the physiological characteristics of IPF because it mitigates the impact of fibrosis on ventilatory physiology and delays lung function decline [29]. Thus, FVC% may have limited predictive ability in the presence of emphysema, consistent with the results reported in the literature. For example, Kim et al. showed that low DLco% was independently associated with higher death rates in patients with combined pulmonary fibrosis and emphysema, but FVC% was not [30]. In a previous study, DLco% changes over time were shown to be a better predictor of mortality than FVC% changes [31]. Remarkably, to the best of the authors' knowledge, there are no clinical trials in IPF where DLco% has been used as a marker of pulmonary function monitoring. Therefore, our findings support the inclusion of DLco% as an endpoint in clinical trials, which can potentially improve the precision of results.
The prognostic value of BMI in IPF has been investigated in several studies. Consistent with our findings, multiple studies have shown that a lower BMI is associated with worse outcomes and increased mortality in IPF [13, 32, 33]. This phenomenon is often referred to as the “obesity paradox” [34]. Low BMI can indicate an overall poor nutritional status [35]. Malnourished individuals have weaker immune systems, decreased overall functional capacity, and shorter survival times [36]. Both IPF itself and gastrointestinal-related adverse events associated with anti-fibrotic therapy can affect patients' nutritional status [37, 38]. Therefore, addressing malnutrition and optimizing nutritional status are important considerations in IPF management. Nutritional support is crucial in the comprehensive care of patients with lung cancer [39]. Given the resemblance with IPF, dietary counseling, and adequate nutritional support that could help improve BMI, muscle mass, functional capacity, quality-of-life, and potentially prolong survival appears warranted.
Among pulmonary comorbidities, pulmonary hypertension, pulmonary embolism, and sleep apnea were identified as independent prognostic factors for mortality, whereas emphysema and lung cancer were not. These findings align with previous studies. The impact of pulmonary hypertension on survival in patients with IPF is well characterized. Pulmonary hypertension complicates IPF [40], is associated with a worse prognosis [41], and is an independent predictor of mortality [42]. In fact, high pulmonary arterial pressure values at the initial evaluation of patients with IPF are an independent predictor of survival [43]. The prognostic value of pulmonary hypertension has been evaluated and was one of the comorbidities that improved GAP's all-cause mortality prediction in IPF [5]. Unfortunately, until recently, patients with IPF did not routinely undergo regular screening for pulmonary hypertension, partly because there was no approved pulmonary hypertension-targeted therapy for the IPF population [44]. However, with the approval of inhaled Treprostinil for pulmonary hypertension associated with intestinal lung disease in 2021 [45], routine screening for this comorbidity is a potential cornerstone in pulmonary fibrosis patient care.
Emphysema is one of the most common pulmonary comorbidities of IPF, presenting in 40% of our sample. Past analyses have shown conflicting results regarding the impact of emphysema on mortality. A previous single-center study demonstrated that emphysema in patients with IPF did not independently predict mortality once age, gender, smoking status, baseline severity using DLco%, and the extent of interstitial lung disease and emphysema had been considered [46]. Another registry-based study supported these findings [47]. However, some authors describe a negative impact on prognosis when IPF is associated with emphysema [48, 49]. In our analysis, we did not find a significant difference in OS in Patients with IPF with or without emphysema after adjusting for other pulmonary comorbidities. Also, consistent with our findings, previous studies have suggested that mortality in patients with concomitant lung cancer and IPF was no greater than in those with IPF without lung cancer [50]. The lack of a prognostic impact of lung cancer in these patients confirms that IPF has a worse prognosis than most cancers. In fact, IPF has a worse prognosis than many forms of cancer [51]. However, more recent studies have argued that the mortality among patients with co-existing IPF and lung cancer is higher than that among patients with IPF alone [52-54]. In fact, the survival rates vary depending on the stage and type of lung cancer, which we did not account for in our study. However, there are currently no guidelines available for clinicians on how to manage lung cancer in IPF.
There are several strengths to our study. First, we accounted for survivor bias by creating a distinct group of patients with missing PFTs. As IPF progresses, missed spirometry visits promote survivor bias by raising the mean PFTs because missing values are associated with exacerbation of the condition or mortality among patients [55]. Second, we utilized LASSO regression for variable selection. LASSO regression is instrumental when there is a high degree of multicollinearity among predictors, such as DLco% and FVC%, as it tends to select one representative predictor from a set of highly correlated variables while shrinking the coefficients of the others. Also, as a regularization technique, it effectively helps prevent overfitting, improve interpretability, and simplify the model by eliminating irrelevant predictors. Third, the study has at least 10 events (death) per variable modeled, commonly recommended in survival analysis to maintain statistical power and reliability. This helps to ensure that the model is not overfitting the data and that the estimated hazard ratios are more likely to reflect true associations [56]. Finally, we evaluated model performance by assessing model discrimination and calibration, the two cardinal aspects of model fit [57].
We recognize the retrospective and the limited sample size as important limitations to our study. Also, our study focused on a cohort of locally based patients with IPF with distinct characteristics within this group [12, 13]. These included older age, high burden of comorbidities, diagnostic delay, limited access to specialized interstitial lung disease centers, and restricted utilization of anti-fibrotic therapy and lung transplantation [12, 13]. Also, the presence of rural health disparities and socioeconomic challenges impacts the eligibility of our patient population for advanced therapeutic options [12, 13]. It is important to note that these distinct characteristics may limit the generalizability of our findings to broader populations, but they hold significant value for understanding and addressing the specific needs of the Appalachian population. Additionally, it is important to highlight that the Appalachian population in our study has one of the highest smoking rates in the United States. Smoking is a well-known common risk factor for both IPF and emphysema. This high prevalence of smoking within the population likely contributes to the observed distinct characteristics and further emphasizes the importance of addressing these specific health challenges in the Appalachian region [12, 13]. Also, we did not precisely evaluate the extent and severity of the comorbidities or define the stage and the type of lung cancer. Additionally, we used BMI as a proxy for malnutrition, which could have underestimated malnutrition among patients with IPF [58]. Furthermore, this nomogram has not been tested in an independent population, and thus, its external validity remains to be established [59]. As a result, our findings should be considered as suggestive rather than definitive. Finally, some important prognostic factors, including non-pulmonary comorbidities and D-dimer levels, were not included due to the study's limited sample size. Further efforts on prospective data collection, wider geographic recruitment, and incorporation of the extent and severity of comorbidity are encouraged to improve this model. More reliable methods of systematically screening patients with IPF for malnutrition are also warranted.
Another potential limitation is that we did not directly compare the performance of the developed nomogram-based scoring system with an established staging system, such as the GAP index. However, in comparison to the GAP index, which is widely used for staging and prognosis in similar patient populations, our developed model incorporates additional clinical predictors, such as pulmonary comorbidities, and utilizes a different modeling approach. Additionally, while the GAP index is developed to predict 3-year survival, our model focuses on OS. Also, the discriminative ability of our model as measured by the c-statistic is comparable to values reported in the literature, indicating similar predictive performance [2, 60, 61]. It is also important to acknowledge that a substantial proportion of the data for FVC and DLco, key predictors in our model, were missing. Although we have included all available data in this analysis by categorizing individuals with missing data as “unknown,” enabling the model to borrow information from the complete observed data in the multivariable analysis, a nomogram based on a complete dataset may provide a more definitive model. However, real-world data often show high levels of missingness in PFTs due to difficulties in adhering to guidelines and the patients' ability to perform these tests as their condition worsens. Including patients with missing data on PFTS makes the nomogram more pragmatic and reflective of actual clinical scenarios. The former notwithstanding, future research could consider utilizing advanced imputation methods to address missing data more effectively, for example, multiple imputation or complete case analysis, provided the missing data follows a missing at random pattern. However, in our dataset, the data were missing not at random, and thus restricts the use of these approaches. Such methods, if applicable, would allow for more refined handling of incomplete datasets, preserving sample size, and potentially improving the model's robustness and predictive accuracy.
Our findings have significant implications for research and patient care, as physicians and patients could perform an individualized survival prediction through this easy-to-use scoring method. Identifying subgroups of patients at different risks of poor survival might impact treatment or care options. We believe that the nomogram represents a promising prognostic model as it is medically actionable, including manageable parameters that can be directly influenced or controlled through deliberate interventions or decisions to achieve desired outcomes.
5 CONCLUSIONS
The present study confirmed the prognostic value of BMI and pulmonary comorbidities in IPF. Thus, routine screening and assessment of such parameters is fundamental in IPF care. The promising performance of the nomogram signifies its potential value in clinical practice and offers a practical approach to enhance the clinical management of IPF. However, further validation of the nomogram is essential to confirm its effectiveness in real-world clinical practice. Furthermore, recognizing the complexities of rural healthcare structures and the distinct health characteristics of West Virginia's population, we propose the adoption of a nomogram-based risk-staging system leveraging local patient cohort data. This tailored approach may present a promising solution to address the inherent inaccuracies in survival prognostication for IPF.
AUTHOR CONTRIBUTIONS
Rowida Mohamed: Conceptualization (equal); data curation (equal); formal analysis (equal); methodology (equal); project administration (equal); visualization (equal); writing—original draft (lead). Rahul G. Sangani: Conceptualization (equal); data curation (lead); resources (lead); supervision (equal); validation (equal); writing—review & editing (equal). Khalid M. Kamal: Resources (equal); supervision (equal); validation (equal); writing—review & editing (equal). Traci J. LeMaster: Supervision (equal); validation (equal); writing—review & editing (equal). Toni Marie Rudisill: Validation (equal); writing—review & editing (equal). Virginia G. Scott: Validation (equal); writing—review & editing (equal). George A. Kelley: Supervision (equal); validation (equal); writing—review & editing (lead). Sijin Wen: Conceptualization (equal); formal analysis (lead); methodology (lead); supervision (lead); validation (lead); visualization (equal); writing—original draft (equal); writing—review & editing (lead).
ACKNOWLEDGMENTS
We would like to thank the reviewers for their thoughtful comments and efforts towards improving our manuscript.
CONFLICT OF INTEREST STATEMENT
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
ETHICS STATEMENT
The West Virginia University institutional review board (ID #1904548975) reviewed and approved the study protocol.
INFORMED CONSENT
Not applicable.
Open Research
DATA AVAILABILITY STATEMENT
The data supporting the findings of this study are not publicly available due to restrictions on data sharing. For any questions or inquiries regarding the data, please contact the co-senior author Dr. Rahul G. Sangani.