Artificial neural network based prediction of the lung tissue involvement as an independent in-hospital mortality and mechanical ventilation risk factor in COVID-19
Abstract
Introduction
During COVID-19 pandemic, artificial neural network (ANN) systems have been providing aid for clinical decisions. However, to achieve optimal results, these models should link multiple clinical data points to simple models. This study aimed to model the in-hospital mortality and mechanical ventilation risk using a two step approach combining clinical variables and ANN-analyzed lung inflammation data.
Methods
A data set of 4317 COVID-19 hospitalized patients, including 266 patients requiring mechanical ventilation, was analyzed. Demographic and clinical data (including the length of hospital stay and mortality) and chest computed tomography (CT) data were collected. Lung involvement was analyzed using a trained ANN. The combined data were then analyzed using unadjusted and multivariate Cox proportional hazards models.
Results
Overall in-hospital mortality associated with ANN-assigned percentage of the lung involvement (hazard ratio [HR]: 5.72, 95% confidence interval [CI]: 4.4–7.43, p < 0.001 for the patients with >50% of lung tissue affected by COVID-19 pneumonia), age category (HR: 5.34, 95% CI: 3.32–8.59 for cases >80 years, p < 0.001), procalcitonin (HR: 2.1, 95% CI: 1.59–2.76, p < 0.001, C-reactive protein level (CRP) (HR: 2.11, 95% CI: 1.25–3.56, p = 0.004), glomerular filtration rate (eGFR) (HR: 1.82, 95% CI: 1.37–2.42, p < 0.001) and troponin (HR: 2.14, 95% CI: 1.69–2.72, p < 0.001). Furthermore, the risk of mechanical ventilation is also associated with ANN-based percentage of lung inflammation (HR: 13.2, 95% CI: 8.65–20.4, p < 0.001 for patients with >50% involvement), age, procalcitonin (HR: 1.91, 95% CI: 1.14–3.2, p = 0.14, eGFR (HR: 1.82, 95% CI: 1.2–2.74, p = 0.004) and clinical variables, including diabetes (HR: 2.5, 95% CI: 1.91–3.27, p < 0.001), cardiovascular and cerebrovascular disease (HR: 3.16, 95% CI: 2.38–4.2, p < 0.001) and chronic pulmonary disease (HR: 2.31, 95% CI: 1.44–3.7, p < 0.001).
Conclusions
ANN-based lung tissue involvement is the strongest predictor of unfavorable outcomes in COVID-19 and represents a valuable support tool for clinical decisions.
1 INTRODUCTION
Since the outbreak of the SARS-CoV2 pandemic, factors associated with mortality have been extensively investigated, and older age and comorbidities such as chronic lung disease, diabetes, obesity, hypertension, kidney injury, and malignancy have been strongly associated with increased risk of death.1-3 Therapeutic interventions with antiviral agents reduce the risk of progression of severe diseases to a certain extent. However, hospitalized patients have limited efficacy, often reducing the mortality risk only in selected subgroups or within—5–7 day time of symptom onset. In the advanced stages of COVID-19, cytokine storm is one of the primary features associated with unfavorable survival outcomes.4-7 Pneumonia and respiratory failure remain key COVID-19 complications, regardless of the infecting variant.8, 9
Radiological imaging techniques, including computed tomography (CT), allow for a thorough investigation of the type and volume of lung tissue involved in COVID-19. The extent, characteristics, and location of inflammatory changes are associated with mortality, clinical severity, and the need for mechanical ventilation10, 11 Chest CT in patients with COVID-19 most commonly reveals ground-glass opacification with or without consolidative anomalies. Even in asymptomatic patients, chest CT imaging anomalies were observed, with progression from focal unilateral to diffuse ground-glass opacities and consolidations.10
Evolving pandemic waves have exerted significant burden on medical systems, which have struggled with high number of admissions, timely diagnostics, and accurate prediction of disease severity to optimize decision processes related to the need for hospital admission and mechanical ventilation.12 The analysis of CT images requires expert radiologic assessment and time, both of which may be of limited availability during a pandemic. Furthermore, the extent of lung tissue involvement may vary because of human error, which reduces the accuracy and usefulness of CT radioimaging data for prediction models.13-15 Lastly, radiological data alone do not account for the remaining clinically important variables, including laboratory parameters, clinical patient data, or comorbidities.
Notably, considerable resources have been invested into COVID-19 research with the involvement of multidisciplinary clinical, virologic, radiologic, and bioinformatics teams.16 Multiple solutions use deep neural networks to classify data from CT scans of SARS-CoV 2 infected patients, leveraging the availability of large public data sets. Artificial intelligence (AI) systems allow the classification and segmentation of lung tissue lesions with high sensitivity and specificity providing assistance for clinicans and aiding medical decisions.17, 18 So far, several different AI algorithms including ANNs, vector machines, decision trees, and random forest methods for the of radiologic data were proposed.19, 20 Furthermore, several predicitive systems and scoring solutions including deep learning models were developed to classify the risk of disease progression integrating clinical, laboratory and radiologic data.21-23 For example, a convolutional neural network (CNN) was trained to classify the risk of disease progression to respiratory failure and intensive care admission.24 The algorithm was trained with the inclusion of laboratory information and a gradient-boosting algorithm to make final predictions. Furhter models proposed end-to-end solutions allowing for integrated and robust real-time predictions over time optimizing the risk prediction.25, 26
In this study, we created ANN based rapid CT assessment tool trained to map and label the type, localization and volume of the lung tissue involvement among patients with COVID-19 as a part of the rapid assistance for the visualization of the type and extent of the inflammatory changes. Subsequently, we have used the data from this tool to model the in-hospital mortality and mechanical ventilation risk using large scale (>4000 cases) clinical data set.
For maximum model simplicity, the inferred hazards model statistics used only the simplest parameter, namely, the percentage of lung tissue involvement, which reflects the sum of all inflammatory changes in the lung. We aimed to validate this ANN-based prediction model across an extensive data set of laboratory and clinical variables to establish athe best- fitted model that would include key prognostic variables. The overall goal of the survival model was to determine the baseline factors at hospital admission predictive of both mortality and mechanical ventilation, but not a specific timeline, which is why Cox regression modeling was implemented in this study.
2 METHODS
2.1 Study group
For the purpose of this study data of 4317 patients hospitalized from March 4, 2020 to January 23, 2022 (database closure) at the Regional Hospital in Szczecin, Poland, due to COVID-19 were analyzed. Patients presented with symptoms associated with COVID-19, including fever (>38°C), dyspnea, cough, or other symptoms suggestive of SARS CoV-2 infection. Polymerase chain reaction (PCR) or antigen testing for severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) was performed using nasopharyngeal swabs to confirm viral infection. Patients were admitted to the hospital based on their clinical condition assessed by the physician upon call at the hospital emergency department. All patients included in the study underwent chest computed tomography (C-CT), which was the basis for assessment using an AI tool. All patients were treated in accordance with the current knowledge and Polish COVID-19 treatment guidelines and received dexamethasone, antibiotics, antivirals (remdesivir), immunomodulating agents (tocilizumab), and prophylactic or therapeutic low-molecular-weight heparin.27
2.2 Ethical issues
The study protocol was approved by the Bioethics Committee of Pomeranian Medical University, Szczecin, Poland (approval number: KB-0012/92/2020). All the patients or their legal representatives provided informed consent for participation in the study and data collection. Data were collected and analyzed anonymously. This study was conducted in accordance with the principles of the Declaration of Helsinki. No additional procedures related to the study were performed on the included patients. Radiological and laboratory analyses and treatment protocols were based on local and national guidelines for the diagnosis and treatment of COVID-19.
2.3 Sampling and data collection
Data were collected from medical records, including age; sex; treatment history; duration of in-hospital stay; duration and timeline of treatment in the intensive care unit (ICU); survival statistics; chest CT scan results; and selected baseline (on the day of hospital admission) laboratory parameters (white blood cell, neutrophil, lymphocyte, and platelet counts; hemoglobin and hematocrit levels; procalcitonin, C-reactive protein, interleukin-6 (IL-6), lactate dehydrogenase, d-dimer activity; creatinine, aspartate, and alanine aminotransferase activity; and troponin I levels). The glomerular filtration rate (GFR) was estimated based on the modification of diet in renal disease (MDRD) equation. All patients underwent CT of the lungs, which was performed and analyzed within the first 12 h of hospital care. All laboratory and CT variables were analyzed based on a standardized data collection protocol implemented in the hospital during the COVID-19 pandemic; therefore, the issue of missing data was minor and data completeness for above variables exceeded 90%.
Comorbidities were coded based on the International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) and further classified into five key clinical disease categories: hypertension, diabetes, cardiovascular and cerebrovascular disease, chronic pulmonary disease, and cancer (including both hematological and any other type of cancer).
2.4 Artificial deep neural network (DNN)-based assessment of lung involvement
- –
Semantic segmentation of lungs divided into left and right on the highly downsampled images. The purpose of this network is to localize each lung and allow other models to process only the necessary parts of the image. The training data set contained 288 CT scans manually annotated by radiologists using Exhibeon3, a DICOM viewer with built-in, certified segmentation tool (Pixel Technology).
- –
Semantic segmentation of the lobes. Using the same data set, we trained a network that segmented the lungs into five lobes based on high-resolution CT scans. Because this task required more precision in places where the segmentations of the two lobes meet, we implemented the loss function as described by Chen et al.,29 which by design, forces the DNN to outline contour in greaterdetails.
- –
Semantic segmentation of inflammatory lung lesions. The training data set consisted of 115 CT scans with annotations for two lesion categories: ground glass and consolidation. Radiology specialists created the annotation set using the SmartBrush tool to annotate structures with similar characteristics. As manual preparation of the data set is a demanding task for radiologists, ground truth segmentations are noisy and difficult to replicate precisely. The radiologist annotation was commonly ragged, and the modeling tool smoothed out the irregular borders. This influenced the DICE coefficient for the annotation.
Training and tests scores: ANN1: training Dice Score: 0.975, test Dice Score: 0.979; ANN2: training Dice Score: 0.831, test Dice Score: 0.828; ANN3: training Dice Score: 0.692, test Dice Score: 0.671.
The U-Net training setup was as follows: we used UNet-like CNNs, where each block consisted of multiple bottleneck layers.30 We used Mish activation function and group normalization. Learning rate: A one-cycle learning-rate schedule with a peak at 3e−4.31 The training parameters were as follows: weight decay was 1e−4, 300 epochs; loss was categorical cross entropy + 0.1 * Dice loss. For better regularization of lungs and lobes segmentation, we additionally added active contour loss.32 The data sets used for lesion and anatomical segmentation had the same PACS source.
An anonymised data set is available upon reasonable request from [email protected]. The ANN code was developed by Pixel Technology with intellectual property rights transferred to the Regional Hospital in Szczecin and may be available following a specific agreement with the Hospital Directorate. Requests to be sent to Małgorzata Szelagiewicz ([email protected]).
The results of semantic segmentation of lung lobes and inflammatory lesions were used to calculate the overall lung involvement, automatically assigned for every patient, and further referred to as ANN based percentage of inflammatory lung involvement (Supporting Information: Figure 2). We used this simple parameter in all statistical models.
2.5 Statistical analyses
Initially, statistics were calculated to analyze in-hospital mortality, the requirement for mechanical ventilation in the intensive care unit, and survival if mechanically ventilated. All models considered baseline radiologic, clinical and laboratory values as described above to calculate the survival prediction model based on patient data at the hospital admittance.
Statistical comparisons were performed using χ2 tests for nominal variables while continuous data were analyzed with the Mann–Whitney U test for nonparametric statistics. CIs and interquartile ranges (IQRs) are indicated where appropriate. For key continuous variables, categorical analyses were also performed to optimally inform the survival models and best fit the data set. Therefore ANN based percentage of inflammatory lung involvement into six categories (<10%, 10%–20%, 21%–30%, 31%–40%, 41%–50%, >50%), age was divided into five categories (<50, 51–60, 61–70, 71–80, >80 years), IL-6 into four (<5, 5–50, 51–100, >100 pg/mL), procalcitonin, C-reactive protein (mg/L), d-dimer (ug/L), and eGFR (ml/min) into three (<0.5, 0.5–2.0, >2.0 ng/mL; <10, 10–100, >100 mg/L; normal range, twofold upper limit normal, >twofold upper limit normal, >60, 60–30, <30 mL/min, respectively), while troponin and lactate dehydrogenase into two categories (normal range and increased).
Subsequently, the data were fitted to unadjusted and multivariate Cox proportional hazards models to assess the effect of the analyzed parameters on the risk of death, mechanical ventilation, and survival if mechanically ventilated by calculating the hazard ratios (HR) and 95% confidence intervals (95% CI).
For the initial unadjusted Cox models, all categorical variables were significantly associated with mortality and the risk of mechanical ventilation, and variables with significant univariate effects were fitted into the multivariate model with stepwise removal of the variables with no significant effect (p < 0.05) until the final model was obtained. Final multivariate models were calculated only for overall survival and risk of mechanical ventilation, as the model for survival if mechanically ventilated did not reach statistical significance in the univariate analyses owing to the small sample size.
As 3014 (69.81%) patients were discharged from the hospital or died within the first 14 days of hospitalization for overall in-hospital mortality, we also calculated the Cox survival models for the first 15 days and separately for patients who were hospitalized beyond 15 days, with the 15th day of hospitalization as the starting point for the analysis. This was done to reduce the informative censoring related to the fact that the mortality risk in discharged patients was lower than that in patients under hospital care; the same variables were used for both analyses. In addition, we aimed to reduce bias related to the high number of discharged patients beyond 15 days of observation.
Trends in the percentage of ANN-analyzed lung involvement and mortality, mechanical ventilation, and ICU mortality were examined using logistic regression on the R statistical platform (v. 3.1.0), with coding available on request, and calculation of the regression trend line with 95% CI.
For all analyses p-values of 0.05 were considered significant. Commercial software (Statistica 13.0 PL; Statsoft) was used for statistical calculations.
3 RESULTS
3.1 General patient characteristics
Overall, we reviewed and analyzed the data of 4317 hospitalized due with COVID-19 for a total of 63811 person-days. Within the observation group, 266 (6.16%) patients required mechanical ventilation and were admitted to ICU accounting for 1409 person-days of ICU observation. The median hospitalization time was 12 days (IQR: 8–17 days, 90th percentile of observation time: 26 days; 95th percentile: 34 days), ranging from 1 to 173 days. For patients requiring mechanical ventilation, the median time to ICU admission was 4 days (IQR: 1–8) days, ranging from 0 to 52 days.
The overall mortality rate was 16.93% (n = 731), with 73.31% (n = 195/266) in the ICU. As expected, comparing to surviving group, patients who died were notably older (median: 65 (53–73) years vs. 74 (67–84) years) (p < 0.001), with higher percentage of ANN-based inflammatory lung tissue involvement (median: 11.41% (4.42–22.78) vs. 31.95% (IQR: 15.59–50.45), p < 0.001), more frequent comorbidities such as diabetes, cardiovascular/cerebrovascular disease (CVD), as well as cancers (Table 1). No sex-associated differences in the overall mortality were observed.
Entire analyzed group | Hospitalized and discharged | Hospitalized and died | p | Not requiring mechanical ventilation | Requiring mechanical ventilation | p | Required mechanical ventilation (ICU) and survived | Required mechanical ventilation (ICU) and died | p | |
---|---|---|---|---|---|---|---|---|---|---|
Age, median (IQR) | 67 (55–75) | 65 (53–73) | 74 (67–84) | 0.00 | 67 (55–76) | 65 (59–71) | 0.045 | 61.5 (53–69) | 66 (60–72) | 0.0048 |
Age range, n (%) | ||||||||||
<50 years | 778 (18.2) | 755 (97.04) | 23 (2.96) | 0.00 | 749 (96.27) | 29 (3.73) | 0.00 | 15 (51.72) | 14 (48.28) | 0.0063 |
51–60 years | 677 (15.68) | 625 (92.32) | 52 (7.68) | 625 (92.32) | 52 (7.68) | 17 (32.69) | 35 (67.31) | |||
61–70 years | 1223 (28.33) | 1030 (84.22) | 193 (15.78) | 1109 (90.68) | 114 (9.32) | 26 (22.81) | 88 (77.19) | |||
71-80 years | 925 (21.43) | 718 (77.62) | 207 (22.38) | 864 (93.41) | 61 (6.59) | 11 (18.03) | 50 (81.97) | |||
>80 years | 671 (15.54) | 417 (62.15) | 254 (37.85) | 663 (98.81 | 8 (1.19) | 1 (12.5) | 7 (87.5) | |||
Gender | ||||||||||
Male | 2425 (56.17) | 2013 (83) | 412 (17) | 0.91 | 2261 (93.24) | 164 (6.76) | 0.063 | 42 (25.61) | 122 (74.39) | 0.613 |
Female | 1892 (43.83) | 1573 (83) | 319 (17) | 1790 (94.61) | 102 (5.39) | 29 (28.43) | 73 (71.57) | |||
ANN lung involvement, median (IQR) | 13.58 (5.16–26.98) | 11.41 (4.42–22.78) | 31.95 (15.59–50.45) | 0.00 | 12.58 (4.8–25.21) | 42.9 (25–55.53) | 0.00 | 32.48 (19.28–51.57) | 45.84 (30.52–58.16) | 0.0021 |
ANN based lung involvement, range, n (%) | ||||||||||
<10% | 1769 (40.98) | 1636 (92.48) | 133 (7.52) | 0.00 | 1740 (98.36) | 29 (1.64) | 0.00 | 10 (34.48) | 19 (65.52) | 0.019 |
10–20% | 975 (22.59) | 876 (89.85) | 99 (10.15) | 955 (97.95) | 20 (2.05) | 10 (50) | 10 (50) | |||
21–30% | 660 (15.29) | 547 (82.88) | 113 (17.12) | 631 (95.61) | 29 (4.39) | 12 (41.38) | 17 (58.62) | |||
31–40% | 372 (8.62) | 272 (73.12) | 100 (26.88) | 332 (89.25) | 40 (10.75) | 10 (25) | 30 (75) | |||
40–50% | 230 (5.33) | 133 (57.83) | 97 (42.17) | 174 (75.65) | 56 (24.35) | 11 (19.64) | 45 (80.36) | |||
>50% | 311 (7.2) | 122 (39.23) | 189 (60.77) | 219 (70.42) | 92 (29.58) | 18 (19.57) | 74 (80.43) | |||
Comorbitdies, n (%) | ||||||||||
Hypertension | ||||||||||
Yes | 966 (22.38) | 783 (81.06) | 183 (18.94) | 0.059 | 815 (84.37) | 151 (15.63) | 0.00 | 38 (25.17) | 113 (74.83) | 0.52 |
No | 3351 (77.62) | 2803 (83.65) | 548 (16.35) | 3236 (96.57) | 115 (3.34) | 33 (28.7) | 82 (71.3) | |||
Diabetes | ||||||||||
Yes | 466 (10.79) | 345 (74.03) | 121 (25.97) | 0.000 | 375 (80.47) | 91 (19.53) | 0.00 | 21 (23.08) | 70 (76.92) | 0.337 |
No | 3851 (89.21) | 3241 (84.16) | 610 (15.84) | 3676 (95.46) | 175 (4.54) | 50 (28.57) | 125 (71.43) | |||
Cardiovascular/cerebrovascular disease | ||||||||||
Yes | 1428 (33.08) | 1093 (76.54) | 335 (23.46) | 0.000 | 1240 (86.83) | 188 (13.17) | 0.00 | 45 (23.94) | 143 (76.06) | 0.115 |
No | 2889 (66.92) | 2493 (86.3) | 396 (13.7) | 2811 (97.3) | 78 (2.7) | 26 (33.33) | 52 (66.67) | |||
Chronic lung disease | ||||||||||
Yes | 98 (2.27) | 76 (77.55) | 22 (22.45) | 0.14 | 78 (79.59) | 20 (20.41) | 0.00 | 6 (30) | 14 (70) | 0.73 |
No | 4219 (97.73) | 3510 (83.2) | 709 (16.8) | 3973 (94.17) | 246 (5.83) | 65 (26.42) | 181 (73.58) | |||
Cancer (any) | ||||||||||
Yes | 195 (4.52) | 141 (72.31) | 54 (27.69) | <0.001 | 175 (89.74) | 20 (10.26) | 0.015 | 7 (35) | 13 (65) | 0.382 |
No | 4122 (95.48) | 3445 (83.58) | 677 (16.42) | 3876 (94.03) | 246 (5.97) | 64 (26.02) | 182 (73.98) |
Of note, the median age of patients on mechanical ventilation admitted to the ICU was slightly lower than that of the remaining hospitalized group, which is reflected by the notably lower ICU admission rates among patients aged >70 years (6.59% for patients aged 71–80 years and 1.19% for those aged >80 years). Requirements for mechanical ventilation were more frequent among patients with hypertension (15.63%), diabetes (19.53%), CVD (13.17%), chronic lung disease (20.41%), or a history of cancer (10.26%) than in unbidity-free cases. The median lung involvement among patients requiring mechanical ventilation was 42.9% (IQR: 25–55.53), which was notably lower among patients who survived intensive care (median: 32.48% (IQR: 19.28–51.57) compared to the median: 45.84% (IQR: 30.52–58.16) for patients who died despite mechanical ventilation support.
Typical laboratory abnormalities with higher neutrophil and lower lymphocyte and platelet counts, and more pronounced increases in inflammatory parameters, including procalcitonin, C-reactive protein, IL-6, lactate dehydrogenase, and d-dimer levels, were found in patients who required mechanical ventilation or died (Supporting Information: Table 1). For categorized variables (age, ANN assigned percentage of lung involvement, history of diabetes, cardiovascular/cerebrovascular disease, procalcitonin, CRP, IL-6, lactate dehydrogenase, d-dimer, eGFR, and troponin).
3.2 Survival and risk of mechanical ventilation models
- a.
In-hospital survival models.
In the final multivariate Cox proportional hazards model (Figure 1), overall in-hospital mortality was associated with ANN-assigned percentage of lung involvement (max. HR: 5.72, 95% CI: 4.4–7.43, p < 0.001 for the patients with >50% of lung tissue affected by COVID-19 pneumonia) followed by age category (max. HR: 5.34, 95% CI: 3.32–8.59, for cases >80 years, p < 0.001), procalcitonin >2 (HR: 2.10, 95% CI: 1.59–2.76 ng/mL p < 0.001, C-reactive protein level category (max. HR: 2.11, 95% CI: 1.25–3.56, for CRP > 100 mg/L, p = 0.004), eGFR (max. HR: 1.82, 95% CI: 1.37–2.42, p < 0.001 for eGFR < 30 mL/min) and troponin I increase above upper limit normal level (HR: 2.14, 95% CI: 1.69–2.72, p < 0.001).
To reduce bias related to informative censoring, separate univariate (Supporting Information: Table 4), initial multivariate models (Supporting Information: Table 5), and final multivariate (Supporting Information: Figures 3 and 4) were created. Hazard risk values for the variables included in the model were higher for analyses of mortality risk in the period of 0–15 days of hospitalisation compared to the overall mortality model (Supporting Information: Figure 3). However, significant associations were also observed among patients with in-hospital treatment prolonged beyond 15 days (Supporting Information: Figure 4), especially for ANN assigned percentage of the lung involvement (max. HR: 3.32, 95% CI: 2.12–5.2, p < 0.001 for the patients with >50% of lung tissue affected by COVID-19 pneumonia), age > 80 years (max. HR: 2.01, 95% CI: 1.04–3.87, p = 0.038 and eGFR < 30 mL/min (HR: 2.21, 95% CI: 1.47–3.32, p < 0.000).
- b.
Mechanical ventilation risk model.

A multivariate cox proportional hazards model was also constructed for the risk of mechanical ventilation and intensive care admission (Figure 2), where ANN-assigned percentage of the lung involvement was the strongest predictor (max. HR: 13.2, 95% CI: 8.65–20.4, p < 0.001 for patients with >50% of lung tissue involvement). Notably, age between 71–80 and >80 years was associated with a decreased likelihood of mechanical ventilation (HR: 0.53, 95% CI: 0.33–0.84, p = 0.007 and HR: 0.12, 95% CI: 0.05–0.26, p < 0.001) which reflects higher ICU disqualification rates for these age groups. Procalcitonin >2 ng/mL (HR: 1.91, 95% CI: 1.14–3.2, p = 0.14 and eGFR (max. HR: 1.82, 95% CI: 1.2–2.74, p = 0.004 for eGFR < 30 mL/min) remained significant predictors of mechanical ventilation risk. Interestingly, the mechanical ventilation hazards risk model included clinical variables, such as diabetes (HR: 2.5, 95% CI: 1.91–3.27, p < 0.001), cardiovascular and cerebrovascular disease (HR: 3.16, 95% CI: 2.38–4.2, p < 0.001), and chronic pulmonary disease (HR: 2.31, 95% CI: 1.44–3.7, p < 0.001) which were not significant in the model for the overall in-hospital morality.

3.3 ANN-based percentage of lung involvement in the context of mortality and mechanical ventilation
Logistic regression models were used to reflect the risk of mortality and mechanical ventilation based on the percentage of lung involvement and baseline clinical variables. The overall probability of in-hospital death increased from 5.47% (95% CI: 4.73–6.32) for individuals without lung inflammatory changes to 81.23% (95% CI: 76.94–84.88), p < 0.001 for cases with 80% of lung involvement (Figure 3A). Whereas, the risk of mechanical ventilation and intensive care unit admission increased from the low probability of 1.25% (95% CI: 0.96–1.62) p < 0.001 for cases without pneumonia reaching 59.98% (95% CI: 52.47–67.04) for 80% of lung tissue involvement (Figure 3B), with a steep increase in death and ICU admission probability by approximately 30% for lung involvement. Lastly, the probability of death at ICU increased from 59.07% (95% CI: 45.37–71.49) to 83.01% (95% CI: 73.13–89.76) (p < 0.05) for 0% and 80% of lung tissue involvement with the survival curve was much less inclined reflecting weaker associations between the ICU mortality and percentage of lung tissue involvement (Figure 3C).

4 DISCUSSION
Based on a large data set of clinical, ANN-derived radiological, and laboratory data, we validated a model identifying key clinical variables associated with COVID-19 in-hospital mortality and the risk of mechanical ventilation. For this purpose, we used a stepwise approach, first analyzing clinical, ANN-based lung involvement, and laboratory data in the context of overall in-hospital mortality, the need for mechanical ventilation and intensive care admission, and death rates among patients undergoing ICU care. Continuous data were fitted into categorical variables and proportional hazard models, allowing examination of the significance of every variable to construct the final model, which was also reanalyzed for the subgroups observed until and beyond 15 days of in-hospital treatment, as almost 70% of the group either died or was discharged within 2 weeks from admission.
AI-based radiological support tools have been rapidly developed as apart of the COVID-19 pandemic response.33 Several such tools have been developed and validated and are being applied in clinical practice,34 achieving high accuracy, sensitivity, and specificity.35 Our tool was implemented across emergency and other clinical departments of a large regional hospital involved in the primary COVID-19 response in the West Pomeranian region of Poland. An ANN-based analytical interface allowed rapid and automatic analysis of the extent and type of lung involvement based on rapid chest CT, which shortened the decision time on the need for in-hospital treatment and supported decision-making on the requirement of ICU-based mechanical ventilation.
COVID-19 risk stratification models based on a reduced number of variables (age, sex, blood oxygenation, well-aerated lung parenchyma, and cardiothoracic vascular calcium) and automatic CT analyses have been previously developed.36 In this study, we expanded the concept linking ANN-based lung involvement parameter and limited number of variables within the adjusted model to reflect overall in-hospital morality and also ICU admission risk with inclusion of reduced number of variables. The risk factors for overall survival included in the final models were percentage of lung involvement, age, procalcitonin, CRP, troponin levels, and eGFR, but not clinical conditions. Furthermore, ICU admission was also associated with a history of diabetes, CVD, or chronic pulmonary disease. Our study combines automatic analysis of lung involvement with clinical parameters for optimal risk assessment, and a similar approach for dichotomous outcomes (death or hospital discharge) was implemented previously.37
Overall, in-hospital mortality was associated with well-established factors previously related to the increased risk of death,2, 22, 38 exceeding 20% for the age of 70 years, percentage of ANN-assessed lung tissue COVID-19 involvement beyond 30%, and was higher among individuals admitted with diabetes, cardio- and cerebrovascular disease, chronic pulmonary disease, or malignancy. In addition, an increase in inflammatory marker activity, such as procalcitonin, beyond the upper limit of normal or C-reactive protein/IL-6 levels, reflecting either bacterial superinfection, sepsis, or cytokine storm, was significantly associated with higher ICU admission risk and increased death rates.22, 39 Most of the observed variables associated with the overall in-hospital mortality risk were fully in line with the factors associated with the requirement for mechanical ventilation and ICU survival, with the notable exception of age >70 years, which was associated with fewer ICU admissions. This association, however, represents the patient preselection for mechanical ventilation based most likely on comorbidities and the general clinical condition of the unfavorable outcome and disqualifications for ICU admission.
Multivariate risk modeling unrestricted for hospitalisation time allowed the identification of ANN-based percentage of lung involvement >50% as the strongest mortality risk factor, even in association with age >80 years (Figure 1). Notably, in this analysis, survival associations with comorbidities proved insignificant and were excluded from the final model, while procalcitonin, CRP, and troponin activities increased, all of which were associated with a >twofold increase in mortality risk. When the analysis was restricted to time, with the patients hospitalized until and beyond 15 days analyzed as separate data sets, ANN-based lung involvement was slightly less predictive than age for the patients observed until 15 days and more predictive for longer hospitalisation times (>15 days). Furthermore, ANN-based lung involvement was by far the strongest predictive factor associated with the requirement for ICU admission and mechanical ventilation, reaching HR of 13.2 (95% CI: 8.65–20.4) for the 50% tissue inflammation values. Our analysis indicates thestrong predictive value of the ANN-based lung tissue inflammation assessment algorithms for both mechanical ventilation and mortality risk.
4.1 Study limitations
The training of the ANN model for predicting lung involvement was based on a manually annotated data set, which limited the use of the model beyond COVID-19 pneumonia. For more detailed differentiation of solid lesions, pneumothorax, or tuberculotic changes, retraining of the algorithm is required. In addition, we would like to recognize statistical limitations: proportional hazards statistics assume that loss to follow-up is noninformative; however, in the analyzed group, loss to follow-up was equal to hospital discharge due to recovery which is, in fact, informative censoring. To avoid this bias, we divided the data set into two groups of patients hospitalized for <15 and >15 days—approximately allowed us to analyze 70% of the cases within the first timeframe–and retained the prognostic value of the key factors. It should be also stated that the COVID-19 mortality is also strongly associated with multimorbidity, coagulation disorders, multiorgan failure as well as septic complications, hence lung involvement is only one of the considered parameters, especially in case of mortality during prolonged in-hospital treatment.40, 41
In conclusion, in this analysis, we demonstrated the utility of ANN-based lung tissue involvement in the prediction of unfavorable outcomes in COVID-19, identifying the automatically assessed percentage of lung involvement as the strongest independent factor associated with mortality. ANN assistants will have to be further developed and validated, which are already becoming a valuable tool aiding clinical decisions and identifying patients at the increased risknot only in COVID-19 but also in other diseases. The main goal of this study was to analyze the risk factors for adverse outcomes during the progression of COVID-19. With the availability of vast amounts of data, we combined the clinical, laboratory, and ANN-evaluated CT data into a reliable model. The variables were selected based on practicality and availability with the inclusion of widely used laboratory analyses. Further applications and development of ANN-based clinical decision support tools may prove highly valuable, especially in times of high strain on medical resources.
AUTHOR CONTRIBUTIONS
Miłosz Parczewski designed the study and data collection, performed clinical interpretations, analyzed the data sets, calculated statistics, written and reviewed the manuscript; Jakub Kufel, Tomasz Puzio, Sebastian Białkowski, Jacek Wydra, Milena Grobelna, Kosma Dunikowski, Marek Podyma Jakub Musiałek designed the ANN, provided model testing and interface development, written and reviewed the manuscript; Bogusz Aksak-Wąs, Daniel Chober, Laura Lesiewska, Milena Rafalska-Kosior, Krystian Awgul, Adam Majchrzak collected clinical data, performed clinical interpretations, analyzed the data sets, written and reviewed the manuscript; Krzysztof Jurczyk collected radiological data, performed initial radiologic interpretations and review oaf the ANN interpretations, written and reviewed the manuscript; Karol Serwin Joanna Piwnik analyzed the data sets, calculated statistics, written and reviewed the manuscript.
ACKNOWLEDGMENTS
We would like to acknowledge all the medical personnel involved in the care of COVID-19 patients. This study was funded by the EUCARE project under the framework of Horizon Europe and the National Center for Research and Development, Agreement No. SZPITALE-JEDNOIMIENNE/27/2020, November 20, 2020 for implementation and financing of a noncompetitive project (PREVENTION AND TREATMENT: COVID-19) titled: “Development of modern laboratory technologies, IT and bioinformatics dedicated to the diagnosis and prevention of SARS CoV-2 infections” implemented as part of the recruitment “Support for homonymous hospitals in combating the spread of SARS-CoV-2 infection and treating COVID-19.”
CONFLICTS OF INTEREST STATEMENT
The author declare no conflict of interest.
ETHICS STATEMENT
The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Bioethical Committee of Pomeranian Medical University, Szczecin, Poland (approval number: KB-0012/92/2020).
Open Research
DATA AVAILABILITY STATEMENT
The original anonymous data set is available from the corresponding data set upon request.