Volume 5, Issue 11 e726

ORIGINAL ARTICLE

Open Access

Point-of-care breath sample analysis by semiconductor-based E-Nose technology discriminates non-infected subjects from SARS-CoV-2 pneumonia patients: a multi-analyst experiment

Tobias Woehrle,

Tobias Woehrle

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Tobias Woehrle and Florian Pfeiffer contributed equally to this study.

Search for more papers by this author

Florian Pfeiffer,

Florian Pfeiffer

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Tobias Woehrle and Florian Pfeiffer contributed equally to this study.

Search for more papers by this author

Maximilian M. Mandl,

Maximilian M. Mandl

Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig Maximilian University, Munich, Germany

Munich Center for Machine Learning, Munich, Germany

Search for more papers by this author

Wolfgang Sobtzick,

Wolfgang Sobtzick

LANZ GmbH, Bergisch Gladbach, Germany

Search for more papers by this author

Jörg Heitzer,

Jörg Heitzer

Airbus Defence and Space GmbH, Claude-Dornier-Straße, Immenstaad, Germany

Search for more papers by this author

Alisa Krstova,

Alisa Krstova

Airbus Defence and Space GmbH, Claude-Dornier-Straße, Immenstaad, Germany

Search for more papers by this author

Luzie Kamm,

Luzie Kamm

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Matthias Feuerecker,

Matthias Feuerecker

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Dominique Moser,

Dominique Moser

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Matthias Klein,

Matthias Klein

Emergency Department, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Benedikt Aulinger,

Benedikt Aulinger

Department of Medicine II, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Michael Dolch,

Michael Dolch

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Department of Anesthesiology, Inn Klinikum, Altötting, Germany

Search for more papers by this author

Anne-Laure Boulesteix,

Anne-Laure Boulesteix

Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig Maximilian University, Munich, Germany

Munich Center for Machine Learning, Munich, Germany

Search for more papers by this author

Daniel Lanz,

Daniel Lanz

LANZ GmbH, Bergisch Gladbach, Germany

Search for more papers by this author

Alexander Choukér,

Corresponding Author

Alexander Choukér

[email protected]

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Correspondence

Alexander Choukér, Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Marchioninistrasse 15, Munich 81337, Germany.

Email: [email protected]

Search for more papers by this author

Tobias Woehrle,

Tobias Woehrle

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Tobias Woehrle and Florian Pfeiffer contributed equally to this study.

Search for more papers by this author

Florian Pfeiffer,

Florian Pfeiffer

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Tobias Woehrle and Florian Pfeiffer contributed equally to this study.

Search for more papers by this author

Maximilian M. Mandl,

Maximilian M. Mandl

Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig Maximilian University, Munich, Germany

Munich Center for Machine Learning, Munich, Germany

Search for more papers by this author

Wolfgang Sobtzick,

Wolfgang Sobtzick

LANZ GmbH, Bergisch Gladbach, Germany

Search for more papers by this author

Jörg Heitzer,

Jörg Heitzer

Airbus Defence and Space GmbH, Claude-Dornier-Straße, Immenstaad, Germany

Search for more papers by this author

Alisa Krstova,

Alisa Krstova

Airbus Defence and Space GmbH, Claude-Dornier-Straße, Immenstaad, Germany

Search for more papers by this author

Luzie Kamm,

Luzie Kamm

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Matthias Feuerecker,

Matthias Feuerecker

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Dominique Moser,

Dominique Moser

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Matthias Klein,

Matthias Klein

Emergency Department, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Benedikt Aulinger,

Benedikt Aulinger

Department of Medicine II, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Search for more papers by this author

Michael Dolch,

Michael Dolch

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Department of Anesthesiology, Inn Klinikum, Altötting, Germany

Search for more papers by this author

Anne-Laure Boulesteix,

Anne-Laure Boulesteix

Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig Maximilian University, Munich, Germany

Munich Center for Machine Learning, Munich, Germany

Search for more papers by this author

Daniel Lanz,

Daniel Lanz

LANZ GmbH, Bergisch Gladbach, Germany

Search for more papers by this author

Alexander Choukér,

Corresponding Author

Alexander Choukér

[email protected]

Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Munich, Germany

Correspondence

Alexander Choukér, Department of Anesthesiology, LMU University Hospital, Ludwig Maximilian University, Marchioninistrasse 15, Munich 81337, Germany.

Email: [email protected]

Search for more papers by this author

First published: 24 October 2024

https://doi.org/10.1002/mco2.726

Citations: 1

Share a link

Email
Wechat
Bluesky

Abstract

Metal oxide sensor-based electronic nose (E-Nose) technology provides an easy to use method for breath analysis by detection of volatile organic compound (VOC)-induced changes of electrical conductivity. Resulting signal patterns are then analyzed by machine learning (ML) algorithms. This study aimed to establish breath analysis by E-Nose technology as a diagnostic tool for severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) pneumonia within a multi-analyst experiment. Breath samples of 126 subjects with (n = 63) or without SARS-CoV-2 pneumonia (n = 63) were collected using the ReCIVA® Breath Sampler, enriched and stored on Tenax sorption tubes, and analyzed using an E-Nose unit with 10 sensors. ML approaches were applied by three independent data analyst teams and included a wide range of classifiers, hyperparameters, training modes, and subsets of training data. Within the multi-analyst experiment, all teams successfully classified individuals as infected or uninfected with an averaged area under the curve (AUC) larger than 90% and misclassification error lower than 19%, and identified the same sensor as most relevant to classification success. This new method using VOC enrichment and E-Nose analysis combined with ML can yield results similar to polymerase chain reaction (PCR) detection and superior to point-of-care (POC) antigen testing. Reducing the sensor set to the most relevant sensor may prove interesting for developing targeted POC testing.

1 INTRODUCTION

Volatile organic and non-organic compounds (VOC) such as alcohols, aldehydes, and ketones in exhaled breath provide an insight into the metabolic processes taking place in the human body. VOC are emitted by all cells, and differ in their specific composition depending on the type of cell and its current metabolism.¹ Thus, different physiological states can affect VOC composition, reflecting food intake, lifestyle, or exercise.^{2, 3} VOC analysis is a growing field with many possible applications, including diagnostic approaches in medicine.⁴ For example, VOC analysis allowed for differentiation between bacteria species in vitro⁵ and in vivo,⁶ and for detection of influenza virus in swine.⁷ Even different influenza virus subtypes could be discriminated in an infected cell line based on the emitted VOC signature.⁸ In addition, VOC have been discussed to potentially identify patients with metabolic diseases such as diabetes,⁹ with malignant diseases such as neck, bladder, colon, or lung cancer,^9-12 and with lung diseases such as chronic obstructive pulmonary disease or asthma.^{13, 14} With regard to the severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) pandemic, primary studies show that SARS-CoV-2 pneumonia patients could be distinguished from non-infected subjects using VOC-based breath analysis.^15-17

Thus far, mass spectrometry is the method of choice for VOC analysis.¹⁶ However, high acquisition and running costs as well as the need for laboratory support show its limitations. In contrast, metal oxide sensor-based electronic nose (E-Nose) technology provides a cheaper, portable, and easy to use method for breath analysis, especially in point-of-care (POC) environments. Unlike mass spectrometry, E-Nose technology does not identify single compounds, but detects VOC-induced changes of electrical conductivity in a sensor set of metal oxide semiconductors. These changes of conductivity generate a signal pattern for each sample, which is then analyzed by machine learning (ML) algorithms. Various setups of electronic noses with different types of sensor sets have been applied for breath research, for example, sensor arrays of conducting polymers, nanomaterial-based or quartz microbalance-based sensors, colorimetric sensors, and metal oxide sensors. Electrical properties of the metal oxide semi-conductors can be optimized for different VOC detection by adjusting sensitivity through altering sensor film thickness or loading of the metal oxide surface with noble metal.¹⁸ E-Nose technology applied in this study represents a technology transfer from space to health. Previous E-Nose experiments have been conducted to successfully determine bacterial surface contamination on the International Space Station,¹⁹ and the technique has now been adapted and optimized for its use in a medical setting.

With SARS-CoV-2 being the predominant cause for pneumonia during the pandemic, this study aimed to establish breath analysis by E-Nose technology as a diagnostic tool in SARS-CoV-2 infection. The particularity of the present study is that it was conducted as a so-called multi-analyst experiment.^20-23 More specifically, after an optimized pre-analytical sampling method was established, the resulting data was further analysed by three independent teams of data analysts who applied different ML approaches, without consulting each other. This still uncommon multi-analyst approach was inspired by Wagenmakers et al., who state that “one statistical analysis must not rule them all”,. and “a single analysis hides an iceberg of uncertainty” while “multi-team analysis can reveal it”.²³ It is established that the high multiplicity of possible analysis strategies resulting from these uncertainties, if combined with selective reporting, can substantially contribute to the so-called replication crisis in science,²⁴ including in the context of ML. The rapidly growing field of artificial intelligence is unfortunately not immune against irreproducibility issues.²⁵ Multi-analyst approaches are known to strengthen the robustness of results and conclusions obtained from analysis of datasets²⁶ and to show that analytical flexibility can have substantial effects on scientific conclusions.²⁰ Thus, results obtained by the three teams, featuring a number of common aspects but also differences, allow us to formulate more reliable results than a single analysis would do. At the same time, it illustrates the huge uncertainty related to analytical choices and its impact on the results—a still largely unexplored issue in the field of artificial intelligence application in medicine.

2 RESULTS

2.1 Patients

From May 2020 until November 2021, 63 patients with PCR-confirmed SARS-CoV-2 infection and 63 subjects without infection were enrolled in this study. Patients and uninfected subjects did not differ with regard to their age, sex, body mass index, or smoking habit (Table 1). Diagnosed comorbidities showed no statistical difference between patients and controls (Table S1).

TABLE 1. Characterization of hospitalized patients with severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) infection and of non-infected controls.

	COV (n = 63)	CTRL (n = 63)	p-Value
Age (years)	53 (48‒65)	56 (46‒63)	0.705^a
Female sex (%)	19 (30.16)	19 (30.16)	1.0^b
BMI (kg/m²)	28.53 (25.21‒31.41)	26.81 (24.38‒30.44)	0.369^a
Smoker (%)	3 (4.76)	8 (12.70)	0.117^b
Days
In hospital	7 (5‒12)
In ICU	9.5 (6.25‒14.5)
Patients in ICU (%)	12 (19)
Day of admission to ICU	3.5 (3‒5)
Respiratory rate	20 (18‒23.5)	14 (10‒16)	<0.001^a
SpO₂	95 (93‒96)
Hemoglobin (g/dL)	13.8 (12.6‒14.6)	11.5‒17.5
Erythrocytes (T/L)	4.69 (4.31‒5.04)	3.96‒5.77
Thrombocytes (G/L)	197 (142‒258)	146‒391
Leukocytes (G/L)	5.32 (4.34‒6.31)	3.90‒10.40
Lymphocytes (G/L)	1.075 (0.76‒1.39)	1.05‒3.56
(%)	20 (14‒25.25)	18‒48
Monocytes (G/L)	0.42 (0.26‒0.63)	0.25‒0.87
(%)	7 (5‒10)	4‒15
Neutrophils (G/L)	3.69 (2.925‒4.71)	1.78‒7.37
(%)	70 (61.5‒78.5)	40‒71
C-reactive protein	5.55 (2.5‒8.15)	<0.5
IL-6	15.47 (9.41‒26.99)	<5.9

Note: Italic font indicates reference range. Values represent median (interquartile range).
Abbreviations: BMI, body mass index; COV, coronavirus disease-2019 patients; CTRL, controls; ICU, intensive care unit; IL, interleukin.
^a For statistical analyses, Mann–Whitney U-test was applied.
^b For statistical analyses, Fisher's exact test was applied.

Upon admission, patients presented with a median peripheral oxygen saturation (SpO₂) of 95% on room air. Blood analyses revealed slightly elevated levels of C-reactive protein, while median leukocyte counts were within reference range. Median hospitalization time on the normal ward was 7 days. Out of 63 patients, 12 patients subsequently required intensive care unit (ICU) treatment, with a median ICU stay of 9.5 days (Table 1). Two fatal outcomes were observed during the evaluated period.

Breath samples were measured as described in Section 4. Raw data of signal patterns was subjected to analyses by three independent parties, designated teams A, B, and C. The total set of seven experiments is numbered consecutively from I to VII.

2.2 Data analyses by three independent parties

The three teams differ in their general approaches from data pre-processing to the interpretation of the results. Team B and C's main goal was to fully optimize the classification error and other performance measures using a variety of sophisticated methods such as gradient boosting (team B) or multi-layer perceptron (team C). Team A aimed to enhance general predictability and conservation of interpretability by applying more intrinsic interpretation methods instead of model-agnostic approaches. The chosen learners (team A: random forest [RF], team B: gradient boosting, team C: extra tree [ET], decision tree [DT], RF) differed considerably in the construction of their underlying prediction function - a phenomenon called Rashomon effect.^{27, 28}

In the following, results of the experiments conducted by the three teams are described in detail.

2.3 Team A

2.3.1 Experiment I: time-independent classifier exploration and variable importance

Experiment I by team A does not explicitly make use of the time structured data, resulting in a typical p > > n (“high—p, low—n”) problem, meaning that the model receives more predictors per subjects than subjects themselves. Figure 1A,B depicts the averaged receiver operating characteristic (ROC) curves for the RF and Glmnet classifier, respectively. Note that the displayed naive confidence intervals might underestimate the true underlying uncertainty due to correlation of the cross-validation iterations, and thus only reflect the order of magnitude of the variability. The RF classifier has an average misclassification error (ME) of 0.13 (AUC: 0.94). The Glmnet classifier performs slightly worse with an average ME of 0.15 (AUC: 0.90). Figure 1C shows the variable importance based on the mean decrease in impurity for the RF classifier. The features with the highest predictive power are mainly features from the upper respiratory tract within the first 10 s of the measurement. This makes sense as the highest variability of the raw measurements can be seen within this period.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Team A, experiment I: ROC curves and feature importance. ROC curves depicting the sensitivity (y-axis) and 1 − specificity (x-axis) for the random forest (RF) (A, AUC: 0.94) and Glmnet classifier (B, AUC: 0.90). The bold line shows the micro-averaged ROC curve based on the resampling results with its pointwise confidence interval. Panel (C) depicts variable importance (mean decrease in impurity) for the top 40 features of the RF classifier and panel (D) shows regression coefficients for the top 40 features of the Glmnet classifier.

Figure 1D depicts the regression coefficients of the Glmnet classifier. The results are not as clear-cut as for classifier RF. Features with the highest impact on the prediction results do not only stem from different sensors but also different time points of measurements. The Glmnet classifier focuses more on later time points of each sensor.

2.3.2 Experiment II: time-dependent feature extraction and variable importance

The second experiment by team A makes use of feature extraction techniques, also termed “dimensionality reduction.” Figure 2A,B depicts the averaged ROC curves for the RF and Glmnet classifier, respectively. The RF classifier has an average misclassification rate of 0.19 (AUC: 0.91). The Glmnet classifier performs better with an average misclassification rate of 0.13 (AUC: 0.92). Figure 2C illustrates the variable importance based on the mean decrease in impurity for the RF classifier. Features with the highest predictive power are features from both upper and lower respiratory tracts. Especially sensor 9 seems to have a high influence on the predictive performance of the model. Regression coefficients of the Glmnet classifier are depicted in Figure 2D, where results again differ from the RF classifier. The Glmnet classifier focuses on both the standard deviations of the upper and lower respiratory tracts for most sensors.

2.4 Team B

2.4.1 Experiments III and IV: feature extraction and hyperparameter tuning on all sensors

These experiments used LightGBM's gradient boosting model and the approach described above. Without hyperparameter optimization, the learner achieved an overall performance of ME = 0.143 (Figure 3A). Combined with Optuna's hyperparamter optimization the learner improved to ME = 0.079 (Figure 3B). F1 results and confusion matrices are provided in Figure S1. Sensor 9 showed the highest importance for classification (Figure 3C,D).

2.4.2 Experiments V and VI: feature extraction and hyperparameter tuning on sensor 9

The importance of sensor 9 led to an additional analysis of this specific sensor alone, where only data generated by sensor 9 was used. LightGBM's gradient boosting model achieved a ME of 0.104 (Figure 4A). Figure 4C depicts time dependent importance of sensor 9 measurements, showing that the first seconds have the highest impact on classification. Thus, subsequent Optuna's hyperparameter optimization was performed on filtered raw data, where only the first 10 s of sensor 9 measurements were used. The train test split was set to 9:1. This resulted in a performance of ME = 0.065, outperforming all previous models (Figure 4B,D). Figure S2 shows F1 results and confusion matrices, respectively.

2.5 Team C

During preliminary testing for approach C, similar results were obtained for three out of four classifiers (ET, DT, RF): for both upper and lower respiratory tract, sensor 9 showed the best classification quality (Figure S3). Variance of classification with regard to hyperparameters was also smallest for this sensor. Classifier RF showed the best results of all classifiers, and sensor 9 measurements from lower respiratory tract samples provided the best results (Figure S3). Thus, it was decided to continue the investigation with classifier RF.

Establishing the ground truth for classifier RF showed not all subjects were identified correctly, although the score of false positives in both cohorts was much lower than during the real leave-one-out test (Table S2 and Figure S3). Sensor 9 again showed superior performance regarding hyperparameter sensitivity (Figure S4). Thus, the following steps were performed with classifier RF and sensor 9 only.

2.5.1 Experiment VII: influence of prevalence and training data on classifier RF and sensor 9

Using a RF classifier with hyperparameter tuning on sensor 9 measurements from lower respiratory tract samples resulted in an overall performance of ME = 0.062. A superior hyperparameter combination could be identified for classifier RF and sensor 9. Table 2 shows the best values for sensitivity and specificity achieved by the classifier.

TABLE 2. Experiment VII: sensitivity P(+|C) and specificity P(‒|nC) for the best hyperparameter combination with classifier random forest (RF) for upper (A) and lower (B) respiratory tract.

Sensitivity upper respiratory tract P(+\|C) \| (A)	Sensitivity lower respiratory tract P(+\|C) \| (B)	Specificity upper respiratory tract P(‒\|nC) \| (A)	Specificity lower respiratory tract P(‒\|nC) \| (B)
0.94915	0.96610	0.89552	0.91045

Note: Values for the best hyperparameter combination for classifier RF and sensor 9.

Precision and negative predictive value depend on the probability of encountering SARS-CoV-2 in a test population (prevalence, P(C)). Sensitivity and specificity of the best classifier were determined for values of P(C) = 50%, as there were 63 subjects for each label. P(C) is necessarily variable in the real world. Figure 5A shows how precision and negative predictive value depend on P(C) in the group to be examined.

Table 3 depicts the true/false positive/negative rates as a function of the amount of training data used. At 25%, only 10 out of 63 non-infected subjects were correctly identified as non-infected, all others, including all SARS-CoV-2 subjects, were classified SARS-CoV-2. This poor classification leads to a constant value of “1” for the negative predictive value because all subjects that were identified as uninfected are uninfected, but only 10 out of 63 were correctly classified as uninfected. For larger amounts of training data, precision and negative predictive values increased with increasing amount of training data (Figure 5B,C).

TABLE 3. Experiment VII: true/false positive/negative rates versus amount of training data.

Amount of training data (%)	P(+\|C)	P(+\|nC)	P(‒\|C)	P(‒\|nC)
100	0.9047619	0.03174603	0.0952381	0.96825397
75	0.7460317	0.03174603	0.25396825	0.96825397
50	0.5714286	0.03174603	0.42857143	0.96825397
25	1	0.84126984	0	0.15873016

Note: True positive P(+|C), false positive P(‒|C), true negative P(‒|nC), and false negative P(+|nC) rates are depicted as a function of the amount of training data used. At 25%, only 10 out of 63 non-infected subjects were correctly identified as non-infected, all others, including all severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) subjects, were classified as SARS-CoV-2.

2.5.2 Experimentation with deep learning models

In the multilayer perceptron model, sensors 2 and 9 performed best although the score was worse than with the previous classifiers and the variation of hyperparameters was larger (Figure S5A,B). Therefore, this process was repeated with slightly adapted hyperparameter ranges and 50 variations for sensors 2 and 9 only (Figure S5C,D). In comparison to the more classical ML approach, the multilayer perceptron could not outperform the RF classifier on sensor 9 measurements from lower respiratory tract samples.

3 DISCUSSION

In this study, a rapid, portable, and cheap breath gas analysis protocol was established. Exhaled VOC were measured by an array of ten metal oxide semiconductor sensors and changes in conductivity were recorded.

Raw data were subjected to three independent ML approaches, including RF and gradient boosting tree learners. The multi-analyst experiment shows that E-Nose technology in combination with the presented novel sampling method allows for successful identification between SARS-CoV-2-positive patients and uninfected volunteers with MEs ranging from 0.06 to 0.19. Sensitivity and specificity reached values of about 90% or higher for all three independent teams, underlining the robustness of the data generated.

The teams’ approaches already showed differences in their data pre-processing strategies. While team A analyzed the data both without taking into account time dependencies between features or by taking them into account through a feature extraction procedure, teams B and C did not use feature extraction but different approaches to reduce the number of features manually before training. It is interesting that even if simple variance and min/max measures are extracted from the original time series (team A) or if raw data are reduced drastically to the first 10 s of the measurement and/or only one sensor (team B), the differences in performances are small. Overall, it can be concluded that despite these major differences between the three analytical approaches and their rationales, all results were in a decent range in terms of ME (0.06–0.19). This indicates that this study was successful in demonstrating predictability of the underlying system.

The most substantial difference between the three teams’ results lies in the interpretation of the contributions of the features/sensors to the prediction models’ performance, which is not as clear-cut. All three teams identify sensor 9 and the first seconds of the time series measurement as most informative. However, other model interpretations are conflicting and conclusions cannot be easily drawn without more domain knowledge on the respective sensors. This can be observed in particular if we compare the results of the model-agnostic interpretation methods to the natively interpretable regression coefficients obtained by team A, that show a completely different picture of the importance of the different features.

Highly accurate testing methods for SARS-CoV-2 have been the focus of several studies. Currently available and widely used antigen tests (Ag-RDT) can reach a sensitivity of up to 81.9% and a specificity of up to 98.9%.^{29, 30} More expensive and less widely available, PCR testing has been shown to be superior to antigen testing with a sensitivity and specificity of up to 92.8% and 97.6%, respectively.³⁰ However, accuracy of PCR results vary. Up to 54% of SARS-CoV-2-infected patients may initially obtain a false-negative RT-PCR result,^31-34 and over 60% of errors occur in the preanalytical phase of any diagnostic process.³² These reports indicate that breath gas analysis by E-Nose technology as demonstrated here may be superior to commercially available antigen tests and comparable to PCR testing.

Many studies focused on the identification of specific VOC possibly serving as biomarkers for SARS-CoV-2 infection.^{15, 35} In contrast, the approach chosen here aimed at allocating patterns of VOC-induced changes in conductivity of a sensor array, generating a breath gas fingerprint of each subject. Since several VOC may bind to one or more sensors, and in the absence of a control cohort suffering from infection caused by viruses other than SARS-CoV-2 or by bacteria, it may be argued that the results are limited in their significance, and that biomarker identification would be the preferred option. However, considering the vast differences in the interindividual metabolism which is dependent on many factors such as body weight, nutrition, and other lifestyle factors such as stress, physical activity, or smoking habits, the quest to identify one specific molecule to reliably distinguish between infected and healthy subjects may be achievable only with extremely high subject numbers, if at all.

Instead, the breath gas analysis conducted here successfully and very rapidly provides information on the infectious state of the individual and could be comparable in its diagnostic relevance to other parameters obtained from blood analyses, for example, C-reactive protein elevation, or clinical parameters such as elevated body temperature. Thus, the non-invasive, inexpensive, and rapid result of E-Nose technology-based breath gas analysis may be used to warrant further, more invasive diagnostic measures to determine the exact underlying pathology.

ML technologies have been critically discussed as being instable, highly dependent of the training data and chosen analysis techniques, and potentially subject to cherry-picking and a resulting over-optimism of the reported results.³⁶ In an effort to counteract these mechanisms and to assess the robustness of results, pseudonymized E-Nose data of this study was made available to three independent teams of data analysts as part of a so-called multi-analyst experiment.

Experiments performed by team A indicated that breath analysis by E-Nose technology as demonstrated here, allows for reliable discrimination between patients with SARS-CoV-2 infection and uninfected subjects. MEs were comparable to those of commercially available rapid antigen tests using oral or nasal swabs.^{32, 37} Analyzing feature importance revealed sensor 9 to be the most important for classification. Applying nested cross-validation techniques ensured an unbiased estimation of the classification performance that would be obtained with independent data, while yielding results that are less impressive at first view.

Experiments of team B yielded an overview of the importance of features and time samples based on decision tree models. Again, sensor 9 had the highest impact on classification, and measurements from the first 10 s were the most informative. Experiments performed by team C again showed that models trained on data from sensor 9 were most effective at discriminating SARS-CoV-2 patients from uninfected subjects. Furthermore, exhaled air from the lower respiratory tract yielded better results than air obtained from the upper respiratory tract. A deep learning model could not outperform the more classical ML algorithms.

Adopting this multi-analyst approach offers the advantage of combining a diverse range of skills, perspectives, and data analysis techniques that surpass what any individual analyst or research team could achieve. Moreover, the setting with several independent teams reduces the pressure on each team and thus also reduces the incentives to generate overly optimistic results through so-called “fishing expeditions”—a term that commonly denotes the approach consisting of performing a large number of analyses until one of them yields a nice result, which is then often the result of advantageous random variations. Finally, a multi-analyst experiment allows both to assess the analytical uncertainty associated with data analyses and, in many cases, including the study presented here, to achieve a consensus regarding the main results, which then can be considered robust.

In summary, the three teams used a variety of learners (decision tree, RF, gradient boosting, deep learning, elastic), data preprocessing, and evaluation procedures, illustrating the flexibility of analytical procedures in ML. They obtained different estimates of performance, which however were all in the range [0.06; 0.19]. Additionally, they all identified sensor 9 as the most informative sensor. This could provide a target for developing commercially attractive diagnostic tools. While previous studies show the potential feasibility to apply E-Nose technology as a diagnostic tool,¹⁷ the present study generated an E-Nose protocol for a quick, inexpensive, and statistically reliable detection of SARS-CoV-2 infection, with results similar to PCR tests and superior to POC antigen testing.^{33, 34, 37, 38}

3.1 Strengths and limitations

This study was realized under challenging clinical conditions during the SARS-CoV-2 pandemic. The patients’ individual lifestyles, eating habits or potential undiagnosed diseases could not be entirely assessed and may be confounding factors. The limited sample size of n = 126 observations is another obvious limitation of this pilot study. Moreover, analytical choices such as the choice of the learner, parameter settings, preprocessing steps or the evaluation procedure can considerably influence the final results, as illustrated by the differences observed between the three teams. Other teams may have obtained different results with yet other approaches. However, despite of their differences, the results of the three teams also shared common components, such as the performance range or the importance of sensor 9. These findings can be considered robust and reliable. Finally, since every E-Nose contains a differing, specific sensor set, different devices may lead to variations in readings for the same breath sample. This might potentially lead to the evaluation of a necessity of training the machine-learning model for each specific E-Nose setup.

4 METHODS

4.1 Study design and study subjects

In order to assess the diagnostic potential of the E-Nose technology, we conducted a non-interventional, non-randomized, open prospective observational study. The aim was to discriminate between patients with viral SARS-CoV-2 infection upon admittance to the hospital and non-infected subjects based on analysis of exhaled breath with an adapted E-Nose technology approach. Patients who presented to the Emergency Department of the LMU Hospital, Munich, Germany, with signs of respiratory infection and who tested positive for SARS-CoV-2 by PCR were asked to participate in the study. Exclusion criteria were malignant diseases, treatment with immunosuppressants, pregnancy, age below 18 years, or inability to consent. As controls, non-infected subjects were recruited. Non-infected subjects were selected to match gender, age, height, weight, and smoking habits of the patient cohort (Table 1), and cohorts did not differ regarding lung disease or other metabolic comorbidities (Table S1).

Ethical approval was obtained from the LMU ethics committee (project no. 19−778), written informed consent was obtained from each subject, and the study was carried out in accordance with the Declaration of Helsinki.³⁹

4.2 Breath and blood sampling

Breath samples were obtained from all patients on day one of their hospital stay and from all non-infected subjects using the non-invasive ReCIVA Breath Biopsy mask (Owlstone Medical) according to the manufacturer's instructions at least 1 h after the last meal or beverage. For sample collection, stainless steel Tenax TA sorption tubes (Markes International) were inserted into each pump's socket. A coated single-use sterile silicon mask (Owlstone Medical) was used for each subject. Medical grade oxygen with a flow rate of 35−40 L/min was administered via the ReCIVA device to reduce resistance and to enable normal breathing. Breath cycle patterns were detected by built-in CO₂ and pressure sensors and analysed in real time by the Breath Sampler Controller software v3.4 (Owlstone Medical) run on a tablet computer (Microsoft Surface Pro 7, Microsoft Corporation). Different pumps with separate Tenax TA tubes for breath collection were activated accordingly during the initial or later phase of expiration to obtain samples from the upper or lower respiratory tract, respectively. For each sample, 500 mL of exhaled breath was collected at a flow rate of 250 mL/min. Tubes were then capped and stored for a maximum of 2 h until analysis.^{40, 41} The equipment was disinfected with ethanol 70% to prevent microbial contamination or contamination with VOC emission from commercially available disinfectants.

4.3 VOC analysis by metal oxide E-Nose technology

Tubes were decapped, inserted into the Enrichment and Desorption Unit (EDU, AIRSENSE Analytics GmbH) and the measurement cycle was started using the WinMuster software v1.6.2.25 (AIRSENSE Analytics GmbH). This pre-analytic step was chosen, as concentrations of some VOC in human breath may be too low to be detected by a regular E-Nose setup. Application of an EDU can achieve enrichment factors of up to 1000, and additionally depletes the sample of water vapors, allowing for VOC measurement at a lower concentration range. Tubes were heated to 220°C to dissolve VOC from the adsorption resin. After 160 s, the air sample was flushed through the transfer line into the Portable Electronic Nose (PEN3.5, AIRSENSE Analytics GmbH), allowing for binding of VOC to an array of 10 metal oxide semiconductors, temporarily altering sensor conductivity. Specifics of these sensors are provided in Table S3. Changes in conductivity were measured for 60 s and depicted color-coded for each sensor. After analysis, sorption tubes were removed from the EDU, conditioned with a thermal desorber (TC-20, Markes International) at 280°C for 20 min according to the manufacturer's instructions, capped, and stored for further use.

Routine blood analyses upon admission were performed for blood cell count and inflammation markers. EDTA-anticoagulated samples were drawn by peripheral venous puncture and analyses were carried out by the Institute of Laboratory Medicine, LMU University Hospital, Munich, Germany.

4.4 Data collection and format

Raw data consisted of one time series for each metal oxide sensor of the E-Nose, that is, 10 sensors resulting in 10 time series for 60 s for each individual. Measurements were available for both upper and lower respiratory tract samples, resulting in a total of 20 time series per individual over 60 s. A binary label indicated if the individual was SARS-CoV-2 negative or positive. Figure 6 shows a graphical depiction of raw E-Nose data obtained from one representative measurement.

4.5 Data analyses

Raw conductivity data were extracted into transfer files, which were then subjected to analysis by three independent parties. In an effort to provide unbiased results, these three teams did not consult each other, thus allowing different impartial choices of analytic approaches. Team A was from the LMU Institute for Medical Information Processing, Biometry and Epidemiology (IBE) with vast experience in medical data processing. Team B was the Artificial Intelligence Company LANZ GmbH, and team C consisted of space physics experts at Airbus Defence and Space GmbH familiar with E-Nose raw data processing (Figure 7). Experiments were numbered consecutively from I to VII. Details for the analytical approaches of the three teams are provided as Supporting Information. Results were reported with varying figures of merit, which were transformed to MEs for better comparability. The lead team refrained from conducting an analysis to ensure non-biased reporting of independent results. All of the results are reported and none were excluded.

AUTHOR CONTRIBUTIONS

Conceptualisation, data curation, investigation, methodology, supervision, validation, visualisation, writing—original draft, and writing—review and editing: T.W. and A.C. Data curation, formal analysis, investigation, methodology, visualisation, writing—original draft, and writing—review and editing: F.P. Formal analysis, investigation, methodology, supervision, validation, visualisation, writing—original draft, and writing—review and editing: M.M.M. Investigation, methodology, supervision, writing—original draft, and writing—review and editing: A.L.B. Formal analysis, investigation, methodology, validation, visualisation, writing—original draft, and writing—review and editing: W.S., J.H., A.K., and D.L. Conceptualisation, methodology, writing—review and editing: M.F., M.D., M.K., and B.A. Investigation, methodology, and writing—review and editing: L.K. and D.M. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS

The authors thank all participants and health care workers who helped realizing this clinical study during the challenging pandemic period. We highly acknowledge the input and support from several parties, which shared their experiences with the E-Nose technology, its technical details, or provided strategic advice: C. Siggelkow and K. Briese (AIRSENSE Analytics GmbH), F. Zeitler and J. Lenic (German Aerospace Center), P. Roth, V. Fetter, and O. Schoele-Scholz (Airbus Defence and Space). We thank Savanna Ratky for language editing. Assistance with the implementation of the E-Nose study by M. Hörl, K. Biere, and B. Han (Laboratory of Translational Research, LMU Hospital) is also much acknowledged. This study was supported by the German Aerospace Center (DLR) and then Federal Ministry of Economic Affairs and Technology (#50RP1920 to AC). A.L.B. and M.M.M. were partially supported by individual grants from the German Research Foundation (DFG) (BO3139/7 and BO3139/9).

CONFLICT OF INTEREST STATEMENT

A.K. and J.H. are employees of Airbus Defense & Space, and authors W.S. and D.L. are part of Lanz GmbH, and have no potential relevant financial or non-financial interests to disclose. The other authors have no conflicts of interest to declare.

ETHICS STATEMENT

Ethical approval was obtained from the LMU ethics committee (project no. 19-778), written informed consent was obtained from each subject, and the study was carried out in accordance with the Declaration of Helsinki.

Open Research

DATA AVAILABILITY STATEMENT

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Supporting Information

REFERENCES

1Dolch ME, Chouker A, Hornuss C, et al. Quantification of propionaldehyde in breath of patients after lung transplantation. Free Radic Biol Med. 2015; 85: 157-164.
10.1016/j.freeradbiomed.2015.04.003
CAS PubMed Google Scholar
2Aleksic M, Simeon A, Vujic D, Giannoukos S, Brkic B. Food and lifestyle impact on breath VOCs using portable mass spectrometer-pilot study across European countries. J Breath Res. 2023; 17.
Google Scholar
3Chou H, Arthur K, Shaw E, et al. Metabolic insights at the finish line: deciphering physiological changes in ultramarathon runners through breath VOC analysis. J Breath Res. 2024; 18.
10.1088/1752-7163/ad23f5
Google Scholar
4Amann A, de Lacy Costello B, Miekisch W, et al. The human volatilome: volatile organic compounds (VOCs) in exhaled breath, skin emanations, urine, feces and saliva. J Breath Res. 2014; 8:034001.
10.1088/1752-7155/8/3/034001
CAS PubMed Web of Science® Google Scholar
5Dolch ME, Hornuss C, Klocke C, et al. Volatile compound profiling for the identification of Gram-negative bacteria by ion-molecule reaction-mass spectrometry. J Appl Microbiol. 2012; 113: 1097-1105.
10.1111/j.1365-2672.2012.05414.x
CAS PubMed Web of Science® Google Scholar
6Filipiak W, Beer R, Sponring A, et al. Breath analysis for in vivo detection of pathogens related to ventilator-associated pneumonia in intensive care patients: a prospective pilot study. J Breath Res. 2015; 9:016004.
10.1088/1752-7155/9/1/016004
PubMed Web of Science® Google Scholar
7Traxler S, Bischoff AC, Sass R, et al. VOC breath profile in spontaneously breathing awake swine during Influenza A infection. Sci Rep. 2018; 8:14857.
10.1038/s41598-018-33061-2
PubMed Web of Science® Google Scholar
8Aksenov AA, Sandrock CE, Zhao W, et al. Cellular scent of influenza virus infection. Chembiochem. 2014; 15: 1040-1048.
10.1002/cbic.201300695
CAS PubMed Web of Science® Google Scholar
9Behera B, Joshi R, Anil Vishnu GK, Bhalerao S, Pandya HJ. Electronic nose: a non-invasive technology for breath analysis of diabetes and lung cancer patients. J Breath Res. 2019; 13:024001.
10.1088/1752-7163/aafc77
CAS PubMed Web of Science® Google Scholar
10Keogh RJ, Riches JC. The use of breath analysis in the management of lung cancer: is it ready for primetime? Curr Oncol. 2022; 29: 7355-7378.
10.3390/curroncol29100578
PubMed Web of Science® Google Scholar
11Taverna G, Grizzi F, Tidu L, et al. Accuracy of a new electronic nose for prostate cancer diagnosis in urine samples. Int J Urol. 2022: 890-896.
10.1111/iju.14912
PubMed Google Scholar
12van de Goor RM, Leunis N, van Hooren MR, et al. Feasibility of electronic nose technology for discriminating between head and neck, bladder, and colon carcinomas. Eur Arch Otorhinolaryngol. 2017; 274: 1053-1060.
10.1007/s00405-016-4320-y
PubMed Web of Science® Google Scholar
13Dragonieri S, Pennazza G, Carratu P, Resta O. Electronic nose technology in respiratory diseases. Lung. 2017; 195: 157-165.
10.1007/s00408-017-9987-3
CAS PubMed Web of Science® Google Scholar
14Licht JC, Grasemann H. Potential of the electronic nose for the detection of respiratory diseases with and without infection. Int J Mol Sci. 2020; 21:9416.
Google Scholar
15Ruszkiewicz DM, Sanders D, O'Brien R, et al. Diagnosis of COVID-19 by analysis of breath with gas chromatography-ion mobility spectrometry—a feasibility study. EClinicalMedicine. 2020; 29:100609.
10.1016/j.eclinm.2020.100609
PubMed Google Scholar
16Shlomo IB, Frankenthal H, Laor A, Greenhut AK. Detection of SARS-CoV-2 infection by exhaled breath spectral analysis: introducing a ready-to-use point-of-care mass screening method. EClinicalMedicine. 2022; 45:101308.
10.1016/j.eclinm.2022.101308
PubMed Google Scholar
17Snitz K, Andelman-Gur M, Pinchover L, et al. Proof of concept for real-time detection of SARS CoV-2 infection with an electronic nose. PLoS One. 2021; 16:e0252121.
10.1371/journal.pone.0252121
CAS PubMed Google Scholar
18Bikov A, Lazar Z, Horvath I. Established methodological issues in electronic nose research: how far are we from using these instruments in clinical settings of breath analysis? J Breath Res. 2015; 9:034001.
10.1088/1752-7155/9/3/034001
PubMed Web of Science® Google Scholar
19Reidt U, Helwig A, Müller G, et al. Detection of microorganisms onboard the international space station using an electronic nose. Gravitational Space Res. 2017; 5: 89-111.
10.2478/gsr-2017-0013
Google Scholar
20Botvinik-Nezer R, Holzmeister F, Camerer CF, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020; 582: 84-88.
10.1038/s41586-020-2314-9
CAS PubMed Web of Science® Google Scholar
21Silberzahn R, Uhlmann EL. Crowdsourced research: many hands make tight work. Nature. 2015; 526: 189-191.
10.1038/526189a
CAS PubMed Web of Science® Google Scholar
22Silberzahn R, Uhlmann EL, Martin DP, et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv Methods Practices Psychol Sci. 2018; 1: 337-356.
10.1177/2515245917747646
Google Scholar
23Wagenmakers EJ, Sarafoglou A, Aczel B. One statistical analysis must not rule them all. Nature. 2022; 605: 423-425.
10.1038/d41586-022-01332-8
CAS PubMed Web of Science® Google Scholar
24Hoffmann S, Schonbrodt F, Elsas R, Wilson R, Strasser U, Boulesteix AL. The multiplicity of analysis strategies jeopardizes replicability: lessons learned across disciplines. R Soc Open Sci. 2021; 8:201925.
10.1098/rsos.201925
PubMed Web of Science® Google Scholar
25Hutson M. Artificial intelligence faces reproducibility crisis. Science. 2018; 359: 725-726.
10.1126/science.359.6377.725
PubMed Web of Science® Google Scholar
26Aczel B, Szaszi B, Nilsonne G, et al. Consensus-based guidance for conducting and reporting multi-analyst studies. Elife. 2021; 10.
10.7554/eLife.72185
PubMed Google Scholar
27Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statist Sci. 2001; 16(3): 199-231.
10.1214/ss/1009213726
Web of Science® Google Scholar
28Dong J, Rudin C. Exploring the cloud of variable importance for the set of all good models. Nat Mach Intell. 2020; 2: 810-824.
10.1038/s42256-020-00264-0
Google Scholar
29Brummer LE, Katzenschlager S, McGrath S, et al. Accuracy of rapid point-of-care antigen-based diagnostics for SARS-CoV-2: an updated systematic review and meta-analysis with meta-regression analyzing influencing factors. PLoS Med. 2022; 19:e1004011.
10.1371/journal.pmed.1004011
CAS PubMed Web of Science® Google Scholar
30Fragkou PC, Moschopoulos CD, Dimopoulou D, et al. Performance of point-of care molecular and antigen-based tests for SARS-CoV-2: a living systematic review and meta-analysis. Clin Microbiol Infect. 2022; 29(3): 291-301.
10.1016/j.cmi.2022.10.028
PubMed Google Scholar
31Arevalo-Rodriguez I, Buitrago-Garcia D, Simancas-Racines D, et al. False-negative results of initial RT-PCR assays for COVID-19: a systematic review. PLoS One. 2020; 15:e0242958.
10.1371/journal.pone.0242958
CAS PubMed Web of Science® Google Scholar
32Payne D, Newton D, Evans P, Osman H, Baretto R. Preanalytical issues affecting the diagnosis of COVID-19. J Clin Pathol. 2021; 74: 207-208.
10.1136/jclinpath-2020-206751
CAS PubMed Web of Science® Google Scholar
33Wang W, Xu Y, Gao R, et al. Detection of SARS-CoV-2 in different types of clinical specimens. JAMA. 2020; 323: 1843-1844.
CAS PubMed Web of Science® Google Scholar
34Woloshin S, Patel N, Kesselheim AS. False negative tests for SARS-CoV-2 infection—challenges and implications. N Engl J Med. 2020; 383:e38.
10.1056/NEJMp2015897
CAS PubMed Web of Science® Google Scholar
35Myers R, Ruszkiewicz DM, Meister A, et al. Breath testing for SARS-CoV-2 infection. EBioMedicine. 2023; 92:104584.
10.1016/j.ebiom.2023.104584
CAS PubMed Google Scholar
36Ntoutsi E, Fafalios P, Gadiraju U, et al. Bias in data-driven artificial intelligence systems—an introductory survey. WIREs Data Mining Knowledge Discov. 2020; 10:e1356.
10.1002/widm.1356
Web of Science® Google Scholar
37Soni A, Herbert C, Lin H, et al. Performance of rapid antigen tests to detect symptomatic and asymptomatic SARS-CoV-2 infection : a prospective cohort study. Ann Intern Med. 2023; 176: 975-982.
10.7326/M23-0385
PubMed Google Scholar
38He JL, Luo L, Luo ZD, Lyu JX, Ng MY, Shen XP, et al. Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China. Respir Med. 2020; 168:105980.
10.1016/j.rmed.2020.105980
PubMed Web of Science® Google Scholar
39 World Medical Association. World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. JAMA. 2013; 310: 2191-2194.
10.1001/jama.2013.281053
CAS PubMed Web of Science® Google Scholar
40Harshman SW, Mani N, Geier BA, et al. Storage stability of exhaled breath on Tenax TA. J Breath Res. 2016; 10:046008.
10.1088/1752-7155/10/4/046008
PubMed Web of Science® Google Scholar
41Lomonaco T, Salvo P, Ghimenti S, et al. Stability of volatile organic compounds in sorbent tubes following SARS-CoV-2 inactivation procedures. J Breath Res. 2021; 15.
Google Scholar

Citing Literature

Volume5, Issue11

November 2024

e726

Point-of-care breath sample analysis by semiconductor-based E-Nose technology discriminates non-infected subjects from SARS-CoV-2 pneumonia patients: a multi-analyst experiment

Abstract

1 INTRODUCTION