Prospective comparison of diagnostic tests for bile acid diarrhoea
The Handling Editor for this article was Professor Alexander Ford, and it was accepted for publication after full peer-review.
Summary
Background
Bile acid diarrhoea is often missed because gold standard nuclear medicine tauroselcholic [75-Se] acid (SeHCAT) testing has limited availability. Empirical treatment effect has unknown diagnostic performance, whereas plasma 7α-hydroxy-4-cholesten-3-one (C4) is inexpensive but lacks sensitivity.
Aims
To determine diagnostic characteristics of empirical treatment and explore improvements in diagnostics with potential better availability than SeHCAT.
Methods
This diagnostic accuracy study was part of a randomised, placebo-controlled trial of colesevelam. Consecutive patients with chronic diarrhoea attending SeHCAT had blood and stool sampled. Key thresholds were C4 > 46 ng/mL and SeHCAT retention ≤10%. A questionnaire recorded patient-reported empirical treatment effect. We analysed receiver operating characteristics and explored machine learning applied logistic regression and decision tree modelling with internal validation.
Results
Ninety-six (38%) of 251 patients had SeHCAT retention ≤10%. The effect of empirical treatment assessed with test results for bile acid studies blinded had 63% (95% confidence interval 44%–79%) sensitivity and 65% (47%–80%) specificity; C4 > 46 ng/mL had 47% (37%–57%) and 92% (87%–96%), respectively. A decision tree combining C4 ≥ 31 ng/mL with ≥1.1 daily watery stools (Bristol type 6 and 7) had 70% (51%–85%) sensitivity and 95% (83%–99%) specificity. The logistic regression model, including C4, the sum of measured stool bile acids and daily watery stools, had 77% (58%–90%) sensitivity and 93% (80%–98%) specificity.
Conclusions
Diagnosis of bile acid diarrhoea using empirical treatment was inadequate. Exploration suggested considerable improvements in the sensitivity of C4-based testing, offering potential widely available diagnostics. Further validation is warranted. ClinicalTrials.gov: NCT03876717.
1 INTRODUCTION
Bile acid diarrhoea is a common cause of chronic watery diarrhoea, affecting 1% of the general population.1, 2 Increased amounts of bile escaping small bowel reabsorption cause colonic watery secretion and peristalsis, leading to watery diarrhoea with urgency.2-5 Bile acid diarrhoea may be secondary to cholecystectomy, small bowel resection, or inflammation.6 However, in primary bile acid diarrhoea, the causes are genetic and physiological and consequently not detected by colonoscopy with biopsies or radiological imaging.2, 7, 8 The diagnostic yield of testing for bile acid diarrhoea in patients with diarrhoea-type irritable bowel syndrome is 32%, and therefore, guidelines recommend testing in this population.9-11 Unfortunately, the two gold standard tests are cumbersome and not widely available: the tauroselcholic [75Se] acid (SeHCAT) one-week retention testing or the measurement of 48-hour total stool bile acid excretion on a diet with 100 g fat per day.12 Therefore, bile acid diarrhoea is often overlooked or mistaken for irritable bowel syndrome, which is detrimental to patient quality of life and increases healthcare costs.13, 14 In the lack of better options, diagnostic assessment of empirical treatment effect is common practice, although with unknown diagnostic performance.15, 16 Furthermore, with several treatment options available for bile acid diarrhoea, proper diagnosis is fundamental.15, 17-19 Plasma 7α-hydroxy-4-cholesten-3-one (C4) is a surrogate marker of hepatic bile acid synthesis and a cheaper diagnostic alternative that potentially could be widely available; however, compared with SeHCAT testing, C4 has a low sensitivity of about 50%.20-22 Also, the diagnostic value of fibroblast growth factor 19 (FGF19)20, 23, 24 and bile acids in spot stool samples have been assessed for diagnosis.8, 25, 26 However, due to all these test options, meta-analyses on the prevalence and diagnosis of bile acid diarrhoea were limited by significant heterogeneity.9, 27 Therefore, we aimed to determine the diagnostic characteristics of empirical colesevelam treatment and explore improvements in diagnostic tests for bile acid diarrhoea with potential wide availability with reference to gold standard SeHCAT testing.
2 METHODS
2.1 Oversight
This diagnostic accuracy study was part of an investigator-initiated double-blinded placebo-controlled trial of colesevelam in bile acid diarrhoea (SINBAD). The treatment effects are reported separately.19 The study was approved by the Danish Region Zealand ethics committee (SJ-641), the Danish Data Protection Agency, the Danish Medicines Agency (EudraCT 2016-001452-22) and registered on ClinicalTrials.gov (NCT03876717), where the study protocol is available. The study was conducted following the Helsinki Declaration. All patients gave written informed consent before participation. All authors had access to the study data, reviewed and approved the final manuscript.
2.2 Patients
Consecutive patients referred for SeHCAT testing due to suspected bile acid diarrhoea and aged 18–79 years were eligible. A history of cholecystectomy or prior small or large intestine surgery was not reason for exclusion. Patients with inflammatory bowel disease and microscopic colitis were excluded; the complete list of participation criteria has been published.19
2.3 Study procedures
Charts of patients referred for SeHCAT testing were prescreened for eligibility before sending out invitations. Recruitment was in conjunction with the first of two visits required for SeHCAT testing. After informed consent, a blood sample was immediately taken. We logged the time of sampling, fasting state, statin or fibrate medication, and any consumption of alcohol in the previous 24 hours, because these are putative factors affecting FGF19 and C4 levels.28-32 Any statin or fibrate prescription was paused for seven days, i.e. until the second visit for SeHCAT testing. The questionnaires Short Health Scale, Short Form 36 version 2, the gastrointestinal symptom rating scale and a Rome version III questionnaire on functional gastrointestinal disorders were answered.33, 34 During the six days between the two hospital visits for SeHCAT testing, the patients recorded stool habits using a structured diary with Bristol stool scale pictograms. The patients could collect a voluntary random stool sample at home to be immediately frozen below −18 degrees Celsius. These stool samples were brought to the hospital for the second SeHCAT visit to be kept at −80 degrees Celsius. At the second SeHCAT visit, a fasting morning blood sample was collected no later than 10:00 AM, with no alcohol consumption allowed 24 hours prior. The study diary was tallied, and patients fulfilling the criteria for diarrhoea: ≥ 3.0 bowel movements or ≥1.0 watery bowel movements (Bristol stool scale type 6 and 7) per day as an average over the six-day diary were eligible for randomisation to 12 days placebo-controlled treatment with colesevelam, as detailed and reported.19 The patients answered a questionnaire at the end of treatment, while both treatment allocation and diagnostic results were blinded: Was your diarrhoea cured – yes or no? Six months after study completion, we reviewed patient files for final diagnoses.
2.4 Diagnostic procedures
SeHCAT testing methodology has been reviewed.35 We defined one-week SeHCAT retention ≤10% as diagnostic for bile acid diarrhoea. High-performance liquid chromatography–tandem mass spectrometry (HPLC-MS/MS) was used to analyse plasma C417 and bile acid profiles in plasma and stool.36, 37 Stool samples were lyophilised to determine the levels of bile acid species per gram of freeze-dried faecal matter.37 The key C4 threshold for bile acid diarrhoea was >46 ng/mL.20, 22 The SeHCAT test result was disclosed to the patients by their physician after the 12 days of treatment but was kept blinded for the study investigators. C4 plasma samples were analysed en bloc after the study ended. All bile acids were summed for the total measured bile acids in absolute values. Cholic acid and chenodeoxycholic acid, whether unconjugated, sulphated or conjugated to glycine or taurine, were summed as primary bile acids and given as absolute values and percentages of the total measured bile acids. Likewise, amounts of conjugated and unconjugated ursodeoxycholic acid, deoxycholic acid, and lithocholic acid were summed to yield the secondary bile acids in absolute amounts and percentage of the total measured bile acids. Results are also given as sums according to sulphation or conjugation. We assessed the threshold >10% of primary bile acids in spot stool samples as used by others.8, 25 No threshold was prespecified for the total amount of measured bile acids in spot stool samples. FGF19 was analysed by enzyme-linked immunosorbent assay.31 We assessed the FGF19 thresholds <60 pg/mL and >204 pg/mL for diagnosis and screening, respectively.20
2.5 Statistical methods
Statistical analyses were done in R version 4.1.3, 2022. Missing data were handled by complete case analysis. The normality of the data distribution was assessed using quantile–quantile plots. Baseline descriptive statistics were grouped by SeHCAT- and C4-defined bile acid diarrhoea (R package gtsummary38). We did receiver operator characteristics (ROC) analysis and diagnostic 2 × 2 cross-tabulation. We used multivariate linear regression to assess the adjusted effect of sample time, fasting, alcohol, and statin or fibrate medication on the delta values of FGF19 and C4 collected at the second and first SeHCAT visits. Logistic regression was used to explore predictors of a positive SeHCAT test.
Combined machine learning diagnostic models were trained in a 60% randomly sampled data partition controlling the rate of SeHCAT-positive tests, while the diagnostic performance was validated in the remaining 40% (R caret packet39). This machine learning applied logistic regression using fivefold cross-validation with 10 repetitions. Prespecified model covariates were C4, FGF19, the mean number of watery stools, and the percentage of primary bile acids. Stool total measured bile acids were highly significant in univariate testing and added post hoc, while FGF19 was removed. Using C4, the mean number of watery stools and stool total measured bile acids (all three covariates log-transformed) gave the best predictive model. We assessed the assumptions and fit of the models above by residual diagnostics. We used simulated scaled residuals (R packet DHARMa40) and Hosmer–Lemeshow's goodness of fit for the logistic regression models. Collinearity was assessed by the variance inflation factor. Additionally, a decision tree was modelled by machine learning using repeated cross-validation as specified above in the same training and testing datasets (rpart function in the caret packet39).
We considered p values <0.05 significant in general, and positive findings should be considered hypothesis generating with an inherent risk of statistical type 1 error. We adjusted the p values in the multiple comparisons done regarding the stool and plasma bile acid profiles by Holm's method of controlling the family-wise error rate.
3 RESULTS
Recruitment started on October 25, 2018, the last clinical visit was on July 1, 2021. The last patient chart follow-up was on February 13, 2022. We prescreened 1124, invited 621 and enrolled 255 patients with chronic diarrhoea. Four patients did not complete SeHCAT testing and were withdrawn. SeHCAT retention ≤10%, diagnostic of bile acid diarrhoea, was found in 96 (38%) of 251 patients. C4 data were available in 233 patients; 54 (23%) had C4 > 46 ng/mL consistent with bile acid diarrhoea (Figure 1). We classified patients with a normal test result as SeHCAT- or C4-defined miscellaneous diarrhoea.

3.1 Patient characteristics
Patients with bile acid diarrhoea defined using C4 or SeHCAT testing had more frequent watery bowel movements than patients with miscellaneous diarrhoea (Table 1 and Table S1), and they reported a higher degree of diarrhoea symptoms on the gastrointestinal symptom rating scale (Table S2). The patients with bile acid diarrhoea tended to be older, have a higher body mass index and have higher levels of plasma triglycerides, glucose and liver enzymes. Furthermore, a history of cholecystectomy was more common in patients with bile acid diarrhoea, while for SeHCAT-defined diagnosis, a history of surgery of the small intestines was more common in patients with bile acid diarrhoea by a factor 2 and regarding resections of the large intestine by a factor 3; however, there were few cases and these differences did not reach statistical significance (Table 1). More details on the comorbidities and classification of patients with SeHCAT-defined bile acid diarrhoea are given in Table S3, and the final diagnoses of patients with a normal SeHCAT result are given in Table S4. The profiles of bile acid species in spot stool samples differed fundamentally with a factor 2–3 higher absolute levels of bile acids found in patients with bile acid diarrhoea, and the SeHCAT-defined groups showed higher percentages of primary bile acids and lower percentages of secondary bile acids in patients with SeHCAT ≤10% than patients with retention >10%; findings that were robust after controlling the family-wise error rate (Table S5). In contrast, the differences in plasma bile acid profiles were less marked and became non-significant after adjusting the p values (Table S6).
C4 | SeHCAT retention | |||||
---|---|---|---|---|---|---|
> 46 ng/mL | ≤ 46 ng/mL | p | ≤ 10% | > 10% | p | |
Bile acid diarrhoea N = 54 | Misc. diarrhoea N = 179 | Bile acid diarrhoea N = 96 | Misc. diarrhoea N = 155 | |||
Patient history and demographic characteristics | ||||||
Age (years) | 55 (39–64) | 47 (32–58) | 0.02 | 51 (37–62) | 46 (31–58) | 0.036 |
Body mass index (kg/m2) | 31 (26–35) | 27 (24–31) | <0.001 | 29 (26–34) | 26 (24–31) | 0.002 |
Female sex, n (%) | 35 (65) | 123 (69) | 0.59 | 59 (61) | 109 (70) | 0.15 |
Cholecystectomy, n (%) | 18 (33) | 25 (14) | 0.001 | 23 (24) | 20 (13) | 0.02 |
Small bowel resection, n (%) | 2 (3.7) | 2 (1.1) | 0.23 | 2 (2.1) | 2 (1.3) | 0.64 |
Right hemicolectomy, n (%) | 3 (5.6) | 5 (2.8) | 0.39 | 6 (6.2) | 3 (1.9) | 0.09 |
Rome 3 IBS-D, n (%) | 23 (56) | 72 (54) | 0.79 | 48 (65) | 52 (48) | 0.022 |
Rome 3 IBS-M, n (%) | 5 (12) | 17 (13) | 0.93 | 5 (6.8) | 18 (17) | 0.051 |
Rome 3 functional diarrhoea, n (%) | 7 (17) | 32 (24) | 0.37 | 14 (19) | 25 (23) | 0.54 |
Bowel habits | ||||||
Total stools per day | 3.3 (2.3–4.8) | 2.8 (2.0–4.2) | 0.14 | 3.7 (2.5–5.2) | 2.7 (1.8–3.9) | <0.001 |
Bristol type 6 and 7 stools per day | 2.2 (1.2–4.0) | 1.5 (0.7–2.7) | 0.01 | 2.7 (1.5–4.3) | 1.2 (0.5–2.4) | <0.001 |
Bile acid diarrhoea biomarkers | ||||||
SeHCAT one-week retention (%) | 4 (2–10) | 18 (10–29) | <0.001 | 4 (2–7) | 23 (16–31) | <0.001 |
SeHCAT ≤10%, n (%) | 41 (76) | 47 (26) | <0.001 | – | – | |
C4 (ng/mL) | 65 (52–91) | 17 (10–27) | <0.001 | 43 (27–68) | 16 (9–26) | <0.001 |
C4 > 46 ng/mL, n (%) | – | – | 41 (47) | 13 (9.0) | <0.001 | |
FGF19 (pg/mL) | 47 (32–71) | 95 (63–154) | <0.001 | 74 (46–133) | 87 (58–149) | 0.11 |
Spot stool sample total measured bile acidsa | 16.4 (12.6–33.4) | 7.0 (4.3–13.6) | <0.001 | 17.2 (12.3–27.2) | 6.1 (3.6–9.2) | <0.001 |
Spot stool sample percentage primary bile acidsa | 15.9 (4.9–52.1) | 10.6 (3.9–32.3) | 0.12 | 29.2 (8.4–61.8) | 7.4 (3.4–18.4) | <0.001 |
Biochemistry | ||||||
Triglycerides (mmol/L) | 1.7 (1.2–2.2) | 1.1 (0.8–1.6) | <0.001 | 1.7 (1.1–2.3) | 1.1 (0.8–1.4) | <0.001 |
Total cholesterol (mmol/L) | 4.8 (4.2–5.7) | 4.7 (4.1–5.5) | 0.4 | 4.8 (4.2–5.7) | 4.8 (4.0–5.5) | 0.41 |
HDL cholesterol (mmol/L) | 1.2 (1.1–1.7) | 1.4 (1.2–1.7) | 0.09 | 1.2 (1.1–1.6) | 1.5 (1.2–1.7) | 0.003 |
LDL cholesterol (mmol/L) | 2.7 (2.0–3.4) | 2.7 (2.2–3.3) | 0.95 | 2.8 (2.0–3.3) | 2.7 (2.3–3.3) | 0.90 |
Glucose; fasting (mmol/L) | 5.8 (5.3–6.2) | 5.3 (4.9–5.7) | 0.001 | 5.7 (5.2–6.2) | 5.3 (4.9–5.7) | <0.001 |
Bilirubin (micromol/L) | 8 (6–10) | 9 (6–12) | 0.16 | 8 (6–10) | 9 (6–12) | 0.086 |
Alanine aminotranferase (U/L) | 30 (21–49) | 24 (17–32) | <0.001 | 31 (23–47) | 23 (17–30) | <0.001 |
Alkaline phosfatase (U/L) | 83 (67–97) | 70 (57–87) | 0.002 | 78 (61–97) | 70 (57–84) | 0.003 |
- Note: Patient characteristics by diagnosis of bile acid diarrhoea using C4 > 46 ng/mL or SeHCAT one-week retention ≤10%.
- Abbreviations: FGF19, fibroblast growth factor 19; IBS, irritable bowel syndrome; IBS-D, diarrhoea-predominant irritable bowel syndrome; IBS-M, mixed-type irritable bowel syndrome; SF36v2, short form 36 version 2; SHS, short health scale.
- a n = 44 and 134 (C4 columns); 75 and 104 (SeHCAT columns). Continuous data as medians (interquartile range); count data as total (percentage).
3.2 Predictors of bile acid diarrhoea in the patient history
Cholecystectomy was a weak predictor of C4- and SeHCAT-defined bile acid diarrhoea with adjusted odds ratios of 2.6 (95% confidence interval (CI) 1.1–5.9; p = 0.02) and 2.4 (1.1–5.6; p = 0.04), respectively. An increased mean number of watery stools predicted SeHCAT-defined bile acid diarrhoea with an adjusted odds ratio of 3.3 (1.5–7.5; p = 0.003) (Table 2).
Predictor | C4 | SeHCAT | ||||||
---|---|---|---|---|---|---|---|---|
>46 ng/mL | ≤46 ng/mL | Adjusted odds ratio | p | ≤10% | >10% | Adjusted odds ratio | P | |
Bile acid diarrhoea N = 54 | Misc. diarrhoea N = 179 | Bile acid diarrhoea N = 96 | Misc. diarrhoea N = 155 | |||||
Body mass index >30 kg/cm2 | 30 (57) | 51 (29) | 2.6 (1.3–5.4) | 0.01 | 42 (45) | 43 (28) | 1.8 (0.9–3.4) | 0.09 |
Age >60 years | 17 (31) | 34 (19) | 2.3 (1.0–5.1) | 0.05 | 27 (28) | 28 (18) | 1.6 (0.8–3.4) | 0.21 |
Watery stools per day >1.0 | 41 (77) | 108 (61) | 1.5 (0.6–3.8) | 0.37 | 76 (84) | 76 (53) | 3.3 (1.5–7.5) | 0.003 |
Total stools per day >3.0 | 31 (58) | 82 (46) | 1.7 (0.8–3.8) | 0.20 | 58 (64) | 58 (40) | 1.6 (0.8–3.2) | 0.19 |
Severely affected by urgent bowel movements | 25 (48) | 64 (37) | 1.2 (0.6–2.4) | 0.62 | 49 (54) | 44 (31) | 1.9 (1.0–3.6) | 0.04 |
Cholecystectomy | 18 (33) | 25 (14) | 2.6 (1.1–5.9) | 0.02 | 23 (25) | 19 (12) | 2.4 (1.1–5.6) | 0.04 |
Right hemicolectomy | 3 (5.6) | 5 (2.8) | 1.1 (0.3–8.7) | 0.46 | 6 (6.0) | 3 (1.9) | 5.3 (1.0–43) | 0.07 |
- Note: Predictors in patient history and physical examination of C4- and SeHCAT-defined bile acid diarrhoea; number of observations (% of total N). Odds ratios adjusted by the predictors given in the table; mean (95% confidence interval). Watery stools were defined as Bristol stool scale type 6 and 7. Urgent bowel movements were reported using the gastrointestinal symptom rating scale questionnaire and ‘severely affected’ was defined as ‘rather severely affected’ or worse.
3.3 Effect of blood sampling conditions on C4 and FGF19
In patients on statins, mean C4 was 18 (95% CI 5–31) ng/mL lower compared with samples taken after a week's pause of the prescription (p = 0.007). Violation of fasting, sampling time and recent alcohol consumption did not significantly affect C4. Violation of fasting affected FGF19 (Table S7).
3.4 Diagnostic value of empirical treatment response
Of the 251 patients, 168 received randomised treatment in the linked trial, 84 were allocated to placebo and 84 to colesevelam.19 Of the 84 patients on colesevelam, 66 patients answered the questionnaire regarding subjective treatment effect, while diagnostics and treatment allocation were double-blinded. The patient-reported effect of empirical colesevelam treatment had 63% (95% CI 44%–79%) sensitivity and 65% (47%–80%) specificity (Table 3).
Questionnaire, subjective effect | SeHCAT ≤10% | SeHCAT >10% | ||
---|---|---|---|---|
“Diarrhoea cured” n = |
20 | 12 |
PPV 63% (40—74) |
False positive rate 35% (20—54) |
“Diarrhoea not cured” n = |
12 | 22 |
NPV 65% (52—75) |
False negative rate 38% (21—56) |
Sensitivity 63% (44—79) |
Specificity 65% (47—80) |
- Note: Of 84 patients on colesevelam in the linked placebo-controlled trial, 66 patients while diagnostics and treatment allocation remained double-blinded answered the questionnaire item “Was your diarrhoea cured; yes or no”. Mean values with 95% confidence intervals.
- Abbreviations: NPV, negative predictive value; PPV, positive predictive value.
3.5 Bivariate distributions of biomarkers and clinical symptoms
SeHCAT retention correlated with the number of watery stools (Spearman's rho (rs) = −0.37, p < 10−8), while C4 correlated to a lesser degree (rs = 0.20, p = 0.003), Figure S1. SeHCAT retention correlated closely with the levels of total measured bile acids in spot stool samples (rs = −0.72, p < 10−15) and to a lesser degree with the percentage of primary bile acids in spot stool samples (rs = −0.34, p < 10−8), Figure S2. Both the total measured amount of bile acids (rs = 0.36, p < 10−6) and the percentage of primary bile acids in spot stool samples (rs = 0.39, p < 10−7) correlated with the number of watery bowel movements, Figure S3.
3.6 Receiver operating characteristics analyses
With SeHCAT retention ≤10% as the diagnostic gold standard, the prespecified C4 threshold >46 ng/mL had 47% (95% CI 37%–57%) sensitivity and 92% (87%–96%) specificity. Exploring other cut-off values, Youden's index optimal threshold was 33 ng/mL with 71% (61%–80%) sensitivity and 84% (78%–90%) specificity. The C4 thresholds <20 and >60 ng/mL had 88% negative and positive predictive values, respectively (Table 4). The threshold of total measured bile acids in spot stool samples <7 μmol/g had a 93% negative predictive value; >16 μmol/g had an 88% positive predictive value. The ROC curves are plotted in Figure 2.
ROC analysis versus SeHCAT ≤10% | ROC-AUC | Thres-hold | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | Positive likelihood ratio | Negative likelihood ratio |
---|---|---|---|---|---|---|---|---|
Watery stools per day, n = 234 | 0.72 (0.65–0.78) | ≥1.0 | 87 (78–92) | 40 (32–48) | 47 (43–51) | 83 (73–89) | 1.4 (1.2–1.7) | 0.3 (0.2–0.6) |
≥2.0 | 67 (56–76) | 66 (58–74) | 55 (48–62) | 76 (70–81) | 2.0 (1.5–2.6) | 0.5 (0.4–0.7) | ||
≥3.0 | 39 (29–50) | 83 (75–88) | 58 (47–69) | 68 (64–72) | 2.2 (1.4–3.5) | 0.7 (0.6–0.9) | ||
≥5.0 | 20 (12–30) | 95 (90–98) | 72 (53–86) | 66 (63–68) | 4.1 (1.8–9.5) | 0.8 (0.8–0.9) | ||
C4 (ng/mL) n = 230 |
0.83 (0.78–0.89) | >15 | 93 (86–97) | 48 (39–56) | 53 (48–57) | 92 (84–96) | 1.8 (1.5–2.1) | 0.1 (0.1–0.3) |
>20 | 87 (80–93) | 64 (56–72) | 60 (54–66) | 88 (83–94) | 2.4 (1.9–3.0) | 0.2 (0.1–0.4) | ||
>33 | 71 (61–80) | 84 (78–90) | 73 (66–81) | 82 (78–87) | 4.4 (3.0–6.6) | 0.4 (0.3–0.5) | ||
>46 | 47 (37–57) | 92 (87–96) | 78 (67–87) | 74 (70–78) | 5.7 (3.2–10.2) | 0.6 (0.5–0.7) | ||
>60 | 32 (22–43) | 97 (94–99) | 88 (76–97) | 70 (67–73) | 11.7 (4.2–32) | 0.7 (0.6–0.8) | ||
FGF19 (pg/mL) N = 230 |
0.56 (0.49–0.64) | <60 | 40 (30–51) | 74 (66–81) | 49 (40–58) | 67 (62–71) | 1.5 (1.1–2.2) | 0.8 (0.7–1.0) |
<204 | 89 (80–94) | 12 (7–18) | 38 (36–40) | 63 (45–78) | 1.0 (0.9–1.1) | 1.0 (0.5–2.0) | ||
Stool total measured bile acids (μmole/g), n = 179 | 0.89 (0.84–0.93) | >7 | 93 (87–99) | 63 (53–71) | 64 (59–70) | 93 (87–98) | 2.5 (1.3–3.2) | 0.1 (0.05–0.3) |
>11 | 83 (73–91) | 84 (76–90) | 79 (71–86) | 87 (82–92) | 5.1 (3.2–7.9) | 0.2 (0.1–0.3) | ||
>16 | 55 (42–65) | 94 (89–98) | 88 (79–95) | 74 (69–79) | 9.5 (4.2–21) | 0.5 (0.4–0.6) | ||
Stool primary bile acids (%), n = 179 | 0.71 (0.63–0.79) | >10 | 75 (63–84) | 59 (49–68) | 59 (50–63) | 76 (68–83) | 1.8 (1.4–2.3) | 0.4 (0.3–0.7) |
>25 | 52 (40–64) | 80 (71–87) | 65 (55–74) | 70 (64–75) | 2.6 (1.7–4.0) | 0.6 (0.5–0.8) | ||
>50 | 37 (26–49) | 91 (84–96) | 76 (61–86) | 67 (63–71) | 4.3 (2.2–8.6) | 0.7 (0.6–0.8) | ||
Combined modela | 0.95 (0.91–0.99) | Index >0.30 | 87 (69–96) | 85 (71–94) | 81 (64–93) | 90 (76–97) | 5.9 (2.8–13) | 0.2 (0.16–0.4) |
Index >0.50 | 77 (58–90) | 93 (80–98) | 88 (70–98) | 84 (71–94) | 10.5 (3.5–32) | 0.3 (0.1–0.5) |
- Note: Mean diagnostic performance characteristics (95% confidence intervals) compared with SeHCAT retention ≤10% as the gold standard. The key C4 threshold was >46 ng/mL; other thresholds are exploratory.
- Abbreviations: FGF19, fibroblast growth factor 19; NPV, negative predictive value; PPV, positive predictive value.
- a The model combined the logarithm to C4, total measured bile acids in stools, and baseline number of watery bowel movements (see Supplementary for model equation); diagnostic performance on internal validation.

3.7 Explorative diagnostic modelling
Using C4, total measured bile acids in spot stool samples and the daily mean number of watery stools (all log-transformed) in machine learning regression modelling gave a model kappa value of 0.71. The validated area under the ROC curve in the internal validation dataset was 0.95 (95% CI 0.91–0.99), giving 77% (58%–90%) sensitivity and 93% (80%–98%) (Table 4). Adjusted diagnostic odds ratios of the model covariates are found in Table 5, and the mathematical equation of the model is found in the Supporting Information material, page 16. Assessing the robustness of the modelling to the random split in 60% training and 40% testing partitioning of the database returned area under the ROC curves ranging from 0.93 to 0.95. Adding the percentage of primary bile acids to the model resulted in negligible improvement in a few iterations. The decision tree analysis, including C4 and the mean number of watery stools, suggested that C4 ≥ 31 ng/mL in conjunction with ≥1.1 watery stools as diagnostic for bile acid diarrhoea with 70% (51%–85%) sensitivity and 95% (83%–99%) specificity on internal validation (Table 6) (Figure S4). Validation of the diagnostic performance of the decision tree was possible in data from a previous cohort, which was recruited similarly to the current study.20, 22 Plasma samples in these 71 patients; however, were collected with no regard to ongoing statin therapy or recent alcohol consumption and the samples had been kept at −80°C for 5 years. In this previous cohort, the decision tree model had a sensitivity of 62% (41%–80%) and a specificity of 91% (79–98).
Predictor | Change | Adjusted odds ratio mean (95% confidence interval) | p |
---|---|---|---|
Daily mean number of watery stools | +50% | 2.2 (1.5–3.7) | <0.001 |
C4 | +50% | 1.7 (1.2–2.5) | <0.01 |
Spot stool sample total measured bile acids | +50% | 1.8 (1.2–2.9) | <0.01 |
- Note: Exploratory combined logistic regression model with SeHCAT ≤10% as the diagnostic gold standard. The model was trained in 60% of the database using 5-fold repeated cross validation with 10 repetitions, and validated in the remaining 40% of the database. Model kappa value: 0.71.
SeHCAT≤10% | SeHCAT>10% | ||||
---|---|---|---|---|---|
Decision tree C4 ≥ 31 ng/mL AND daily mean no. watery stools ≥1.1 |
Yes: bile acid diarrhoea; n= | 21 | 2 |
PPV 91% (73—98) |
LR + 14 (4–57) |
No: not bile acid diarrhoea n= | 9 | 39 |
NPV 81% (71—88) |
LR − 0.3 (0.2–0.6) |
|
Sensitivity 70% (51—85) |
Specificity 95% (83—99) |
- Note: Machine learning developed decision tree defining bile acid diarrhoea as C4 ≥ 31 ng/mL in conjunction with mean daily number of Bristol type 6 and 7 stools ≥1.1. Internally validated diagnostic performance; mean (95% confidence interval).
- Abbreviations: LR−, Negative diagnostic likelihood ratio; LR+ Positive diagnostic likelihood ratio; NPV, negative predictive value; PPV, positive predictive value.
All regression models had good fits in model diagnostics.
4 DISCUSSION
The limited availability of SeHCAT for the diagnosis of bile acid diarrhoea and the emergence of new treatment options emphasises the need for more readily available and valid diagnostic tests.14, 17-19 This diagnostic accuracy study included a prospective cohort of consecutive patients with chronic diarrhoea attending the reference SeHCAT test. The patient's subjective response to empirical colesevelam treatment was diagnostically inadequate, while we confirmed that the key C4 threshold of 46 ng/mL was specific but lacked sensitivity. Explorations suggested the sensitivity of C4 could be improved by lowering the C4 threshold to 31 ng/mL in conjunction with diary confirmed number of watery stools ≥1.1 or by using a regression model including C4, the mean diary-registrered number of watery stools and spot stool sample total measured bile acids.
The optimal SeHCAT test threshold is not settled, and some consider values of 10%–15% or even 15%–20% a diagnostic grey zone.40, 41 Although not powered to distinguish SeHCAT thresholds, our data suggest worse diarrhoea symptoms in patients with SeHCAT retention of ≤10% and that the bowel habits of patients with SeHCAT values in the 10%–15% and 15%–20% ranges are more similar to patients with SeHCAT retention >20% (Table S1). This notion is supported by data from the linked trial on the treatment difference between colesevelam and placebo in ranges of SeHCAT retention, suggesting benefit of colesevelam over placebo only in the patients with SeHCAT retention of 10% or less and no apparent benefit in the SeHCAT ranges of 10.1%–15% and 15.1%–20%.19 We, therefore, find the threshold of 10% SeHCAT retention reasonable as diagnostic for bile acid diarrhoea.
The diagnostic performance of C4 reported here confirms our previous findings that the 46 ng/mL threshold had 92% (87%–96%) specificity but a low sensitivity of 47% (37%–57%), so about half of the patients with SeHCAT-defined bile acid diarrhoea would be overlooked using this C4 threshold.20 However, since C4 is inexpensive (about 5%) compared to SeHCAT, it has merit to scrutinise its best use. The C4 threshold of 33 ng/mL increased the sensitivity considerably to 71% (61%–80%) with only a slight decline in specificity to 84% (78%–90%). C4 values less than 20 ng/mL had an 88% (95% CI 83%–94%) negative predictive value to rule out bile acid diarrhoea, whereas values above 60 ng/mL had an 88% (76%–97%) positive predictive value; indicating the 20–60 ng/mL range to be a diagnostic grey zone.20, 22 However, our placebo-controlled treatment data from this patient cohort suggested similar remission rates in patients with C4 > 46 ng/mL as in patients with SeHCAT retention ≤10%.19 Moreover, the explorative decision tree analysis suggested a C4 threshold of 31 ng/mL combined with a diary-recorded mean number of watery stools ≥1.1 may increase the sensitivity to 70% (51%–85%) while keeping 95% (83%–99%) specificity. Compared with the diagnostic performance of C4 alone, it appears that the sensitivity of the proximate C4 threshold 33 ng/mL, is retained and that adding the criterion of watery bowel movements ≥1.1 in the decision tree increased specificity to an acceptable level. The reliability of the proposed decision tree model was strengthened by replication of the diagnostic characteristics in our previous cohort.20 The slightly lower sensitivity of 62% of the decision tree in the previous dataset reflects a lower sensitivity of C4 overall that could be explained by differences in blood sampling conditions. The logistic regression model, which in addition also included the covariate of spot stool sample total measured bile acids, increased the sensitivity to 77% (58%–90%). A recent study reported similar diagnostic performance with a mean sensitivity of 78% (64%–89%) and specificity of 93% (83%–98%); however, this was a case–control study with controls matched on sex, age, and body mass index; not on diarrhoea symptoms or bowel transit.42 We deem our study cohort consisting of prospective consecutive patients attending SeHCAT testing to have better external validity. Both our approaches could potentially limit the size of the grey zone of C4-based diagnostics compared with SeHCAT testing. The regression model does require complex HPLC-MS/MS analysis of bile acids in a random stool sample, whereas the decision tree model would readily be available where C4 is implemented.
Where available, SeHCAT testing could be reserved only for patients with grey zone C4 values. If gold standard testing is unavailable, prescribing a sequestrant, typically colestyramine, to evaluate the empirical treatment effect may be the only approach which recent guidelines deemed reasonable in settings without better diagnostic options.10 However, we here report a sensitivity of 63% (44%—79%) and specificity of 65% (47%—80%) for the patient-reported effect of colesevelam; figures that are insufficient for a diagnostic test and accordingly support guidelines that advise against the diagnostic use of empirical treatment effect.6 Our reported test performance was based on 20 (63%) of 32 patients with bile acid diarrhoea diagnosed by SeHCAT ≤10% compared with 12 (35%) of 34 with SeHCAT >10% deeming their diarrhoea cured on colesevelam after 12 day's treatment. This brief treatment duration of treatment is a limitation; however, similar figures were reported in a retrospective study with a follow-up time of 2–24 months, showing 71% of patients with bile acid diarrhoea improving on colesevelam, but 37% of patients without bile acid diarrhoea also improved on colesevelam.43 With several treatment options available for bile acid diarrhoea, using the empirical effect of just one option, typically colestyramine, as diagnostic is, in our opinion, outdated.15, 17, 18, 44 A notion underlined by the substantial rates of false negative and false positive tests of 38 and 35%, respectively, that we here report (Table 3). C4-based biochemical diagnosis of bile acid diarrhoea is fairly inexpensive with potential widespread availability, and could, therefore, optimise the use of the expensive SeHCAT testing. Where gold standard testing is unavailable, C4-based testing seems a considerable improvement over empirical treatment effect that could scale to the diagnostic needs. Further validation is warranted.
Our analysis of bile acid profiles in spot stool samples demonstrated a close relationship between the total measured amount of bile acids and SeHCAT testing with an area under the ROC curve of 0.89 (0.84–0.93), markedly superior to the 0.71 (0.63–0.79) of the percentage of primary bile acids (Table 4, Figure 2). This finding is surprising, as recent diagnostic studies proposed the percentage of primary bile acids in spot stool samples as a preferred measure of bile acid diarrhoea.25, 26 In contrast to our results, a study including 113 patients reported a 0.69 area under the ROC curve against SeHCAT ≤10% of spot stool sample total bile acids measured by an enzymatic kit.45 However, the reported levels of faecal bile acids of median 9.9 (IQR 4.8–15.4) μmol/g, even in severe bile acid diarrhoea with SeHCAT retention <5%, were lower than the median 17.2 (IQR 12.3–27.2) μmol/g we found in patients with SeHCAT retention ≤10%. Differences in sample preparation could explain this discrepancy as we lyophilised the stool samples to measure bile in dried faecal matter. The bile acid profiles in spot stool samples of patients with bile acid diarrhoea and miscellaneous diarrhoea were markedly different even when controlling the family-wise error rate. This was not true regarding the bile acid profiles in plasma, where apparent differences became non-significant when adjusting for multiple statistical comparisons. Unlike measurements of bile acids in stool, plasma levels are fundamentally affected by the hydrophilicity of each bile acid and by the hepatic first-pass effect.46 Although serum lipidomics have been proposed as diagnostic for bile acid diarrhoea, our data suggest that stool samples have better diagnostic applicability.42
Strengths of this study include the consecutive recruitment of prospective patients referred for SeHCAT testing, limiting the effect of selection bias on external validity. The patient groups in this study were similar in demographic characteristics to data from a large observational study and biochemical profile reported in a recent cohort study.13, 47 The study included all current diagnostic modalities except the 48-hour stool collection, and the study size allowed data partitioning with internal validation of the explorative modelling. Although explorative, the diagnostic performance characteristics of the decision tree model including C4 and the mean number of watery stools was validated both in an internal dataset and in a previous cohort. Combining the measurement of C4 with a diary registration would be readily available where C4 is implemented. The performance characteristics of spot stool sample total measured bile acids were appealing; however, the complexity of the HPLC-MS/MS assay to measure numerous bile acid species is a limitation. This study being part of a trial imposed some limitations as the criteria for inclusion and exclusion were tailored for the trial, and notably, patients with inflammatory bowel disease and microscopic colitis were excluded, and the evaluation of clinical response was done after 12 days' treatment. Specific studies addressing these populations are needed. Logistic regression modelling handles collinearity poorly; therefore, we used few known putative covariates, but comprehensive machine learning might identify combinations of bile acid molecular species to be more specific; however, the kappa statistic of 0.71 of our diagnostic model indicates a substantial improvement in diagnostic classification. Finally, the exploratory findings of the study will need subsequent validation.
In conclusion, the effect of empirical colesevelam treatment was diagnostically inadequate. We confirm that the predefined threshold of the plasma biomarker C4 > 46 ng/mL is specific compared with SeHCAT for the diagnosis of bile acid diarrhoea but lacks sensitivity. Exploration suggested that lowering the C4 threshold to 31 ng/mL in conjunction with an increased number of watery stools could considerably improve the sensitivity, as could modelling combining C4 and spot stool sample total measured bile acids.
AUTHOR CONTRIBUTIONS
Christian Borup: Conceptualization (equal); data curation (lead); formal analysis (lead); funding acquisition (lead); investigation (equal); methodology (equal); project administration (lead); software (lead); visualization (lead); writing – original draft (lead); writing – review and editing (equal). Lars Vinter-Jensen: Investigation (equal); project administration (equal); writing – review and editing (equal). Søren Peter German Jørgensen: Investigation (equal); project administration (equal); writing – review and editing (equal). Signe Wildt: Conceptualization (equal); methodology (equal); supervision (equal); writing – review and editing (equal). Jesper Graff: Conceptualization (equal); investigation (equal); project administration (equal); supervision (equal); writing – review and editing (equal). Tine Gregersen: Investigation (equal); project administration (supporting); writing – review and editing (equal). Anna Zaremba: Investigation (equal); project administration (supporting); writing – review and editing (equal). Trine Borup Andersen: Investigation (equal); project administration (supporting); writing – review and editing (equal). Camilla Nøjgaard: Investigation (supporting); writing – review and editing (equal). Hans Bording Timm: Investigation (supporting); writing – review and editing (equal). Antonin Lamazière: Formal analysis (equal); methodology (equal); resources (supporting); validation (supporting); writing – review and editing (equal). Dominique Rainteau: Formal analysis (equal); methodology (equal); resources (supporting); validation (equal); writing – review and editing (equal). Svend Høime Hansen: Formal analysis (equal); methodology (equal); resources (supporting); writing – review and editing (equal). Jüri Johannes Rumessen: Conceptualization (equal); methodology (equal); supervision (equal); writing – review and editing (equal). Lars Kristian Munck: Conceptualization (lead); funding acquisition (lead); investigation (supporting); methodology (lead); project administration (lead); supervision (lead); writing – review and editing (lead).
ACKNOWLEDGMENTS
We are grateful to the patients for participation and to our funders and collaborators making this work possible. We thank research Majbritt Frost Nilsson (M.H.S.) for her crucial role in the project management of the Aalborg study site.
CONFLICT OF INTEREST STATEMENT
SW was on an advisory board for Bristol-Meyers Squibb and was supported by Takeda Pharmaceuticals to attend the ECCO congress. All other authors declare no conflicts of interests.
FUNDING INFORMATION
This study was funded entirely by independent research grants, predominantly by a donation from the Fabrikant Vilhelm Pedersen og hustrus mindelegat by recommendation from the Novo Nordisk Foundation (NNF19OC0055844). Smaller donations were granted by the Region Zealand Health-Scientific Fund (R17A48B39, RSSF2017000645), the Axel Muusfeldts Fond (2017-771), Overlæge Johan Boserup og Lise Boserups Legat (20795-24), Aase og Ejnar Danielsens Fond (10-002035), Civilingeniør H.C. Bechgaard og hustru Ella Mary Bechgaards Fond (2017-1064/93), Prosektor Axel Emil Søeborg Ohlsen og ægtefælles Mindelegat (6386 MT/IV), the Foundation for Advancement of Medical Science under the A.P. Møller, and Chastine Mc-Kinney Møller Foundation (18-L-0394).
AUTHORSHIP
Guarantor of the article: Lars Kristian Munck.