A prediction tool was developed and internally validated to aid the diagnosis of Cushing's syndrome in dogs attending UK primary-care practices. External validation is an important part of model validation to assess model performance when used in different populations.

Objectives

To assess the original prediction model's transportability, applicability, and diagnostic performance in a secondary-care practice in the Netherlands.

Animals

Two hundred thirty client-owned dogs.

Methods

Retrospective observational study. Medical records of dogs under investigation of Cushing's syndrome between 2011 and 2020 were reviewed. Dogs diagnosed with Cushing's syndrome by the attending internists and fulfilling ALIVE criteria were defined as cases, others as non-cases. All dogs were scored using the aforementioned prediction tool. Dog characteristics and predictor-outcome effects in development and validation data sets were compared to assess model transportability. Calibration and discrimination were examined to assess model performance.

Results

Eighty of 230 dogs were defined as cases. Significant differences in dog characteristics were found between UK primary-care and Dutch secondary-care populations. Not all predictors from the original model were confirmed to be significant predictors in the validation sample. The model systematically overestimated the probability of having Cushing's syndrome (a = −1.10, P < .001). Calibration slope was 1.35 and discrimination proved excellent (area under the receiver operating curve = 0.83).

Conclusions and Clinical Importance

The prediction model had moderate transportability, excellent discriminatory ability, and overall overestimated probability of having Cushing's syndrome. This study confirms its utility, though emphasizes that ongoing validation efforts of disease prediction tools are a worthwhile effort.

Abbreviations

ACTH: adrenocorticotropic hormone
ADH: adrenal-dependent hypercortisolism
ALP: alkaline phosphatase
AUROC: area under the receiver operating characteristic curve
LDDST: low dose dexamethasone suppression test
NPV: negative predictive value
o-HDDST: oral high dose dexamethasone suppression test
PDH: pituitary-dependent hypercortisolism
PPV: positive predictive value
SE: standard error
UCCR: corticoid-to-creatinine ratio
USG: urine specific gravity

1 INTRODUCTION

Cushing's syndrome is an umbrella term for a range of clinical syndromes that is caused by a chronic excess of glucocorticoid activity, which can be because of a range of endogenous or exogenous steroid hormones.¹ It is a common endocrine disorder in dogs. Overall prevalence is estimated at 0.17%-0.28%.^2-4 Spontaneous Cushing's syndrome is caused by an excessive production of glucocorticoids, often leading to a typical case presentation. Common clinical signs include polydipsia, polyuria, polyphagia, panting, abdominal enlargement, hepatomegaly, dermatological changes, and muscle atrophy. Frequently observed clinicopathological abnormalities include a stress leukogram, increased serum alkaline phosphatase activity (ALP), and hypercholesterolaemia.^{5, 6}

Multiple adrenal function tests and differentiating tests are described for diagnostic purposes, including urine corticoid-to-creatinine ratio (UCCR) with or without suppression using oral dexamethasone, low dose dexamethasone suppression test (LDDST) on the basis of blood cortisol, and adrenocorticotropic hormone (ACTH) stimulation test also on the basis of blood cortisol measurement. However, none of these tests are perfect, as they can be time-consuming, costly, and both false-positive and false-negative results are common.^6-12 Recently, a prediction tool was developed and internally validated to aid the diagnosis of spontaneous Cushing's syndrome in dogs.^{13, 14} This model demonstrated a good predictive performance in dogs attending UK primary-care practices, using neuter status, age, breed, polydipsia, vomiting, potbelly/hepatomegaly, alopecia, pruritus, urine specific gravity (USG), and serum ALP as predictor variables.

A prediction model can be validated internally and externally. With internal validation, the model is tested in patients who belong to the original population and indicates how the model would likely perform in a very similar population. With external validation, the model is tested in patients belonging to another population. External validation is an important part of model validation, as different population characteristics could have a major influence on the performance of a prediction model.^{15, 16} These variations in population characteristics have been categorized as temporal, geographic, and domain differences.^{17, 18} With temporal validation, the same study is performed in a similar population from a later time period. With geographic validation, the model performance is tested by varying the location of the population (eg, primary-care practices in the UK vs primary-care practices in the Netherlands). Domain validation can be performed when one suspects differences in patient groups (eg, primary-care vs secondary-care practice). Disease prevalence and characteristics can vary markedly between primary-care and secondary-care or tertiary-care practices in both human and veterinary medicine.^19-21 The aim of this study was to assess the aforementioned prediction model's transportability, applicability, and diagnostic performance in a secondary-care practice in the Netherlands.

2 MATERIALS AND METHODS

2.1 Case selection

All electronic medical records of the Internal Medicine service of small animal referral clinic “Amsterdam Medisch Centrum voor Dieren” containing results of UCCR's in combination with an oral high dose dexamethasone suppression test (most commonly used in the Netherlands and at the investigational hospital) between January 2011 and December 2020 were reviewed. The use of UCCR and oral high dose dexamethasone suppression test (o-HDDST) has previously been validated^{9, 22, 23} and involved the collection of a morning urine sample by the owner on 3 consecutive days. After collection of the second urine sample, the owner administered 3 oral doses of dexamethasone (0.1 mg/kg/dose) at 8-hour intervals. UCCR was measured in all 3 morning urine samples. In an animal with appropriate signalment and suggestive clinical signs hypercortisolism was suspected if the average of the first 2 UCCRs was ≥10 × 10⁻⁶; this cut-off was previously established by the University of Utrecht Veterinary Laboratory.⁹ If the third UCCR was <50% of the mean of the first 2 samples, this was considered suggestive of pituitary-dependent hypercortisolism (PDH). According to the described validation of the methodology, a decrease >50% was considered to be suggestive of pituitary-dependent (but dexamethasone-resistant) hypercortisolism, ectopic ACTH excess, or ACTH independence (eg, adrenal-dependent hypercortisolism [ADH]). The same laboratory and assay were used for UCCR measurement as in the validation publications.

Dogs were included as cases (ie, having Cushing's syndrome) if this was the final conclusion of the attending internist based on a combination of medical history, clinical signs, physical examination, routine laboratory investigations, endocrine tests (UCCR and o-HDDST), and diagnostic imaging (abdominal ultrasound ± computed tomography). Cases were excluded if a subsequent revision of the diagnosis was made in the medical record. Dogs were included as non-cases (ie, not having Cushing's syndrome) if the attending internist considered Cushing's syndrome and subsequently ruled out this diagnosis based on normal UCCRs and o-HDDST in combination with 1 or more of the following: medical history, clinical signs, physical examination, routine laboratory investigations, other endocrine tests, and diagnostic imaging. A definite alternative diagnosis was not required for non-cases.

All cases and non-cases were subsequently independently reviewed by the primary and last author and verified to be compliant (cases) or not (non-cases) with ALIVE criteria for diagnosis of Cushing's syndrome.^{24, 25} Those not fulfilling ALIVE criteria were excluded. Current ALIVE criteria for PDH and ADH are shown in Table 1.

TABLE 1. Current ALIVE criteria for diagnosis of pituitary-dependent hypercortisolism and adrenal-dependent hypercortisolism.

Pituitary-dependent hypercortisolism criteria	Accepted ways to fulfill criteria
Identification of a set of clinical features attributable to Cushing's syndrome including.	Supportive history, physical examination findings and clinicopathologic test results
Demonstration of an excess of cortisol through dynamic testing of pituitary-adrenal function	Dexamethasone suppression test based on blood OR Dexamethasone suppression test based on urine OR ACTH stimulation test
ACTH-dependence originating from the pituitary is proven through at least 1 clear differentiation test result	Characteristic suppression of a LDDST using blood OR Characteristic suppression of a HDDST using blood OR Characteristic suppression of a HDDST combined with UCCR measurement OR Absence of suppressed endogenous ACTH concentration OR Absence of an ultrasound examination characteristic of a glucocorticoid-secreting adrenal tumor using ALIVE methodology OR Characteristic changes of pituitary morphology on CT or MRI using ALIVE methodology.

Adrenal-dependent hypercortisolism criteria	Accepted ways to fulfill criteria
Identification of a set of clinical features attributable to Cushing's syndrome including.	Supportive history, physical examination findings and clinicopathologic test results
Demonstration of an excess of cortisol through dynamic testing of pituitary-adrenal function	Dexamethasone suppression test based on blood OR Dexamethasone suppression test based on urine OR ACTH stimulation test
ACTH-independence confirmed by at least 1 clear differentiation test result	Suppressed endogenous ACTH concentration OR Ultrasound examination characteristic of a glucocorticoid-secreting adrenal tumor using ALIVE criteria OR Characteristic changes of adrenal morphology on ultrasound, CT or MRI using ALIVE methodology.

Note: Fulfillment of all 3 criteria is required for diagnosis.

2.2 Prediction tool

The published Cushing's Prediction Tool (Table 2) was used as described in the aforementioned study.¹⁴ Information regarding clinical signs and laboratory results within 1 week before and 1 week after the point of first assessment at the internal medicine service were used for scoring purposes. If a specific clinical sign was not mentioned in the medical record, it was considered to be absent.

TABLE 2. Prediction tool to calculate the likelihood of a dog having Cushing's syndrome.

	Category	Points
Dog demography
Neuter status	Female-entire	0
	Female-neutered	−1
	Male-entire	−1
	Male-neutered	−1
Current age (years)	<7	0
Current age (years)	≥7	1
Breed	Bichon frise	2
	Border terrier	1
	Labrador retriever	−3
	Schnauzer	−2
	West Highland white terrier	−3
	Other breed or crossbreed	0
Presenting clinical signs
Polydipsia	Yes	2
Polydipsia	No	0
Vomiting	Yes	−2
Vomiting	No	0
Potbelly/hepatomegaly	Yes	3
Potbelly/hepatomegaly	No	0
Alopecia	Yes	2
Alopecia	No	0
Pruritus	Yes	−2
Pruritus	No	0
Laboratory factors
Urine specific gravity	Dilute (≤1.020)	0
	Not dilute (>1.020)	−2
	Not recorded	−1
Serum ALP	Elevated	0
	Not elevated	−3
	Not recorded	0

Note: To calculate the predicted likelihood of an individual dog having Cushing's syndrome, one has to add together the points that correspond to the category for each predictor and match the final score to the predicted likelihood as published by Schofield et al.¹⁴ This way, a likelihood between 0% (score −13) and 96% (score 10) can be predicted.

2.3 Additional laboratory factors and comorbidities

In addition to the baseline characteristics used as predictors in the prediction tool, the presence or absence of lymphopenia, hypercholesterolemia, persistent proteinuria using urine protein-to-creatinine ratio, urinary tract infection using culture, uroliths using imaging, vacuolar hepatopathy using cytology or histopathology, diabetes mellitus, systemic hypertension, thromboembolic disease (considered absent if not reported), and gallbladder sludge or mucocele using ultrasound were noted for every case.

2.4 Statistical analysis

Statistical analysis was performed using R (R version 4.2.1, and RStudio v2022.02.3+492; packages mosaic, car, Hmisc, Epi, PredictABEL, and fmsb). Baseline characteristics were expressed as n and % for categorical variables. Subgroup differences in baseline characteristics were examined using Wilcoxon's rank sum test for non-normally distributed numeric variables and Fisher's exact test for categorical variables.

The first step in the external validation of the diagnostic prediction tool was to subjectively qualify the level of relatedness between the case mix of the development and validation sample (model transportability). This is an important step, as it helps to differentiate between reproducibility and transportability. Reproducibility refers to a model's capacity to produce accurate predictions in a new sample that is very similar to the development population, whereas transportability refers to the ability to produce accurate predictions in a different population. To assess model transportability, one would ideally see a low to moderate degree of relatedness. If the development and validation sample appear to be (almost) identical, the external validation study could actually reflect the model's reproducibility.^{26, 27} Using the summary measures percentages, median and range, the distribution of the dog characteristics was compared. This included the predictors in the validated model and outcome occurrence. Furthermore, the extent to which the development and validation samples share common predictor effects was evaluated by refitting the original logistic regression model in the validation sample. The estimated regression coefficients and corresponding SEs were compared to evaluate the heterogeneity in the predictor-outcome associations.

The second step of external validation involved examining the calibration and discrimination to assess the model's performance in the new validation sample. Calibration measures the agreement between observed and predicted outcomes. Calibration-in-the-large was given as the intercept term a from the recalibration model logit(y) = a + b × logit(ŷ), in which the logit(y) is the natural logarithm of the observed odds of being diagnosed with Cushing's syndrome and logit(ŷ) the natural logarithm of the predicted odds of being diagnosed with Cushing's syndrome. Ideally, the intercept term a should equal 0. If a < 0, this indicates the model overestimates the odds, whereas a > 0 indicates underestimation. The calibration slope was estimated as b from the same recalibration model. Ideally, the calibration slope b equals 1. If 0 < b < 1, this often indicates predictions vary too much (ie, too low with low predicted probability, too high with high predicted probability), whereas b > 1 implies the opposite. Discrimination was evaluated calculating the c-statistic with 95% confidence intervals (CI), corresponding to the area under the receiver operating characteristic curve (AUROC) for the outcome diagnosis Cushing's syndrome. This reflects whether individual dogs with Cushing's syndrome receive a higher predicted probability than those without. For c-statistic, 0.5-0.7 was interpreted as poor performance, 0.7-0.8 as acceptable performance, 0.8-0.9 as excellent performance, and >0.9 as outstanding performance.²⁸ Significance was set at P < .05 for all analyses.

3 RESULTS

3.1 Descriptive statistics

In total, 242 medical records were identified. Twelve cases were excluded because they could not be assessed to be case or non-case using the ALIVE criteria. Of 230 dogs, 26 (11.3%) were female-entire, 107 (46.5%) female-neutered, 38 (16.5%) male-entire, and 59 (25.7%) male-neutered. Median age was 10.2 years and there was no significant difference between cases and non-cases (10.5 years [range 5.0-15.7] and 10.1 years [range 2.1-16.0], respectively; P = .30). The most commonly observed breeds in the study sample included crossbreed (n = 44 [19.1%]), Jack Russel terrier (n = 15 [6.5%]), Beagle (n = 11 [4.8%]), Dachshund (n = 10 [4.3%]), Cairn terrier (n = 9 [3.9%]), Maltese (n = 9 [3.9%]), French bulldog (n = 8 [3.5%]), Shih Tzu (n = 7 [3.0%]), Labrador retriever (n = 6 [2.6%]), Poodle (n = 5 [2.2%]), and Yorkshire terrier (n = 5 [2.2%]).

Eighty dogs (34.8%) were defined as cases with Cushing's syndrome using ALIVE criteria. In 65/80 dogs (81.3%) the disease was considered pituitary-dependent, in 12/80 (15%) adrenal-dependent, and in 3/80 (3.8%) sub-type could not be specified. Final diagnosis was available for 125/150 non-cases (83.3%; Table 3). Baseline characteristics are presented in Table 4, stratified for cases and non-cases.

TABLE 3. Final diagnosis recorded in the medical records for non-cases (n = 150).

Disease category	Non-cases (%)	Final diagnosis
Cardiorespiratory	6 (4.0)	Brachycephalic obstructive syndrome (1), chronic bronchitis (1), dilated cardiomyopathy (1), myxomatous mitral valve disease (1), pulmonic stenosis (1), tracheal collapse (1)
Dermatological	2 (1.3)	Atopic dermatitis (2)
Endocrine	23 (15.3)	Central diabetes insipidus (6), diabetes mellitus (12), idiopathic hypercalcemia (1), pheochromocytoma (1), primary hyperparathyroidism (1), primary hypothyroidism (2)
Gastrointestinal	20 (13.3)	Acute gastroenteritis (1), chronic enteropathy (18), periodontal disease (1)
Hepatobiliary	4 (2.7)	Cholelithiasis (2), hepatic amyloidosis (1), reactive hepatitis (1)
Miscellaneous	14 (9.3)	Iatrogenic Cushing's (1), overfeeding (5), postprandial hyperlipemia (1), side effect phenobarbital (2), transient polydipsia (3), underfeeding (2)
Neoplastic	10 (0.7)	Apocrine gland anal sac adenocarcinoma (2), brain tumor (1), hepatic mass (1), mammary gland tumor (1), multicentric lymphoma (1), pulmonary mass (1), splenic haemangioma (1), splenic haemangiosarcoma (1), urinary bladder transitional cell carcinoma (1)
Neurological	12 (8.0)	Cerebrovascular event (1), psychogenic polydipsia (10), vestibular geriatric syndrome (1)
Ocular	5 (3.3)	Sudden acquired retinal degeneration syndrome (5)
Orthopedic	1 (0.7)	Osteoarthritis (1)
Renal	18 (12.0)	Chronic kidney disease (10), focal and segmental glomerulosclerosis (1), primary nephrogenic diabetes insipidus (1), protein losing nephropathy (1), pyelonephritis (3), renal glucosuria (2)
Urogenital	10 (0.7)	Urinary bladder polyp (1), silent heat (1), urethral sphincter mechanism incompetence (3), urinary tract infection (4), urolithiasis (1)
No final diagnosis	25 (16.7)	Adrenal mass (4), suspected alopecia X (1), suspected chronic enteropathy (6), suspected chronic lymphatic leukemia (1), suspected Cushing's, lost on follow-up (1), suspected intracranial disease (2), suspected dermatological disorder (1), psychogenic polydipsia vs primary nephrogenic diabetes insipidus (9)

TABLE 4. Baseline characteristics and Fisher's exact association stratified for cases and non-cases.

	Category	Cases (%)	Non-cases (%)	P value
Dog demography
Neuter status	Female-entire	9 (11.3)	17 (11.3)	.01*
	Female-neutered	29 (36.3)	78 (52.0)
	Male-entire	22 (27.5)	16 (10.7)
	Male-neutered	20 (25.0)	39 (26.0)
Current age (years)	<7	7 (8.8)	25 (16.7)	.21
	7-11	39 (48.8)	60 (40.0)
	≥11	34 (42.5)	65 (43.3)
Breed	Beagle	6 (7.5)	5 (3.3)	.54
	Bichon frise	1 (1.3)	0 (0.0)
	Border terrier	0 (0.0)	2 (1.3)
	Cairn terrier	3 (3.8)	6 (4.0)
	Crossbreed	16 (20.0)	28 (18.7)
	Dachshund	2 (2.5)	8 (5.3)
	French bulldog	6 (7.5)	2 (1.3)
	Jack Russell terrier	7 (8.8)	8 (5.3)
	Labrador retriever	1 (1.3)	5 (3.3)
	Maltese	1 (1.3)	8 (5.3)
	Other purebreed	31 (38.8)	63 (42.0)
	Poodle	2 (2.5)	3 (2.0)
	Schnauzer	0 (0.0)	1 (0.7)
	Shih Tzu	3 (3.8)	4 (2.7)
	West Highland white terrier	0 (0.0)	3 (2.0)
	Yorkshire terrier	1 (1.3)	4 (2.7)
Presenting clinical signs
Polydipsia	Yes	75 (93.8)	124 (82.7)	.03*
Polydipsia	No	5 (6.3)	20 (13.3)
Vomiting	Yes	3 (3.8)	20 (13.3)	.02*
Vomiting	No	77 (96.3)	130 (86.7)
Potbelly/hepatomegaly	Yes	57 (71.3)	64 (42.7)	<.001*
Potbelly/hepatomegaly	No	23 (28.8)	86 (57.3)
Alopecia	Yes	44 (55.0)	27 (18.0)	<.001*
Alopecia	No	36 (45.0)	123 (82.0)
Pruritus	Yes	2 (2.5)	9 (6.0)	.34
Pruritus	No	78 (97.5)	142 (94.0)
Laboratory factors
Urine specific gravity	Dilute (≤1.020)	37 (46.3)	54 (36.0)	.07
	Not dilute (>1.020)	4 (5.0)	21 (14.0)
	Not recorded	39 (48.8)	75 (50.0)
Serum ALP	Elevated	64 (80.0)	66 (44.0)	<.001*
	Not elevated	7 (8.8)	66 (44.0)
	Not recorded	9 (11.3)	18 (12.0)
Lymphocyte count	Decreased	13 (16.3)	14 (9.3)	.02*
	Not decreased	26 (32.5)	76 (50.7)
	Not recorded	41 (51.3)	60 (40.0)
Serum cholesterol	Elevated	8 (10.0)	7 (4.7)	.02*
	Not elevated	9 (11.3)	38 (25.3)
	Not recorded	63 (78.8)	105 (70.0)
Urine protein-to-creatinine ratio	Persistent proteinuria	6 (7.5)	11 (7.3)	.31
	No persistent proteinuria	5 (6.3)	17 (11.3)
	Not recorded	69 (86.3)	122 (81.3)
Urinary tract infection	Yes	7 (8.8)	11 (7.3)	.09
	No	11 (13.8)	45 (30.0)
	Not recorded	62 (77.5)	94 (62.7)
Urolithiasis	Yes	1 (1.3)	6 (4.0)	.23
	No	70 (87.5)	127 (84.7)
	Not recorded	9 (11.3)	17 (11.3)
Vacuolar hepatopathy	Yes	32 (40.0)	42 (28.0)	.14
	No	4 (5.0)	12 (8.0)
	Not recorded	44 (55.0)	96 (64.0)
Diabetes mellitus	Yes	0 (0.0)	12 (8.0)	.007*
	No	71 (88.8)	130 (86.7)
	Not recorded	9 (11.3)	8 (5.3)
Thromboembolism	Yes	2	1	.28
Thromboembolism	No	78	149
Gall bladder content	Abnormal	10 (12.5)	6 (4.0)	.02*
	Normal	60 (75.0)	122 (81.3)
	Not recorded	10 (12.5)	22 (14.7)

* P value <.05 implies a significant difference in baseline characteristics of cases and non-cases.

Sex distribution in cases was significantly different from that in non-cases (P = .01): 29/80 (36.3%) cases were female-neutered vs 78/150 (52.0%) non-cases, whereas 22/80 (27.5%) cases were entire males vs 16/150 (10.7%) non-cases. No statistically significant differences in age, breed, and pruritus were found between cases and non-cases. Polydipsia, potbelly, and alopecia were more commonly present in cases compared to non-cases (93.8% vs 82.7%, P = .03; 71.3% vs 42.7%, P < .001; 55% vs 18.0%, P < .001, respectively). Vomiting was reported less frequently in cases than non-cases (3.8% vs 13.3%, P = .02). Serum ALP was more commonly elevated in dogs diagnosed with Cushing's syndrome than in those without (80.0% vs 44.0%, P < .001).

Descriptive statistics of additional laboratory variables and comorbidities are presented in Table 4. Systemic blood pressure was excluded from analysis as it was recorded rarely (0/80 cases, 6/150 non-cases). Lymphopenia was noted more often in cases (13/39 [33.3%] vs 14/90 [15.6%], P = .02). Hypercholesterolemia was noted more frequently in cases than non-cases (8/17 [47.1%] vs 7/45 [15.6%], P = .02). Diabetes mellitus was observed less in cases than non-cases (0/71 [0%] vs 12/142 [8.5%], P = .007). None of the non-cases diagnosed with diabetes mellitus had evidence of insulin resistance. Finally, abnormal gall bladder content was reported more frequently in cases than non-cases (10/70 [14.3%] vs 6/128 [4.7%], P = .02). Abnormal gall bladder content was reported as sludge in 7 cases and 5 non-cases, and as mucocele in 3 cases and 1 non-case.

3.2 Model transportability

Several significant differences were found in the baseline characteristics of the development data set and the external validation data set (Table 5). Refitting the originally reported model in the current validation data set resulted in the regression coefficients and SEs presented in Table 6. The regression coefficients and standard errors of the original model are shown as a comparison. From the predictors of the original model, the regression coefficients of male-entire, vomiting, alopecia, USG not dilute, USG not recorded and serum ALP not elevated demonstrated a P < .05. The P value of the regression coefficient of polydipsia and potbelly proved <.10. All other regression coefficients demonstrated a P > .10.

TABLE 5. Comparison of baseline characteristics of development data set (primary-care practice; n = 939) and external validation set (secondary-care practice; n = 230) and Fisher's exact association.

	Category	Development	External validation	P value
Incidence Cushing's syndrome (%)		42.4	34.8	.04*
Dog demography
Neuter status (%)	Female-entire	10.3	11.3	.63
	Female-neutered	41.5	46.5	.18
	Male-entire	12.1	16.5	.08
	Male-neutered	36.0	25.7	.003*
Current age (%)	<7	13.2	13.9	.04*
	7-11	43.6	43.0	.94
	≥11	43.2	43.0	1.0
Breed (%)	Bichon frise	6.0	0.4	<.001*
	Border terrier	3.6	0.9	.03*
	Crossbreed	21.7	19.1	.42
	Jack Russell terrier	8.3	6.5	.42
	Labrador retriever	4.8	2.6	.21
	Other purebreed	36.0	66.1	<.001*
	Schnauzer	3.2	0.4	.02*
	Staffordshire bull terrier	5.9	0.4	<.001*
	West Highland white terrier	6.3	1.3	.002*
	Yorkshire terrier	4.3	2.2	.18
Presenting clinical signs
Polydipsia (%)	Yes	57.5	86.5	<.001*
Polydipsia (%)	No	42.5	13.5
Vomiting (%)	Yes	8.3	10.0	.43
Vomiting (%)	No	91.7	90.0
Potbelly/hepatomegaly (%)	Yes	33.3	52.6	<.001*
Potbelly/hepatomegaly (%)	No	66.7	47.4
Alopecia (%)	Yes	21.2	30.9	.002*
Alopecia (%)	No	78.8	69.1
Pruritus (%)	Yes	6.4	4.8	.44
Pruritus (%)	No	93.6	95.2
Laboratory factors
Urine specific gravity (%)	Dilute (≤1.020)	24.2	39.6	<.001*
	Not dilute (>1.020)	16.0	10.9	.06
	Not recorded	59.8	49.6	.006*
Serum ALP (%)	Elevated	50.5	56.5	.11
	Not elevated	7.3	31.7	<.001*
	Not recorded	42.2	11.7	<.001*

* P value <.05 implies a significant difference in baseline characteristics of development data set and external validation set.

TABLE 6. Estimated regression coefficients and corresponding standard errors (SE) for Cushing's syndrome prediction model in development data set (primary-care practice; cases, n = 398; non-cases, n = 541) and external validation data set (secondary-care practice; cases, n = 80; non-cases, n = 150).

Predictor	Category	r_dev	SE_dev	r_val	SE_val	P value
Constant (model intercept)		−0.49	0.38	−1.31	1.18	.27
Neuter status	Female-entire	Baseline		Baseline
	Female-neutered	−0.64	0.27	−0.20	0.61	.75
	Male-entire	−0.34	0.32	1.44	0.72	.04*
	Male-neutered	−0.60	0.27	0.04	0.65	.95
Current age (years)	<7	Baseline		Baseline
	7 to <11	0.64	0.26	0.47	0.69	.50
	≥11	0.58	0.27	0.05	0.71	.95
Breed	Crossbreed	Baseline		Baseline
	Bichon frise	0.68	0.34	15.81	3956.18	1.0
	Border terrier	0.61	0.44	−19.45	2315.47	.99
	Jack Russell terrier	0.11	0.30	0.81	0.93	.38
	Labrador retriever	−1.37	0.49	−0.97	1.34	.47
	Other purebred	−0.04	0.20	−0.12	0.50	.81
	Schnauzer	−1.03	0.53	−14.67	3956.18	1.0
	Staffordshire terrier	0.05	0.35	−11.97	3956.18	1.0
	West Highland white terrier	−1.18	0.37	−16.54	1835.12	.99
	Yorkshire terrier	0.09	0.40	−1.15	1.47	.43
Polydipsia	Yes	0.87	0.16	1.16	0.65	.07
Polydipsia	No	Baseline		Baseline
Vomiting	Yes	−0.76	0.31	−2.16	0.90	.02*
Vomiting	No	Baseline		Baseline
Potbelly/hepatomegaly	Yes	1.11	0.17	0.74	0.41	.07
Potbelly/hepatomegaly	No	Baseline		Baseline
Alopecia	Yes	0.94	0.20	2.49	0.48	<.001*
Alopecia	No	Baseline		Baseline
Pruritus	Yes	−0.88	0.35	−1.85	1.46	.20
Pruritus	No	Baseline		Baseline
Urine specific gravity	Dilute (≤1.020)	Baseline		Baseline
	Not dilute (>1.020)	−0.85	0.26	−2.02	0.78	.01*
	Not recorded	−0.43	0.20	−1.61	0.48	<.001*
Serum ALP	Elevated	Baseline		Baseline
	Not elevated	−1.46	0.35	−2.96	0.61	<.001*
	Not recorded	−0.16	0.16	−0.85	0.62	.17

* P value <.05 implies a significant predictor-outcome association after refitting the original logistic regression model in the external validation data set.
Abbreviations: r_dev, estimated regression coefficient in development data set; r_val, estimated regression coefficient in external validation data set; SE_dev, standard error in development data set; SE_val, SE in external validation data set.

3.3 Model performance

The clinical prediction model systematically overestimated the probability of having Cushing's syndrome (a = −1.10 [95% CI: −1.49 to −0.71], P < .001). The calibration slope was 1.35 (95% CI: 0.97 to 1.73, P < .001; Figure 1). The Hosmer-Lemeshow test was significant (P < .001), indicating that the observed probabilities (expressed as the natural logarithm of the odds) differed significantly from the predicted probabilities.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Calibration plot for the outcome diagnosis of Cushing's syndrome. The plot shows the mean observed proportions of dogs with a diagnosis of Cushing's compared to the mean predicted probabilities, by deciles of predictions. The 45° line denotes perfect calibration.

The discrimination of the model in the current dataset was excellent (c-statistic = 0.83; Figure 2). When setting the threshold of the prediction tool at 2 for the total prediction score (ie, dogs with prediction tool end score ≥2 are predicted as cases and <2 are predicted as non-cases), sensitivity was 91% and specificity 59%. This resulted in a negative predictive value (NPV) of 92% and positive predictive value (PPV) of 54% for diagnosis of Cushing's syndrome in this group of referred Dutch dogs. Decreasing the threshold for the total prediction score to ≥0 increased the NPV to 99% (sensitivity 99%, specificity 41%, PPV 47%).

4 DISCUSSION

This study showed several significant differences in dog characteristics between development UK primary-care dogs and the external validation Dutch secondary-care study sample. Moreover, comparison of the estimated regression coefficients and corresponding standard errors revealed substantial heterogeneity in the predictor-outcome associations. These findings imply moderate transportability of the model, which means that the Dutch dogs differ from those in the original UK study, supporting that external validation was performed in the current study.

There are several explanations for the moderate transportability of the model (ie, different case mix). Geographical differences could have resulted in the finding that breeds used as predictors in the prediction model were uncommon in our study cohort. One explanation for this could be a difference in breed-associated risk between the 2 countries, as breeding practices might lead to significant genetic differences and disease predisposition.^{29, 30} However, it might also be explained by different breed popularities in the United Kingdom and the Netherlands.

Calibration of the model detected a discrepancy between the predicted and observed odds of being diagnosed with Cushing's syndrome, with the model overestimating the probability of having Cushing's syndrome in the Dutch dogs. Some underfitting of the model was detected, implying less variation in the predictive chance of being diagnosed with Cushing's syndrome (high predictions were too low, whereas low predictions were too high). On the other hand, discriminating ability of the model (ie, can the model distinguish cases from non-cases) was found to be excellent given an AUROC of 0.83. This was fairly similar to the AUROC of the prediction tool in its development study sample. Overall, this implies that the model showed good performance in this group of dogs used for external validation.

That the current study tested the model's performance in dogs that were presented to a referral hospital instead of a primary-care practice could explain why the dogs diagnosed with Cushing's syndrome in the present study had higher frequency of clinical signs associated with the condition. This could indicate that dogs diagnosed with Cushing's syndrome in a secondary-care practice show a more prominent clinical picture compared to those diagnosed in primary-care practice. Another explanation could be that the attending internists were more likely to recognize or report these clinical signs. Additionally, the dogs with and without Cushing's syndrome were more similar to each other in the current study than those in the UK primary-care caseload. Dogs without an overly clear clinical picture of Cushing's syndrome but still with some of the suggestive clinical signs and laboratory variables might be referred more often. In addition, veterinarians specialized in internal medicine within secondary-care practice might be more familiar with and/or confident in assessing the clinical picture of Cushing's syndrome. Indeed, polydipsia, potbelly/hepatomegaly, and alopecia were noted more often in non-cases of the external validation group than non-cases of the development group. Such differences could explain why the model overestimates the probability of having Cushing's syndrome in the current cohort's non-cases.

In the current study, only a few of the original model's predictors were found to be significant in the external validation sample (male-entire, vomiting, alopecia, not diluted or not recorded USG, and not elevated serum ALP). This explains the substantial heterogeneity in the predictor-outcome associations. A possible explanation for this is that the other predictors were selected for the original prediction tool because of overfitting. This phenomenon seems to be greatest for the selected breeds. Another factor to consider is the definition of the cases and non-cases in this study vs the original study reporting the development and validation of the Cushing's Prediction Tool. Internationally agreed ALIVE criteria were used as recommended by the 2 largest veterinary endocrinology societies in the current study, whereas the ALIVE criteria were not literally applied to the original 1.

The current study shows a different behavior of the original tool when applied in a sample of a different population. This emphasizes the ongoing need to further externally validate the tool in different populations and settings. Further external validation of the prediction tool would particularly be beneficial within other primary-care populations, other geographical domains and clinic-settings. In addition, several possibilities exist to improve the model's performance for use within secondary-care practice in the Netherlands, without creating a new model completely. As a first possible step, the intercept could be updated for a better calibration.^{26, 31} Second, to improve the model's discriminating ability, removing predictors from the original model could be considered, since several of the original predictors were not associated with outcome in our study (eg, breeds). Adding new predictor variables to the original model could be a third step. In the Dutch dogs, lymphopenia, hypercholesterolemia, and abnormal gall bladder content (sludge, mucocele) proved more common in cases than non-cases. These variables were selected because of the evidence in the literature to indicate their discriminatory ability between dogs with and without Cushing's syndrome.^32-34 These adjustments were not part of the scope of this study. A new prediction model was not constructed using the Dutch data set, as the number of cases and non-cases was relatively low. This could lead to overfitting, rendering the improved or new model not useful for future predictions.³⁵

The retrospective nature of the current study represents an obvious limitation. Laboratory findings were for instance not always available. Although the prediction model was developed to also be used when laboratory findings were not available, it might have shown a better performance with more data available. Moreover, clinical signs not mentioned in the medical record were considered to be absent. However, it is possible that they were present but not recorded, leading to incorrect scores. The authors chose to use the internationally agreed ALIVE criteria for diagnosis of PDH and ADH. The veterinary endocrinology communities actively promote the use of ALIVE definitions since they foster uniformity and comparability of data.³⁶ The results of this study should therefore be seen in the light of these specific disease definitions. Results could therefore also have been different when differing definitions were used, such as in the original British study. The same applies to the use of UCCR in combination with o-HDDST for the diagnosis and differentiation of Cushing's syndrome. This test is popular in the Netherlands, yet not commonly used internationally.³⁷ A drawback of the prediction model itself is that it uses predictors that are also part of the reference standard for diagnosis (ie, the opinion of the attending veterinarian, based on a combination of medical history, clinical signs, physical examination, routine laboratory investigations, endocrine tests, and diagnostic imaging). This is known as incorporation bias, and it can lead to overestimation of the diagnostic accuracy.³⁸ Finally, an important limitation to note is the sample size. Ideally, a large-scale dataset is used for validation purposes to prevent imprecise predictive performance estimates. To minimize this effect, statistical methods were used to account for potential overestimation of model performance because of model fitting on the same dataset. This correction ensures a more accurate estimation of the model's predictive ability and helps mitigate any potential bias introduced by the relatively low numbers of cases.

In conclusion, this external validation study showed moderate transportability of a Cushing's syndrome prediction model developed in the UK in a group of dogs presented to a secondary-care practice in the Netherlands. The model had excellent discriminatory ability. Overall, the tool did overestimate the probability of having Cushing's syndrome. Despite its limitations, the tool could still prove useful in its current form. Using a total prediction score cut-off of 0 (ie, score < 0 predicts the dog does not have Cushing's syndrome), it demonstrated an NPV of 99% and as such the model could be useful as a screening test early in the diagnostic pathway to rule out Cushing's syndrome as a likely explanation for the dog's clinical signs. Adrenal function tests and differentiating tests could then be pursued if Cushing's syndrome remains a probable diagnosis based on the tool's result, the dog's overall clinical picture and routine laboratory results. The study emphasizes ongoing validation efforts of this and other disease prediction tools are a worthwhile effort, when considering their use in populations with differing characteristics, such as in different countries or practice types.

ACKNOWLEDGMENT

No funding was received for this study.

CONFLICT OF INTEREST DECLARATION

Authors declare no conflict of interest.

OFF-LABEL ANTIMICROBIAL DECLARATION

Authors declare no off-label use of antimicrobials.

INSTITUTIONAL ANIMAL CARE AND USE COMMITTEE (IACUC) OR OTHER APPROVAL DECLARATION

Authors declare no IACUC or other approval was needed.

HUMAN ETHICS APPROVAL DECLARATION

Authors declare human ethics approval was not needed for this study.

REFERENCES

1 European Society of Veterinary Endocrinology. ALIVE Project Definition Cushing's syndrome. 2021.
Google Scholar
2Carotenuto G, Malerba E, Dolfini C, et al. Cushing's syndrome—an epidemiological study based on a canine population of 21,281 dogs. Open Vet J. 2019; 9: 27-32.
10.4314/ovj.v9i1.5
PubMed Web of Science® Google Scholar
3Schofield I, Brodbelt DC, Niessen SJM, Church DB, Geddes RF, O'Neill DG. Frequency and risk factors for naturally occurring Cushing's syndrome in dogs attending UK primary-care practices. J Small Anim Pract. 2022; 63: 265-274.
10.1111/jsap.13450
CAS PubMed Web of Science® Google Scholar
4O'Neill DG, Scudder C, Faire JM, et al. Epidemiology of hyperadrenocorticism among 210,824 dogs attending primary-care veterinary practices in the UK from 2009 to 2014. J Small Anim Pract. 2016; 57: 365-373.
10.1111/jsap.12523
PubMed Web of Science® Google Scholar
5Bennaim M, Shiel RE, Mooney CT. Diagnosis of spontaneous hyperadrenocorticism in dogs. Part 1: pathophysiology, aetiology, clinical and clinicopathological features. Vet J. 2019; 252:105342.
10.1016/j.tvjl.2019.105342
PubMed Web of Science® Google Scholar
6Behrend EN, Kooistra HS, Nelson R, Reusch CE, Scott-Moncrieff JC. Diagnosis of spontaneous canine hyperadrenocorticism: 2012 ACVIM Consensus Statement (small animal). J Vet Intern Med. 2013; 27: 1292-1304.
10.1111/jvim.12192
CAS PubMed Web of Science® Google Scholar
7Bennaim M, Shiel RE, Mooney CT. Diagnosis of spontaneous hyperadrenocorticism in dogs. Part 2: adrenal function testing and differentiating tests. Vet J. 2019; 252:105343.
10.1016/j.tvjl.2019.105343
PubMed Web of Science® Google Scholar
8Bennaim M, Shiel RE, Forde C, Mooney CT. Evaluation of individual low-dose dexamethasone suppression test patterns in naturally occurring hyperadrenocorticism in dogs. J Vet Intern Med. 2018; 32: 967-977.
10.1111/jvim.15079
PubMed Web of Science® Google Scholar
9Rijnberk A, van Wees A, Mol JA. Assessment of two tests for the diagnosis of canine hyperadrenocorticism. Vet Rec. 1988; 122: 178-180.
10.1136/vr.122.8.178
CAS PubMed Web of Science® Google Scholar
10Kaplan AJ, Peterson ME, Kemppainen RJ. Effects of disease on the results of diagnostic tests for use in detecting hyperadrenocorticism in dogs. J Am Vet Med Assoc. 1995; 207: 445-451.
10.2460/javma.1995.207.04.0445
CAS PubMed Web of Science® Google Scholar
11Smiley LE, Peterson ME. Evaluation of a urine cortisol: creatinine ratio as a screening test for hyperadrenocorticism in dogs. J Vet Intern Med. 1993; 7: 163-168.
10.1111/j.1939-1676.1993.tb03181.x
CAS PubMed Web of Science® Google Scholar
12Monroe WE, Panciera DL, Zimmerman KL. Concentrations of noncortisol adrenal steroids in response to ACTH in dogs with adrenal-dependent hyperadrenocorticism, pituitary-dependent hyperadrenocorticism, and nonadrenal illness. J Vet Intern Med. 2012; 26: 945-952.
10.1111/j.1939-1676.2012.00959.x
CAS PubMed Web of Science® Google Scholar
13Schofield I, Brodbelt DC, Kennedy N, et al. Machine-learning based prediction of Cushing's syndrome in dogs attending UK primary-care veterinary practice. Sci Rep. 2021; 11: 9035.
10.1038/s41598-021-88440-z
CAS PubMed Web of Science® Google Scholar
14Schofield I, Brodbelt DC, Niessen SJM, et al. Development and internal validation of a prediction tool to aid the diagnosis of Cushing's syndrome in dogs attending primary-care practice. J Vet Intern Med. 2020; 34: 2306-2318.
10.1111/jvim.15851
PubMed Web of Science® Google Scholar
15Li Y, Sperrin M, Belmonte M, Pate A, Ashcroft DM, van Staa TP. Do population-level risk prediction models that use routinely collected health data reliably predict individual risks? Sci Rep. 2019; 9: 11222.
10.1038/s41598-019-47712-5
PubMed Web of Science® Google Scholar
16de Jong VMT, Moons KGM, Eijkemans MJC, et al. Developing more generalizable prediction models from pooled studies and large clustered data sets. Stat Med. 2021; 40: 3533-3559.
10.1002/sim.8981
PubMed Web of Science® Google Scholar
17Toll DB, Janssen KJM, Vergouwe Y, Moons KGM. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol. 2008; 61: 1085-1094.
10.1016/j.jclinepi.2008.04.008
CAS PubMed Web of Science® Google Scholar
18Austin PC, van Klaveren D, Vergouwe Y, et al. Validation of prediction models: examining temporal and geographic stability of baseline risk and estimated covariate effects. Diagn Progn Res. 2017; 1: 1-8.
10.1186/s41512-017-0012-3
PubMed Google Scholar
19Salive ME. Referral bias in tertiary care: the utility of clinical epidemiology. Mayo Clin Proc. 1994; 69: 808-809.
10.1016/S0025-6196(12)61105-7
CAS PubMed Web of Science® Google Scholar
20Bartlett PC, van Buren JW, Neterer M, Zhou C. Disease surveillance and referral bias in the veterinary medical database. Prev Vet Med. 2010; 94: 264-271.
10.1016/j.prevetmed.2010.01.007
PubMed Web of Science® Google Scholar
21Collonnaz M, Erpelding ML, Alla F, et al. Impact of referral bias on prognostic studies outcomes: insights from a population-based cohort study on infective endocarditis. Ann Epidemiol. 2021; 54: 29-37.
10.1016/j.annepidem.2020.09.008
PubMed Web of Science® Google Scholar
22Galac S, Kooistra HS, Teske E, Rijnberk A. Urinary corticoid/creatinine ratios in the differentiation between pituitary-dependent hyperadrenocorticism and hyperadrenocorticism due to adrenocortical tumour in the dog. Vet Q. 1997; 19: 17-20.
10.1080/01652176.1997.9694731
CAS PubMed Web of Science® Google Scholar
23Stolp R, Rijnberk A, Meijer JC, Croughs RJM. Urinary corticoids in the diagnosis of canine hyperadrenocorticism. Res Vet Sci. 1983; 34: 141-144.
10.1016/S0034-5288(18)32248-3
CAS PubMed Web of Science® Google Scholar
24 European Society of Veterinary Endocrinology. ALIVE Project Definition Adrenal-dependent Hypercortisolism (ADH). 2021.
Google Scholar
25 European Society of Veterinary Endocrinology. ALIVE Project Definition Pituitary-dependent Hypercortisolism (PDH). 2021.
Google Scholar
26Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015; 68: 279-289.
10.1016/j.jclinepi.2014.06.018
PubMed Web of Science® Google Scholar
27Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med. 1999; 130: 515-524.
10.7326/0003-4819-130-6-199903160-00016
CAS PubMed Web of Science® Google Scholar
28Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression. 3rd ed. Hoboken, New Jersey: Wiley Series in Probability and Statistics; 1989: 528.
Google Scholar
29Comazzi S, Marelli S, Cozzi M, et al. Breed-associated risks for developing canine lymphoma differ among countries: an European canine lymphoma network study. BMC Vet Res. 2018; 14: 232.
10.1186/s12917-018-1557-2
PubMed Web of Science® Google Scholar
30Lampi S, Donner J, Anderson H, Pohjoismäki J. Variation in breeding practices and geographic isolation drive subpopulation differentiation, contributing to the loss of genetic diversity within dog breed lineages. Canine Med Genet. 2020; 7: 5.
10.1186/s40575-020-00085-9
PubMed Google Scholar
31Janssen KJM, Vergouwe Y, Kalkman CJ, Grobbee DE, Moons KGM. A simple method to adjust clinical prediction models to local circumstances. Can J Anaesth. 2009; 56: 194-201.
10.1007/s12630-009-9041-x
PubMed Web of Science® Google Scholar
32Reusch CE, Feldman EC. Canine hyperadrenocorticism due to adrenocortical neoplasia. Pretreatment evaluation of 41 dogs. J Vet Intern Med. 1991; 5: 3-10.
10.1111/j.1939-1676.1991.tb00922.x
CAS PubMed Web of Science® Google Scholar
33Ling GV, Stabenfeldt GH, Comer KM, Gribble DH, Schechter RD. Canine hyperadrenocorticism: pretreatment clinical and laboratory evaluation of 117 cases. J Am Vet Med Assoc. 1979; 174: 1211-1215.
CAS PubMed Web of Science® Google Scholar
34Hoffman JM, Lourenço BN, Promislow DEL, Creevy KE. Canine hyperadrenocorticism associations with signalment, selected comorbidities and mortality within North American veterinary teaching hospitals. J Small Anim Pract. 2018; 59(11): 681-690.
10.1111/jsap.12904
CAS PubMed Web of Science® Google Scholar
35Martin GP, Riley RD, Collins GS, Sperrin M. Developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance. Stat Methods Med Res. 2021; 30: 2545-2561.
10.1177/09622802211046388
PubMed Web of Science® Google Scholar
36Niessen SJM, Bjornvad C, Church DB, et al. Agreeing Language in Veterinary Endocrinology (ALIVE): diabetes mellitus – a modified Delphi-method-based system to create consensus disease definitions. Vet J. 2022; 289:105910.
10.1016/j.tvjl.2022.105910
PubMed Web of Science® Google Scholar
37Behrend EN, Kemppainen RJ, Clark TP, Salman MD, Peterson ME. Diagnosis of hyperadrenocorticism in dogs: a survey of internists and dermatologists. J Am Vet Med Assoc. 2002; 220: 1643-1649.
10.2460/javma.2002.220.1643
PubMed Web of Science® Google Scholar
38Worster A, Carpenter C. Incorporation bias in studies of diagnostic tests: how to avoid being biased about bias. Canadian J Emerg Med. 2008; 10: 174-175.
10.1017/S1481803500009891
PubMed Web of Science® Google Scholar

Volume37, Issue6

November/December 2023

Pages 2052-2063

External validation of a United Kingdom primary-care Cushing's prediction tool in a population of referred Dutch dogs