Volume 98, Issue 7 pp. 1052-1057
RESEARCH ARTICLE
Free Access

External validation of a novel electronic risk score for cancer-associated thrombosis in a comprehensive cancer center

Ang Li

Corresponding Author

Ang Li

Section of Hematology-Oncology, Baylor College of Medicine, Houston, Texas, USA

Correspondence

Ang Li, Baylor College of Medicine, One Baylor Plaza, 011DF, Houston, TX 77030, USA.

Email: [email protected]

Search for more papers by this author
Giordana De Las Pozas

Giordana De Las Pozas

Department of Tumor Registry, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Clark R. Andersen

Clark R. Andersen

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Chijioke C. Nze

Chijioke C. Nze

Hematology/Oncology Fellowship Program, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Katy M. Toale

Katy M. Toale

Division of Pharmacy, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Emily M. Milner

Emily M. Milner

School of Medicine, Baylor College of Medicine, Houston, Texas, USA

Search for more papers by this author
Nathanael R. Fillmore

Nathanael R. Fillmore

Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, Massachusetts, USA

Section of Hematology & Medical Oncology, Boston University School of Medicine, Boston, Massachusetts, USA

Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA

Search for more papers by this author
Elizabeth Yu Chiao

Elizabeth Yu Chiao

Department of Epidemiology, Division of Cancer Prevention and Population Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Cristhiam Rojas Hernandez

Cristhiam Rojas Hernandez

Section of Benign Hematology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Michael H. Kroll

Michael H. Kroll

Section of Benign Hematology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Kelly W. Merriman

Kelly W. Merriman

Department of Tumor Registry, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Christopher R. Flowers

Christopher R. Flowers

Department of Lymphoma-Myeloma, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
First published: 17 April 2023
Citations: 1

Kelly W Merriman and Christopher R Flowers contributed equally as senior authors.

Abstract

Venous thromboembolism (VTE) is a significant complication for cancer patients undergoing systemic therapy. We performed an independent external validation for a recently derived and validated a novel electronic health record (EHR) VTE risk score in a comprehensive cancer center. Adult patients with incident cancer diagnoses were identified from MD Anderson Cancer Center Tumor Registry 1/2017–1/2021. Baseline covariates extracted at the time of first-line systemic therapy included demographics, cancer site/histology, stage, treatment, complete blood count, body mass index, recent prolonged hospitalization, and history of VTE or paralysis. VTE was ascertained using an institution-specific natural language processing radiology algorithm (positive predictive value of 94.8%). The median follow-up for 21 142 cancer patients was 8.1 months. There were 1067 (5.7%) VTE within 6 months after systemic therapy. The distribution of the novel score for 0-, 1, 2, 3, 4, 5+ was 5661, 3558, 3462, 3489, 2918, and 2054; while the corresponding 6-month VTE incidence was 1.3%, 3.1%, 5.4%, 7.3%, 9.3%, and 13.8%, respectively (c statistic 0.71 [95% CI 0.69–0.72] with excellent calibration). In comparison, the Khorana score had a c statistic of 0.64 [95% CI 0.62–0.65]. The two risk scores had 80% concordance; the novel score reclassified 20% of Khorana score (3530 low-to-high with 9.0% VTE; 734 high-to-low with 3.4% VTE) and led to a 25% increment in VTEs captured in the high-risk group. In conclusion, the novel score demonstrated consistent discrimination and calibration across cohorts with heterogenous demographics. It could become a new standard to select high-risk populations for clinical trials and VTE monitoring.

1 INTRODUCTION

Venous thromboembolism (VTE) is a significant complication for cancer patients undergoing systemic therapy.1, 2 The American Society of Hematology and American Society of Clinical Oncology guidelines recommend pharmacologic thromboprophylaxis for selective ambulatory cancer patients at intermediate-to-high risk for VTE based on a validated risk assessment tool (i.e., Khorana score) complemented by clinical judgment and experience.3, 4 To achieve this goal in an automated approach for all cancer types and modern therapies, we recently derived and validated a novel electronic health record (EHR) risk score using 9769 patients from the Harris Health System (HHS) and 79 517 patients from the Veterans Affairs healthcare system (VA).5 However, HHS as a safety-net system had mostly underserved and uninsured patients, while the VA had predominantly male veterans. Therefore, we sought to perform an independent external validation at a large National Cancer Institute (NCI)-designated comprehensive cancer center to ensure generalizability and accuracy across populations.

2 METHODS

We performed a retrospective cohort study at MD Anderson Cancer Center (MDACC). Adult patients with incident cancer diagnoses were identified from MDACC Tumor Registry from 1/2017–1/2021 and merged with encounter, medication, laboratory, and claims data from MDACC data warehouse. Patients were excluded if they did not receive systemic anti-cancer therapy within 1 year of diagnosis, had isolated encounters without follow-up, had recent diagnosis of acute VTE within 6 months or received anticoagulation within 1 month before therapy, or had missing body mass index (BMI), complete blood count (CBC, including white blood cell (WBC), hemoglobin (Hb), and platelet (Plt)). Patients were evaluated from the time of first-line systemic therapy initiation until the first outcome event, death, loss of follow-up (>90-day gap without clinical encounter), or censoring on 12/2022.

VTE was defined as radiologically confirmed subsegmental or greater pulmonary embolism (PE), proximal or distal lower extremity deep vein thrombosis (LE-DVT), or proximal or distal upper extremity DVT (UE-DVT).6 Specifically, VTE rate at 6 months was chosen as the primary outcome to mimic recent clinical trials.7, 8 This was ascertained from the MDACC VTE registry using an institutional-specific natural language processing (NLP) radiology algorithm. The positive predictive value (PPV) was 94.8% based on a selective chart review of 500 patients. The most common reason for misclassification occurred in radiology reports where “a thrombus could not be excluded”.

Baseline covariates derived from the MDACC Tumor Registry included age, sex, race, ethnicity, insurance, ICD-O-3 cancer type/histology, and cancer stage. Specifically, cancer type was recorded using the WHO 2008 cancer classifications for common solid tumors and rare cancer classifications for sarcoma, neuroendocrine cancers, and hematologic malignancies.9, 10 Cancer stage (I, II, III, IV, not applicable or missing) was defined using the American Joint Committee on Cancer (AJCC) definition for most cancers. For cancers without appropriate AJCC classification (e.g., brain cancer, leukemia, myeloma), SEER summary stage was used where “distant site” was defined as stage IV. Baseline covariates derived from the EHR included first-line systemic therapy, WBC, Hb, Plt, BMI, recent hospitalization >3 days, and lifetime history of VTE or paralysis. Specifically, systemic therapy was classified as chemotherapy, immune checkpoint inhibitor (ICI), targeted therapy, or endocrine therapy (see Table S1 for detailed breakdown by drug name). BMI, WBC, Hb, Plt were defined using the closest values on or before the date of systemic therapy initiation within a 12-month lookback window. Recent prolonged hospitalization >3 days were defined as a qualifying event within a 3-month lookback window. History of VTE and paralysis were defined using all available ICD codes on or before index date using a lifetime lookback window (see Table S2). Complete case analysis was performed without imputation.

Data harmonization and variable derivation from the previous publication5 were independently performed by the current study coauthors. Differential points were assigned to cancer type, stage, therapy, WBC, Hb, Plt, BMI, recent hospitalization, VTE history, paralysis history, and race. The final additive score was consolidated into 0- to 5+ points where 3+ was defined as high-risk. Bootstrapped time-dependent c statistic and calibration curves from cumulative incidence function at 6 months were used to assess model discrimination and fit, respectively.11, 12 The predictive performance was compared with the Khorana score. The original integer score model was used without any model update or recalibration. Statistical analyses were performed using R 4.2.2 (Vienna, Austria) and Stata/SE 16.1 (College Station, TX). The study was approved by the Institutional Review Board at MDACC.

3 RESULTS

From 2017 to 2021, 36 542 patients with incident cancer diagnoses receiving upfront treatment at MDACC (analytic cases) were identified in the Tumor Registry. Patients were excluded if they did not receive systemic anti-cancer therapy within 1 year after diagnosis (n = 12 818), had inadequate follow-up with isolated consultation encounters (n = 467), received anticoagulation within 30 days prior to systemic therapy (n = 696), had recent diagnosis of acute VTE within the past 6 months (n = 584), had missing values in BMI or CBC (n = 534), or had inappropriate age (n = 301). In the final analytic cohort, 21 142 patients with incident cancer diagnosis receiving systemic therapy within 1 year of diagnosis met the inclusion criteria with the index date at the time of initial therapy initiation (Figure 1).

Details are in the caption following the image
Patient cohort selection and outcome overview. This diagram provides an overview of the patient selection and exclusion for the current external validation study along with outcome ascertainment.

Among included patients, the median age was 60, 53% were female, and 88.9% had private or Medicare insurance. Race/ethnicity breakdown showed 66.2% Non-Hispanic White, 9.6% Non-Hispanic Black, 14.3% Hispanic, 6.0% Non-Hispanic Asian Pacific Islander, and 4.0% Other/Unknown. The most common types of cancers included breast (23.7%), lung (8.5%), colorectal (7.2%), and prostate (6.3%). The majority of patients had stage III-IV at diagnosis (60.8%). The median continuous follow-up for VTE assessment was 8.1 months (IQR 4.1–14.9). From the NLP-based MDACC VTE registry, 2008 new VTE events (932 PE, 542 LE-DVT, 534 UE-DVT) were captured, from which 1731 occurred within continuous follow-up, and 1067 (5.7%) occurred within the first 6 months after initial therapy.

The point assignments of the risk scores are shown in Table 1 and the stratified outcomes are shown in Table 2. The distribution of the novel score for 0-, 1, 2, 3, 4, 5+ was 5661 (26.8%), 3558 (16.8%), 3462 (16.4%), 3489 (16.5%), 2.918 (13.8%), and 2054 (9.7%), while the corresponding VTE incidence was 1.3%, 3.1%, 5.4%, 7.3%, 9.3%, and 13.8% at 6 months, respectively (c statistic 0.71 [95% CI 0.69–0.72], Figure 2A). The observed VTE in the external validation had excellent calibration at each of the predicted points (Figure 2B). There was no systemic deviation in model discrimination across age, sex, and race/ethnicity subgroups (c statistics 0.68–0.72). When PE/LE-DVT was used as the outcome, the score performed similarly well with a c statistic of 0.72 (95% CI 0.69–0.73).

TABLE 1. Comparison of baseline predictor distribution in Khorana score versus novel risk score in 21 142 cancer patients receiving systemic therapy.
Characteristics Number (%) Khorana et al.13 Li et al.5
Cancer type
Other cancer 10 452 (49.4%)
Colorectal & intestinal 1526 (7.2%) +1
Soft tissue sarcoma 588 (2.8%) +2
Brain 513 (2.4%) +2
Lung 1797 (8.5%) +1 +2
Kidney 390 (1.8%) +1 +2
Bladder 362 (1.7%) +1 +2
Testicular 109 (0.5%) +1 +2
Uterine 349 (1.7%) +1 +2
Ovarian 319 (1.5%) +1 +2
Cervical 238 (1.1%) +1
Other gynecologic 143 (0.7%) +1
Precursor B/T lymphoma 54 (0.3%) +1 +2
T & Natural Killer lymphoma 113 (0.5%) +1 +2
Diffuse large B cell lymphoma 608 (2.9%) +1 +2
Follicular lymphoma 273 (1.3%) +1
Other mature B cell lymphoma 92 (0.4%) +1
Hodgkin lymphoma 286 (1.4%) +1
Other lymphoma 319 (1.5%) +1
Multiple myeloma 515 (2.4%) +2
Acute lymphocytic leukemia 186 (0.9%) +2
Esophageal & gastric 828 (3.9%) +2 +3
Pancreas 764 (3.6%) +2 +3
Cholangiocarcinoma & gallbladder 318 (1.5%) +3
Cancer stage
I 4274 (20.2%)
II 3501 (16.6%)
III 4400 (20.8%) +1
IV 8453 (40.0%) +1
Not applicable or missing 514 (2.4%)
Systemic therapy
Chemotherapy +/− others 14 168 (67.0%)
Immune checkpoint inhibitor (ICI) +/− others 1277 (6.0%)
Chemotherapy + ICI +/− others 765 (3.6%)
Targeted +/− endocrine 1819 (8.6%) −1
Endocrine only 3113 (14.7%) −1
Complete blood count
White blood cell >11 2890 (13.7%) +1 +1
Hemoglobin <10 2204 (10.4%) +1 +1
Platelet ≥350 3313 (15.7%) +1 +1
Other predictors
Body mass index (BMI) ≥ 35 3270 (15.5%) +1 +1
Non-Hispanic Asian Pacific Islander, American Native, Alaskan Indian 1262 (6.0%) −1
History of VTE lifetime 343 (1.6%) +1
History of paralysis lifetime 460 (2.2%) +1
History of hospitalization >3d within last 90d 3555 (16.8%) +1
TABLE 2. Comparison of VTE incidence and performance of Khorana score versus novel risk score.
Number (%) VTE at 6 month Risk group VTE at 6 month C statistic
Khorana et al.13
0 8665 (41.0%) 230 (3.1%) Low (0–1) 594 (4.4%) 0.64 [95% CI 0.62–0.65]
1 6812 (32.2%) 364 (6.0%)
2 3935 (18.6%) 291 (8.2%) High (2+) 473 (9.2%)
3+ 1730 (8.2%) 182 (11.4%)
Li et al.5
0- 5661 (26.8%) 59 (1.3%) Low (2-) 325 (2.6%) 0.71 [95% CI 0.69–0.72]
1 3558 (16.8%) 99 (3.1%)
2 3462 (16.4%) 167 (5.4%)
3 3489 (16.5%) 232 (7.3%) High (3+) 742 (8.8%)
4 2918 (13.8%) 250 (9.3%)
5+ 2054 (9.7%) 260 (13.8%)
  • Note: The final point assignment for both Khorana score and Li et al.5 was derived by adding or subtracting individual score from Table 1. The cut-off threshold for risk group assignment for Khorana score (2+) and Li et al (3+) were based on previously published studies. C statistics were estimated from time-dependent ROC at 6 months and 95% confidence interval was estimated from 500 bootstrapped resamples.
Details are in the caption following the image
Discrimination and calibration of the novel risk score in the external validation cohort. (A). Performance of the novel risk score from Li et al.5 using cumulative incidence competing risk curves. The observed VTE incidences at 6 months (vertical line) are 1.3%, 3.1%, 5.4%, 7.3%, 9.3%, 13.8% for 0-, 1, 2, 3, 4, 5+ points, respectively, with time-dependent C statistic of 0.71 (95% CI 0.69–0.72). (B). Model calibration curve. The calibration curves compare the observed VTE incidence at 6 months from the current validation study versus the predicted VTE incidence at 6 months from the initial derivation study. The vertical lines represent confidence intervals from 500 bootstrapped resamples.

In comparison, the distribution of the Khorana score for 0, 1, 2, and 3+ was 8665 (41.0%), 6812 (32.2%), 3935 (18.6%), and 1730 (8.2%), while the corresponding VTE incidence was 3.1%, 6.0%, 8.2%, and 11.4% at 6 months, respectively (c statistic 0.64 [95% CI 0.62–0.65]). In a sensitivity analysis that excluded patients with brain cancer, myeloma, or leukemia to simulate original cohort design for the Khorana score, the c statistic remained unchanged.

When assessed as a binary low- versus high-risk group, the score from Li et al.5 (3+ cutoff) and Khorana score (2+ cutoff) had 80% concordance (11 947 low-risk with 3.0% VTE; 4931 high-risk with 10.0% VTE). The score reclassified 20% of patients from the Khorana score (3530 low-to-high with 9.0% VTE; 734 high-to-low with 3.4% VTE). At 6 months, the novel score captured 742 (69.5%) while the Khorana score captured 473 (44.3%) in the high-risk group – a 25% increment in VTE events.

4 DISCUSSION

Using 21 142 cancer patients from an NCI-comprehensive cancer center, we successfully validated the recently derived novel risk score for cancer-associated thrombosis. This risk score has performed consistently in a combined 110 428 patients from multiple cancer cohorts (c statistic ranging 0.68–0.71)5 with a modest improvement compared with the Khorana score (25% more VTE events in the high-risk group). More importantly, the model calibration appears stable despite significant differences in patient demographics across cohorts. This suggests that this simple score provides sufficient flexibility and granularity to estimate VTE incidence over broad variability in cancer patient populations.

This validation study has certain strengths. While the study design was conceived by the original authors of the initial Li et al. publication,5 the cohort construction, variable extraction, outcome capture, and statistical analysis were independently performed by co-authors from MDACC. Therefore, the risk score is not only discriminative but can be implemented across institutions with access to cancer registry and EHR database. Furthermore, the VTE outcome for the study was screened and validated independently using an institution-specific NLP radiology algorithm rather than administrative claims codes. However, the reliance on radiology reports may underestimate the incidence of VTE if they were diagnosed at an outside hospital.

Despite the performance of the novel risk score, we recognize the Khorana score as a viable and important risk stratification tool, especially in healthcare systems without access to integrated EHR data. While it was initially derived for patients with solid tumors and lymphomas receiving cytotoxic chemotherapy, it remains discriminative in modern cancer populations (c statistics ranging 0.60–0.65 in 3 cohorts).5 The current risk score is best viewed as a modified Khorana score derived from the same risk predictors but further complemented by clinical judgment and experience from EHR data. The additional risk predictors make intuitive sense clinically with supportive data from other publications.13-15 Furthermore, more data on bleeding outcomes are needed to aid with clinical implementation of risk prediction scores for VTE in cancer patients.

In summary, we have successfully validated the novel risk score for VTE in a comprehensive cancer center with patient population inherently different than the initial derivation or validation cohorts. We hope an integrated EHR tool can become a new standard to select high-risk populations for clinical trials and VTE monitoring across institutions.

AUTHOR CONTRIBUTIONS

Conception and design: Ang Li, Christopher R Flowers. Collection and assembly of data: Giordana De Las Pozas, Kelly W Merriman, Katy M Toale, Emily M Milner, Elizabeth Yu Chiao, Cristhiam Rojas Hernandez, Michael H Kroll. Data analysis: Clark R Andersen, Chijioke C Nze. Data interpretation: Ang Li, Giordana De Las Pozas, Clark R Andersen, Chijioke C Nze, Katy M Toale, Emily M Milner, Nathanael R Fillmore, Elizabeth Yu Chiao, Cristhiam Rojas Hernandez, Michael H Kroll, Kelly W Merriman, Christopher R Flowers. Manuscript writing: Ang Li. Final approval of manuscript: Ang Li, Giordana De Las Pozas, Clark R Andersen, Chijioke C Nze, Katy M Toale, Emily M Milner, Nathanael R Fillmore, Elizabeth Yu Chiao, Cristhiam Rojas Hernandez, Michael H Kroll, Kelly W Merriman, Christopher R Flowers.

ACKNOWLEDGMENTS

AL, a CPRIT Scholar in Cancer Research, was supported by the Cancer Prevention and Research Institute of Texas (RR190104), the National Heart, Lung, and Blood Institute (K23 HL159271), and the National Institute of Health AIM-AHEAD (1OT2-OD032581). CRF, a CPRIT Scholar in Cancer Research, was supported by the Cancer Prevention and Research Institute of Texas (RR190079). MHK was supported by the National Institutes of Health (P30 CA016672).

    CONFLICT OF INTEREST STATEMENT

    The authors declare no competing financial interests.

    DATA AVAILABILITY STATEMENT

    Data will not be shared online due to institutional policies pertaining to individual patient data. For data access request, please contact the corresponding author to coordinate a collaborative data use agreement with MDACC.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.