Validated algorithms for identifying timing of second event of oropharyngeal squamous cell carcinoma using real-world data
Shahreen Khair MA
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Search for more papers by this authorJoseph C. Dort MD, MSc
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Search for more papers by this authorMay Lynn Quan MD, MSc
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Department of Oncology, Cumming School of Medicine, University of Calgary, Tom Baker, Cancer Centre, Calgary, Alberta, Canada
Search for more papers by this authorWinson Y. Cheung MD, MPH
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Search for more papers by this authorKhara M. Sauro PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Department of Oncology, Cumming School of Medicine, University of Calgary, Tom Baker, Cancer Centre, Calgary, Alberta, Canada
Search for more papers by this authorSteven C. Nakoneshny BSc
The Ohlson Research Initiative, Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, Alberta, Canada
Search for more papers by this authorBrittany Lynn Popowich BHSc
Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Teaching Research and Wellness (TRW), Calgary, Alberta, Canada
Search for more papers by this authorPing Liu PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Search for more papers by this authorGuosong Wu PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Teaching Research and Wellness (TRW), Calgary, Alberta, Canada
Search for more papers by this authorCorresponding Author
Yuan Xu MD PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Department of Oncology, Cumming School of Medicine, University of Calgary, Tom Baker, Cancer Centre, Calgary, Alberta, Canada
Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Teaching Research and Wellness (TRW), Calgary, Alberta, Canada
Correspondence
Yuan Xu, Cumming School of Medicine, University of Calgary, 3280 Hospital Drive NW, Calgary, Alberta T2N4Z6, Canada.
Email: [email protected]
Search for more papers by this authorShahreen Khair MA
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Search for more papers by this authorJoseph C. Dort MD, MSc
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Search for more papers by this authorMay Lynn Quan MD, MSc
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Department of Oncology, Cumming School of Medicine, University of Calgary, Tom Baker, Cancer Centre, Calgary, Alberta, Canada
Search for more papers by this authorWinson Y. Cheung MD, MPH
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Search for more papers by this authorKhara M. Sauro PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Department of Oncology, Cumming School of Medicine, University of Calgary, Tom Baker, Cancer Centre, Calgary, Alberta, Canada
Search for more papers by this authorSteven C. Nakoneshny BSc
The Ohlson Research Initiative, Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, Alberta, Canada
Search for more papers by this authorBrittany Lynn Popowich BHSc
Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Teaching Research and Wellness (TRW), Calgary, Alberta, Canada
Search for more papers by this authorPing Liu PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Search for more papers by this authorGuosong Wu PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Teaching Research and Wellness (TRW), Calgary, Alberta, Canada
Search for more papers by this authorCorresponding Author
Yuan Xu MD PhD
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
Department of Surgery, Cumming School of Medicine, University of Calgary, North Tower, Foothills Medical Centre, Calgary, Alberta, Canada
Department of Oncology, Cumming School of Medicine, University of Calgary, Tom Baker, Cancer Centre, Calgary, Alberta, Canada
Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Teaching Research and Wellness (TRW), Calgary, Alberta, Canada
Correspondence
Yuan Xu, Cumming School of Medicine, University of Calgary, 3280 Hospital Drive NW, Calgary, Alberta T2N4Z6, Canada.
Email: [email protected]
Search for more papers by this authorAbstract
Background
Understanding occurrence and timing of second events (recurrence and second primary cancer) is essential for cancer specific survival analysis. However, this information is not readily available in administrative data.
Methods
Alberta Cancer Registry, physician claims, and other administrative data were used. Timing of second event was estimated based on our developed algorithm. For validation, the difference, in days between the algorithm estimated and the chart-reviewed timing of second event. Further, the result of Cox-regression modeling cancer-free survival was compared to chart review data.
Results
Majority (74.3%) of the patients had a difference between the chart-reviewed and algorithm-estimated timing of second event falling within the 0–60 days window. Kaplan–Meier curves generated from the estimated data and chart review data were comparable with a 5-year second-event-free survival rate of 75.4% versus 72.5%.
Conclusion
The algorithm provided an estimated timing of second event similar to that of the chart review.
CONFLICT OF INTEREST
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
REFERENCES
- 1Gazawi FM, Lu J, Savin E, et al. Epidemiology and patient distribution of oral cavity and oropharyngeal SCC in Canada. J Cut Med Surg. 2020; 24(4): 340-349.
- 2Hobbs AJ, Brockton NT, Matthews TW, et al. Primary treatment for oropharyngeal squamous cell carcinoma in Alberta, Canada: a population-based study. Head Neck. 2017; 39(11): 2187-2199.
- 3Michiels S, Le Maître A, Buyse M, et al. Surrogate endpoints for overall survival in locally advanced head and neck cancer: meta-analyses of individual patient data. Lancet Oncol. 2009; 10(4): 341-350. doi:10.1016/S1470-2045(09)70023-3
- 4Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data. 2019; 6:54.
10.1186/s40537-019-0217-0 Google Scholar
- 5Xu Y, Kong S, Cheung WY, et al. Development and validation of case-finding algorithms for recurrence of breast cancer using routinely collected administrative data. Int J Popul Data Sci. 2019; 19(1): 210.
- 6Ricketts K, Williams M, Liu Z-W, Gibson A. Automated estimation of disease recurrence in head and neck cancer using routine healthcare data. Comput Methods Biomech Biomed Eng. 2014; 117(3): 412-424.
- 7A'mar T, Beatty JD, Fedorenko C, et al. Incorporating breast cancer recurrence events into population-based cancer registries using medical claims: cohort study. JMIR Cancer. 2020; 6(2):e18143.
- 8Rasmussen LA, Jensen H, Virgilsen LF, Jensen JB, Vedsted P. A validated algorithm to identify recurrence of bladder cancer: a register-based study in Denmark. Clin Epi. 2018; 10: 1755-1763.
- 9Chubak J, Onega T, Zhu W, Buist DS, Hubbard RA. An electronic health record–based algorithm to ascertain the date of second breast cancer events. Med Care. 2017; 55(12): e81-e87.
- 10Ritzwoller DP, Hassett MJ, Uno H, et al. Development, validation, and dissemination of a breast cancer recurrence detection and timing informatics algorithm. 2017; 110(3): 273-281.
- 11Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann A. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epi. 2011; 64(8): 821-829.
- 12Cairncross ZF, Nelson G, Shack L, Metcalfe A. Validation in Alberta of an administrative data algorithm to identify cancer recurrence. Curr Oncol. 2020; 27(3): 343-346.
- 13Li Z, Li C, Long Y, Wang X. A system for automatically extracting clinical events with temporal information. BMC Med Inform Decis Mak. 2020; 20(1): 198.
- 14Rasmussen LA, Jensen H, Virgilsen LF, et al. Identification of endometrial cancer recurrence—a validated algorithm based on nationwide Danish registries. Acta Oncol. 2020; 60(4): 452-458.
- 15Mosayebi A, Mojaradi B, Bonyadi Naeini A, Khodadad Hosseini SH. Modeling and comparing data mining algorithms for prediction of recurrence of breast cancer. PLOS one. 2020; 15(10):e0237658.
- 16Izci H, Tambuyzer T, Tuand K, et al. A systematic review of estimating breast cancer recurrence at the population level with administrative data. J Natl Cancer Inst. 2020; 112(10): 979-988.
- 17Ting W-C, Lu Y-CA, Ho W-C, Cheewakriangkrai C, Chang H-R, Lin C-L. Machine learning in prediction of second primary cancer and recurrence in colorectal cancer. Int J Med Sci. 2020; 17(3): 280-291.
- 18Aagaard Rasmussen L, Jensen H, Flytkjær Virgilsen L, Jellesmark Thorsen LB, Vrou Offersen B, Vedsted P. A validated algorithm for register-based identification of patients with recurrence of breast cancer—based on Danish Breast Cancer Group (DBCG) data. Canc Epi. 2019; 59: 129-134.
- 19Uno H, Ritzwoller DP, Cronin AM, Carroll NM, Hornbrook MC, Hassett MJ. Determining the time of cancer recurrence using claims or electronic medical record data. JCO Clin Cancer Inform. 2018; 2: 1-10.
- 20Mazurowski MA, Saha A, Harowicz MR, Cain EH, Marks JR, Marcom PK. Association of distant recurrence-free survival with algorithmically extracted MRI characteristics in breast cancer. J Magn Reson Imaging. 2019; 49(7): e231-e240.
- 21Zeng Z, Espino S, Roy A, et al. Using natural language processing and machine learning to identify breast cancer local recurrence. BMC Bioinform. 2018; 19(S17): 498.
- 22Kroenke CH, Chubak J, Johnson L, Castillo A, Weltzien E, Caan BJ. Enhancing breast cancer recurrence algorithms through selective use of medical record data. J Natl Cancer Inst. 2015; 108(3):djv336.
- 23Nicolò C, Périer C, Prague M, et al. Machine learning and mechanistic modeling for prediction of metastatic relapse in early-stage breast cancer. JCO Clin Cancer Inform. 2020; 4: 259-274.
- 24Hassett MJ, Uno H, Cronin AM, Carroll NM, Hornbrook MC, Ritzwoller D. Detecting lung and colorectal cancer recurrence using structured clinical/administrative data to enable outcomes research and population health management. Med Care. 2017; 55(12): e88-e98.
- 25Lash TL, Riis AH, Ostenfeld EB, Erichsen R, Vyberg M, Thorlacius-Ussing O. A validated algorithm to ascertain colorectal cancer recurrence using registry resources in Denmark. Int J Cancer. 2014; 136(9): 2210-2215.
- 26rHaque R, Shi J, Schottinger JE, et al. A hybrid approach to identify subsequent breast cancer using pathology and automated health information data. Med Care. 2015; 53(4): 380-385.
- 27Strauss JA, Chao CR, Kwan ML, Ahmed SA, Schottinger JE, Quinn VP. Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm. J Am Med Info Ass. 2013; 20(2): 349-355.
- 28Carrell DS, Halgrim S, Tran D-T, et al. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epi. 2014; 179(6): 749-758.
- 29Xu Y, Kong S, Cheung WY, Quan ML, Nakoneshny SC, Dort JC. Developing case-finding algorithms for second events of oropharyngeal cancer using administrative data: a population-based validation study. Head Neck. 2019; 41(7): 2291-2298.
- 30Bujang MA, Adnan TH. Requirements for minimum sample size for sensitivity and specificity analysis. J Clin Diagn Res. 2016; 10: YE01-YE06.
- 31 Alberta Health Services. Alberta Cancer Registry. Alberta Health Services; 2021. https://www.albertahealthservices.ca/cancer/Page17367.aspx
- 32 National Ambulatory Care Reporting System metadata (NACRS). Canadian Institute for Health Information (CIHI); 2021. https://www.cihi.ca/en/national-ambulatory-care-reporting-system-metadata-nacrs
- 33 Discharge Abstract Database metadata (DAD). Canadian Institute for Health Information (CIHI); 2021. https://www.cihi.ca/en/discharge-abstract-database-metadata-dad
- 34 Statistics Canada. Canadian vital statistics—death database (CVSD). Statistics Canada Government of Canada; 2021. https://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=3233
- 35Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005; 43(11): 1130-1139.
- 36 CART. San Diego, CA: Salford Systems.
- 37 SAS 9.4. Cary, NC: SAS Institute Inc.; 2014.
- 38Achilonu OJ, Fabian J, Bebington B, Elvira S, Eijkemans MJC, Musenge E. Predicting colorectal cancer recurrence and patient survivial using supervicsed machine learning approach: a South African population-based study. Front Pub Health. 2021; 9:9694306.
- 39Ragin CCR, Taioli E. Survival of squamous cell carcinoma of the head and neck in relation to human papillomavirus infection: review and meta-analysis. Int J Cancer. 2007; 121(8): 1813-1820. doi:10.1002/ijc.22851