RESEARCH ARTICLE

Open Access

Application of interpretable machine learning algorithms to predict distant metastasis in osteosarcoma

Bing-li Bai

Department of Orthopedics Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Search for more papers by this author

Zong-yi Wu,

Zong-yi Wu

Department of Orthopedics Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Search for more papers by this author

She-ji Weng,

She-ji Weng

Department of Orthopedics Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Search for more papers by this author

Qing Yang,

Corresponding Author

Qing Yang

[email protected]

orcid.org/0000-0002-6706-3124

Department of Breast Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Correspondence

Qing Yang, Department of Breast Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, College West Road No. 109, Wenzhou, Zhejiang, 325027, China.

Email: [email protected]

Search for more papers by this author

Bing-li Bai,

Bing-li Bai

Department of Orthopedics Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Search for more papers by this author

Zong-yi Wu,

Zong-yi Wu

Department of Orthopedics Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Search for more papers by this author

She-ji Weng,

She-ji Weng

Department of Orthopedics Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Search for more papers by this author

Qing Yang,

Corresponding Author

Qing Yang

[email protected]

orcid.org/0000-0002-6706-3124

Department of Breast Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, China

Correspondence

Qing Yang, Department of Breast Surgery, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, College West Road No. 109, Wenzhou, Zhejiang, 325027, China.

Email: [email protected]

Search for more papers by this author

First published: 09 September 2022

https://doi.org/10.1002/cam4.5225

Citations: 18

Share a link

Email
Wechat
Bluesky

Abstract

Background

Osteosarcoma is well-established as the most common bone cancer in children and adolescents. Patients with localized disease have different prognoses and management than those with metastasis at the time of diagnosis. The purpose of this study was to explore potential risk factors for metastatic disease.

Methods

The Surveillance, Epidemiology, and End Results (SEER) Program database was used to identify patients diagnosed with osteosarcoma between 2004 and 2015. We developed prediction models for distant metastasis using six machine learning (ML) techniques, including logistic regression (LR), support vector machine (SVM), Gaussian Naive Bayes (GaussianNB), Extreme Gradient Boosting (XGBoost), random forest (RF), and k-nearest neighbor algorithm (kNN). The adaptive synthetic (ADASYN) technique was used to deal with imbalanced data. The Shapley Additive Explanation (SHAP) analysis generated visualized explanations for each patient. Finally, the average precision (AP), sensitivity, specificity, accuracy, F1 score, precision-recall curves, calibration plots, and decision curve analysis (DCA) were conducted to evaluate the models' effectiveness.

Results

The six machine learning algorithms achieved AP of 0.661–0.781 for predicting distant metastasis. The RF model yielded the best performance with an accuracy of 71.8 percent and an AP of 0.781 and was highly dependent on tumor size, primary surgery, and age. SHAP analysis provided model-independent interpretation, highlighting significant clinical factors associated with the risk of metastasis in osteosarcoma patients.

Conclusions

An accurate machine learning-based prediction model was established for metastasis in osteosarcoma patients to help clinicians during clinical decision-making.

1 INTRODUCTION

Osteosarcoma is widely acknowledged as the most common type of primary malignant bone tumor that arises from mesenchymal tissue,¹ with an incidence rate of 4 to 5 cases per million people.² It has a bimodal age distribution pattern, occurring predominantly in children and young adults <20 years of age and those >50–60 years of age.^{3, 4} It has been reported that metastasis disease is found in approximately 20% of osteosarcoma patients at diagnosis.^{3, 5} Patients with localized osteosarcoma have a significantly higher 5-year survival rate than patients with distant metastasis (70% vs. 20%).⁶ Complete surgical resection combined with adjuvant chemotherapy is the best treatment for osteosarcoma patients with distant metastases.⁷ Despite this, the prognosis for metastatic disease remains dismal, with relapse happening in more than half of patients.8 Distant metastasis is widely considered as one of the most important long-term prognostic indicators for osteosarcoma patients. These findings highlight the need to develop technologies that can detect distant metastasis in patients at diagnosis to allow implementation of personalized therapeutic approaches.

Herein, we used the SEER Program database, a commonly used tool for analyzing rare cancers.⁹ The SEER Program is known to collect data from seventeen geographically diverse cancer registries, covering roughly 26% of the United States population. Modern data mining and machine learning approaches can assist in uncovering hidden correlations between parameters by extracting implicit relevant information and expertise from massive amounts of data.¹⁰ Over the years, machine learning has been successfully used to diagnose and predict many diseases.^{11, 12} Accordingly, we gathered a large, international cohort of osteosarcoma patients to develop and validate a machine learning-based model for analyzing clinical characteristics associated with the risk of distant metastasis.

2 METHODS

2.1 Study population and variable selection

In the current study, we used data from the SEER database (approval number: 11875-Nov2020), a population-based cancer registry at the National Cancer Institute in the United States (SEER*Stat 8.3.6). The inclusion criteria consisted of: (a) patients diagnosed with osteosarcoma as the “primary and only cancer diagnosis” from 2004 to 2015; (b) the ICD-O-3 morphology codes “9180–9187,9192–9195”; (c) the primary site of tumors was bone (C400-C419); (d) patients with evidence of metastasis. The exclusion criteria included patients with (a) multiple primary cancers and (b) missing or blank clinical information. Variables were classified based on their clinical relevance and previously reported thresholds. Patient characteristics of interest included age, gender, race, tumor size, anatomic location, grade, histology, surgery, radiotherapy, chemotherapy, lymphadenectomy, and stage. Patients were separated into different age groups (<24 years old, 24–59 years old, and > 60 years old) and ethnicity (White, Black, and Other ethnic groups). Pathological tumor grade based on the variable “ICD-O-3 grade” and classified into grades I, II, III, and IV. The tumor size was stratified as ≤5.0 cm, 5.0–10 cm or >10 cm. The primary tumor location consisted of the axial skeleton, extremities, and others. Our primary outcome variable was a binary variable indicating the presence of metastatic disease at the time of diagnosis. In this category, we included patients with “distant” disease. Patients with “localized” or “regional” staging were considered free of metastasis. The non-parametric miss-Forest method was used to impute missing data.¹³ A heatmap was used to visualize Pearson's correlation test results and examine the relationship between variables.

2.2 Imbalanced data processing

There were 1891 (22.6%) patients with metastasis and 553 (77.4%) patients without metastasis in the dataset. The large difference between the two classes could lead to low classifier prediction power. Indeed, it is widely acknowledged that balanced data can achieve better prediction performance. ADASYN is a powerful oversampling method widely used in machine learning with imbalanced data.¹⁴ In our research, we implemented the ADASYN technique using three different percentages 100%, 150%, and 200%. The minority class increased from 22.4 percent in the raw dataset to 46.7 percent in the ADASYN with a 200% dataset as a result of this. (Table 1) Then, this balanced dataset was randomly split into training (70%) and test (30%) sets.

TABLE 1. Number of instances increased by ADASYN technique

Percentage of ADASYN increase	Class “No” actual 1819 (76.7%)	Class “Yes” actual 553 (23.3%)
100%	1819	1116
150%	1819	1383
200%	1819	1659

2.3 Establishment and evaluation of the predictive model

In this study, six machine learning algorithms were chosen to predict distant metastasis in osteosarcoma patients. The six models that we have developed are as follows: logistic regression model (LR), support vector machine (SVM), Gaussian Naive Bayes (GaussianNB), Extreme Gradient Boosting (XGBoost) model, random forest (RF), and k-nearest neighbor algorithm (kNN). We employed the ADASYN strategy to improve the classifier performance in the imbalanced dataset. We performed k-fold cross-validation used as a resampling method (k = 10) on the training set and the hyperparameters were tuned by grid search. The validation set was used to adjust for the model parameters, whereas the test set was used to evaluate the performance of the system. The clinical value of this prediction model was evaluated by three model quality measurements, including discrimination, calibration, and clinical usefulness. First, model discrimination was quantified using precision-recall curve analysis. Subsequently, we assessed the model's performance using calibration plots to assess how far the calibration and model predictions deviated from actual events. Then, the clinical utility was evaluated using DCA, which calculated the net benefits for various threshold probabilities. Furthermore, the confusion matrix metrics of AP, accuracy, sensitivity, specificity, and F1 score were assessed for the six models.

2.4 Model interpretation

It is well-established that the application of machine learning techniques is limited by the difficulty of interpreting the results. The SHAP method proposed by Lundberg et al. is a game theoretic method to explain the output of any machine learning model and is reliable, fast, and computationally cheap.¹⁵ Importantly, the SHAP approach is used to sort the importance of each predictor based on its SHAP value. The output of the ML model is positively influenced by high SHAP values and vice versa.

2.5 Statistical analysis

Software including R (version 3.6.8), Python (version 3.7), and SEER*Stat (https://seer.cancer.gov/seerstat/) were used in this study. The used packages were shown in Table 2.

TABLE 2. Detailed information about the packages used in the development of machine learning models

Package name	Version	Description
Numpy	1.19.5	Numpy is the fundamental package for array computing with python
Pandas	1.0.4	Powerful data structures for data analysis, time series, and statistics
Matplotlib	3.3.2	Python plotting package
Sklearn	0.22.1	A set of python modules for machine learning and data mining
XGBoost	1.2.1	A set of python modules for machine learning and data mining
Imblean	0.0	Toolbox for imbalanced dataset in machine learning
PDPbox	0.2.1	Python partial dependence plot toolbox
SHAP	0.39.0	Toolbox for exploring how the input features contribute to the output of a complex machine learning model

3 RESULTS

3.1 Patient characteristics

This study enrolled a total of 2444 individuals with osteosarcoma, with 553 having distant metastases and 1891 not having metastasis. Table 3 shows the comprehensive demographic and clinical information.

TABLE 3. Demographic and clinicopathologic variables of the whole cohort grouped by metastasis status

Variables	All (n = 2444)	Distant metastasis (−) (n = 1891)	Distant metastasis (+) (n = 553)	p
Age				<0.001
0 –24	1567 (64.116)	1214 (64.199)	353 (63.834)
25–59	662 (27.087)	543 (28.715)	119 (21.519)
>60	215 (8.797)	134 (7.086)	81 (14.647)
Size				<0.001
<50 mm	374 (15.303)	342 (18.086)	32 (5.787)
51-99 mm	1025 (41.939)	819 (43.310)	206 (37.251)
>100 mm	1045 (42.758)	730 (38.604)	315 (56.962)
Race				<0.001
White	1840 (75.286)	1429 (75.568)	411 (74.322)
Black	394 (16.121)	301 (15.918)	93 (16.817)
Other	210 (8.592)	161 (8.514)	49 (8.861)
Gender				<0.001
Male	1336 (54.664)	1005 (53.146)	331 (59.855)
Female	1108 (45.336)	886 (46.854)	222 (40.145)
Tumor site				<0.001
Axial	285 (11.661)	175 (9.254)	110 (19.892)
Extremity	1913 (78.273)	1503 (79.482)	410 (74.141)
Other	246 (10.065)	213 (11.264)	33 (5.967)
Grade				<0.001
Grade I	94 (3.846)	91 (4.812)	3 (0.542)
Grade II	172 (7.038)	154 (8.144)	18 (3.255)
Grade III	770 (31.506)	579 (30.619)	191 (34.539)
Grade IV	1408 (57.610)	1067 (56.425)	341 (61.664)
Histology				<0.001
9180	1596 (0.653)	1190 (0.629)	406 (0.734)
9181	361 (0.148)	282 (0.282)	79 (0.143)
9182	114 (0.470)	99 (0.520)	15 (0.270)
9183	84 (0.340)	69 (0.360)	15 (0.270)
9184	15 (0.600)	9 (0.500)	6 (0.110)
9185	23 (0.900)	15 (0.800)	8 (0.140)
9186	86 (0.35)	74 (0.390)	12 (0.220)
9187	6 (0.200)	6 (0.300)	0
9192	111 (0.450)	104 (0.550)	7 (0.130)
9193	34 (0.140)	33 (0.170)	1 (0.200)
9194	14 (0.600)	10 (0.500)	4 (0.700)
Primary tumor surgery				<0.001
No	310 (12.684)	131 (6.928)	179 (32.369)
Yes	2134 (87.316)	1760 (93.072)	374 (67.631)
Radiotherapy				<0.001
No	2234 (91.408)	1767 (93.443)	467 (84.448)
Yes	210 (8.592)	124 (6.557)	86 (15.552)
Chemotherapy				<0.001
No	433 (17.717)	363 (19.196)	70 (12.658)
Yes	2011 (82.283)	1528 (80.804)	483 (87.342)
Lymphadenectomy				<0.001
No	2177 (89.075)	1673 (88.472)	504 (91.139)
Yes	267 (10.925)	218 (11.528)	49 (8.861)

3.2 Feature analysis

Pearson correlation analysis was used to examine the relationship between each variable. The correlation heatmap (Figure 1) revealed a weak relationship between age and chemotherapy. Table 4 shows the results of machine learning classification algorithms applied on the balanced datasets generated with ADASYN techniques. Importantly, the ADASYN approach significantly enhanced the AP values of the classification models. The RF classifier achieved the highest precision values in the validation set for the 100% increase (0.688), the 150% increase (0.732) and the 200% increase (0.781) in validation set.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Correlation between factors.

TABLE 4. Evaluation of the performance of classification models on imbalance dataset using ADASYN technique in validation set

Model	ADASYN	Precision	Accuracy (%)	Sensitivity (%)	Specificity (%)	F1-score
XGBoost	100%	0.655 (0.635–0.676)	0.693 (0.676–0.710)	0.699 (0.636–0.761)	0.706 (0.630–0.781)	0.619 (0.592–0.647)
	150%	0.694 (0.677–0.711)	0.687 (0.677–0.697)	0.696 (0.659–0.733)	0.707 (0.670–0.745)	0.659 (0.640–0.678)
	200%	0.757 (0.740–0.775)	0.711 (0.701–0.721)	0.675 (0.653–0.696)	0.763 (0.738–0.788)	0.681 (0.665–0.697)
LR	100%	0.553 (0.511–0.595)	0.614 (0.598–0.630)	0.666 (0.588–0.743)	0.618 (0.541–0.695)	0.550 (0.511–0.590)
	150%	0.547 (0.514–0.579)	0.616 (0.604–0.628)	0.572 (0.452–0.692)	0.661 (0.584–0.737)	0.547 (0.481–0.613)
	200%	0.663 (0.653–0.673)	0.613 (0.599–0.628)	0.687 (0.613–0.761)	0.571 (0.514–0.628)	0.626 (0.588–0.663)
RF	100%	0.688 (0.666–0.711)	0.703 (0.692–0.714)	0.623 (0.572–0.673)	0.800 (0.755–0.845)	0.595 (0.569–0.621)
	150%	0.732 (0.717–0.746)	0.708 (0.697–0.720)	0.690 (0.678–0.701)	0.746 (0.735–0.758)	0.671 (0.660–0.681)
	200%	0.781 (0.766–0.795)	0.718 (0.710–0.726)	0.687 (0.659–0.715)	0.781 (0.751–0.812)	0.688 (0.667–0.708)
GaussianNB	100%	0.575 (0.556–0.594)	0.609 (0.592–0.627)	0.453 (0.374–0.531)	0.819 (0.743–0.895)	0.458 (0.411–0.504)
	150%	0.626 (0.604–0.647)	0.631 (0.616–0.647)	0.635 (0.563–0.707)	0.641 (0.565–0.718)	0.614 (0.565–0.662)
	200%	0.661 (0.640–0.682)	0.619 (0.609–0.628)	0.553 (0.450–0.656)	0.710 (0.609–0.812)	0.576 (0.505–0.646)
kNN	100%	0.636 (0.620–0.651)	0.692 (0.681–0.703)	0.705 (0.629–0.781)	0.680 (0.606–0.753)	0.623 (0.588–0.658)
	150%	0.684 (0.670–0.698)	0.698 (0.687–0.709)	0.659 (0.591–0.727)	0.737 (0.672–0.802)	0.665 (0.629–0.701)
	200%	0.717 (0.707–0.727)	0.697 (0.693–0.702)	0.740 (0.693–0.787)	0.668 (0.623–0.714)	0.719 (0.695–0.743)
SVM	100%	0.631 (0.611–0.652)	0.646 (0.630–0.662)	0.664 (0.606–0.722)	0.655 (0.595–0.716)	0.575 (0.544–0.606)
	150%	0.644 (0.629–0.659)	0.643 (0.629–0.658)	0.652 (0.574–0.730)	0.658 (0.577–0.739)	0.608 (0.569–0.647)
	200%	0.678 (0.658–0.699)	0.637 (0.631–0.643)	0.661 (0.596–0.725)	0.624 (0.560–0.689)	0.628 (0.599–0.657)

3.3 Model development and evaluation

Six machine learning algorithms were employed to build a prediction model in this study. The training set was used to create and train machine learning models. After adjusting for parameters and comparing algorithms, the average precision values of six machine learning models were greater than 0.64, indicating that predictive models have good predictive ability. (Figure 2). As described in Figure 2, the RF algorithm yielded the best prediction performance in precision-recall curves between the training and validation set. The precision-recall curves, calibration plots, and DCAs for the validation set were generated to evaluate the prediction model. The calibration plots of the validation set indicated that the predictive probabilities against observed the risk of distant metastasis (Figure 3A). DCA of the six models was subsequently conducted, showing that all models achieved net clinical benefit against a treat all-or-none plan, except kNN (Figure 3B). Regarding calibration plots and DCAs, the RF model also performed best. Table 4 shows the confusion matrix evaluation measures as well as the average precision of all prediction models. All prediction models' k-fold cross-validation accuracies (k = 10) are listed in Table 5. Our results showed that the RF model yielded the highest k-fold cross-validation accuracy. Among these, the predictive model using the RF algorithm yielded the best predictive performance. (Figure 4).

TABLE 5. The k-fold cross-validation accuracies (k = 10) of all six prediction models

Model	XGB	RF	Gaussian NB	kNN	SVM	LR
k-fold accuracy	0.711 (0.701–0.721)	0.718 (0.710–0.726)	0.619 (0.609–0.628)	0.697 (0.693–0.702)	0.637 (0.631–0.643)	0.613 (0.599–0.628)

3.4 Model interpretation

Figure 5A depicts the predictive model's SHAP summary plot, which consists of 11 features sorted by their impact on metastatic status. The higher the SHAP value of a feature, the higher the risk of distant metastasis. The red color denotes a high feature value, purple denotes a feature value close to the overall average, and blue denotes a low feature value. Figure 5B provides an example for predicting the risk of metastasis in an osteosarcoma patient. In this case, the RF model predicted a distant metastasis risk of 1.00 (base value: 0.35). The probability of distant metastasis was increased by tumor size (more than 10 cm), radiation therapy, and lack of surgical intervention. The axial placement of the tumor decreased the estimated probability of metastasis. The axial tumor location reduced the predicted risk of metastasis.

4 DISCUSSION

Patients with osteosarcoma who develop distant metastasis have a significantly lower overall survival rate, with a 5-year survival rate of only 20%.¹⁶ It has been established that patients with distant metastasis derive little benefit from surgery, chemotherapy, or novel immunotherapy, accounting for the poor prognosis of advanced osteosarcoma patients.¹⁷ As a result, it is critical to identify risk factors associated with distant metastasis in osteosarcoma patients to facilitate early detection, prevention, and prognosis assessment.

An ML-based model was trained and validated in this study to predict the risk of distant metastasis. This ML model was based on clinical data, including age, gender, race, tumor size, primary tumor site, grade, histology, surgery, radiotherapy, chemotherapy, lymphadenectomy, and stage. The ADASYN technique was then used to deal with imbalanced data, yielding a more balanced dataset. As a result, the RF model outperformed all other models in training and validation sets. To improve the interpretability of our machine learning model, we reported the SHAP values and a list of highly influential features.

We see at least two approaches to clinical practice implementation. To begin, individual doctors and patients could use the model to assess risk and make tailored decisions about osteosarcoma screening in the clinic. Second, health systems could use the model to identify high-risk patients for additional outreach at the population level.

In the present study, SHAP analysis demonstrated that the top three informative features in the model were tumor size, primary surgery, and age. The present study consistently showed that tumor size is an important risk factor. Moreover, it has been reported that a large tumor at the time of diagnosis is associated with a greater number of lung metastases.¹⁸ Interestingly, tumor size has been shown to be an independent predictor of overall survival and distant metastases in patients with osteosarcoma using plain radiographs or magnetic resonance imaging.⁹ In this regard, tumors treated without surgery, with higher pathological grades and larger size, contributed to an increased risk of cancer metastasis. Surgery, which involves removing the tumor or amputating, remains the only effective treatment for osteosarcoma and is not indicated in patients with distant metastasis.¹⁹ Although the efficacy of surgery is variable due to various factors, radical resection of the primary tumor inhibits tumor progression, including metastases, to a certain extent.^{20, 21} Our findings indicate that the absence of surgery exerted a significant negative effect on distant metastasis, which corroborates the above findings. Although age as risk factor for distant metastases are less intuitive than primary surgery and tumor size has been previously found to be risk factors for advanced disease in in studies of other malignancies.^{22, 23} It had been found in a large study that includes over 27,000 patients that older age was an independent significant risk factor for distant metastases.²⁴ Similarly, a recent study had also discovered that older age at diagnosis was at a higher risk of developing pleural metastases.²⁵ In concordance with these studies, our results find that a high risk of distant metastasis among patients older than 60 years. In this study, we divided patients into separated into different age groups (<24 years old, 24–59 years old, and >60 years old) according to recent publication.²⁶ The reasons for this may be related to health access, among other socioeconomic features.

Previous efforts to improve osteosarcoma prognostic prediction were based on parametric regression models, widely used in clinical studies due to their ease of use and interpretability. Regression analysis, on the other hand, can only be carried out in its entirety.²⁷ Importantly, regression modeling assumes linear and homogeneous relationships among input variables. Nonetheless, it should be borne in mind that complex interactions exist between predictors.²⁸ Decision tree-based methods are a subset of machine learning algorithms that uncover complex non-linear relationships between covariates.²⁹ The final class of an instance in a random forest is determined by outputting the class that is the mode of individual tree outputs, resulting in robust and accurate classification as well as the ability to handle a large number of input variables.³⁰ Random forests are relatively resistant to overfitting and are capable of handling datasets with highly asymmetric class distributions.³¹ In the training and validation sets, our AI model based on RF had a high prediction performance for the risk of distant metastasis in osteosarcoma patients (AP = 0.903 and 0.781, respectively). The retrospective nature of this study, however, limited it, and selection bias was inevitable. Moreover, we could not identify socioeconomic factors linked to patient survival and the occurrence of pathological fractures in osteosarcoma. Finally, “no” and “unknown” were combined into one group in the SEER data for chemotherapy and radiation. Chemotherapy and radiation were significantly underreported, which we could not overlook.

In conclusion, we employed the random forest method to develop an artificial intelligence model for predicting the probability of distant metastasis in patients with osteosarcoma. According to precision-recall analysis, our prediction model yielded accurate results. DCA demonstrated that the model provided a net benefit. Our findings substantiate that this prediction model has huge prospects for application during clinical practice to assist physicians in making more informed treatment decisions for osteosarcoma patients.

AUTHOR CONTRIBUTIONS

Conceptualization, BL.B. and Q.Y.; Methodology, ZY.W. and SJ.W.; Software, SJ.W.; Investigation, BL.B.;Resources, BL.B. and Q.Y.; Data curation, Q.Y.; Formal analysis, ZY.W., SJ.W. and BL.B.; Validation, BL.B. and Q.Y.; Writing-original draft preparation, ZY.W., SJ.W., and Q.Y.; Writing—review and editing, BL.B. and Q.Y.; Visualization, BL.B.; Supervision, Q.Y.

ACKNOWLEDGMENTS

The authors would like to express their gratitude to the Surveillance, Epidemiology, and End Results (SEER) database, for the data from it in this study. We would like to thank Home for Researchers (www. Home-for-researchers.com) for English language editing. This work is also supported by Extreme Smart Analysis platform (https://www.xsmartanalysis.com/).

FUNDING INFORMATION

This research did not receive any specific grant from funding agencies in the public, commercial, or not- for- profit sectors.

CONFLICT OF INTEREST

We have read and understood Cancer Medicine's policy on disclosing conflicts of interest and declare that we have none.

ETHICAL APPROVAL STATEMENT

We received permission to access the research data file in the SEER program from the National Cancer Institute, US. Approval was waived by the local ethics committee, as SEER data are publicly available and de-identified.

Open Research

DATA AVAILABILITY STATEMENT

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

REFERENCES

1Gu J, Li J, Huang M, et al. Identification of osteosarcoma-related specific proteins in serum samples using surface-enhanced laser desorption/ionization-time-of-flight mass spectrometry. J Immunol Res. 2014; 2014: 1-5.
10.1155/2014/649075
Web of Science® Google Scholar
2Luo X, Liu X, Tao Q, et al. Enoxacin inhibits proliferation and invasion of human osteosarcoma cells and reduces bone tumour volume in a murine xenograft model. Oncol Lett. 2020; 20(2): 1400-1408.
10.3892/ol.2020.11656
CAS PubMed Web of Science® Google Scholar
3Polednak AP. Primary bone cancer incidence in black and white residents of New York state. Cancer. 1985; 55(12): 2883-2888.
10.1002/1097-0142(19850615)55:12<2883::AID-CNCR2820551231>3.0.CO;2-Q
CAS PubMed Web of Science® Google Scholar
4Mirabello L, Troisi RJ, Savage SA. Osteosarcoma incidence and survival rates from 1973 to 2004: data from the surveillance, epidemiology, and end results program. Cancer: Interdisciplinary Int J Am Cancer Soc. 2009; 115(7): 1531-1543.
Google Scholar
5Kager L, Cooperative German-Austrian-Swiss Osteosarcoma Study Group. Primary metastatic osteosarcoma: presentation and outcome of patients treated on neoadjuvant cooperative osteosarcoma study group protocols. J Clin Oncol. 2003; 21: 2011-2018.
10.1200/JCO.2003.08.132
PubMed Web of Science® Google Scholar
6Wu Z-l, Deng Y-j, Zhang G-z, Ren E-h, Yuan W-h, Xie Q-q. Development of a novel immune-related genes prognostic signature for osteosarcoma. Sci Rep. 2020; 10(1): 1-13.
PubMed Web of Science® Google Scholar
7Nataraj V, Rastogi S, Khan S, et al. Prognosticating metastatic osteosarcoma treated with uniform chemotherapy protocol without high dose methotrexate and delayed metastasectomy: a single center experience of 102 patients. Clin Transl Oncol. 2016; 18(9): 937-944.
10.1007/s12094-015-1467-8
CAS PubMed Web of Science® Google Scholar
8Huang X, Zhao J, Bai J, et al. Risk and clinicopathological features of osteosarcoma metastasis to the lung: a population-based study. J Bone Oncol. 2019; 16:100230.
10.1016/j.jbo.2019.100230
PubMed Web of Science® Google Scholar
9Miller B, Feuer E, Hankey B. Surveillance, epidemiology, and end results program of the National Cancer Institute. CA Cancer J Clin. 1993; 43: 27-41.
10.3322/canjclin.43.1.27
CAS PubMed Web of Science® Google Scholar
10Gore JC. Artificial intelligence in medical imaging. Elsevier; 2020.
Google Scholar
11Singal AG, Mukherjee A, Elmunzer BJ, et al. Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma. Am J Gastroenterol. 2013; 108(11): 1723-1730.
10.1038/ajg.2013.332
PubMed Web of Science® Google Scholar
12Wu Y, Rao K, Liu J, et al. Machine learning algorithms for the prediction of central lymph node metastasis in patients with papillary thyroid cancer. Front Endocrinol. 2020; 11: 816.
10.3389/fendo.2020.577537
Web of Science® Google Scholar
13Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012; 28(1): 112-118.
10.1093/bioinformatics/btr597
CAS PubMed Web of Science® Google Scholar
14He H, Bai Y, Garcia EA, Li S. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE world Congress on Computational Intelligence). IEEE; 2008.
Google Scholar
15Lundberg SM, Lee S-I, eds. A unified approach to interpreting model predictions. Proceedings of the 31st international conference on neural information processing systems; 2017.
Google Scholar
16He J-P, Hao Y, Wang X-L, et al. Review of the molecular pathogenesis of osteosarcoma. Asian Pac J Cancer Prev. 2014; 15(15): 5967-5976.
10.7314/APJCP.2014.15.15.5967
PubMed Web of Science® Google Scholar
17Zhang Y, Yang J, Zhao N, et al. Progress in the chemotherapeutic treatment of osteosarcoma. Oncol Lett. 2018; 16(5): 6228-6237.
CAS PubMed Web of Science® Google Scholar
18Xie L, Huang W, Wang H, Zheng C, Jiang J. Risk factors for lung metastasis at presentation with malignant primary osseous neoplasms: a population-based study. J Orthop Surg Res. 2020; 15(1): 1-8.
10.1186/s13018-020-1571-5
PubMed Web of Science® Google Scholar
19Marulanda GA, Henderson ER, Johnson DA, Letson GD, Cheong D. Orthopedic surgery options for the treatment of primary osteosarcoma. Cancer Control. 2008; 15(1): 13-20.
10.1177/107327480801500103
PubMed Google Scholar
20Andreou D, Bielack S, Carrle D, et al. The influence of tumor-and treatment-related factors on the development of local recurrence in osteosarcoma after adequate surgery. An analysis of 1355 patients treated on neoadjuvant cooperative osteosarcoma study group protocols. Ann Oncol. 2011; 22(5): 1228-1235.
10.1093/annonc/mdq589
CAS PubMed Web of Science® Google Scholar
21Spinelli M, Ziranu A, Piccioli A, Maccauro G. Surgical treatment of acetabular metastasis. Eur Rev Med Pharmacol Sci. 2016; 20(14): 3005-3010.
CAS PubMed Web of Science® Google Scholar
22Schwartz KL, Crossley-May H, Vigneau FD, Brown K, Banerjee M. Race, socioeconomic status and stage at diagnosis for five common malignancies. Cancer Causes Control. 2003; 14(8): 761-766.
10.1023/A:1026321923883
PubMed Web of Science® Google Scholar
23Ward E, Jemal A, Cokkinides V, et al. Cancer disparities by race/ethnicity and socioeconomic status. CA Cancer J Clin. 2004; 54(2): 78-93.
10.3322/canjclin.54.2.78
CAS PubMed Web of Science® Google Scholar
24Kuperman DI, Auethavekiat V, Adkins DR, et al. Squamous cell cancer of the head and neck with distant metastasis at presentation. Head Neck. 2011; 33(5): 714-718.
10.1002/hed.21529
PubMed Web of Science® Google Scholar
25Shao Y-Y, Hong R-L. Pleural metastases as a unique entity with dismal outcome of head and neck squamous cell carcinoma. Oral Oncol. 2010; 46(9): 694-697.
10.1016/j.oraloncology.2010.06.014
PubMed Web of Science® Google Scholar
26Miller BJ, Cram P, Lynch CF, Buckwalter JA. Risk factors for metastatic disease at presentation with osteosarcoma: an analysis of the SEER database. J Bone Joint Surg Am. 2013; 95:e89.
10.2106/JBJS.L.01189
PubMed Google Scholar
27Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009; 338:b2393.
10.1136/bmj.b2393
PubMed Web of Science® Google Scholar
28Loftus TJ, Tighe PJ, Filiberto AC, et al. Artificial intelligence and surgical decision-making. JAMA Surg. 2020; 155(2): 148-158.
10.1001/jamasurg.2019.4917
PubMed Web of Science® Google Scholar
29Bibault J-E, Chang DT, Xing L. Development and validation of a model to predict survival in colorectal cancer using a gradient-boosted machine. Gut. 2021; 70(5): 884-889.
10.1136/gutjnl-2020-321799
CAS PubMed Web of Science® Google Scholar
30Karelson M, Dobchev D. Using artificial neural networks to predict cell-penetrating compounds. Expert Opin Drug Discovery. 2011; 6(8): 783-796.
10.1517/17460441.2011.586689
CAS PubMed Web of Science® Google Scholar
31Roguet A, Eren AM, Newton RJ, McLellan SL. Fecal source identification using random forest. Microbiome. 2018; 6(1): 1-15.
10.1186/s40168-018-0568-3
PubMed Web of Science® Google Scholar

Citing Literature

Volume12, Issue4

February 2023

Pages 5025-5034

Application of interpretable machine learning algorithms to predict distant metastasis in osteosarcoma