Volume 75, Issue 1 pp. 153-162
BIOMETRIC METHODOLOGY
Open Access

Personalized schedules for surveillance of low-risk prostate cancer patients

Anirudh Tomer

Corresponding Author

Anirudh Tomer

Department of Biostatistics, Erasmus University Medical Center, The Netherlands

email: [email protected]Search for more papers by this author
Daan Nieboer

Daan Nieboer

Department of Public Health, Erasmus University Medical Center, The Netherlands

Search for more papers by this author
Monique J. Roobol

Monique J. Roobol

Department of Urology, Erasmus University Medical Center, The Netherlands

Search for more papers by this author
Ewout W. Steyerberg

Ewout W. Steyerberg

Department of Public Health, Erasmus University Medical Center, The Netherlands

Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, The Netherlands

Search for more papers by this author
Dimitris Rizopoulos

Dimitris Rizopoulos

Department of Biostatistics, Erasmus University Medical Center, The Netherlands

Search for more papers by this author
First published: 23 July 2018
Citations: 17

Summary

Low-risk prostate cancer patients enrolled in active surveillance (AS) programs commonly undergo biopsies on a frequent basis for examination of cancer progression. AS programs employ a fixed schedule of biopsies for all patients. Such fixed and frequent schedules may schedule unnecessary biopsies. Since biopsies are burdensome, patients do not always comply with the schedule, which increases the risk of delayed detection of cancer progression. Motivated by the world's largest AS program, Prostate Cancer Research International Active Surveillance (PRIAS), we present personalized schedules for biopsies to counter these problems. Using joint models for time-to-event and longitudinal data, our methods combine information from historical prostate-specific antigen levels and repeat biopsy results of a patient, to schedule the next biopsy. We also present methods to compare personalized schedules with existing biopsy schedules.

1 Introduction

Prostate cancer (PCa) is the second most frequently diagnosed cancer (14% of all cancers) in males worldwide (Torre et al., 2015). The increase in diagnosis of low-grade PCa has been attributed to increase in life expectancy and increase in the number of screening programs (Potosky et al., 1995). An issue of screening programs that has also been established in other types of cancers (e.g., breast cancer) is over-diagnosis. To avoid overtreatment, patients diagnosed with low-grade PCa are commonly advised to join active surveillance (AS) programs. In order to delay serious treatments such as surgery, chemotherapy, or radiotherapy, in AS PCa progression is routinely examined via serum prostate-specific antigen (PSA) levels, digital rectal examination, medical imaging, and biopsy etc.

Biopsies are the most painful, prone to medical complications (Loeb et al., 2013) and yet also the most reliable PCa progression examination technique used in AS. When a patient's biopsy Gleason grading becomes larger than 6 (Gleason reclassification or GR), he is advised to switch from AS to active treatment (Bokhorst et al., 2015). Hence the timing of biopsies has significant medical implications. The world's largest AS program, Prostate Cancer Research International Active Surveillance (PRIAS) conducts biopsies at year 1, 4, 7, and 10 of follow-up, and every 5 years thereafter. However, it switches to a more frequent, annual biopsy schedule for faster-progressing patients. These are patients with PSA doubling time (PSA-DT) between 0 and 10 years, which is measured as the inverse of the slope of the regression line through the base two logarithm of PSA values. In contrast, many AS programs use annual schedule for all patients (Tosoian et al., 2011; Welty et al., 2015). Consequently, for slowly-progressing PCa patients many unnecessary biopsies are scheduled. Furthermore, patients may not always comply with such schedules (Bokhorst et al., 2015), which can lead to delayed detection of PCa and reduce the effectiveness of AS.

This article is motivated by the need to reduce the medical burden of repeat biopsies while simultaneously avoiding late detection of PCa progression. To this end, we intend to develop personalized schedules for biopsies using historical PSA measurements and biopsy results of patients. Personalized schedules for screening have received much interest in the literature, especially in the medical decision making context. For example, Markov decision process (MDP) models have been used to create personalized screening schedules for diabetic retinopathy (Bebu and Lachin, 2017), breast cancer (Ayer et al., 2012), cervical cancer (Akhavan-Tabatabaei et al., 2017), and colorectal cancer (Erenay et al., 2014). Another type of model called joint model for time-to-event and longitudinal data (Tsiatis and Davidian, 2004; Rizopoulos, 2012) has also been used to create personalized schedules for the measurement of longitudinal biomarkers (Rizopoulos et al., 2016). In the context of PCa, Zhang et al. (2012) have used partially observable MDP models to personalize the decision of (not) deferring a biopsy to the next check-up time during the screening process. This decision is based on the baseline characteristics as well as a discretized PSA level of the patient at the current check-up time.

In comparison to the work referenced above, the schedules we propose in this article account for the latent between-patient heterogeneity. We achieve this by using joint models, which are inherently patient-specific because they utilize random effects. Secondly, joint models allow a continuous time scale and utilize the entire history of PSA levels. Lastly, instead of making a binary decision of (not) deferring a biopsy to the next pre-scheduled check-up time, we schedule biopsies at a per-patient optimal future time. To this end, using joint models we first obtain a full specification of the joint distribution of PSA levels and time of GR. We then use it to define a patient-specific posterior predictive distribution of the time of GR, given the observed PSA measurements and repeat biopsies up to the current check-up time. Using the general framework of Bayesian decision theory, we propose a set of loss functions which are minimized to find the optimal time of conducting a biopsy. These loss functions yield us two categories of personalized schedules, those based on expected time of GR and those based on the risk of GR. In addition, we analyze an approach where the two types of schedules are combined. We also present methods to evaluate and compare the various schedules for biopsies.

The rest of the article is organized as follows. Section 2 briefly covers the joint modeling framework. Section 3 details the personalized scheduling approaches we have proposed in this article. In Section 4, we discuss methods for evaluation and selection of a schedule. In Section 5, we demonstrate the personalized schedules by employing them for the patients from the PRIAS program. Lastly, in Section 6, we present the results of a simulation study we conducted to compare personalized schedules with PRIAS and annual schedule.

2 Joint Model for Time-to-Event and Longitudinal Outcomes

We start with a short introduction of the joint modeling framework we will use in our following developments. Let urn:x-wiley:15410420:media:biom12940:biom12940-math-0002 denote the true GR time for the i-th patient and let S be the schedule of his biopsies. Let the vector of the time of biopsies be denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0003, where urn:x-wiley:15410420:media:biom12940:biom12940-math-0004 are the total number of biopsies conducted. Because biopsy schedules are periodical, urn:x-wiley:15410420:media:biom12940:biom12940-math-0005 cannot be observed directly and it is only known to fall in an interval urn:x-wiley:15410420:media:biom12940:biom12940-math-0006, where urn:x-wiley:15410420:media:biom12940:biom12940-math-0007 if GR is observed, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0008 if GR is not observed yet. Further let urn:x-wiley:15410420:media:biom12940:biom12940-math-0009 denote the urn:x-wiley:15410420:media:biom12940:biom12940-math-0010 vector of PSA levels for the i-th patient. For a sample of n patients the observed data is denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0011.

The longitudinal outcome of interest, namely PSA level, is continuous in nature and thus to model it the joint model utilizes a linear mixed effects model (LMM) of the form:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0012
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0013 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0014 denote the row vectors of the design matrix for fixed and random effects, respectively. The fixed and random effects are denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0015 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0016, respectively. The random effects are assumed to be normally distributed with mean zero and urn:x-wiley:15410420:media:biom12940:biom12940-math-0017 covariance matrix urn:x-wiley:15410420:media:biom12940:biom12940-math-0018. The true and unobserved, error free PSA level at time t is denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0019. The error urn:x-wiley:15410420:media:biom12940:biom12940-math-0020 is assumed to be t-distributed with three degrees of freedom and scale urn:x-wiley:15410420:media:biom12940:biom12940-math-0021 (see Web Appendix C.1), and is independent of the random effects urn:x-wiley:15410420:media:biom12940:biom12940-math-0022.
To model the effect of PSA on hazard of GR, joint models utilize a relative risk sub-model. The hazard of GR for patient i at any time point t, denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0023, depends on a function of subject specific linear predictor urn:x-wiley:15410420:media:biom12940:biom12940-math-0024 and/or the random effects:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0025
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0026 denotes the history of the underlying PSA levels up to time t. The vector of baseline covariates is denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0027, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0028 are the corresponding parameters. The function urn:x-wiley:15410420:media:biom12940:biom12940-math-0029 parametrized by vector urn:x-wiley:15410420:media:biom12940:biom12940-math-0030 specifies the functional form of PSA levels (Brown, 2009; Rizopoulos, 2012; Taylor et al., 2013; Rizopoulos et al., 2014) that is used in the linear predictor of the relative risk model. Some functional forms relevant to the problem at hand are the following:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0031

These formulations of urn:x-wiley:15410420:media:biom12940:biom12940-math-0032 postulate that the hazard of GR at time t may be associated with the underlying level urn:x-wiley:15410420:media:biom12940:biom12940-math-0033 of the PSA at t, or with both the level and velocity urn:x-wiley:15410420:media:biom12940:biom12940-math-0034 of the PSA at t. Lastly, urn:x-wiley:15410420:media:biom12940:biom12940-math-0035 is the baseline hazard at time t, and is modeled flexibly using P-splines. The detailed specification of the baseline hazard, and parameter estimation using the Bayesian approach are presented in Web Appendix A of the supplementary material.

3 Personalized Schedules for Repeat Biopsies

We intend to use the joint model fitted to urn:x-wiley:15410420:media:biom12940:biom12940-math-0036, to create personalized schedules of biopsies. To this end, let us assume that a schedule is to be created for a new patient j, who is not present in urn:x-wiley:15410420:media:biom12940:biom12940-math-0037. Let t be the time of his latest biopsy, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0038 denote his historical PSA measurements up to time s. The goal is to find the optimal time urn:x-wiley:15410420:media:biom12940:biom12940-math-0039 of the next biopsy.

3.1 Posterior Predictive Distribution for Time to GR

The information from urn:x-wiley:15410420:media:biom12940:biom12940-math-0040 and repeat biopsies is manifested by the posterior predictive distribution urn:x-wiley:15410420:media:biom12940:biom12940-math-0041, given by (baseline covariates urn:x-wiley:15410420:media:biom12940:biom12940-math-0042 are not shown for brevity hereafter):
urn:x-wiley:15410420:media:biom12940:biom12940-math-0043

The distribution urn:x-wiley:15410420:media:biom12940:biom12940-math-0044 depends on urn:x-wiley:15410420:media:biom12940:biom12940-math-0045 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0046 via the posterior distribution of random effects urn:x-wiley:15410420:media:biom12940:biom12940-math-0047 and posterior distribution of the vector of all parameters urn:x-wiley:15410420:media:biom12940:biom12940-math-0048, respectively.

3.2 Loss Functions

To find the time u of the next biopsy, we use principles from statistical decision theory in a Bayesian setting (Berger, 1985; Robert, 2007). More specifically, we propose to choose u by minimizing the posterior expected loss urn:x-wiley:15410420:media:biom12940:biom12940-math-0049, where the expectation is taken with respect to urn:x-wiley:15410420:media:biom12940:biom12940-math-0050. The former is given by:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0051

Various loss functions urn:x-wiley:15410420:media:biom12940:biom12940-math-0052 have been proposed in literature (Robert, 2007). The ones we utilize, and the corresponding motivations are presented next.

Given the burden of biopsies, ideally only one biopsy performed at the exact time of GR is sufficient. Hence, neither a time which overshoots the true GR time urn:x-wiley:15410420:media:biom12940:biom12940-math-0053, nor a time which undershoots it, is preferred. In this regard, the squared loss function urn:x-wiley:15410420:media:biom12940:biom12940-math-0054 and the absolute loss function urn:x-wiley:15410420:media:biom12940:biom12940-math-0055 have the properties that the posterior expected loss is symmetric on both sides of urn:x-wiley:15410420:media:biom12940:biom12940-math-0056. Secondly, both loss functions have well known solutions available. The posterior expected loss for the squared loss function is given by:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0057(1)
The posterior expected loss in 1 attains its minimum at urn:x-wiley:15410420:media:biom12940:biom12940-math-0058, that is, the expected time of GR. The posterior expected loss for the absolute loss function is given by:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0059(2)
The posterior expected loss in 2 attains its minimum at urn:x-wiley:15410420:media:biom12940:biom12940-math-0060, that is, the median time of GR. It can also be expressed as urn:x-wiley:15410420:media:biom12940:biom12940-math-0061, where urn:x-wiley:15410420:media:biom12940:biom12940-math-0062 is the inverse of dynamic survival probability urn:x-wiley:15410420:media:biom12940:biom12940-math-0063 of patient j (Rizopoulos, 2011). It is given by:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0064
Even though urn:x-wiley:15410420:media:biom12940:biom12940-math-0065 or urn:x-wiley:15410420:media:biom12940:biom12940-math-0066 may be obvious choices from a statistical perspective, from the viewpoint of doctors or patients, it could be more intuitive to make the decision for the next biopsy by placing a cutoff urn:x-wiley:15410420:media:biom12940:biom12940-math-0067, where urn:x-wiley:15410420:media:biom12940:biom12940-math-0068, on the dynamic incidence/risk of GR. This approach would be successful if urn:x-wiley:15410420:media:biom12940:biom12940-math-0069 can sufficiently well differentiate between patients who will obtain GR in a given period of time versus others. This approach is also useful when patients are apprehensive about delaying biopsies beyond a certain risk cutoff. Thus, a biopsy can be scheduled at a time point u such that the dynamic risk of GR is higher than a certain threshold urn:x-wiley:15410420:media:biom12940:biom12940-math-0070 beyond u. To this end, the posterior expected loss for the following multilinear loss function can be minimized to find the optimal u:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0071
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0072 are constants parameterizing the loss function. The posterior expected loss urn:x-wiley:15410420:media:biom12940:biom12940-math-0073 obtains its minimum at urn:x-wiley:15410420:media:biom12940:biom12940-math-0074 (Robert, 2007). The choice of the two constants urn:x-wiley:15410420:media:biom12940:biom12940-math-0075 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0076 is equivalent to the choice of urn:x-wiley:15410420:media:biom12940:biom12940-math-0077.

In practice, for some patients, we may not have sufficient information to accurately estimate their PSA profile. The resulting high variance of urn:x-wiley:15410420:media:biom12940:biom12940-math-0078 could lead to a mean (or median) time of GR which overshoots the true urn:x-wiley:15410420:media:biom12940:biom12940-math-0079 by a big margin. In such cases, the approach based on the dynamic risk of GR with smaller risk thresholds is more risk-averse and thus could be more robust to large overshooting margins. This consideration leads us to a hybrid approach, namely, to select u using dynamic risk of GR-based approach when the spread of urn:x-wiley:15410420:media:biom12940:biom12940-math-0080 is large, while using urn:x-wiley:15410420:media:biom12940:biom12940-math-0081 or urn:x-wiley:15410420:media:biom12940:biom12940-math-0082 when the spread of urn:x-wiley:15410420:media:biom12940:biom12940-math-0083 is small. What constitutes a large spread will be application-specific. In PRIAS, within the first 10 years, the maximum possible delay in detection of GR is 3 years. Thus, we propose that if the difference between the 0.025 quantile of urn:x-wiley:15410420:media:biom12940:biom12940-math-0084, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0085 or urn:x-wiley:15410420:media:biom12940:biom12940-math-0086 is more than 3 years then proposals based on the dynamic risk of GR be used instead.

3.3 Estimation

Since there is no closed form solution available for urn:x-wiley:15410420:media:biom12940:biom12940-math-0087, for its estimation we utilize the following relationship between urn:x-wiley:15410420:media:biom12940:biom12940-math-0088 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0089:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0090(3)
However, as mentioned earlier, selection of the optimal biopsy time based on urn:x-wiley:15410420:media:biom12940:biom12940-math-0091 alone will not be practically useful when the urn:x-wiley:15410420:media:biom12940:biom12940-math-0092 is large, which is given by:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0093(4)

Since there is no closed form solution available for the integrals in 3 and 4, we approximate them using Gauss-Kronrod quadrature (see Web Appendix B). The variance depends both on the last biopsy time t and the PSA history urn:x-wiley:15410420:media:biom12940:biom12940-math-0094, as demonstrated in Section 5.2.

For schedules based on dynamic risk of GR, the choice of threshold urn:x-wiley:15410420:media:biom12940:biom12940-math-0095 has important consequences because it dictates the timing of biopsies. Often it may depend on the amount of risk that is acceptable to the patient (if maximum acceptable risk is 5%, urn:x-wiley:15410420:media:biom12940:biom12940-math-0096). When urn:x-wiley:15410420:media:biom12940:biom12940-math-0097 cannot be chosen on the basis of the input of the patients, we propose to automate its choice. More specifically, given the time t of latest biopsy we propose to choose a urn:x-wiley:15410420:media:biom12940:biom12940-math-0098 for which a binary classification accuracy measure (López-Ratón et al., 2014), discriminating between cases (patients who experience GR) and controls, is maximized. In joint models, a patient j is predicted to be a case in the time window urn:x-wiley:15410420:media:biom12940:biom12940-math-0099 if urn:x-wiley:15410420:media:biom12940:biom12940-math-0100, or a control if urn:x-wiley:15410420:media:biom12940:biom12940-math-0101 (Rizopoulos, 2016; Rizopoulos et al., 2017). We choose urn:x-wiley:15410420:media:biom12940:biom12940-math-0102 to be 1 year. This is because, in AS programs at any point in time, it is of interest to identify and provide extra attention to patients who may obtain GR in the next 1 year. As for the choice of the binary classification accuracy measure, we chose urn:x-wiley:15410420:media:biom12940:biom12940-math-0103 score since it is in line with our goal to focus on potential cases in time window urn:x-wiley:15410420:media:biom12940:biom12940-math-0104. The urn:x-wiley:15410420:media:biom12940:biom12940-math-0105 score combines both sensitivity and positive predictive value (PPV) and is defined as:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0106
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0107 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0108 denote time dependent true positive rate (sensitivity) and positive predictive value (precision), respectively. The estimation for both is similar to the estimation of urn:x-wiley:15410420:media:biom12940:biom12940-math-0109 given by Rizopoulos et al. (2017). Since a high urn:x-wiley:15410420:media:biom12940:biom12940-math-0110 score is desired, the corresponding value of urn:x-wiley:15410420:media:biom12940:biom12940-math-0111 is urn:x-wiley:15410420:media:biom12940:biom12940-math-0112. We compute the latter using a grid search approach. That is, first the urn:x-wiley:15410420:media:biom12940:biom12940-math-0113 score is computed using the available dataset over a fine grid of urn:x-wiley:15410420:media:biom12940:biom12940-math-0114 values between 0 and 1, and then urn:x-wiley:15410420:media:biom12940:biom12940-math-0115 corresponding to the highest urn:x-wiley:15410420:media:biom12940:biom12940-math-0116 score is chosen. Furthermore, in this article we use urn:x-wiley:15410420:media:biom12940:biom12940-math-0117 chosen only on the basis of the urn:x-wiley:15410420:media:biom12940:biom12940-math-0118 score.

3.4 Algorithm

When a biopsy gets scheduled at a time urn:x-wiley:15410420:media:biom12940:biom12940-math-0119, then GR is not detected at u and at least one more biopsy is required at an optimal time urn:x-wiley:15410420:media:biom12940:biom12940-math-0120. This process is repeated until GR is detected. To aid in medical decision making, we elucidate this process via an algorithm in Figure 1. AS programs strongly advise that two biopsies have a gap of at least 1 year. Thus, when urn:x-wiley:15410420:media:biom12940:biom12940-math-0121, the algorithm postpones u to urn:x-wiley:15410420:media:biom12940:biom12940-math-0122, because it is the time nearest to u, at which the 1-year gap condition is satisfied.

Details are in the caption following the image
Algorithm for creating a personalized schedule for patient j. The time of the latest biopsy is denoted by t. The time of the latest available PSA measurement is denoted by s. The proposed personalized time of biopsy is denoted by u. The time at which a repeat biopsy was proposed on the last visit to the hospital is denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0123. The time of the next visit for the measurement of PSA is denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0124. This figure appears in color in the electronic version of this article.

4 Evaluation of Schedules

In order to compare various schedules of biopsies, we require measures of their efficacy. We propose to use two measures, namely the number of biopsies (burden) urn:x-wiley:15410420:media:biom12940:biom12940-math-0125 a schedule S conducts for the j-th patient to detect GR, and the offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0126 by which it overshoots urn:x-wiley:15410420:media:biom12940:biom12940-math-0127. The offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0128 is defined as urn:x-wiley:15410420:media:biom12940:biom12940-math-0129, where urn:x-wiley:15410420:media:biom12940:biom12940-math-0130 is the time at which GR is detected. Our interest lies in the joint distribution urn:x-wiley:15410420:media:biom12940:biom12940-math-0131 of the number of biopsies and the offset. The least burdensome scenario is when urn:x-wiley:15410420:media:biom12940:biom12940-math-0132 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0133. Hence, realistically we should select a schedule with a low mean number of biopsies urn:x-wiley:15410420:media:biom12940:biom12940-math-0134 as well a low mean offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0135. It is also desired that a schedule has a low variance for both the number of biopsies urn:x-wiley:15410420:media:biom12940:biom12940-math-0136, and offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0137, so that the schedule works similarly for most patients.

4.1 Choosing a Schedule

Given the multiple schedules of biopsies, it is of clinical interest to choose a suitable schedule. Using principles from compound optimal designs (Läuter, 1976), we propose to choose a schedule S which minimizes a loss function of the following form:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0138(5)
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0139 is a function of either urn:x-wiley:15410420:media:biom12940:biom12940-math-0140 or urn:x-wiley:15410420:media:biom12940:biom12940-math-0141 (for brevity, only urn:x-wiley:15410420:media:biom12940:biom12940-math-0142 is used in the equation above). Some examples of urn:x-wiley:15410420:media:biom12940:biom12940-math-0143 are mean, median, variance, and quantile function. Constants urn:x-wiley:15410420:media:biom12940:biom12940-math-0144, where urn:x-wiley:15410420:media:biom12940:biom12940-math-0145 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0146, are weights to differentially weigh-in the contribution of each of the R criteria. An example loss function is:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0147(6)

The choice of urn:x-wiley:15410420:media:biom12940:biom12940-math-0148 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0149 is not easy, because the burden of a biopsy cannot be compared to a unit increase in offset easily. To obviate this problem we utilize the equivalence between compound and constrained optimal designs (Cook and Wong, 1994). More specifically, it can be shown that for any urn:x-wiley:15410420:media:biom12940:biom12940-math-0150 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0151 there exists a constant urn:x-wiley:15410420:media:biom12940:biom12940-math-0152 for which minimization of the loss function in 6 is equivalent to minimization of the loss function subject to the constraint that urn:x-wiley:15410420:media:biom12940:biom12940-math-0153. That is, a schedule which conducts at most C biopsies on average and detects GR earliest should be chosen. The choice of C could be based on the number of biopsies a patient is willing to undergo. In the more generic case in 5, a schedule can be chosen by minimizing urn:x-wiley:15410420:media:biom12940:biom12940-math-0154 under the constraint urn:x-wiley:15410420:media:biom12940:biom12940-math-0155.

5 Demonstration of Personalized Schedules

To demonstrate the personalized schedules, we apply them to the patients enrolled in PRIAS study. To this end, we divide the PRIAS dataset into a training part (5264 patients) and a demonstration part (three patients). We fit a joint model to the training dataset and then use it to create schedules for the demonstration patients. We fit the joint model using the R package JMbayes (Rizopoulos, 2016), which uses the Bayesian approach for parameter estimation.

5.1 Fitting the Joint Model to the PRIAS Dataset

For each of the PRIAS patients, we know their age at the time of inclusion in AS, PSA history, and the time interval in which GR is detected. For the longitudinal analysis of PSA we use urn:x-wiley:15410420:media:biom12940:biom12940-math-0156 measurements instead of the raw data (Pearson et al., 1994; Lin et al., 2000). The longitudinal sub-model of the joint model we fit is given by:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0157(7)
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0158 denotes the k-th basis function of a B-spline with three internal knots at urn:x-wiley:15410420:media:biom12940:biom12940-math-0159 years, and boundary knots at 0 and 7 (0.99 quantile of the observed follow-up times) years. The spline for the random effects consists of one internal knot at 0.1 years and boundary knots at 0 and 7 years. For the relative risk sub-model the hazard function we fit is given by:
urn:x-wiley:15410420:media:biom12940:biom12940-math-0160(8)
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0161 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0162 are measures of strength of the association between hazard of GR and urn:x-wiley:15410420:media:biom12940:biom12940-math-0163 value urn:x-wiley:15410420:media:biom12940:biom12940-math-0164 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0165 velocity urn:x-wiley:15410420:media:biom12940:biom12940-math-0166, respectively.

From the fitted joint model we found that urn:x-wiley:15410420:media:biom12940:biom12940-math-0167 velocity and the age at the time of inclusion in AS were significantly associated with the hazard of GR. For any patient, an increase in urn:x-wiley:15410420:media:biom12940:biom12940-math-0168 velocity from −0.06 to 0.14 (first and third quartiles of the fitted velocities, respectively) corresponds to a 2.05 fold increase in the hazard of GR. In terms of the predictive performance, we found that the area under the receiver operating characteristic curves (Rizopoulos et al., 2017) was 0.61, 0.65, and 0.59 at years 1, 2, and 3 of follow-up, respectively. Parameter estimates are presented in detail in Web Appendix C.

In PRIAS, the interval urn:x-wiley:15410420:media:biom12940:biom12940-math-0169 in which GR is detected depends on the PSA-DT of the patient. However, because the parameters are estimated using a full likelihood approach (Tsiatis and Davidian, 2004), the joint model gives valid estimates for all of the parameters, under the condition that the model is correctly specified (see Web Appendix A.2 and C.3). To this end, we performed several sensitivity analysis in our model (e.g., changing the position of the knots, etc.) to investigate the fit of the model and also the robustness of the results. In all of our attempts, the same conclusions were reached, namely that the velocity of the longitudinal outcome is more strongly associated with the hazard of GR than the value.

5.2 Personalized Schedules for the First Demonstration Patient

We now demonstrate the functioning of the personalized schedules for the first demonstration patient (see Web Appendix D for the other two demonstration patients). The fitted and observed urn:x-wiley:15410420:media:biom12940:biom12940-math-0170 profile, time of latest biopsy, and proposed biopsy times u for him are shown in the top panel of Figure 2. We can see that with a consistently decreasing PSA and negative repeat biopsy between year 3 and 4.5, the proposed time of biopsy based on the dynamic risk of GR has increased from 3.05 years (urn:x-wiley:15410420:media:biom12940:biom12940-math-0171) to 14.73 years (urn:x-wiley:15410420:media:biom12940:biom12940-math-0172) in this period. The proposed time of biopsy based on expected time of GR has also increased from 14.40 to 15.97 years. We can also see in the bottom panel of Figure 2 that after each negative repeat biopsy, urn:x-wiley:15410420:media:biom12940:biom12940-math-0173 decreases sharply. Thus, if the expected time of GR-based approach is used, then the offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0174 will be smaller on average for biopsies scheduled after the second repeat biopsy than those scheduled after the first repeat biopsy.

Details are in the caption following the image
Top panel: fitted versus observed urn:x-wiley:15410420:media:biom12940:biom12940-math-0175 profile, history of repeat biopsies, and corresponding personalized schedules for the first demonstration patient. Bottom panel: history of repeat biopsies and standard deviation urn:x-wiley:15410420:media:biom12940:biom12940-math-0176 of the posterior predictive distribution of time of GR over time for the first demonstration patient.

6 Simulation Study

In Section 5.2, we demonstrated that the personalized schedules, schedule future biopsies according to the historical data of each patient. However, we could not perform a full-scale comparison between personalized and PRIAS schedules, because the true time of GR was not known for the PRIAS patients. To this end, we conducted a simulation study comparing personalized schedules with PRIAS and annual schedule, whose details are presented next.

6.1 Simulation Setup

The population of AS patients in this simulation study is assumed to have the same entrance criteria as that of PRIAS. The PSA and hazard of GR for these patients follow a joint model of the form postulated in Section 5.1, with the only change that urn:x-wiley:15410420:media:biom12940:biom12940-math-0177 levels are used as the outcome. The population joint model parameters are equal to the posterior mean of parameters estimated from the corresponding joint model fitted to the PRIAS dataset. We intend to test the efficacy of different schedules for a population which has patients with both faster as well as slowly-progressing PCa. This rate of progression is not only manifested via PSA profiles but also via the baseline hazard. We assume that there are three equal sized subgroups urn:x-wiley:15410420:media:biom12940:biom12940-math-0178, urn:x-wiley:15410420:media:biom12940:biom12940-math-0179, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0180 of patients in the population, each with a baseline hazard from a Weibull distribution, with the following shape and scale parameters urn:x-wiley:15410420:media:biom12940:biom12940-math-0181): urn:x-wiley:15410420:media:biom12940:biom12940-math-0182, urn:x-wiley:15410420:media:biom12940:biom12940-math-0183, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0184 for urn:x-wiley:15410420:media:biom12940:biom12940-math-0185, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0186, respectively. The effect of these parameters is that the mean GR time is lowest in urn:x-wiley:15410420:media:biom12940:biom12940-math-0187 (fast PCa progression) and highest in urn:x-wiley:15410420:media:biom12940:biom12940-math-0188 (slow PCa progression).

From this population, we have sampled 500 datasets with 1000 patients each. We generate a true GR time for each of the patients, and then sample a set of PSA measurements at the same time points as given in PRIAS protocol (see Web Appendix C). We then split the dataset into a training (750 patients) and a test (250 patients) part, and generate a random and non-informative censoring time for the training patients. We next fit a joint model of the specification given in 7 and 8 to each of the 500 training datasets and obtain MCMC samples from the 500 sets of the posterior distribution of the parameters. Using these fitted joint models, we obtain the posterior predictive distribution of time of GR for each of the urn:x-wiley:15410420:media:biom12940:biom12940-math-0189 test patients. This distribution is further used to create personalized biopsy schedules for the test patients. For every test patient we conduct hypothetical biopsies using the following six types of schedules (abbreviated names in parenthesis): personalized schedules based on expected time of GR (Exp. GR time) and median time of GR (Med. GR time), personalized schedules based on dynamic risk of GR (Dyn. risk GR), a hybrid approach between median time of GR and dynamic risk of GR (Hybrid), PRIAS schedule and the annual schedule. The biopsies are conducted as per the algorithm in Figure 1.

To compare the aforementioned schedules we require estimates of the various measures of efficacy described in Section 4. To this end, for schedule S, we compute pooled estimates of mean offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0190 and variance of offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0191, as below (estimates for urn:x-wiley:15410420:media:biom12940:biom12940-math-0192 are similar):
urn:x-wiley:15410420:media:biom12940:biom12940-math-0193
where urn:x-wiley:15410420:media:biom12940:biom12940-math-0194 denotes the number of test patients, urn:x-wiley:15410420:media:biom12940:biom12940-math-0195 is the estimated mean and urn:x-wiley:15410420:media:biom12940:biom12940-math-0196 is the estimated variance of the offset for the k-th simulation. The offset for the l-th test patient of the k-th dataset is denoted by urn:x-wiley:15410420:media:biom12940:biom12940-math-0197.

6.2 Results

The pooled estimates of the aforementioned measures are summarized in Table 1. In addition, estimated values of urn:x-wiley:15410420:media:biom12940:biom12940-math-0198 are plotted against urn:x-wiley:15410420:media:biom12940:biom12940-math-0199 in Figure 3. The figure shows that across the schedules there is an inverse relationship between number urn:x-wiley:15410420:media:biom12940:biom12940-math-0200 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0201. For example, the annual schedule conducts on average 5.2 biopsies to detect GR, which is the highest among all schedules. However, it has the least average offset of 6 months as well. On the other hand, the schedule based on expected time of GR conducts only 1.9 biopsies on average to detect GR, the least among all schedules, but it also has the highest average offset of 15 months (similar for median time of GR). Since the annual schedule attempts to contain the offset within a year it has the least urn:x-wiley:15410420:media:biom12940:biom12940-math-0202 (Figure 5). However, to achieve this, it conducts a wide range of number of biopsies from patient to patient, i.e., highest urn:x-wiley:15410420:media:biom12940:biom12940-math-0203 (Figure 4). In this regard, schedules based on expected and median time of GR perform the opposite of annual schedule.

Table 1. Estimated mean and standard deviation (SD), of the number of biopsies urn:x-wiley:15410420:media:biom12940:biom12940-math-0204 conducted until Gleason reclassification (GR) is detected, and of the offset urn:x-wiley:15410420:media:biom12940:biom12940-math-0205 (difference in time at which GR is detected and the true time of GR, in months), for the simulated (500 datasets) test patients, across different schedules and subgroups. Patients in subgroup urn:x-wiley:15410420:media:biom12940:biom12940-math-0206 have the fastest prostate cancer progression rate, whereas patients in subgroup urn:x-wiley:15410420:media:biom12940:biom12940-math-0207 have the slowest progression rate. Types of personalized schedules (full names in brackets): Exp. GR time (expected time of GR), Med. GR time (median time of GR), Dyn. risk GR (schedules based on dynamic risk of GR), hybrid (a hybrid approach between median time of GR and dynamic risk of GR). Annual corresponds to a schedule of yearly biopsies and PRIAS corresponds to biopsies as per PRIAS protocol
a) All hypothetical subgroups
Schedule urn:x-wiley:15410420:media:biom12940:biom12940-math-0208 urn:x-wiley:15410420:media:biom12940:biom12940-math-0209 urn:x-wiley:15410420:media:biom12940:biom12940-math-0210 urn:x-wiley:15410420:media:biom12940:biom12940-math-0211
Annual 5.24 6.01 2.53 3.46
PRIAS 4.90 7.71 2.36 6.31
Dyn. risk GR 4.69 6.66 2.19 4.38
Hybrid 3.75 9.70 1.71 7.25
Med. GR time 2.06 13.88 1.41 11.80
Exp. GR time 1.92 15.08 1.19 12.11
Hypothetical subgroup urn:x-wiley:15410420:media:biom12940:biom12940-math-0212
Schedule urn:x-wiley:15410420:media:biom12940:biom12940-math-0213 urn:x-wiley:15410420:media:biom12940:biom12940-math-0214 urn:x-wiley:15410420:media:biom12940:biom12940-math-0215 urn:x-wiley:15410420:media:biom12940:biom12940-math-0216
Annual 4.32 6.02 3.13 3.44
PRIAS 4.07 7.44 2.88 6.11
Dyn. risk GR 3.85 6.75 2.69 4.44
Hybrid 3.25 10.25 2.16 8.07
Med. GR time 1.84 20.66 1.76 14.62
Exp. GR time 1.72 21.65 1.47 14.75
Hypothetical subgroup urn:x-wiley:15410420:media:biom12940:biom12940-math-0217
Schedule urn:x-wiley:15410420:media:biom12940:biom12940-math-0218 urn:x-wiley:15410420:media:biom12940:biom12940-math-0219 urn:x-wiley:15410420:media:biom12940:biom12940-math-0220 urn:x-wiley:15410420:media:biom12940:biom12940-math-0221
Annual 5.18 5.98 2.13 3.47
PRIAS 4.85 7.70 2.00 6.29
Dyn. risk GR 4.63 6.66 1.82 4.37
Hybrid 3.68 10.32 1.37 7.45
Med. GR time 1.89 12.33 1.16 9.44
Exp. GR time 1.77 13.54 0.98 9.83
Hypothetical subgroup urn:x-wiley:15410420:media:biom12940:biom12940-math-0222
Schedule urn:x-wiley:15410420:media:biom12940:biom12940-math-0223 urn:x-wiley:15410420:media:biom12940:biom12940-math-0224 urn:x-wiley:15410420:media:biom12940:biom12940-math-0225 urn:x-wiley:15410420:media:biom12940:biom12940-math-0226
Annual 6.20 6.02 1.76 3.46
PRIAS 5.76 7.98 1.71 6.51
Dyn. risk GR 5.58 6.58 1.56 4.33
Hybrid 4.32 8.55 1.26 5.91
Med. GR time 2.45 8.70 1.15 6.32
Exp. GR time 2.27 10.09 0.99 7.47
Details are in the caption following the image
Estimated mean number of biopsies conducted until Gleason reclassification (GR) is detected, and mean offset (difference in time at which GR is detected and the true time of GR, in months) for the simulated (500 datasets) test patients, across different schedules. Types of personalized schedules (full names in brackets): Exp. GR time (expected time of GR), Med. GR time (median time of GR), Dyn. risk GR (schedules based on dynamic risk of GR), hybrid (a hybrid approach between median time of GR and dynamic risk of GR). Annual corresponds to a schedule of yearly biopsies and PRIAS corresponds to biopsies as per PRIAS protocol.
Details are in the caption following the image
Boxplot showing variation in number of biopsies conducted by various biopsy schedules for the simulated (500 datasets) test patients. Biopsies are conducted until Gleason reclassification (GR) is detected. Types of personalized schedules (full names in brackets): Exp. GR Time (expected time of GR), Med. GR time (median time of GR), Dyn. risk GR (schedules based on dynamic risk of GR), hybrid (a hybrid approach between median time of GR and dynamic risk of GR). Annual corresponds to a schedule of yearly biopsies and PRIAS corresponds to biopsies as per PRIAS protocol.
Details are in the caption following the image
Boxplot showing variation in biopsy offset (difference in time at which Gleason reclassification, also known as GR, is detected and the true time of GR, in months) for the simulated (500 datasets) test patients, across different schedules. Types of personalized schedules (full names in brackets): Exp. GR time (expected time of GR), Med. GR time (median time of GR), Dyn. risk GR (schedules based on dynamic risk of GR), hybrid (a hybrid approach between median time of GR and dynamic risk of GR). Annual corresponds to a schedule of yearly biopsies and PRIAS corresponds to biopsies as per PRIAS protocol.

The PRIAS schedule conducts only 0.3 biopsies less than the annual schedule, but with a higher urn:x-wiley:15410420:media:biom12940:biom12940-math-0227, early detection is not always guaranteed. In comparison, the dynamic risk of GR-based schedule performs slightly better than the PRIAS schedule in all four criteria. The hybrid approach combines the benefits of methods with low urn:x-wiley:15410420:media:biom12940:biom12940-math-0228 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0229, and methods with low urn:x-wiley:15410420:media:biom12940:biom12940-math-0230 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0231. It conducts 1.5 biopsies less than the annual schedule on average and with a urn:x-wiley:15410420:media:biom12940:biom12940-math-0232 of 9.7 months it detects GR within a year since its occurrence. Moreover, it has both urn:x-wiley:15410420:media:biom12940:biom12940-math-0233 and urn:x-wiley:15410420:media:biom12940:biom12940-math-0234 comparable to PRIAS.

The performance of each schedule differs for the three subgroups urn:x-wiley:15410420:media:biom12940:biom12940-math-0235, and urn:x-wiley:15410420:media:biom12940:biom12940-math-0236. The annual schedule remains the most consistent across subgroups in terms of the offset, but it conducts two extra biopsies for the subgroup urn:x-wiley:15410420:media:biom12940:biom12940-math-0237 (slowly-progressing PCa) than urn:x-wiley:15410420:media:biom12940:biom12940-math-0238 (faster-progressing PCa). The performance of schedule based on expected time of GR is the most consistent in terms of the number of biopsies but it detects GR a year later on average in subgroup urn:x-wiley:15410420:media:biom12940:biom12940-math-0239 than urn:x-wiley:15410420:media:biom12940:biom12940-math-0240. For the dynamic risk of GR-based schedule and the hybrid schedule, the dynamics are similar to that of the annual schedule. Unlike the latter two schedules, the PRIAS schedule not only conducts more biopsies in urn:x-wiley:15410420:media:biom12940:biom12940-math-0241 than urn:x-wiley:15410420:media:biom12940:biom12940-math-0242 but also detects GR later in urn:x-wiley:15410420:media:biom12940:biom12940-math-0243 than urn:x-wiley:15410420:media:biom12940:biom12940-math-0244.

The choice of a suitable schedule using 5 depends on the chosen measure for evaluation of schedules. In this regard, the schedules we compared either have high urn:x-wiley:15410420:media:biom12940:biom12940-math-0245 and low urn:x-wiley:15410420:media:biom12940:biom12940-math-0246, or vice versa (Table 1). Thus, applying a cutoff on urn:x-wiley:15410420:media:biom12940:biom12940-math-0247 when urn:x-wiley:15410420:media:biom12940:biom12940-math-0248 is high may not be as fruitful (same for urn:x-wiley:15410420:media:biom12940:biom12940-math-0249) as applying a cutoff on urn:x-wiley:15410420:media:biom12940:biom12940-math-0250 or quantile(s) of urn:x-wiley:15410420:media:biom12940:biom12940-math-0251. For example, the schedule based on the dynamic risk of GR is suitable if on average the least number of biopsies are to be conducted to detect GR, while simultaneously making sure that at least 90% of the patients have an average offset less than 1 year.

7 Discussion

In this article, we presented personalized schedules based on joint models for time-to-event and longitudinal data, for surveillance of PCa patients. These schedules are dynamic in nature, and at any given follow-up time, utilize a patient's historical PSA measurements and repeat biopsies conducted up to that time. We proposed two types of personalized schedules, namely those based on expected and median time of GR of a patient, and those based on the dynamic risk of GR. We also proposed a combination (hybrid approach) of these two approaches, which is useful in scenarios where the variance of time of GR for a patient is high. We then proposed criteria for evaluation of various schedules and a method to select a suitable schedule.

We demonstrated the dynamic and personalized nature of our schedules using the PRIAS dataset. We observed that a recent biopsy impacts the schedules more than recent PSA measurements, which correlates with biopsies being more reliable. Since true GR time is not known for PRIAS patients, we conducted a simulation study to compare personalized schedules with PRIAS and annual schedules. The latter two schedules are already in practice. Hence, it can be argued that the maximum possible offsets due to these schedules (1 and 3 years, respectively) are acceptable to doctors. Thus, less frequent schedules with offset under 1 year may reduce the burden of biopsies while simultaneously being practical. For example, for slowly-progressing patients in our simulation study, we observed that the schedule based on expected time of GR conducts on average two biopsies and has an average offset of 10 months. In comparison, annual schedule conducts six biopsies on average and gives an offset smaller by only 4 months, making the personalized schedule a suitable alternative. For high-risk patients, however, early detection (annual or PRIAS schedule) may be necessary, given the rapidness of progression. When it is not known in advance if a patient will have a fast or slow-progression of PCa, the hybrid approach may be used. It conducts one biopsy less than the annual schedule in faster-progressing PCa patients and has an average offset of 10.25 months. For slowly-progressing PCa patients it conducts two biopsies less than the annual schedule and has an average offset of 8.55 months.

More personalized schedules can be added to the current set, using loss functions which asymmetrically penalize overshooting/undershooting the target GR time. For dynamic risk of GR-based schedules, more simulations are required to compare data-driven urn:x-wiley:15410420:media:biom12940:biom12940-math-0252 values (e.g., urn:x-wiley:15410420:media:biom12940:biom12940-math-0253 score), with urn:x-wiley:15410420:media:biom12940:biom12940-math-0254 chosen using decision analytic approaches such as the net benefit measure (Vickers and Elkin, 2006), and with various fixed urn:x-wiley:15410420:media:biom12940:biom12940-math-0255 values used by doctors in practice. In general, the Gleason scores are susceptible to inter-observer variation (Carlson et al., 1998). Schedules which account for error in the measurement of time of GR will be interesting to investigate further (Coley et al., 2017). Lastly, there is potential for including diagnostic information from magnetic resonance imaging (MRI) or DRE. When such information is not continuous in nature, our proposed methodology can be easily extended by utilizing the framework of generalized linear mixed models.

8 Supplementary Materials

Web Appendix A, B, and C, D referenced in Sections 2, 3.3, and 5, respectively, and the R code for fitting the joint model to the PRIAS dataset, and for the simulation study are available with this article at the Biometrics website on Wiley Online Library.

Acknowledgements

The first and last authors would like to acknowledge support by the Netherlands Organization for Scientific Research's VIDI grant nr. 016.146.301, and Erasmus MC funding. The authors also thank the Erasmus MC Cancer Computational Biology Center for giving access to their IT-infrastructure and software that was used for the computations and data analysis in this study. Lastly, we thank Frank-Jan H. Drost from the Department of Urology, Erasmus University Medical Center, for helping us in accessing the PRIAS data set.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.