Personalized schedules for surveillance of low-risk prostate cancer patients
Summary
Low-risk prostate cancer patients enrolled in active surveillance (AS) programs commonly undergo biopsies on a frequent basis for examination of cancer progression. AS programs employ a fixed schedule of biopsies for all patients. Such fixed and frequent schedules may schedule unnecessary biopsies. Since biopsies are burdensome, patients do not always comply with the schedule, which increases the risk of delayed detection of cancer progression. Motivated by the world's largest AS program, Prostate Cancer Research International Active Surveillance (PRIAS), we present personalized schedules for biopsies to counter these problems. Using joint models for time-to-event and longitudinal data, our methods combine information from historical prostate-specific antigen levels and repeat biopsy results of a patient, to schedule the next biopsy. We also present methods to compare personalized schedules with existing biopsy schedules.
1 Introduction
Prostate cancer (PCa) is the second most frequently diagnosed cancer (14% of all cancers) in males worldwide (Torre et al., 2015). The increase in diagnosis of low-grade PCa has been attributed to increase in life expectancy and increase in the number of screening programs (Potosky et al., 1995). An issue of screening programs that has also been established in other types of cancers (e.g., breast cancer) is over-diagnosis. To avoid overtreatment, patients diagnosed with low-grade PCa are commonly advised to join active surveillance (AS) programs. In order to delay serious treatments such as surgery, chemotherapy, or radiotherapy, in AS PCa progression is routinely examined via serum prostate-specific antigen (PSA) levels, digital rectal examination, medical imaging, and biopsy etc.
Biopsies are the most painful, prone to medical complications (Loeb et al., 2013) and yet also the most reliable PCa progression examination technique used in AS. When a patient's biopsy Gleason grading becomes larger than 6 (Gleason reclassification or GR), he is advised to switch from AS to active treatment (Bokhorst et al., 2015). Hence the timing of biopsies has significant medical implications. The world's largest AS program, Prostate Cancer Research International Active Surveillance (PRIAS) conducts biopsies at year 1, 4, 7, and 10 of follow-up, and every 5 years thereafter. However, it switches to a more frequent, annual biopsy schedule for faster-progressing patients. These are patients with PSA doubling time (PSA-DT) between 0 and 10 years, which is measured as the inverse of the slope of the regression line through the base two logarithm of PSA values. In contrast, many AS programs use annual schedule for all patients (Tosoian et al., 2011; Welty et al., 2015). Consequently, for slowly-progressing PCa patients many unnecessary biopsies are scheduled. Furthermore, patients may not always comply with such schedules (Bokhorst et al., 2015), which can lead to delayed detection of PCa and reduce the effectiveness of AS.
This article is motivated by the need to reduce the medical burden of repeat biopsies while simultaneously avoiding late detection of PCa progression. To this end, we intend to develop personalized schedules for biopsies using historical PSA measurements and biopsy results of patients. Personalized schedules for screening have received much interest in the literature, especially in the medical decision making context. For example, Markov decision process (MDP) models have been used to create personalized screening schedules for diabetic retinopathy (Bebu and Lachin, 2017), breast cancer (Ayer et al., 2012), cervical cancer (Akhavan-Tabatabaei et al., 2017), and colorectal cancer (Erenay et al., 2014). Another type of model called joint model for time-to-event and longitudinal data (Tsiatis and Davidian, 2004; Rizopoulos, 2012) has also been used to create personalized schedules for the measurement of longitudinal biomarkers (Rizopoulos et al., 2016). In the context of PCa, Zhang et al. (2012) have used partially observable MDP models to personalize the decision of (not) deferring a biopsy to the next check-up time during the screening process. This decision is based on the baseline characteristics as well as a discretized PSA level of the patient at the current check-up time.
In comparison to the work referenced above, the schedules we propose in this article account for the latent between-patient heterogeneity. We achieve this by using joint models, which are inherently patient-specific because they utilize random effects. Secondly, joint models allow a continuous time scale and utilize the entire history of PSA levels. Lastly, instead of making a binary decision of (not) deferring a biopsy to the next pre-scheduled check-up time, we schedule biopsies at a per-patient optimal future time. To this end, using joint models we first obtain a full specification of the joint distribution of PSA levels and time of GR. We then use it to define a patient-specific posterior predictive distribution of the time of GR, given the observed PSA measurements and repeat biopsies up to the current check-up time. Using the general framework of Bayesian decision theory, we propose a set of loss functions which are minimized to find the optimal time of conducting a biopsy. These loss functions yield us two categories of personalized schedules, those based on expected time of GR and those based on the risk of GR. In addition, we analyze an approach where the two types of schedules are combined. We also present methods to evaluate and compare the various schedules for biopsies.
The rest of the article is organized as follows. Section 2 briefly covers the joint modeling framework. Section 3 details the personalized scheduling approaches we have proposed in this article. In Section 4, we discuss methods for evaluation and selection of a schedule. In Section 5, we demonstrate the personalized schedules by employing them for the patients from the PRIAS program. Lastly, in Section 6, we present the results of a simulation study we conducted to compare personalized schedules with PRIAS and annual schedule.
2 Joint Model for Time-to-Event and Longitudinal Outcomes
We start with a short introduction of the joint modeling framework we will use in our following developments. Let denote the true GR time for the i-th patient and let S be the schedule of his biopsies. Let the vector of the time of biopsies be denoted by
, where
are the total number of biopsies conducted. Because biopsy schedules are periodical,
cannot be observed directly and it is only known to fall in an interval
, where
if GR is observed, and
if GR is not observed yet. Further let
denote the
vector of PSA levels for the i-th patient. For a sample of n patients the observed data is denoted by
.




















These formulations of postulate that the hazard of GR at time t may be associated with the underlying level
of the PSA at t, or with both the level and velocity
of the PSA at t. Lastly,
is the baseline hazard at time t, and is modeled flexibly using P-splines. The detailed specification of the baseline hazard, and parameter estimation using the Bayesian approach are presented in Web Appendix A of the supplementary material.
3 Personalized Schedules for Repeat Biopsies
We intend to use the joint model fitted to , to create personalized schedules of biopsies. To this end, let us assume that a schedule is to be created for a new patient j, who is not present in
. Let t be the time of his latest biopsy, and
denote his historical PSA measurements up to time s. The goal is to find the optimal time
of the next biopsy.
3.1 Posterior Predictive Distribution for Time to GR




The distribution depends on
and
via the posterior distribution of random effects
and posterior distribution of the vector of all parameters
, respectively.
3.2 Loss Functions



Various loss functions have been proposed in literature (Robert, 2007). The ones we utilize, and the corresponding motivations are presented next.

























In practice, for some patients, we may not have sufficient information to accurately estimate their PSA profile. The resulting high variance of could lead to a mean (or median) time of GR which overshoots the true
by a big margin. In such cases, the approach based on the dynamic risk of GR with smaller risk thresholds is more risk-averse and thus could be more robust to large overshooting margins. This consideration leads us to a hybrid approach, namely, to select u using dynamic risk of GR-based approach when the spread of
is large, while using
or
when the spread of
is small. What constitutes a large spread will be application-specific. In PRIAS, within the first 10 years, the maximum possible delay in detection of GR is 3 years. Thus, we propose that if the difference between the 0.025 quantile of
, and
or
is more than 3 years then proposals based on the dynamic risk of GR be used instead.
3.3 Estimation







Since there is no closed form solution available for the integrals in 3 and 4, we approximate them using Gauss-Kronrod quadrature (see Web Appendix B). The variance depends both on the last biopsy time t and the PSA history , as demonstrated in Section 5.2.
























3.4 Algorithm
When a biopsy gets scheduled at a time , then GR is not detected at u and at least one more biopsy is required at an optimal time
. This process is repeated until GR is detected. To aid in medical decision making, we elucidate this process via an algorithm in Figure 1. AS programs strongly advise that two biopsies have a gap of at least 1 year. Thus, when
, the algorithm postpones u to
, because it is the time nearest to u, at which the 1-year gap condition is satisfied.



4 Evaluation of Schedules
In order to compare various schedules of biopsies, we require measures of their efficacy. We propose to use two measures, namely the number of biopsies (burden) a schedule S conducts for the j-th patient to detect GR, and the offset
by which it overshoots
. The offset
is defined as
, where
is the time at which GR is detected. Our interest lies in the joint distribution
of the number of biopsies and the offset. The least burdensome scenario is when
and
. Hence, realistically we should select a schedule with a low mean number of biopsies
as well a low mean offset
. It is also desired that a schedule has a low variance for both the number of biopsies
, and offset
, so that the schedule works similarly for most patients.
4.1 Choosing a Schedule










The choice of and
is not easy, because the burden of a biopsy cannot be compared to a unit increase in offset easily. To obviate this problem we utilize the equivalence between compound and constrained optimal designs (Cook and Wong, 1994). More specifically, it can be shown that for any
and
there exists a constant
for which minimization of the loss function in 6 is equivalent to minimization of the loss function subject to the constraint that
. That is, a schedule which conducts at most C biopsies on average and detects GR earliest should be chosen. The choice of C could be based on the number of biopsies a patient is willing to undergo. In the more generic case in 5, a schedule can be chosen by minimizing
under the constraint
.
5 Demonstration of Personalized Schedules
To demonstrate the personalized schedules, we apply them to the patients enrolled in PRIAS study. To this end, we divide the PRIAS dataset into a training part (5264 patients) and a demonstration part (three patients). We fit a joint model to the training dataset and then use it to create schedules for the demonstration patients. We fit the joint model using the R package JMbayes (Rizopoulos, 2016), which uses the Bayesian approach for parameter estimation.
5.1 Fitting the Joint Model to the PRIAS Dataset











From the fitted joint model we found that velocity and the age at the time of inclusion in AS were significantly associated with the hazard of GR. For any patient, an increase in
velocity from −0.06 to 0.14 (first and third quartiles of the fitted velocities, respectively) corresponds to a 2.05 fold increase in the hazard of GR. In terms of the predictive performance, we found that the area under the receiver operating characteristic curves (Rizopoulos et al., 2017) was 0.61, 0.65, and 0.59 at years 1, 2, and 3 of follow-up, respectively. Parameter estimates are presented in detail in Web Appendix C.
In PRIAS, the interval in which GR is detected depends on the PSA-DT of the patient. However, because the parameters are estimated using a full likelihood approach (Tsiatis and Davidian, 2004), the joint model gives valid estimates for all of the parameters, under the condition that the model is correctly specified (see Web Appendix A.2 and C.3). To this end, we performed several sensitivity analysis in our model (e.g., changing the position of the knots, etc.) to investigate the fit of the model and also the robustness of the results. In all of our attempts, the same conclusions were reached, namely that the velocity of the longitudinal outcome is more strongly associated with the hazard of GR than the value.
5.2 Personalized Schedules for the First Demonstration Patient
We now demonstrate the functioning of the personalized schedules for the first demonstration patient (see Web Appendix D for the other two demonstration patients). The fitted and observed profile, time of latest biopsy, and proposed biopsy times u for him are shown in the top panel of Figure 2. We can see that with a consistently decreasing PSA and negative repeat biopsy between year 3 and 4.5, the proposed time of biopsy based on the dynamic risk of GR has increased from 3.05 years (
) to 14.73 years (
) in this period. The proposed time of biopsy based on expected time of GR has also increased from 14.40 to 15.97 years. We can also see in the bottom panel of Figure 2 that after each negative repeat biopsy,
decreases sharply. Thus, if the expected time of GR-based approach is used, then the offset
will be smaller on average for biopsies scheduled after the second repeat biopsy than those scheduled after the first repeat biopsy.



6 Simulation Study
In Section 5.2, we demonstrated that the personalized schedules, schedule future biopsies according to the historical data of each patient. However, we could not perform a full-scale comparison between personalized and PRIAS schedules, because the true time of GR was not known for the PRIAS patients. To this end, we conducted a simulation study comparing personalized schedules with PRIAS and annual schedule, whose details are presented next.
6.1 Simulation Setup
The population of AS patients in this simulation study is assumed to have the same entrance criteria as that of PRIAS. The PSA and hazard of GR for these patients follow a joint model of the form postulated in Section 5.1, with the only change that levels are used as the outcome. The population joint model parameters are equal to the posterior mean of parameters estimated from the corresponding joint model fitted to the PRIAS dataset. We intend to test the efficacy of different schedules for a population which has patients with both faster as well as slowly-progressing PCa. This rate of progression is not only manifested via PSA profiles but also via the baseline hazard. We assume that there are three equal sized subgroups
,
, and
of patients in the population, each with a baseline hazard from a Weibull distribution, with the following shape and scale parameters
):
,
, and
for
, and
, respectively. The effect of these parameters is that the mean GR time is lowest in
(fast PCa progression) and highest in
(slow PCa progression).
From this population, we have sampled 500 datasets with 1000 patients each. We generate a true GR time for each of the patients, and then sample a set of PSA measurements at the same time points as given in PRIAS protocol (see Web Appendix C). We then split the dataset into a training (750 patients) and a test (250 patients) part, and generate a random and non-informative censoring time for the training patients. We next fit a joint model of the specification given in 7 and 8 to each of the 500 training datasets and obtain MCMC samples from the 500 sets of the posterior distribution of the parameters. Using these fitted joint models, we obtain the posterior predictive distribution of time of GR for each of the test patients. This distribution is further used to create personalized biopsy schedules for the test patients. For every test patient we conduct hypothetical biopsies using the following six types of schedules (abbreviated names in parenthesis): personalized schedules based on expected time of GR (Exp. GR time) and median time of GR (Med. GR time), personalized schedules based on dynamic risk of GR (Dyn. risk GR), a hybrid approach between median time of GR and dynamic risk of GR (Hybrid), PRIAS schedule and the annual schedule. The biopsies are conducted as per the algorithm in Figure 1.








6.2 Results
The pooled estimates of the aforementioned measures are summarized in Table 1. In addition, estimated values of are plotted against
in Figure 3. The figure shows that across the schedules there is an inverse relationship between number
and
. For example, the annual schedule conducts on average 5.2 biopsies to detect GR, which is the highest among all schedules. However, it has the least average offset of 6 months as well. On the other hand, the schedule based on expected time of GR conducts only 1.9 biopsies on average to detect GR, the least among all schedules, but it also has the highest average offset of 15 months (similar for median time of GR). Since the annual schedule attempts to contain the offset within a year it has the least
(Figure 5). However, to achieve this, it conducts a wide range of number of biopsies from patient to patient, i.e., highest
(Figure 4). In this regard, schedules based on expected and median time of GR perform the opposite of annual schedule.




a) All hypothetical subgroups | ||||
---|---|---|---|---|
Schedule | ![]() |
![]() |
![]() |
![]() |
Annual | 5.24 | 6.01 | 2.53 | 3.46 |
PRIAS | 4.90 | 7.71 | 2.36 | 6.31 |
Dyn. risk GR | 4.69 | 6.66 | 2.19 | 4.38 |
Hybrid | 3.75 | 9.70 | 1.71 | 7.25 |
Med. GR time | 2.06 | 13.88 | 1.41 | 11.80 |
Exp. GR time | 1.92 | 15.08 | 1.19 | 12.11 |
Hypothetical subgroup ![]() |
||||
Schedule | ![]() |
![]() |
![]() |
![]() |
Annual | 4.32 | 6.02 | 3.13 | 3.44 |
PRIAS | 4.07 | 7.44 | 2.88 | 6.11 |
Dyn. risk GR | 3.85 | 6.75 | 2.69 | 4.44 |
Hybrid | 3.25 | 10.25 | 2.16 | 8.07 |
Med. GR time | 1.84 | 20.66 | 1.76 | 14.62 |
Exp. GR time | 1.72 | 21.65 | 1.47 | 14.75 |
Hypothetical subgroup ![]() |
||||
Schedule | ![]() |
![]() |
![]() |
![]() |
Annual | 5.18 | 5.98 | 2.13 | 3.47 |
PRIAS | 4.85 | 7.70 | 2.00 | 6.29 |
Dyn. risk GR | 4.63 | 6.66 | 1.82 | 4.37 |
Hybrid | 3.68 | 10.32 | 1.37 | 7.45 |
Med. GR time | 1.89 | 12.33 | 1.16 | 9.44 |
Exp. GR time | 1.77 | 13.54 | 0.98 | 9.83 |
Hypothetical subgroup ![]() |
||||
Schedule | ![]() |
![]() |
![]() |
![]() |
Annual | 6.20 | 6.02 | 1.76 | 3.46 |
PRIAS | 5.76 | 7.98 | 1.71 | 6.51 |
Dyn. risk GR | 5.58 | 6.58 | 1.56 | 4.33 |
Hybrid | 4.32 | 8.55 | 1.26 | 5.91 |
Med. GR time | 2.45 | 8.70 | 1.15 | 6.32 |
Exp. GR time | 2.27 | 10.09 | 0.99 | 7.47 |



The PRIAS schedule conducts only 0.3 biopsies less than the annual schedule, but with a higher , early detection is not always guaranteed. In comparison, the dynamic risk of GR-based schedule performs slightly better than the PRIAS schedule in all four criteria. The hybrid approach combines the benefits of methods with low
and
, and methods with low
and
. It conducts 1.5 biopsies less than the annual schedule on average and with a
of 9.7 months it detects GR within a year since its occurrence. Moreover, it has both
and
comparable to PRIAS.
The performance of each schedule differs for the three subgroups , and
. The annual schedule remains the most consistent across subgroups in terms of the offset, but it conducts two extra biopsies for the subgroup
(slowly-progressing PCa) than
(faster-progressing PCa). The performance of schedule based on expected time of GR is the most consistent in terms of the number of biopsies but it detects GR a year later on average in subgroup
than
. For the dynamic risk of GR-based schedule and the hybrid schedule, the dynamics are similar to that of the annual schedule. Unlike the latter two schedules, the PRIAS schedule not only conducts more biopsies in
than
but also detects GR later in
than
.
The choice of a suitable schedule using 5 depends on the chosen measure for evaluation of schedules. In this regard, the schedules we compared either have high and low
, or vice versa (Table 1). Thus, applying a cutoff on
when
is high may not be as fruitful (same for
) as applying a cutoff on
or quantile(s) of
. For example, the schedule based on the dynamic risk of GR is suitable if on average the least number of biopsies are to be conducted to detect GR, while simultaneously making sure that at least 90% of the patients have an average offset less than 1 year.
7 Discussion
In this article, we presented personalized schedules based on joint models for time-to-event and longitudinal data, for surveillance of PCa patients. These schedules are dynamic in nature, and at any given follow-up time, utilize a patient's historical PSA measurements and repeat biopsies conducted up to that time. We proposed two types of personalized schedules, namely those based on expected and median time of GR of a patient, and those based on the dynamic risk of GR. We also proposed a combination (hybrid approach) of these two approaches, which is useful in scenarios where the variance of time of GR for a patient is high. We then proposed criteria for evaluation of various schedules and a method to select a suitable schedule.
We demonstrated the dynamic and personalized nature of our schedules using the PRIAS dataset. We observed that a recent biopsy impacts the schedules more than recent PSA measurements, which correlates with biopsies being more reliable. Since true GR time is not known for PRIAS patients, we conducted a simulation study to compare personalized schedules with PRIAS and annual schedules. The latter two schedules are already in practice. Hence, it can be argued that the maximum possible offsets due to these schedules (1 and 3 years, respectively) are acceptable to doctors. Thus, less frequent schedules with offset under 1 year may reduce the burden of biopsies while simultaneously being practical. For example, for slowly-progressing patients in our simulation study, we observed that the schedule based on expected time of GR conducts on average two biopsies and has an average offset of 10 months. In comparison, annual schedule conducts six biopsies on average and gives an offset smaller by only 4 months, making the personalized schedule a suitable alternative. For high-risk patients, however, early detection (annual or PRIAS schedule) may be necessary, given the rapidness of progression. When it is not known in advance if a patient will have a fast or slow-progression of PCa, the hybrid approach may be used. It conducts one biopsy less than the annual schedule in faster-progressing PCa patients and has an average offset of 10.25 months. For slowly-progressing PCa patients it conducts two biopsies less than the annual schedule and has an average offset of 8.55 months.
More personalized schedules can be added to the current set, using loss functions which asymmetrically penalize overshooting/undershooting the target GR time. For dynamic risk of GR-based schedules, more simulations are required to compare data-driven values (e.g.,
score), with
chosen using decision analytic approaches such as the net benefit measure (Vickers and Elkin, 2006), and with various fixed
values used by doctors in practice. In general, the Gleason scores are susceptible to inter-observer variation (Carlson et al., 1998). Schedules which account for error in the measurement of time of GR will be interesting to investigate further (Coley et al., 2017). Lastly, there is potential for including diagnostic information from magnetic resonance imaging (MRI) or DRE. When such information is not continuous in nature, our proposed methodology can be easily extended by utilizing the framework of generalized linear mixed models.
8 Supplementary Materials
Web Appendix A, B, and C, D referenced in Sections 2, 3.3, and 5, respectively, and the R code for fitting the joint model to the PRIAS dataset, and for the simulation study are available with this article at the Biometrics website on Wiley Online Library.
Acknowledgements
The first and last authors would like to acknowledge support by the Netherlands Organization for Scientific Research's VIDI grant nr. 016.146.301, and Erasmus MC funding. The authors also thank the Erasmus MC Cancer Computational Biology Center for giving access to their IT-infrastructure and software that was used for the computations and data analysis in this study. Lastly, we thank Frank-Jan H. Drost from the Department of Urology, Erasmus University Medical Center, for helping us in accessing the PRIAS data set.