Volume 27, Issue 6 pp. 658-666
RESEARCH PAPER
Full Access

Misleading prioritizations from modelling range shifts under climate change

Helen R. Sofaer

Corresponding Author

Helen R. Sofaer

U.S. Geological Survey, Fort Collins Science Center, Fort Collins, Colorado

Correspondence Helen R. Sofaer, U.S. Geological Survey, Fort Collins Science Center, 2150 Centre Avenue, Building C, Fort Collins, CO 80526, U.S.A. Email: [email protected]Search for more papers by this author
Catherine S. Jarnevich

Catherine S. Jarnevich

U.S. Geological Survey, Fort Collins Science Center, Fort Collins, Colorado

Search for more papers by this author
Curtis H. Flather

Curtis H. Flather

U.S. Department of Agriculture Forest Service, Rocky Mountain Research Station, Fort Collins, Colorado

Search for more papers by this author
First published: 09 March 2018
Citations: 46

Funding information: U.S. Geological Survey

Abstract

Aim

Conservation planning requires the prioritization of a subset of taxa and geographical locations to focus monitoring and management efforts. Integration of the threats and opportunities posed by climate change often relies on predictions from species distribution models, particularly for assessments of vulnerability or invasion risk for multiple taxa. We evaluated whether species distribution models could reliably rank changes in species range size under climate and land use change.

Location

Conterminous U.S.A.

Time period

1977–2014.

Major taxa studied

Passerine birds.

Methods

We estimated ensembles of species distribution models based on historical North American Breeding Bird Survey occurrences for 190 songbirds, and generated predictions to recent years given c. 35 years of observed land use and climate change. We evaluated model predictions using standard metrics of discrimination performance and a more detailed assessment of the ability of models to rank species vulnerability to climate change based on predicted range loss, range gain, and overall change in range size.

Results

Species distribution models yielded unreliable and misleading assessments of relative vulnerability to climate and land use change. Models could not accurately predict range expansion or contraction, and therefore failed to anticipate patterns of range change among species. These failures occurred despite excellent overall discrimination ability and transferability to the validation time period, which reflected strong performance at the majority of locations that were either always or never occupied by each species.

Main conclusions

Models failed for the questions and at the locations of greatest interest to conservation and management. This highlights potential pitfalls of multi-taxa impact assessments under global change; in our case, models provided misleading rankings of the most impacted species, and spatial information about range changes was not credible. As modelling methods and frameworks continue to be refined, performance assessments and validation efforts should focus on the measures of risk and vulnerability useful for decision-making.

1 INTRODUCTION

Climate change impact assessments can be used to direct monitoring and management towards the taxa that are expected to pose invasion risks or face population declines under climate change, and to inform spatial management and reserve design priorities. Assessments of species vulnerability to climate change have typically been based on species traits and/or quantitative models of physiology, demography, or distributions (Pacifici et al., 2015). For assessments of many taxa, correlative species distribution models (SDMs) have been (Guisan & Thuiller, 2005) and continue to be (Evans, Merow, Record, McMahon, & Enquist, 2016) the predominant tool used to evaluate broad-scale climate change impacts by projecting future distributions (Langham, Schuetz, Distler, Soykan, & Wilsey, 2015; Thuiller, Richardson et al., 2005). The ease of implementation and the ability to produce spatially explicit projections have led to the continued use of SDMs despite longstanding questions about their biological realism and reliability (Bahn & McGill, 2007; Pearson & Dawson, 2003). The integration of predictions from SDMs into conservation planning faces several challenges, including perceived relevance and reliability for conservation and management decision-making (Guisan et al., 2013; Sinclair, White, & Newell, 2010). An outstanding question is whether common summaries used to represent species vulnerability and risk provide a reliable basis for conservation prioritization.

Validation of model projections under climate change is inherently difficult because we do not know the future. Studies have used simulations (e.g., Elith & Graham, 2009), fossil records (e.g., Maguire et al., 2016) and long-term data or temporal partitioning (e.g., Araújo, Pearson, Thuiller, & Erhard, 2005; Brun, Kiørboe, Licandro, & Payne, 2016; Morán-Ordóñez, Lahoz-Monfort, Elith, & Wintle, 2017) to test model assumptions, identify reliable approaches, and validate predicted distributional patterns. Results of these validation studies have been mixed, with both optimistic and pessimistic findings regarding model transferability. However, model evaluations have largely focused on discrimination performance rather than on the metrics used to summarize relative vulnerability, primarily proportional range loss and change in range size. Overall discrimination may be an insufficient performance metric because accuracy at sites where range gain or loss occurred can be lower (perhaps much lower) than at consistently occupied or unoccupied locations (Rapacciuolo et al., 2012). Although climate-induced range shifts are increasingly observed (Chen, Hill, Ohlemüller, Roy, & Thomas, 2011) and, in turn, affect ecosystem function and services (Pecl et al., 2017), it remains unclear whether SDMs can anticipate these shifts and reliably sort species by their relative climate change vulnerability. If so, then the use of multispecies correlative SDMs in management planning would be supported. Indeed, consistent bias in projections of absolute range change may be tolerable provided that vulnerability rankings are consistent and reliable (Wright, Hijmans, Schwartz, & Shaffer, 2015), as these rankings would still identify species that should receive monitoring or management priority.

We used a long-term dataset of North American birds, the Breeding Bird Survey (BBS; Pardieck, Ziolkowski, & Hudson, 2015), to test the ability of distribution models to generate reliable rankings of species vulnerability and risk under climate change. We fitted ensembles of SDMs to historical presence–absence data and generated predicted occurrences in recent years based on c. 35 years of observed climate and land use change. We compared avian survey data in recent years with model predictions, focusing on a set of 512 routes that were consistently surveyed in both the historical and the recent period. We compared predicted and observed patterns of range loss and change in range size, and evaluated model accuracy under different methodological decisions. Our analyses focused on the following questions: (a) can models consistently identify the most vulnerable species; and (b) can models predict where range expansion and contraction are likely to occur?

2 METHODS

We summarized occurrence of passerine bird species at BBS routes in the conterminous U.S. during historical (1977–1979) and recent (2012–2014) periods. We used 3-year periods to minimize the effects of imperfect detection, such that only routes surveyed for all three years contributed absences for each time period. The selected years were chosen to provide a continental extent and a large number of routes that were surveyed in all years, while maintaining a long interval between historical and recent periods. We considered a species present on a c. 40 km roadside route if it was detected on at least one stop (out of 50 systematically arranged stops) in at least one of the 3 years. Validation routes (n = 512) were defined as those that were surveyed during all 6 years of the historical and recent periods (Supporting Information Figure S1) and were used to compare predicted and observed changes in species presence. Our analysis was limited to the 190 passerine species that were detected on at least 30 routes during the historical period and on at least 10 of the validation routes during either time period.

We evaluated how model accuracy was affected by methodological decisions regarding the selection of climate covariates, the inclusion of land use covariates, and the spatial extent at which species absences were included in model estimation. We developed three groups of climate covariates based on annual, quarterly or all bioclimatic variables, and estimated models with only climate covariates and with both climate and land use covariates (Supporting Information Table S1). We extracted climate and land use covariates within allometrically sized buffers surrounding each route centroid (see Supporting Information for details of these and other methods and for comprehensive results). Bioclimatic covariates were based on PRISM data (PRISM Climate Group, 2014), using a 4-year time period corresponding to each set of avian observations and the prior year, to capture potential lag effects. Land use covariates were the proportion of the buffer surrounding each route in developed and conservation or low human use classes based on the 1974 and 2012 versions of the U.S. conterminous wall-to-wall anthropogenic land use trends dataset (NWALT; Falcone, 2015). The NWALT dataset was selected because it was designed to estimate land use change. For each covariate set, we fitted models based on two versions of the response dataset; both versions included all routes where a given species was present historically. In one case, absences from routes across the conterminous U.S. were included, and in the other case, absences were geographically restricted to routes within 600 km of a historically occupied route. Data are available online (https://doi.org/10.5066/F7NS0S4R).

We estimated ensembles of distribution models for each species based on eight correlative modelling methods suitable for presence–absence data using the biomod2 (Thuiller, Georges, Engler, & Breiner, 2016) package in R (R Core Team, 2016). These correlative modelling approaches were regression- or tree-based methods, including those found to perform well (Elith et al., 2006); MaxEnt was not included because we had presence–absence data, rather than only presences. Models were fitted with historical data and predicted to historical and recent years; we developed an ensemble by weighting predictions proportionally by the area under the receiver operating characteristic curve (AUC; Marmion, Parviainen, Luoto, Heikkinen, & Thuiller, 2009). Continuous predictions were thresholded to binary predictions by maximizing sensitivity plus specificity in cross-validation. To evaluate whether poor calibration might underlie problems with predicting relative vulnerability, we also thresholded continuous predictions using the optimal cut-off point for each period (which studies would not know in practice for the future).

Species vulnerability was quantified as the proportional range loss and change in range size (i.e., change in the number of occupied routes, considering both loss and gain), which were calculated based on the 512 validation routes. These metrics of vulnerability are commonly used for assessing species-level climate change impacts (Broennimann et al., 2006; Buisson, Thuiller, Casajus, Lek, & Grenouillet, 2010; Erasmus, Van Jaarsveld, Chown, Kshatriya, & Wessels, 2002; Thuiller, Lavorel, Araújo, Sykes, & Prentice, 2005). Observed change in range size was correlated with BBS estimates of species population trends (Sauer et al., 2014), indicating that this metric captured observed population trajectories (Supporting Information Figure S2).

For each set of methodological decisions (i.e., estimation area, climate covariates, and inclusion of land use; Supporting Information Table S1), we ranked predictions of relative vulnerability among species. To compare performance among model ensembles based on these different decisions, we calculated the absolute value of the difference between predicted and observed range loss for each species and each set of model predictions, and fitted generalized linear models to evaluate the effects of different methodological decisions (see Supporting Information; Methods). Likewise, we modelled how methodological choices affected the magnitude of the difference between the predicted and observed change in range size.

Species distribution models provide spatially explicit predictions of habitat suitability, which can be used to inform spatial conservation planning decisions. We evaluated whether model ensembles could identify the locations where range gain and loss occurred. For each species, we selected the set of survey routes that were observed to be historically occupied and which had the lowest predicted suitability in the recent period; across species, we then tested whether more of these routes showed range contraction, rather than stable occupancy (see Supporting Information; Methods). To evaluate whether the predicted direction and magnitude of range shifts aligned qualitatively with observations, we compared the geographical coordinates of the centre of occupied validation routes for observed and predicted range shifts between the historical and recent time periods. We also visualized predictive ability relative to multivariate climatic dissimilarity (Mahalanobis distance) from historically occupied routes; this was used to evaluate whether prediction was most difficult at climatically marginal versus climatically distant survey routes.

Finally, we evaluated whether variation in model performance for range loss and range change could be explained by variation in four species attributes: the size of each species' breeding and resident range in the conterminous U.S.A. (in square kilometres); the proportion of the total breeding and resident range located within the conterminous U.S.A.; estimated long-term population trend based on BBS data (Sauer et al., 2014); and migratory strategy (neotropical migrant, short-distance migrant, or resident). We predicted that model performance would increase with a larger proportion of the species' range within the conterminous U.S.A. because estimation would encompass more of the species' occupied environmental conditions. We predicted better performance for resident species compared with migrants, because migrants can be limited by processes during non-breeding periods that would not be captured by breeding area land use or climate change. Predictions regarding total range size and population trend were less straightforward; for example, discrimination can be poor for wide-ranging generalist species (Evangelista et al., 2008; Hernandez, Graham, Master, & Albert, 2006), but conversely, metrics of range loss and range change may be noisy for species with few validation routes, and the risk of overfitting may increase for species with restricted ranges (Thorne et al., 2013).

3 RESULTS

Predicted range loss and change in range size were highly sensitive to modelling decisions regarding the selection of covariates and the geographical extent from which absences were drawn (Supporting Information Figure S3). Predictions from model ensembles based on a covariate set that included annual climate and land use, and a model estimation extent that included absences from routes across the entire conterminous U.S.A., showed the smallest differences between observed and predicted vulnerability across all species (see Supporting Information for model performance comparison: Figures S4 and S5; Table S1). Based on this model set, we explored the ability of the ensemble to predict relative vulnerability among species and to identify the routes where range loss and gain occurred.

Species distribution models showed a striking inability to identify which species were observed to have the greatest range loss or change in range size between the historical and recent periods (Figure 1). Nevertheless, the performance metrics typically used to validate SDMs yielded a favourable view of model performance and transferability. Model ensembles generally showed excellent discrimination ability, even when using temporally independent validation routes to estimate model performance (median AUC for predictions to recent years was 0.96 for focal model set; Supporting Information Figures S5 and S6). Discrimination performance during the recent period remained high when excluding distant absences that could lead to overly optimistic performance assessments (Supporting Information Figure S6; median AUC = 0.92).

Details are in the caption following the image

Model predictions of range loss and change in range size were not consistently related to observed patterns of loss and change in range size across species. Each point represents a species, with point sizes scaled according to the observed number of occupied validation routes during the historical period. A positive correlation would be seen in each panel if models correctly identified the most vulnerable species. Species ranks were calculated such that one represented the most vulnerable species (i.e., greatest proportional range loss, greatest relative decline in range size). Results are shown for models estimated using absences from the entire conterminous U.S.A., with the set of annual climate covariates and land use

Model ensembles did not capture relative vulnerability among species because of systematic failures to predict observed changes in occupancy correctly. Specifically, binary predictions for the recent period were usually correct for routes that were always or never occupied by a given species, but were usually wrong for routes where a gain or loss of occupancy was observed (Figure 2). Binary predictions often indicated that a focal species would always or never occur at locations where gain or loss was observed (Figure 2a). Model performance evaluations focused on predictions to geographically restricted validation routes (see Supporting Information Methods; model estimation was based on the conterminous U.S.A.) to ensure that performance metrics were not inflated by distant routes with true absences (Lobo, Jiménez-Valverde, & Real, 2008). Poor model calibration could also not explain these differences fully, because applying the optimal threshold for each period improved predictions for the most prevalent species but did not align overall predicted and observed vulnerability ranks (Supporting Information Figure S7).

Details are in the caption following the image

Distribution model predictions were less accurate for routes where a species was observed to have gained or lost occupancy, compared with routes where a species was always or never present. (a) Across all species and routes, those that were gained or lost were often predicted to be always or never occupied. (b) Boxplots showing variation among species in the mean change in continuous predictions for routes that were gained, lost, always or never occupied. (c) Boxplots showing variation among species in the proportion of routes in each observed category that were correctly classified. All summaries are based only on predictions to validation routes within the geographically restricted area, so distant routes that were never occupied were not included. Model estimation included absences from the conterminous U.S.A

Of each species' historically occupied routes, those with the lowest predicted suitability in recent years were no more likely to be lost than to remain occupied; likewise, the routes predicted to be most suitable among observed historically unoccupied routes were no more likely to be gained than to remain unoccupied (see Supporting Information Results). Failure to identify correctly those locations where species experienced range contraction or expansion led to unreliable predictions of the direction and magnitude of shifts in the centre of each species' occupied range (Figure 3; Supporting Information Figure S8). Model predictions were often incorrect for routes at intermediate climatic distances to a species' historically occupied range, whereas routes with climate distances ‘far’ from the historically occupied range were often correctly predicted to be consistently unoccupied (Figure 4). Distribution model performance for both range loss and range change was lower for species with smaller range sizes, and declining species also showed lower performance for range loss (Supporting Information Figure S11; Supporting Information Results).

Details are in the caption following the image

The direction and distance of predicted range shifts (gold arrows) did not align well with the observed change in the geographical centre of occupied validation routes (blue arrows). The origin of each arrow set represents the geographical centre of occupied validation routes during the historical period for a particular species. Results are based on predictions to the 512 validation routes, from models estimated from data throughout the conterminous U.S.A. with the covariate set including annual climate and land use. See Supporting Information Figure S8 for results with geographically restricted estimation and validation and all covariate sets

Details are in the caption following the image

Changes in continuous ensemble predictions at validation routes relative to changes in climatic Mahalanobis distance between the historical (start of arrow) and recent (arrowhead) time periods for nine example species. Purple arrows correspond to routes at which occupancy was correctly predicted during both time periods; orange arrows correspond to routes at which binary predictions were wrong during one or both periods. Horizontal lines correspond to the cut-off used to threshold predictions. Selected species differed in predicted and observed vulnerability (Supporting Information Figure S9). A common pattern was for model predictions to be less accurate at intermediate Mahalanobis distance for each species, which often included routes at which range expansion or contraction occurred (Supporting Information Figure S10). Results are shown for a model set that included only climate covariates

4 DISCUSSION

Species distribution models have been a primary tool in multispecies climate change impact assessments that aim to identify taxa in need of monitoring and management attention (Langham et al., 2015; Thuiller, Richardson et al., 2005). Our results were consistent with previous findings that SDMs have good to excellent discrimination ability in new time periods (Araújo et al., 2005; Morán-Ordóñez et al., 2017), but highlight a striking inability to translate this performance into reliable predictions of relative range expansion and contraction among species (Figure 1). Prediction was most difficult in locations where range contraction or expansion was observed (Figure 2), yet these areas of environmental and geographical space are of great interest for management and conservation under climate change. The finding that model ensembles could not identify the survey routes most likely to be gained or lost (see Supporting Information Methods and Results) suggests that output from correlative SDMs based on generic covariates across species may have limited utility for the identification of vulnerable species and the incorporation of climate projections into spatially explicit conservation planning.

The failure of SDMs to predict vulnerability could reflect one of the major assumptions inherent in distribution modelling, which is that species ranges are limited by climatic and other variables included in the model covariate set (Dormann, 2007). Models that are not tailored to a species of interest may capture correlates of range limits, rather than factors truly limiting distributions, and can make qualitatively different predictions from models based on species-specific expertise (Thorne et al., 2013). The validation of predictions of range change based on higher quality models for individual species remains necessary. Multispecies modelling exercises may often fail to identify which species will show substantial range shifts and where such shifts will occur (Scherrer, Massy, Meier, Vittoz, & Guisan, 2017). Unfortunately, it is this information that would be most useful for informing prioritization decisions surrounding the allocation of resources among species and locations (Sinclair et al., 2010). We found that realistic modelling strategies expected to improve performance across species, such as including land use covariates, were insufficient to align predicted and observed rankings of species by range loss and change in range size.

Our results suggest that predictive errors did not arise simply from extrapolation in climate space, and that neither range expansion nor contraction was predicted more reliably than the other. We found that model ensembles often struggled to predict occupancy correctly at routes that were intermediate in multivariate climatic distance from each species' historically occupied routes (Figure 4; see Supporting Information Methods); the routes at intermediate climatic distances included many of those where a change in occupancy was observed (Supporting Information Figure S10). This pattern suggests that errors largely arose not from extrapolation, but rather in potentially marginal conditions, where a complex interplay between abiotic and biotic factors, phenotypic plasticity, and dispersal patterns shapes observed occupancy. The inability of our models to identify where range expansion and contraction occurred also translated into an inability to predict the overall magnitude and direction of range shifts (Figure 3). Previous work has suggested that models may overestimate vulnerability and be more accurate for predicting range gain than range loss (Schwartz, 2012), but our results showed poor performance at predicting both expansion and contraction (Figure 2). Models had lower performance for declining species and those with smaller ranges (Supporting Information Figure S11). Our work also supports the importance of model calibration and suggests potential interactions between calibration and prevalence, because using the optimal calibration for each time period improved rankings of range change for common species (Supporting Information Figure S7).

The lack of agreement between observed and predicted vulnerability is of particular concern because multiple attributes of our study should have led to an optimistic view of the utility of correlative SDMs, including: (a) basing our work on high-quality presence–absence data collected via standardized methods, which is generally more accurate than basing models on presence-only data (Boitani et al., 2011; Brotons, Thuiller, Araújo, & Hirzel, 2004); (b) predicting presence–absence, which can be easier than predicting abundance (Bahn & McGill, 2013); (c) excluding species with < 30 historical presence records (Wisz et al., 2008); (d) focusing on a taxonomic group with high dispersal capacity should have reduced the time-lag in species responses, enhancing alignment between predictions and observations (Zurell, Jeltsch, Dormann, & Schröder, 2009), although lags also reflect avian responses to vegetative communities (Yackulic & Ginsberg, 2016); (e) basing models on land use and climate data from the historical and recent periods, thereby avoiding the uncertainties associated with future projections (Northrop & Chandler, 2014); and (f) including validation routes in model estimation, so that validation was in time only, rather than in both space and time. Each of these characteristics should have favoured reliable model predictions.

Our results highlight a striking incongruity between models' excellent discrimination ability and poor capacity to predict range change. Models both failed to anticipate observed changes and predicted change where none was observed. We designed our methods to reflect approaches often used in multi-taxa studies, although we summarized climate data over 4 years, rather than the typical 30-year period. The implications of this decision could be explored further, but standard performance criteria indicated that continental climatic patterns during those 4 years did allow models to discriminate between species' presences and absences. Our models appeared to have high transferability, because discrimination performance remained high during the validation time period. However, the low reliability of predicted range expansion and contraction over a 35-year historical period should caution against the uncritical use of SDMs to quantify relative risk and vulnerability among species, in the context of both ongoing climate change and for risk assessment among potential invasive species.

We suggest that ecological modellers and conservation practitioners must re-evaluate how best to approach multispecies vulnerability and risk assessments under climate and land use change. In particular, more attention must be given to performance assessment designs that capture accuracy and bias along the dimensions of intended use (Araújo & Guisan, 2006; Loiselle et al., 2003; Rapacciuolo et al., 2012). Studies aimed at validating SDMs in the context of risk and vulnerability should focus explicitly on the quality of predictions at locations where occupancy status changed. The type of validation we apply here should be used to evaluate more advanced modelling methods, including those that account for spatial autocorrelation (Dormann et al., 2007) and patterns of species co-occurrence (Kissling et al., 2012). Dynamic occupancy models explicitly estimate colonization and extinction and could improve predictions compared with static SDMs (Naujokaitis-Lewis & Fortin, 2016; Yackulic, Nichols, Reid, & Der, 2015). In addition, trait-based approaches may be useful for identifying vulnerable species (Pearson et al., 2014) and, when available, data on spatio-temporal variation in demography could improve predictions of range change (Merow et al., 2014). Future work should evaluate whether life history and natural history traits can reliably predict species' sensitivity to climate and other environmental factors, because models that integrate these relationships can derive spatial predictions of distributional change (Brown et al., 2014; Evans et al., 2016). It will be crucial to validate whether existing and emerging modelling methods effectively identify both the species that should receive conservation priority and the locations where range contraction and expansion are anticipated.

Identification of the taxa that are most sensitive to climate change, in terms of either population declines or invasion risk, should increase both the efficiency and the efficacy of conservation and management. Our study highlights how one of the primary tools used in climate change impact assessments could not reliably predict relative risk and vulnerability to distributional changes. Correlative SDMs have been criticized and questioned (Evans et al., 2016; McMahon et al., 2011; Pearson & Dawson, 2003), but may be considered the only available or practical tools for deriving spatial information about potential future distributions of many taxa, particularly those for which only presence information is available. The hope for these models is that they can provide actionable information that can complement other sources of information available to decision-makers. Our work raises the question of whether these tools effectively inform priority setting and whether species risk and vulnerability ranks derived from correlative SDMs can improve conservation and management planning under climate change. Indeed, models showed poor performance in precisely the areas of environmental and geographical space that would be of greatest interest: those areas where range contraction or expansion occurred.

ACKNOWLEDGMENTS

This work was funded by a U.S. Geological Survey Mendenhall Postdoctoral Fellowship to H.R.S. Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We thank the many coordinators and volunteers who have participated in the Breeding Bird Survey since its establishment. Our manuscript was improved by comments from J. Hoeting, C. Yackulic and two anonymous referees.

    DATA ACCESSIBILITY

    Data are publically available via ScienceBase, with the following DOI: https://doi.org/10.5066/F7NS0S4R.

    BIOSKETCHES

    Helen R. Sofaer is a Mendenhall Postdoctoral Fellow with the U.S. Geological Survey Fort Collins Science Center. A major focus of her research is understanding organisms' responses to ongoing global change.

    Catherine S. Jarnevich is a Research Ecologist with the U.S. Geological Survey Fort Collins Science Center. Her research focuses on understanding patterns of invasion from local to global scales.

    Curtis H. Flather is a Research Ecologist with the U.S. Department of Agriculture Forest Service at the Rocky Mountain Research Station. His research is focused on understanding the response of biodiversity to changing climate, land use, natural disturbance and land management activities to support resource planning activities within the agency.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.