Volume 30, Issue 9 e13906
RESEARCH ARTICLE
Open Access

Predictor importance in habitat suitability models for invasive terrestrial plants

Demetra A. Williams

Corresponding Author

Demetra A. Williams

Student Contractor to the U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA

Correspondence

Demetra A. Williams, U.S. Geological Survey, Fort Collins Science Center, 2150 Centre Ave Bldg C, Fort Collins, CO 80526, USA.

Email: [email protected]

Search for more papers by this author
Keana S. Shadwell

Keana S. Shadwell

Student Contractor to the U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA

Search for more papers by this author
Ian S. Pearse

Ian S. Pearse

U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA

Search for more papers by this author
Janet S. Prevéy

Janet S. Prevéy

U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA

Search for more papers by this author
Peder Engelstad

Peder Engelstad

Graduate Degree Program in Ecology, Colorado State University in Cooperation with the U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA

Search for more papers by this author
Grace C. Henderson

Grace C. Henderson

Student Contractor to the U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA

Search for more papers by this author
Catherine S. Jarnevich

Catherine S. Jarnevich

U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA

Search for more papers by this author
First published: 22 July 2024
Citations: 6

Demetra A. Williams and Keana S. Shadwell should be considered joint first authors.

Editor: Fernanda Thiesen Brum

Abstract

Aim

Due to the socioeconomic and environmental damages caused by invasive species, predicting the distribution of invasive plants is fundamental for effectively targeting management efforts. A habitat suitability model (HSM) is a powerful tool to predict potential habitat of invasive species to help guide the early detection of invasive plants. Despite numerous studies of the predictors used in HSMs, there is little consensus about the most appropriate predictors to use in creating ecologically realistic predictions from HSMs.

Location

The contiguous United States.

Methods

We explore 220 invasive terrestrial plant species' existing HSMs constructed with consistent modelling algorithms, background generation methods, predictor resolution, and geographic extent, and calculate the relative importance of predictors for each species. We sort predictors into eight groups (topography, temperature, disturbance, atmospheric water, landscape water, substrate, biotic interaction, and radiation) and compare the importance of predictor groups by plant lifeforms and phylogenetic relatedness.

Results

Human modification and minimum winter temperature were generally the two highest performing individual predictors across the species studied. The highest-performing predictor groups were disturbance, temperature, and atmospheric water. Across lifeforms, there were minimal differences in the influences of predictor groups, although woody plant models exhibited the largest differences in predictor importance when compared with non-woody plant models. Additionally, we found no significant relationship between the importance of predictor groups and phylogenetic relatedness.

Main Conclusions

This study has implications for informing predictor selection in invasive plant HSMs, leading to more reliable and accurate models of invasive terrestrial plants. Our results emphasize the need to critically select predictors included in HSMs, with special consideration to temperature and disturbance predictors, to accurately predict habitat of invasive plant for detection and response of invasive plant species. With more accurate predictions, managers will be better prepared to address invasive species and reduce their threats to landscapes.

1 INTRODUCTION

The habitat requirements and geographic distributions of species are of fundamental interest to both basic and applied ecologists. The geographic distributions of species are driven by a myriad of environmental and biological dimensions which ecologists define as a species' niche (Hutchinson, 1957). Plant species' niches are known to be defined by a multitude of factors such as climate, soil nutrients, radiation, disturbance, and biotic interactions (Mod et al., 2016). Over the previous decades, fuelled by more powerful computational power and algorithms, correlative habitat suitability models (HSMs; also referred to as species distribution models or environmental niche models) have become some of the most powerful tools for describing and predicting habitat requirements and geographic distributions of species (Anderson et al., 2011; Guisan et al., 2017). These tools establish correlative relationships between a species' occurrence and environmental conditions in the location of occurrence and may be used to predict the suitability of environments in locations that have not been surveyed or where a species has not yet become established (Guisan et al., 2017). The use of HSMs has increased dramatically in recent years, and HSMs are routinely created for numerous species to better understand conservation concerns (e.g., where there are hotspots of rare species), invasion risk, and the effects of climate change (Chai et al., 2016; Coulter et al., 2022; Fyllas et al., 2022).

Many of the real-world scenarios in which HSMs are used (e.g., defining the habitat of an endangered species or targeting efforts in invasive species control) require substantial model credibility (Guisan et al., 2013). There is likely a tradeoff between the ability to model suitable habitat for a great number of species versus the ability to establish an accurate and credible distribution model for any one species (Sofaer et al., 2019). Part of this tradeoff can be attributed to the thoroughness with which the habitat suitability for a species is statistically described. Past work has demonstrated that the environmental data used to model habitat suitability are fundamental for model accuracy, and that the most influential predictors (also referred to as environmental predictors, environmental predictor variables, or covariates) vary among taxa (Bradie & Leung, 2017). However, handpicking predictors that will be initially offered to individual species' HSMs is time-consuming, prone to human-bias, and not feasible when creating HSMs for a multitude of species.

HSMs are of particular interest to invasion biologists and land managers for providing an understanding of invasive species' potential to establish in areas where they have not yet been documented. Understanding the potential spread of invasive species is of interest due to the immense economic and ecological damages they cause (Rai & Singh, 2020; Turbelin et al., 2023). With this understanding, managers can better target efforts to minimize the spread of invasive plants and reduce their effects. Currently, there are large-scale efforts to create HSMs for invasive plant species of interest to resource managers in the United States, and these models have guided the early detection of invasive plant populations, aided in the creation of watch lists for emerging invaders, and informed response efforts (Cook et al., 2019; Engelstad et al., 2022; Jarnevich, Engelstad, et al., 2023). As with any model, the accuracy and reliability of HSMs for invasive plant species are limited by the quality and relevance of their input data. Thus, understanding the relative importance of including certain predictors is essential for producing more accurate and reliable HSMs, and therefore improving invasive species detection efforts (Bradie & Leung, 2017). Invasive plant species tend to have generalist life histories and occupy a broader range of suitable environmental states than native species (Evangelista et al., 2008). However, invasive species also have diverse traits and can occupy distinct habitat types, indicating that some predictors may be more important than others in shaping their distributions.

Many HSM studies report metrics that reflect the impact that individual predictors (i.e., predictor importance metrics) have on their models, but there is considerable variation in the metrics reported due to differences in modelling algorithms, taxa, spatial resolution, study area extent, importance metrics considered, and predictors (Bucklin et al., 2015; Chauvier et al., 2021; Guevara et al., 2018; Roe et al., 2022). Due to this variability, literature attempting to understand the contributions of predictors across studies remains limited. For example, Bradie and Leung (2017) examined predictor importance for 227 HSM studies across multiple taxa but were limited to studies using the MaxEnt algorithm (Phillips et al., 2017) due to its prevalence and accessibility of predictor importance metrics. HSM researchers largely agree that climatic predictors are among the most influential predictors for these models (Bradie & Leung, 2017; Bucklin et al., 2015; Chauvier et al., 2021). However, others contend that models can be improved by including more edaphic predictors (Walthert & Meier, 2017), land use predictors (Fournier et al., 2017), or other predictors that are physiologically relevant to the modelled species (Austin, 2002; Gardner et al., 2019; Guevara et al., 2018; Mod et al., 2016). The number of predictors and diversity of predictor groups (e.g., predictors related to water, temperature, or disturbance) offered to models have also varied, and while some argue for maximizing the number of predictors offered to certain algorithms (such as Bradie and Leung (2017) in MaxEnt algorithms), others recommend the use of predictors across more categories (Mod et al., 2016), and some are inclined to offer fewer predictors to models for the sake of parsimony (Bucklin et al., 2015; Mod et al., 2016). Clearly, there is no consensus regarding the number of predictors and the predictor types that should be initially considered for use in invasive plant habitat suitability models. It is generally recommended that modellers tailor predictors to the species being modelled by considering species habitat and natural history (Sofaer et al., 2019). However, the predictors initially offered for inclusion prior to the predictor selection step of modelling could be better delineated through a more thorough understanding of predictor importance in these models. This could reduce the time modellers spend obtaining predictors by prioritizing the predictor groups that could be more important to offer prior to the predictor selection process.

If the importance of predictors varies considerably among species, it is possible that their variation is predictable based on species traits. Differing plant life forms exhibit different functional traits and therefore may require different environmental conditions (Cheng et al., 2022). For example, short-lived non-native grasses and forbs were more abundant following wildfires in the western United States, whereas non-native shrubs and trees did not benefit from fire, indicating that disturbance predictors such as burn frequency may differentially influence habitat suitability for different non-native lifeforms (Prevéy et al., 2024). Additional considerations may be that phylogenetic niche conservatism, i.e., the tendency of closely related species to have similar environmental requirements (Elliott & Davies, 2017; Warren et al., 2008), would result in more similar sets of predictors being useful in describing habitat in HSMs when species are closely related. If there are patterns in predictor importance based on lifeform or phylogeny, this information could potentially be used to improve predictor selection criteria for non-native species invading new regions (Liu et al., 2022; McCune et al., 2020; Morales-Castilla et al., 2017; Synes & Osborne, 2011).

In this study, we explore predictor importance in 220 previously created HSMs for invasive terrestrial plants within the contiguous United States, offering consistency in modelling algorithms, methods, taxa, and objectives to the predictor conversation. These HSMs used a consistent modelling methodology (e.g., modelling workflow, modelling algorithms, and model validation procedures), allowing us to compare which predictors tend to be more important across species and species groups for this set of models. Our main objectives are threefold: (1) identify the top performing predictors across all species and within each plant lifeform group, (2) identify the groups of predictors that are most important for each plant lifeform, and (3) discern whether there are patterns of predictor importance based on species' phylogenetic relatedness.

We hypothesize that disturbance predictors will be more important predictors for the distribution of non-native short-lived plant species because these fast-growing species may be able to establish quickly following disturbances (Montesinos, 2022). Precipitation predictors may also be important for these species, since gradients in precipitation can strongly influence native plant competition with invasive annuals (Chambers et al., 2007). We also predict that climate and soil predictors will be the more important predictors for long-lived plant species. Cold temperatures and precipitation are well known determinants of woody plant distributions (Boucher-Lalonde et al., 2012; Lembrechts, Lenoir, et al., 2019), and longer-lived, deeper-rooted lifeforms, such as shrubs and trees, may be more limited by suitable soil conditions than short-lived herbaceous species (Walthert & Meier, 2017). Longer-lived woody species are also more likely to persist for longer periods in specific habitat types and thus the relationships between these environmental variations and the species may be more detectable through habitat suitability modelling (Hanspach et al., 2010; Syphard & Franklin, 2010). Finally, we predict that HSMs for more closely related non-native species will exhibit more similar sets of important predictors (Morales-Castilla et al., 2017).

2 METHODS

2.1 Modelling process

All data for this project were obtained from 220 correlative HSMs of invasive terrestrial plant species (Jarnevich, LaRoe, et al., 2023). We provide a brief explanation of the modelling process implemented by Jarnevich, LaRoe, et al. (2023) relevant to our analyses in this section as well as in Section 2.2, but for specific details on the modelling process, see Young et al. (2020), Jarnevich, Engelstad, et al. (2023), and Appendix S3. All models were created using consistent methodology, allowing robust comparisons across species' models which used five algorithms, including boosted regression trees (BRT; Elith et al., 2008), generalized linear models (GLM; McCullagh & Nelder, 1989), multivariate adaptive regression splines (MARS; Elith & Leathwick, 2007), maximum entropy models (MaxEnt; Phillips et al., 2017), and random forests (RF; Breiman, 2001). Species modelled were limited to non-native terrestrial plants that are of concern to land managers and believed to be important to monitor and control. Based on the duration and growth habit designated by the USDA Plants Database, we assigned each species to a lifeform-duration class, hereby referred to as “lifeform” (USDA NRCS, 2019). These classes included tree, shrub, vine, long-lived forb, short-lived forb, long-lived graminoid, and short-lived graminoid (for details see Appendix S3).

2.2 Predictors and predictor importance

All models started with a suite of 53 predictors (Appendix S1), representing atmospheric water, biotic interactions, disturbance, landscape water, solar radiation, substrate, temperature, and topography, at the extent of the contiguous United States (Table 1 and final column of Appendix S1). Including a wide range of predictors – both climatic and non-climatic – to characterize habitat in invasive plant HSMs has been established in HSM literature as a best practice (Chauvier et al., 2021; Dubuis et al., 2013; Gardner et al., 2019; Guevara et al., 2018). Jarnevich, LaRoe, et al. (2023) generated this predictor suite with that consideration in mind. During the predictor selection process, Jarnevich, LaRoe, et al. (2023) selected predictors for each species' model based on the species' natural history (e.g., winter annual) and the habitat of the invaded region. They removed one of any pair of predictors with a correlation coefficient >.7 (maximum of Pearson, Spearman, or Kendall; Dormann et al., 2013) to limit covariation. They also removed predictors that did not make ecological sense for a particular species (e.g., tree cover when modelling tree species, as it can act as a confounding variable; Bradley et al., 2012). Therefore, each species had a different combination of predictors used for model fitting. Jarnevich, LaRoe, et al. (2023) also ensured a ratio of at least 10:1 of occurrence points to predictors (Hosmer & Lemeshow, 2000).

TABLE 1. Description of predictor groups and examples of individual predictors within each group used in habitat suitability models to predict potential habitat of invasive species.
Predictor group Description Number of predictors Examples
Temperature Related to air temperature 7

Annual temperature range

Maximum summer temperature

Atmospheric Water Includes precipitation, evapotranspiration, and water deficit 22

Precipitation of the warmest quarter

Potential water deficit of spring

Evapotranspiration of summer

Disturbance Related to human disturbances 2

Burn frequency

Human modification index

Landscape Water Related to surface water 6

Normalized difference moisture index (NDMI)

Annual water recurrence

Substrate Predictors that describe soil or substrate content 9

Percent clay

Mean depth to bedrock

Topography Related to topography or elevation 4

Distance to water

Multi-Scale Topographic Position index (mTPI)

Biotic Interaction Describe interaction with other plant species 2

Percent bare ground

Mean tree cover

Radiation Related to solar energy 1 Continuous Heat-Load Index
  • Note: For full descriptions of all 53 predictors, see Appendix S1.

During the modelling process, predictor importance was calculated for each algorithm individually by randomly permuting the values of the predictor between the presence and background locations and comparing the area under the curve (AUC) values with and without permutation. The metric for predictor importance hereafter is referred to as ΔAUC (the change in AUC), and predictors with greater importance in a model are reflected by a larger ΔAUC value, as the more influence a predictor has on a model is expected to result in a greater change in model performance given permutation.

2.3 Analyses of predictor importance

We calculated the average ΔAUC of each predictor for each species across all model algorithms and background methods. Three of the five algorithms contained internal predictor reduction (BRT, MARS, and GLM). When a predictor was offered to the algorithm but was subsequently dropped by that algorithm, the ΔAUC of that predictor for that algorithm was assigned a value of 0. The MaxEnt and RF algorithms retained every predictor offered to the model; therefore, any predictor offered to a model had a minimum of four inclusions (two background methods for MaxEnt and RF) and a maximum of 10 inclusions.

We reported our results in two ways: ranked data and percent contribution. We calculated ranked data by ordering the predictors from highest to lowest averaged ΔAUC for each species and assigning a value between 1 and n, where n is the number of predictors offered to the algorithm. For example, a species with 20 predictors offered to the algorithms for model fitting had a value of 20 assigned to the predictor with the highest ΔAUC, and a value of 1 to the predictor with the lowest ΔAUC. Because ΔAUC values were reported to the seventh decimal place, ties only occurred between 2 predictors for 3 species; in these circumstances, the predictors were ranked alphabetically. We then divided these values by n so that each predictor's ranked score was comparable between species, resulting in the highest performing predictor (i.e., the single most important predictor in the model) having a ranked value of 1, and the lowest performing predictor having a ranked value near 0.

We calculated the percent that a single predictor's averaged ΔAUC contributed to the summed total ΔAUC of all predictors offered to the species' models, referred to as a predictor's percent contribution. We assigned each of the 53 predictors to a predictor group to account for seasonality of predictors (see Appendix S1 for more information on seasonality of predictors) and correlations among predictors within the same predictor group (Table 1). For each species, we then summed the percent contribution within predictor groups. Additionally, this approach accounted for the fact that a different set of predictors may have been offered to different species' models due to different predictors exhibiting correlations, although all algorithms for the same species were offered the same set of predictors. This percent contribution method allowed us to better compare predictors across species. We also examined trends of algorithm ΔAUC across predictors by calculating the average ΔAUC of each algorithm for each predictor. As with the average ΔAUC, for circumstances in which a predictor was offered to the algorithm but was subsequently dropped by that algorithm, the ΔAUC of that predictor for that algorithm was assigned a value of 0. We calculated assessment metrics of all predictors for each algorithm, including ranked data (mean and maximum rank) and mean, minimum, & maximum ΔAUC. Additionally, we fit a regression model in R version 4.3.0 using the lm() function in the stats package (R Core Team, 2023) to compare differences across algorithms and their ΔAUC values across species.

We tested for differences in the percent contribution of each predictor group across lifeforms using the adonis2 function, a non-parametric permutational multivariate analysis of variance (MANOVA) using distance matrices in the R package vegan (Oksanen et al., 2022; version 2.6-6.1). When we found statistically significant differences among groups, we used a non-parametric ANOVA – the Kruskal–Wallis test – to delineate the predictor groups causing these differences. Then, we used the Nemenyi function in the R package DescTools (Signorell, 2023; version 0.99.49), a non-parametric equivalent to the Tukey HSD test, to identify which lifeforms had statistically significant means driving the difference within predictor groups. To visualize the distribution of points across multidimensional space, we conducted a principal component analysis (PCA) that assessed whether lifeform groups clustered out across principal components of predictor groups.

To understand the relationship between phylogenetic similarity and similarities in predictor group importance in the 220 species' HSMs, we created both a phylogenetic distance matrix and a predictor group distance matrix (or a niche dimensionality matrix). We matched phylogenetic data from Zanne et al. (2014) to our list of 220 species to generate the phylogenetic distance matrix. Species matched at the genus or family levels were grafted onto the phylogeny as a polytomy, and the phylogeny was ultrametricized using rate smoothing. We generated the phylogenetic distance matrix using the ape package in R (Paradis & Schliep, 2019; version 5.6.2). We calculated the Euclidean distance between all 220 species' percent contribution for each predictor group using 1000 repetitions to generate the niche dimensionality matrix. We then performed a mantel test on the two matrices to discern the significance of the similarity between two species' phylogenetic distance and niche dimensionality distance values using the function “mantel.rtest” in the R package ade4 (Dray & Dufour, 2007; version 1.7-22).

3 RESULTS

3.1 Summary of models

Our analysis represented models for a diverse group of invasive plants: 26 trees, 30 shrubs, 18 vines, 54 long-lived forbs, 41 short-lived forbs, 29 long-lived graminoids, and 22 short-lived graminoids. The number of predictors offered for model fitting for each species ranged from six to 31 predictors, with a mean of 22.4 predictors for a species. The number of predictor groups offered for model fitting for each species ranged from four to eight, with an average of 7.74 groups included for a species. The number of predictors used from each predictor group, averaged across all species, ranged from 0.85 (radiation) to 4.70 (landscape water). After the predictor selection stage, the most used predictors across all species for model fitting were human modification (220 species, 100%), minimum winter temperature (217 species, 98.64%), distance to water (213 species, 96.82%), and multi-scale topographic position index (mTPI) (210 species, 95.45%).

3.2 Ranked analyses

Examination of algorithm-specific predictor importance data, including ranked data and mean/minimum/maximum ΔAUC values, revealed similar trends across algorithms (Appendix S2). For example, both human modification and minimum winter temperature were ranked as the two most influential predictors and had the highest mean ΔAUC across algorithms (Appendix S2). Although there are some statistical differences between algorithms across species (linear regression, n = 24,670, p < .01), these differences are accounted for by averaging the ΔAUC values across algorithms (Appendix S2). Therefore, we chose to represent all our findings using the averaged ΔAUC metric in terms of ranked data and percent contribution.

Human modification was the most important predictor for the greatest number of species (102 of 220 species, 46.4%). Minimum winter temperature was the second most important predictor when it was offered to a species' model (59 of 217 species, 26.8%). Human modification was the highest performing predictor for trees, shrubs, vines, and long-lived forbs (Table 2). Minimum winter temperature was the highest preforming predictor for short-lived graminoids. These two predictors were nearly equally the top performing predictor for short-lived forbs and long-lived graminoids.

TABLE 2. The highest performing predictor(s) used in habitat suitability models to predict potential habitat of invasive species based on ranked analyses for each lifeform.
Lifeform Highest ranked predictor(s) Number of species' models within each lifeform it was #1 in % of species' models within each lifeform it was #1 in
Tree gHM 18 69.2%
Shrub gHM 16 53.3%
Vine gHM 13 72.2%
Short-lived forb gHM 17 41.5%
TMIN 13 31.7%
Long-lived forb gHM 23 42.6%
Short-lived graminoid TMIN 11 50.0%
Long-lived graminoid gHM 10 35.7%
TMIN 9 32.1%
  • Note: Data used to generate this table were obtained from all species' data (n = 220). The human modification index predictor is denoted as gHM and the mean minimum winter temperature predictor is denoted as TMIN.

The lowest performing predictors when offered to species' models (i.e., the predictors that had the lowest ΔAUC for the greatest number of species) were water recurrence (51 of 75 species, 68%) and burn frequency (50 of 198 species, 25.25%). Water recurrence was the worst performing predictor for trees, short-lived forbs, and long-lived graminoids. Burn frequency was the lowest performing predictor for short-lived graminoids and long-lived forbs. Both predictors were equally poor for shrubs. No single predictor was consistently ranked last for the 18 vine species.

3.3 Percent contribution analyses

To compare predictor group importance across lifeforms, we used the percent contribution metric to quantify predictor groups (i.e., predictor group is numeric, lifeform is categorical; Appendix S4). The three most important groups of predictors across all species based on percent contribution were disturbance, temperature, and atmospheric water, which varied in importance depending on plant lifeform (Figure 1). Almost all predictor groups had greater than 5% contribution, indicating predictors within most predictor groups contributed a sizeable amount to the HSM. Disturbance predictors were the most important group for woody plants (trees, vines, and shrubs). Temperature was the most important predictor group for graminoids (both long- and short-lived). Disturbance, temperature, and atmospheric water predictor groups were relatively evenly important for forb species (both long- and short-lived). The radiation group contributed the least to the models compared with other groups (<5%). There is a significant difference in the percent contribution of predictor groups across lifeforms, although this relationship was weak (MANOVA Test, n = 220, p = .001; Figure 2).

Details are in the caption following the image
Heatmap of the mean percent contribution of each predictor group for each lifeform category in habitat suitability models that predict potential habitat of invasive species. Percentage of total predictor importance is the summed predictor importance for individual predictors within the group, averaged within each lifeform class. High numbers indicate high predictor importance of that predictor group for the corresponding lifeform. Appendix S4 contains species specific predictor group importance data.
Details are in the caption following the image
A principal component analysis (PCA) of the 1st and 2nd principal component axes of the relationship between lifeforms and the percent contribution of each predictor group for each species' habitat suitability model. Each point represents an individual species (n = 220). There is a significant difference between predictor percent contribution across lifeform (p < .01, adonis2) although this relationship is weak, as seen by the overlapping ranges of lifeforms in this visualization.

Predictor group percent contributions differed by lifeform and were generally driven by markedly different structural characteristics, specifically when comparing woody plants (trees, shrubs, vines) with non-woody plants (forbs and graminoids) (Table 3). Within landscape water and substrate predictor groups, there were no differences among lifeform percent contribution (Kruskal–Wallis Test, n = 220, p > .05; Table 3). Trees and forbs differed in percent contributions of atmospheric water, disturbance, and topography (Nemenyi Test, n = 220, p < .05; Table 3). Shrubs differed from forbs in the percent contribution of the atmospheric water predictor group. In the radiation predictor group, trees, vines, and shrubs all differed from graminoids. Trees and graminoids also differed in biotic interaction and disturbance predictors. Interestingly, forbs and graminoids also exhibited significant differences within two predictor groups: for biotic interaction predictors, short-lived graminoids differed from long-lived forbs. For temperature predictors, long-lived graminoids differed from both short and long-lived forbs.

TABLE 3. Results of ANOVA and Nemenyi test on predictor groups and lifeforms exploring the drivers of the differences between percent contribution of predictor groups and lifeform in habitat suitability models of invasive plant species.
Predictor group ANOVA result (α = .05) Nemenyi test significant pairs (p < .05)
Atmospheric water Significant

Shrub & short-lived forb

Tree & short-lived forb

Biotic interaction Significant

Short-lived graminoid & long-lived forb

Tree & short-lived graminoid

Disturbance Significant

Tree & long-lived forb

Tree & long-lived graminoid

Tree & short-lived graminoid

Landscape water Not significant
Radiation Significant

Tree & long-lived graminoid

Vine & long-lived graminoid

Shrub & short-lived graminoid

Tree & short-lived graminoid

Vine & short-lived graminoid

Substrate Not significant
Temperature Significant

Long-lived graminoid & long-lived forb

Long-lived graminoid & short-lived forb

Topography Significant

Tree & long-lived forb

Tree & short-lived forb

  • Note: ANOVA was conducted using Kruskal–Wallis test for non-parametric data to identify if there were differences among lifeforms percent contribution to that predictor group. Nemenyi test is a non-parametric TukeyHSD test that was used to identify specific lifeforms driving the differences within a predictor group's total precent contribution.

HSMs of closely related plant species did not have more similar sets of important predictors than did distantly related plant species (Mantel Test; n = 220, r = .022, p = .18).

4 DISCUSSION

We analysed predictor importance across 220 invasive plant species' HSMs at the extent of the contiguous United States. The predictors that most substantially defined invasive plant habitat suitability were human modification and minimum winter temperature. There were significant differences in predictor group importance for different lifeforms; however, those differences were relatively small and generally related to differences in structural characteristics and functional traits among lifeforms, such as when comparing woody plant lifeforms (e.g., trees and shrubs) with non-woody plant lifeforms (e.g., forbs and graminoids). Contrary to our predictions, phylogenetic distance among the studied species did not explain differences in predictor group importance across species. However, this lack of phylogenetic signal in predictor importance aligns with other research showing limited relationships between phylogenetic relatedness and environmental niche preferences or species traits (Losos, 2008; Zhang et al., 2017).

4.1 Comparison to existing literature

Traditionally, climatic predictors have been considered the most important to include in an HSM, with many studies choosing to only include climate predictors (Abdulwahab et al., 2022; Beaumont et al., 2005). However, in this analysis, we found that the disturbance predictor human modification index was the top performing predictor in most of the analysed invasive plant species' HSMs, with the climate predictor minimum winter temperature as the second highest performing predictor. The patterns in predictor importance by lifeform did not align with our hypotheses, as human modification index was a more influential predictor for trees and other long-lived lifeforms and minimum winter temperature was a more influential predictor for short-lived species than disturbance or precipitation predictors. Bradie and Leung (2017) and Gardner et al. (2019) argued that solely using climatic predictors is insufficient for creating accurate habitat suitability models, and the results of this analysis further support this argument. These results highlight the influence of human modification for the distribution of non-native plants. The importance of disturbance in the habitat suitability of invasive plants is supported by invasion ecology theory. Lozon and MacIsaac (1997) found that approximately two thirds of 257 plant species' invasions involved some sort of disturbance (e.g., human activities, animal activities, agriculture, soil disturbance), otherwise known as disturbance-facilitated invasion (Early et al., 2016). Many invasive plant species in the United States are found in habitats that are heavily anthropogenically altered, as disturbance tends to pose as a catalyst of invasion (Simberloff et al., 2012).

These results also support existing literature that emphasizes the importance of minimum winter temperature in plant distribution in the northern hemisphere. Qian et al. (2022) found that minimum winter temperature limited the distribution and richness of trees in the North America, and Jorgensen and Renz (2021) found minimum winter temperature to be one of the most influential predictors of invasive plant habitat suitability in Wisconsin. Thus, one of the most important factors limiting the potential niche of non-native plant species in the contiguous United States, minimum temperature, is the same limiting factor that shapes the distribution and evolution of most organisms on Earth (Bennett et al., 2021; Box, 1981).

The substrate, landscape water, topography, biotic interaction, and radiation predictor groups all had limited importance in the invasive plant HSMs analysed. Additionally, the burn frequency predictor included in the disturbance predictor group was one of the least important predictors for many species, contrary to our hypotheses. However, multiple other studies have highlighted the importance of including predictors beyond simple climate predictors (Bradie & Leung, 2017; Gardner et al., 2019; Walthert & Meier, 2017). Our conflicting findings may make ecological sense because many non-native invasive species are generalists, so perhaps variation in predictors such as soil type and topography, especially at the continental scale, are not as important as they are for more specialized groups of species (Evangelista et al., 2008; Stigall, 2012). Another possibility is that the resolution of the non-climatic predictors is too coarse to be ecologically relevant. For example, averaging topographic diversity or percent soil clay content across a 90-m grid cell – the resolution of the models analysed in this study – could inaccurately characterize the actual topography or soil conditions at the locations of species' occurrences. Our results do align with those of other HSM studies, however, in emphasizing the importance of including climatic predictors (e.g., temperature, precipitation, evapotranspiration) in HSMs (Bucklin et al., 2015; Chauvier et al., 2021).

4.2 Implications for HSM predictor selection

Our results emphasize the need to include predictors that reflect anthropogenic disturbance as well as traditional climatic predictors in invasive plant HSMs. Although the disturbance predictor group was overall very important across the HSMs analysed, its influence was primarily driven by the human modification predictor; burn frequency was also included in the disturbance category but was one of the least important predictors for many species. The human modification predictor utilized by these models is a measure of 13 anthropogenic stressors across 5 categories and their estimated impacts: human settlement (human population density, built-up areas), agriculture (cropland, livestock/pasture), transportation (major/minor roads, two-tracks, railroads), mining and energy production (mining, oil wells, wind turbines), and electrical infrastructure (powerlines, night-time lights) (Kennedy et al., 2019). Edaphic, topographic, and surface water predictors should also be included, as our results indicate average importance >5% for these groups, echoing other studies which have found these predictors essential in accurately predicting suitable habitat (Gardner et al., 2019). Disturbance, temperature, and atmospheric water predictor groups should be prioritized for most invasive plant species when selecting for predictors to include in modelling terrestrial invasive plant suitability. Disturbance is known as a major determinant of the success of many invasive plant species (Early et al., 2016) and it also defines a major axis of variation in native plant communities (Burkle et al., 2015; Root-Bernstein & Svenning, 2018). It will be interesting to explore whether disturbance-related predictors have a similar influence on native plant models, or whether the primacy of disturbance in delimiting suitable habitat is unique to invasive plant HSMs.

Although we emphasize the utility of incorporating predictors outside of the traditional bioclimatic predictors (such as WorldClim data; Fick & Hijmans, 2017) to reflect multiple dimensions of a plant's niche, we acknowledge that bioclimatic predictors are widely used and well established, whereas obtaining other predictor data with adequate quality may be a limitation in many habitat suitability models. We recognize the lack of diversity in high-quality predictors for other study regions or at different scales might make incorporating diverse predictors in HSMs less feasible.

There was variation in the importance of predictors among species, but this variation was weakly related to plant lifeform and not at all to plant phylogeny. As such, it would not be useful to automate predictor selection based on these attributes. This leaves habitat suitability modellers with the tradeoff of individually tailoring predictors to other known attributes of a species based on external biological information or accepting a likely loss of model performance by offering the same suite of predictors to all species. This tradeoff may be acceptable in certain situations depending on the intended use of the model outputs, such as creating HSMs for the broad-scale assessment of invasion risk at a regional level as opposed to fine resolution HSMs used to plan field surveys or create watch lists for smaller management units.

Our findings will help guide and inform predictor selection for habitat suitability models of invasive terrestrial plants to create better and more reliable HSMs. We emphasize the importance of including disturbance predictors relating to anthropogenic stressors and infrastructure in invasive species HSMs. Although the results of this study highlight the utility of prioritizing metrics of human modification, temperature, and atmospheric water for inclusion in invasive plant habitat suitability modelling, modellers should continue the best practice of selecting predictors with individual species' biological characteristics and invaded range characteristics in mind (Elith & Leathwick, 2009). With more accurate invasive species HSMs, land managers will be better equipped to detect and treat invasive species populations early for conservation efforts.

4.3 Limitations and future directions

To create the HSMs analysed in this study, Jarnevich, LaRoe, et al. (2023) followed best practices by removing highly correlated predictors (Dormann et al., 2013). Additionally, predictors for the models were selected for each species individually based on natural history knowledge of a species, such that a winter annual would have a different suite of predictors from a shrub. This resulted in a slightly different set of predictors used for each species; thus, one limitation to our analysis is that we did not offer a standard suite of predictors to each model. We chose to compare across predictor groups to account for this imbalance in suite of predictors offered to each species. Because the correlations and geographic context of each species varied, we were unable to conduct a one-to-one comparison of individual predictors. Although a future direction of this work could be to include the same predictors in all models to compare across individual predictors, this does not follow best practices of including predictors thought to be important in limiting the distribution of a specific species. Instead, one route to pursue would be to extend gaming statistics, such as Microsoft's TrueSkill™ (Bradie & Leung, 2017), to perform a one-to-one comparison of individual predictors, regardless of if they were used in all models or not, to analyse predictor importance. Additionally, the 220 invasive plant species models used in this analysis were generated using occurrence locations restricted to the contiguous United States, excluding data from species' entire global distribution, which may lead to niche truncation. As such, our findings of predictor importance are limited in scope to important predictors used to define these species' invaded ranges and not necessarily their entire geographic distribution.

The focus of this study is occurrence-based HSMs, meaning these models are predicting suitability for species occurrence, and not necessarily where a species could thrive and reach high abundance. The predictor importance in this analysis therefore demonstrates the importance of predictors in predicting invasive plant occurrence, but not necessarily in predicting where species can establish abundant populations and dominate an ecosystem. A future study could analyse predictors of HSMs that reflect plant abundance. To derive more ecologically significant conclusions from the results of this analysis (e.g., extrapolate conclusions about predictors influential in invasive plant HSMs), we would want to perform this analysis on abundance models to understand which predictors permit or hinder abundant populations (Beaury et al., 2023). Another caveat in this study is that for larger lifeform species, such as trees or shrubs, occurrence locations are typically documented where the plant is fully grown and not necessarily reflective of germination or juvenile conditions. Consequently, habitat based on adult-stage species occurrence locations may not reflect habitat suitable for germination, although this could be addressed in a future study that accounts for the life stages of plant occurrences and models habitat conditions at various growth stages.

Predictor quality could be a driving factor of these results, as the usage of predictors that do not accurately characterize the landscape would likely result in an inaccurate predictor importance metric. Predictor quality may be a limitation in this study, perhaps explaining consistently weak predictor contributions in the 220 invasive plant HSMs. Although soil composition is important for plant growth, mapped soil data may not accurately capture true conditions at occurrence sites (Driscoll & Strong, 2018; Roe et al., 2022). Additionally, soils and other types of predictors (e.g., climate) often vary across smaller scales than the resolution of the predictors used in these HSMs, which may also be important in capturing species habitat (Lembrechts, Nijs, & Lenoir, 2019). Thus, predictor quality and scale likely influenced predictor importance along with true underlying habitat relationships.

Although the focus of this study was to understand the patterns of predictor importance as it relates to HSMs, a future direction of this work could be focused on the lowest performing predictors (e.g., water recurrence and burn frequency in the case of this study), assessing the impact of removing low performing predictors or predictor groups on species' HSMs.

5 CONCLUSION

Through the analyses of predictor importance in HSMs for invasive plants, we collected information that can guide future modelling decisions. The findings of this study illustrate the importance of including disturbance related predictors for invasive terrestrial plants in addition to climatic predictors, emphasizing predictors that are most important to include when modelling habitat suitability for invasive plants. Regardless of lifeform, we show that disturbance, temperature, and atmospheric water predictors are the most important predictor groups for occurrence-based HSMs of invasive plant species in the contiguous United States. Our research can guide predictor selection in the future creation of HSMs for invasive terrestrial plants, leading to more proactive and targeted invasive species management and conservation efforts.

ACKNOWLEDGEMENTS

This research was funded by the USGS Biological Threats & Invasive Species Program and the USGS Science Analytics and Synthesis Program. We would like to thank former U.S. Geological Survey Fort Collins Science Center student services contractors Jillian LaRoe and Brandon Hayes for their contribution in modelling species for the INHABIT web tool. Thanks to Nathan Teich for reviewing this article and providing comments. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

    CONFLICT OF INTEREST STATEMENT

    The authors declare that there are no competing interests.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are openly available in a U.S. Geological Survey data release at https://doi.org/10.5066/P9V54H5K.

    BIOSKETCHES

    The authors of this study are biological technicians and ecologists at the U.S. Geological Survey and Colorado State University. Our research focuses on the biogeography of species invasions and on advancing the use of quantitative tools to inform invasive species prevention and management.

    Author contributions: DAW: Data curation, investigation, formal analysis, visualization, writing - original draft preparation, writing - review & editing; KSS: Data curation, investigation, formal analysis, writing - original draft preparation, writing - review & editing; JSP: Conceptualization, methodology, writing - review & editing; ISP: Methodology, investigation, writing - review & editing; PE: Data curation, methodology, writing - review & editing; GCH: Data curation, writing – review & editing; CSJ: Conceptualization, methodology, supervision, writing - review & editing.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.