Model-based mapping of assemblages for ecology and conservation management: A case study of demersal fish on the Kerguelen Plateau
Abstract
Aim
Quantifying biological assemblages and their environment is a fundamental, yet statistically challenging task in conservation ecology. Here, we use a recently developed approach called Regions of Common Profile (RCP) to quantify and map the distribution of demersal fish assemblages in an ecologically significant region of the Southern Ocean to (1) gain ecological and management insights and (2) evaluate the utility of the new method for ecoregionalization.
Location
Northern Kerguelen Plateau, Subantarctic Islands, Southern Ocean.
Methods
The RCP approach is a multispecies, model-based approach that can overcome many limitations of traditional distance-based approaches. It simultaneously groups sites with a similar composition of species and describes the patterns of variation in assemblages using environmental data, allowing the prediction of assemblages across the study region. We apply RCP to a unique dataset of demersal fish occurrences across the northern Kerguelen Plateau to model and map the distribution of assemblages and examine the representativeness of the Heard Island and McDonald Island marine reserve.
Results
We demonstrate that the RCP approach allows a direct and quantitative interpretation of the composition of assemblages as well as their environment. Further, the model reasonably predicts the occurrence of individual species across the plateau as well as the species composition of sites. We distinguish and map seven assemblages defined by depth, surface temperature and chlorophyll a. Shallow-water assemblages contain a high proportion of endemic species, while deep-water assemblages contain more cosmopolitan species. With the exception of one deep-water assemblage, assemblages were well represented within the current Heard and McDonald Islands marine reserve.
Main conclusions
The RCP is a valuable tool for classifying biological regions with a range of ecological and conservation management applications. Our results extend current ecological and biogeographic knowledge for the northern Kerguelen Plateau, and maps of the distribution of assemblages will be useful for ongoing spatial management.
1 INTRODUCTION
Biodiversity plays a vital role in maintaining properly functioning and productive marine ecosystems (Palumbi et al., 2008). Consequently, emphasis on the conservation and sustainable use of biodiversity has grown, aided by international initiatives such as the Convention on Biological Diversity (United Nations, 1992) and the Commission for the Conservation of Antarctic Marine Living Resources (CCAMLR). Fundamental to achieving biodiversity conservation across large scales is identifying where different groups of species are found. This increases our ecological understanding and allows managers to define and prioritize areas for conservation, target monitoring efforts and manage human activity.
Delineating and mapping biological assemblages has been termed bioregionalization, ecoregionalization and ecological mapping (De Broyer & Koubbi, 2014; Koubbi et al., 2011; O'Hara, 2008) and are challenging to achieve in a quantitative, robust way. Quantitative regionalizations that use biological data commonly rely on dissimilarity metrics calculated between pairs of sites. Sites are usually classified into groups representing assemblages using a clustering technique (e.g., hierarchical clustering). These groups are then related to environmental variables using a different model (referred to here as a two-stage analysis) to allow prediction into geographic space where only environmental data exist (e.g., Rubidge, Gale, and Curtis, 2016). Sometimes, the pairwise dissimilarities themselves are related to environmental variables (e.g., generalized dissimilarity modelling; Ferrier and Guisan, 2006; Koubbi et al., 2010). There are two drawbacks of these approaches: first, the use of pairwise dissimilarities means that analyses cannot be directly interpreted (i.e., which species belong to what groups and how do these groups relate to the environment) and the inability to accommodate the underlying structure of most biological data can result in distorted inferences (amongst other issues outlined in Warton, Wright, and Wang (2012)). Second, it is not clear how to propagate uncertainty in the classification stage through the second stage of the analysis and the final distribution map, which is fundamental for informing conservation outcomes (Barry & Elith, 2006).
Model-based approaches that model multiple species directly and simultaneously offer some solutions to these issues and are at the forefront of modern community ecology (see Warton et al. (2015b) and Warton, Foster, De'ath, Stoklosa, and Dunstan (2015a) for reviews). These approaches are based on multivariate generalized linear models (GLMs) and use latent (unobserved) factors to determine groupings. Model-based approaches are attractive because they: (1) remove the dependence on dissimilarity measures; (2) allow direct prediction of assemblages into unsampled locations; and (3) have diagnostics for the appropriateness of the analysis.
Here, we focus on a new multispecies, model-based method, called Regions of Common Profile (RCP; Foster, 2013) and its application to ecoregionalization. RCP aims to simultaneously group sites with a similar composition of species (the site's species profile) and describe the patterns of variation in assemblages using environmental data (Foster, Givens, Dornan, Dunstan, & Darnell, 2013). RCP advances previous model-based approaches that either classify sites or species without considering their environment (Pledger & Arnold, 2014) or classify species with respect to their environment (Dunstan, Foster, & Darnell, 2011). This makes RCPs one of the first truly one-stage methods for spatially delineating biological assemblages, meaning that (1) direct inference is possible in terms of sites, species and their environment simultaneously, and (2) uncertainty in the final ecoregionalization can appropriately be quantified from the one model. RCP has the potential to provide important ecological insights about assemblages and their drivers as well as a valuable contribution to the conservation management of biodiversity. However, due to its recent development, the RCP approach has had limited application to complex, real-world datasets and questions. We apply the RCP method to quantify and map the distribution of demersal fish assemblages in an ecologically significant region of the Southern Ocean, the northern Kerguelen Plateau, to gain ecological and management insights and to evaluate the utility of the method for ecoregionalization.
The Kerguelen Plateau is a large isolated outcrop in the Indian Sector of the Southern Ocean (Figure 1a) and our study region comprises the section north of the Fawn Trough (Figure 1b). The plateau forms an obstacle to the Antarctic Circumpolar Current (ACC), affecting regional circulation and resulting in complex oceanographic patterns (Park & Vivier, 2011). It is a highly productive region that supports a diversity of marine organisms and is an important foraging area for marine birds and mammals (Hindel et al., 2011; Thiers, Delord, Bost, Guinet, & Weimerskirch, 2016). The uniqueness and ecological importance of this region are formally recognized in the World Heritage Listing of the Heard and the McDonald Islands and ongoing spatial management that aims to conserve biodiversity and habitats (Commonwealth of Australia, 2014; Koubbi et al., 2016).

Demersal fish are a key component of the Kerguelen Plateau ecosystem. A relatively high proportion of fish species is endemic to the Plateau (Duhamel, Gasco, & Davaine, 2005), and several species, such as the Patagonian toothfish (Dissostichus eleginoides) and mackerel icefish (Champsocephalus gunnari), are of commercial importance (Duhamel & Williams, 2011). The Northern Plateau is divided into two zones; the northern section is administered by France and the southern section by Australia. Key commercial and bycatch species have been well studied on a species by species basis separately for each zone (e.g., Duhamel and Hautecoeur, 2009; Welsford et al., 2011). However, there has been limited integration of assemblage-level data (Duhamel & Hautecoeur, 2009) and none across both zones, potentially limiting biodiversity-focussed conservation efforts. Here, for the first time, we draw together data on demersal fish assemblages across the entire northern plateau. Our specific aims are to: (1) use RCP to quantitatively characterize demersal fish assemblages and delineate their spatial distribution across the plateau; (2) describe the environmental characteristics associated with assemblages; (3) examine representation of assemblages within the current marine reserve at the Heard and the McDonald Islands; (4) evaluate the performance and utility of the RCP approach.
2 METHODS
2.1 Spatial management on the Kerguelen Plateau
Economic Exclusion Zones (EEZ) divide the Kerguelen Plateau into a French zone in the north and Australian zone in the south. Contemporary fisheries primarily target Patagonian toothfish (D. eleginoides) by longline and mackerel icefish (C. gunnari) by trawling. Historically, species such as Notothenia rossii and Lepidonotothen squamifrons were also targeted by trawling (Duhamel & Williams, 2011). Fisheries are managed by each nation separately but consistently with the CCAMLR ecosystem-based management approach (Constable, 2011). There are also two marine protected areas on the plateau. The Australian government established a marine reserve (IUCN category 1A) surrounding the World Heritage listed Heard and MacDonald Islands (HIMI) in 2002, the boundaries of which were expanded in 2014 to comprise an area of 71,000 km2 (Commonwealth of Australia, 2014). In 2006, the French government established a shallow-water, coastal marine reserve within 12 nm of the Kerguelen Islands (Falguier & Marteau, 2011) and in December 2016 adopted an extension which covers 387,000 km² of the EEZ (Koubbi et al., 2016). A further extension of the reserves to cover the whole French EEZ is now under evaluation.
2.2 Demersal fish data
Comparable research surveys have been undertaken in both EEZs to monitor the status of target and bycatch species. In the Australian EEZ, a Random Stratified Trawl Survey (RSTS) has been conducted annually since 1997, visiting different sites each year (Welsford et al., 2011). In the French EEZ, three RSTSs were conducted in 2006, 2010 and 2013, visiting the same sites (within ~ 10 km) as part of the POKER (POissons de KERguelen) campaigns (Duhamel & Hautecoeur, 2009). Surveys are conducted on a commercial trawling vessel and fish are identified by a trained scientific observer (RSTS) or a scientific team on board a chartered trawler (POKER). In both the RSTS and POKER surveys, otter bottom trawls were towed for ~30 min at a speed of ~3 kts at sites between 100 and 1,200 m depth. Shelf trawls were conducted during the daytime to capture icefish, which diurnally aggregate near the seafloor (icefish do not occur in deep waters). Trawls of a similar dimension were used in both EEZ; however, the mesh at the codend differed; all RSTS surveys used 50 mm mesh, 2006/2010 POKER surveys 40 mm and the 2013 POKER survey used 90 mm (Duhamel & Hautecoeur, 2009; Nowara, Lamb, & Welsford, 2014). Most RSTS surveys were undertaken in the austral autumn (in 2010, a spring survey was also conducted), and all POKER surveys were undertaken in spring. The similarities and difference between surveys are tabulated in Table S1. All fish species caught in trawls were recorded.
We used RSTS and POKER surveys conducted in 2006, 2010 and 2013, resulting in 1,197 trawls across the Plateau (Figure 1b). Species nomenclature was based on the Biogeographic Atlas of the Southern Ocean (appendix 5, Duhamel et al., 2014) and common names based on Gon and Heemstra (1990). The species list was checked for consistency in the level of identification across surveys, and some species were aggregated. These included the unicorn icefish Channichthys rhinoceratus and Channichthys velifer and the genera Paraliparis, Macrourus and Muraenolepis. Primarily pelagic species were removed from analyses. Data were presence–absence. A total of 39 species/taxa were recorded across all sites. Many species were relatively rare and do not contain much information for the ecoregionalization. We analysed the occurrences of 21 species of demersal fish that were present at more than 2% of sites (See Table S2).
2.3 Environmental data
Six environmental variables representing sea surface and nine environmental variables representing seafloor conditions likely to affect demersal fish communities (Majewski, Lynn, Lowdon, Williams, & Reist, 2013; McClatchie et al., 1997; Pakhomov, Bushula, Kaehler, Watkins, & Leslie, 2006) were obtained from Raymond (2012), Ridgeway, Dunn, and Wilkin (2002) or derived directly from satellite products (Johnson, Strutton, Wright, McMinn, & Meiners, 2013; Reynolds et al., 2007) and gridded at 0.1-degree resolution (Table 1). Environmental data were assigned to trawls using the location of the trawl mid-point.
Variable | Description | Units | Source | References | Derivation |
---|---|---|---|---|---|
Seafloor | |||||
Depth* | Seafloor depth | M | Trawl data/Polar Environmental Data | Raymond (2012) | Subsampled and interpolated from ETOP1- global topography from satellite and ship soundings, Smith and Sandwell (1997) as described in Raymond (2012) |
Slope* | Seafloor slope | Degrees | Polar Environmental Data | Raymond (2012) | As above |
Floor temp* | Average temperature near seafloor | °C | Polar Environmental Data | Raymond (2012) | Derived from depth-stratified regional oceanographic model built for the Southern Ocean originally described in Galton-Fenzi, Hunter, Coleman, Marsland, and Warner (2012) |
Floor salinity | Average salinity near seafloor | PSU | Polar Environmental Data | Raymond (2012) | Derived from depth-stratified regional oceanographic model built for the Southern Ocean originally described in Galton-Fenzi et al. (2012) |
Floor current* | Average current speed near seafloor | m/s2 | Polar Environmental Data | Raymond (2012) | Derived from depth-stratified regional oceanographic model built for the Southern Ocean originally described in Galton-Fenzi et al. (2012) |
Oxy mean | Average oxygen concentration near seafloor | mg/L | CSIRO Atlas of Regional seas (CARS), 2009 | Ridgway et al. (2002) | Depth-stratified climatology based on interpolation of all available in situ oceanographic data |
Oxy range | Seasonal range in oxygen concentration near seafloor | mg/L | CSIRO Atlas of Regional seas (CARS) 2009 | Ridgway et al. (2002) | Depth-stratified climatology based on interpolation of all available in situ oceanographic data. |
No3 mean* | Average nitrate concentration near seafloor | μmol/L | CSIRO Atlas of Regional seas (CARS), 2009 | Ridgway et al. (2002) | Depth-stratified climatology based on interpolation of all available in situ oceanographic data |
No3 range | Seasonal range in nitrate concentration near seafloor | μmol/L | CSIRO Atlas of Regional seas (CARS) 2009 | Ridgway et al. (2002) | Depth-stratified climatology based on interpolation of all available in situ oceanographic data |
Sea surface | |||||
Temp mean* | Average of daily surface temperature | °C | NOAA OI SST v2 | Reynolds et al. (2007) | Derived from daily satellite re-analysis product for the period 1982–2014 |
Temp var | Variance in daily surface temperature | °C2 | NOAA OI SST v2 | Reynolds et al. (2007) | Derived from daily satellite re-analysis product for the period 1982–2014 |
Temp dt var | Variance after removing seasonal cycle | °C2 | NOAA OI SST v2 | Reynolds et al. (2007) | Derived from daily satellite re-analysis product for the period 1982–2014 |
Chl a mean* | mean of yearly mean chl a | mg/m3 | L3 SeaWiFs data corrected for Southern Ocean | Johnson et al. (2013) | Derived from Southern Ocean corrected, daily satellite ocean colour data for the period 1997–2010 |
Chl a SD | standard deviation of yearly mean chl a | mg/m3 | L3 SeaWiFs data corrected for Southern Ocean | Johnson et al. (2013) | Derived from Southern Ocean corrected, daily satellite ocean colour data for the period 1997–2010 |
ssha sd* | Standard deviation sea surface height (indicates surface currents and fronts) | mm/km | Polar Environmental Data | Raymond (2012) | Derived from weekly satellite altimetry data (AVISO) for the period 1992–2007 |
- Variables were screened by removing highly collinear variables and * indicate variables included in the RCP model selection process.
Where environmental variables were highly correlated (Pearson's R > |0.7|, Table S3), the variable considered most proximal to the distribution of demersal fish was retained for analysis (indicated by * in Table 1). Generalized additive models (GAMs) were used as a qualitative tool to assess the shape of species’ responses to the retained environmental variables and inform whether linear, quadratic or higher order polynomials should be considered in the RCP models. Specifically, GAMs with Bernoulli sampling variation were fitted and plotted for each environmental variable and each species separately. Plots were visually inspected, and linear and quadratic terms of the following variables were considered in RCP models: depth, the log of seafloor slope, average seafloor temperature, the log of seafloor currents, average nitrate, average sea surface temperature, average chlorophyll a and the standard deviation of sea surface height (see Table 1). Interactions were not considered.
2.4 Statistical analyses
Regions of Common Profile is a statistical method that simultaneously models multispecies and environmental data at sampled sites to produce groupings (the RCP) that can be interpreted as assemblages. A Region of Common Profile is formally defined as a region of environmental space where the probability of observing a set of species is approximately constant within a region and distinct between regions (Foster et al., 2013). The membership of a site to each RCP is regarded as a latent (unobserved) variable whose expectation varies as a function of the environmental data (a mixture of experts model; Foster et al. (2013)). The dependency of RCP groups on environmental data allows us to predict the probability of each (and their associated species’ composition) in new locations where only environmental data exist. RCP models have recently been extended to account for sampling factors or artefacts (e.g., gear type) that alter the catchability of species (Foster, Hill, & Lyons, 2017). This can be useful when multiple surveys within a region are combined for analyses. Briefly, this is achieved by including terms in the model that accommodate one or more sampling factor/s and the effect of the sampling factor/s on observing each species separately. The term for a particular level of the sampling factor/s is common to all RCPs; it reduces or increases the expectation of each species by the same amount for all RCPs (Foster et al., 2017).
We modelled the occurrence of demersal fish using RCP models with a Bernoulli sampling distribution and logit link function. We used the combination of year, season and mesh size in the codend of the trawl (gear type) of each survey as the sampling factor that affects catchability (Figure 1b, Table S1). The model is estimated with maximum penalized likelihood (Foster, 2013). We used the mild default penalties from Foster et al. (2017) except that we used the moderately severe penalty of “1” on the sampling artefacts (gamma) as we suspect that they may be partially confounded with any north–south biogeographic pattern because of the differences in season and gear type between the two datasets. Increasing the gamma penalty increases the amount of biological variability assumed to come from the environment (i.e., favours RCP effects). We chose to emphasize RCP effects oversampling artefacts because some environmental conditions differ from north to south, and we may reasonably expect assemblage differences along these gradients. In Appendix S1, we explore potential seasonal effects in the data and investigate the consequences of using different gamma penalties. Multiple starts are required to avoid making inference from a local likelihood maxima, and for each model selection step outlined below, we performed 500 optimizations from random starting values. Mixture models can also have spikes in the likelihood that manifest as RCP groups with few associated sites. We identified and excluded models where any RCP was associated with less than two sites (Foster et al., 2013).
When fitting mixture models, the number of groups (RCPs in our case) needs to be specified. The optimal number is rarely known a priori, but can be inferred by specifying models with varying numbers of groups and using the Bayesian information criteria (BIC) to choose the “best” number of groups (Foster et al., 2013; Hui, Taskinen, Pledger, Foster, & Warton, 2015). Here, we also wanted to perform variable selection, so we used a forward selection procedure to select environmental variables and the number of RCPs simultaneously. Starting from the null model, for each step we considered the addition of each environmental variable (linear and quadratic term simultaneously) for between one and eight RCPs. The best model for that step was the combination of environmental variables and number of RCPs that minimized BIC. The process was repeated until there was no improvement in BIC between selection steps. Model assumptions were checked by examining randomized quantile residuals (RQR; Dunn and Smyth (1996)) modified for mixture models (Dunstan, Foster, Hui, & Warton, 2013; Foster et al., 2017). Five hundred Bayesian bootstraps (Rubin, 1981), where sites (and their complement of species) are resampled, were used to quantify uncertainty in parameter estimates (Foster et al., 2017). Uncertainty in RCP predictions is quantified by generating predictions for each set of bootstrap estimates and empirically calculating means and confidence intervals (Foster et al., 2017). Further details of RCP models, including their derivation, likelihood function and estimation can be found in Refs Foster (2013) and Foster et al. (2017).
2.5 Validation
As we cannot directly observe the groupings, validation of RCPs (or any clustering technique in this context) is difficult. We can, however, observe the species themselves at a given location, and we use this in conjunction with an independent dataset (536 RSTS trawls conducted across 2007, 2008, 2009 and 2012 in the Australian EEZ that were not used in RCP model construction) as our basis for validation. RCP predictions were generated for each validation trawl site based on its environmental covariates. We determined how well the model could discriminate the presence or absence of each species across all sites. The probability that a species occurs at each validation site was calculated as the probability that the site belongs to an RCP multiplied by the probability that a species occurs in that RCP, summed across all RCPs (Foster et al., 2013). The threshold-independent measure of discrimination, the area under the curve (AUC), was then calculated for each species (Franklin, 2009).
We also considered how well the RCP classifications represent the mix of species observed at sites. For the validation sites, we calculated the weighted mean occurrence of each species observed in each RCP where the probability of a site belonging to each RCP was used as the weighting. This was compared to the overall expected mean occurrence and 95% CI for each species in each RCP. If the model is entirely correct, we could expect the observed means to fall within the 95% CI of the expected means 95% of the time. The same 21 species that were used in the construction of the RCPs were considered in both aspects of model validation.
2.6 Coverage of RCP groups within HIMI Marine Protected Area
We assessed the degree to which the HIMI Marine Reserve covers the predicted spatial distribution of demersal fish RCPs by calculating the percentage of cells of each RCP type within the HIMI MR boundaries compared to those within the entire Australian EEZ boundary. To account for the probabilistic classification of RCPs, the “effective number of cells” for each RCP in each boundary was calculated as the sum of the probability of occurrence in each RCP across all cells. At the time of our analyses, the declared French marine reserve only covered coastal waters (<100 m), and so a comparable analysis was not performed here.
All analyses were conducted in the R computing environment (R Development Core Team, 2015) in v3.3.1. RCP models were fit and predicted in the package rcpmod v 2.152 (Foster, 2013). Maps were plotted in arcgis v 10.1. A series of interactive maps of the results can be found at https://doi.org/10.4225/15/58169d06ee8fc.
3 RESULTS
3.1 Number, composition and environmental characteristics of RCPs
Of the 21 species modelled, 11 species are endemic to the Southern Ocean, and five species are endemic to the Kerguelen Plateau (Table S2), highlighting the importance of this region. Seven RCPs defined by depth, mean sea surface temperature and mean chl a (and their quadratics) were identified using the model selection procedure (see Fig. S1). Diagnostic plots indicate the model was adequate to describe variation in the data (Fig. S2). There was no evidence of significant residual spatial autocorrelation for most species, and for the few species with significant autocorrelation at short lags, the magnitude of this correlation was low (~0.1; See Fig. S3 and caption for details). Depth appeared the most influential variable, causing the greatest reduction in BIC (∆BIC 4776 at step 1), and all RCPs had a characteristic depth. Dissostichus eleginoides (Patagonian toothfish) was the only species with a high probability of occurrence in all RCPs. Most species were associated with at least one RCP with a moderate (~0.5) to high (>0.75) average probability (Figure 2). However, some rarer species, such as Lycodapus antarcticus and Paraliparis spp., had a low probability of occurrence across all RCPs.

Regions of Common Profiles 1 and 2 are deep-water groups characterized by a high probability of occurrence of Macrourus spp. (grenadiers; Figure 2). While both RCPs are more likely found at depths >600 m, RCP 2 is also more likely to be found where sea surface temperature and surface production are low (<2 degrees and <1 mg/m3, respectively; Figure 3). RCP 1 is more species rich than RCP 2 with a higher chance of finding the three Bathyraja species (skates), Antimora rostrata (blue antimore), Etmopterus viator (lantern shark) and Paraliparis spp. (snailfish; Figure 2). Etmopterus viator is unlikely to occur in any other RCP.

Region of Common Profile 4 occurs between 400 and 800 m (most likely around 600 m) and at chl a values <1.5 mg/m3. While RCP 4 contains a high probability of finding Macrourus spp., it differs from RCPs 1 and 2 in that there is a moderate probability of occurrence of the three Bathyraja species, Bathydraco antarcticus (dragonfish), Muraenolepsis spp. (eel cods) and Paradiplospinus gracilis (snake mackerel; Figure 2).
Regions of Common Profiles 3 and 6 occur between ~200 and 700 m with RCP 3 most likely at 400 m and RCP 6 at 500 m. RCP 6 corresponds with lower chl a and temperature values (Figure 3). RCP 3 is characterized by a high probability of occurrence of the unicorn icefish C. rhinoceratus/velifer, Bathyraja eatonii and L. squamifrons (grey notothen), and moderate probability of occurrence of Bathyraja murrayi, Bathyraja irrasa, Macrourus spp, Mancopsetta maculata (spotted flounder) and Muraenolepsis spp. RCP 6 is similar to RCP 3, but B. eatonii, B. irrasa and C. rhinoceratus/velifer are less likely to occur (Figure 2).
Regions of Common Profiles 5 and 7 are shallow-water groups. RCP 5 is most likely to occur in a narrow zone around 200 m depth, between 2 and 5°C surface temperature and at lower chl a values (Figure 3). It is characterized by a high probability of occurrence of C. rhinoceratus/velifer (unicorn icefish) and C. gunnari (mackerel icefish) as well as L. squamifrons, and a moderate occurrence of Gobionotothen acuta (triangular rockcod), Lepidonotothen mizops, B. eatonii and Zanclorhynchus spinifer (horsefish), M. maculata and Muraenolepis spp (Figure 2). RCP 7 is found in depths less than 300 m and positively associated with chl a (Figure 3). It is similar to RCP 5, but has a higher probability of finding C. gunnari and a lower probability of finding L. squamifrons, D. eleginoides and M. maculata (Figure 2). Notothenia rossii (marbled rockcod) was only likely to occur in RCPs 5 and 7.
The sampling factor (the combination of year, season and gear type for each survey) had a variable effect on the “catchability” of species (Fig. S4). For some species, such as C. gunnari and C. rhinoceratus/velifer, the sampling factor had little effect. For others, such as M. maculata and L. antarcticus, it had a large effect. There were no obvious trends in the way the levels of the sampling factor affected the catchability of species.
3.2 Spatial patterns and predictions
The predicted spatial distribution of the RCPs across the northern Kerguelen Plateau is shown in Figure 4. The probabilistic model output has been summarized by assigning each cell a hard class based on its most likely RCP (Figure 4a). The probability of the hard classification for each cell is also shown (Figure 4b). Maps of the average probability of occurrence (and 95% CI's) of each RCP separately are in Fig. S5.

The most extensive RCP was RCP 7 with 19% coverage across the Plateau, and the least extensive was RCP 6 with 7% coverage. All other RCPs covered between 13% and 17% of the Plateau. Most RCPs were represented in the north and south of the study region, with the exception of the deep-water RCP2 that was predicted to mostly occur in the south-east. The spatial patterns closely reflected the complex underlying bathymetry of the Plateau. The classification is most certain (with the probability of a single RCP group approaching one in the shallow region in the north of the Plateau surrounding the Kerguelen Islands and in the deep sections of the Plateau. It is most uncertain (~0.5) in the intermediate depths and in the trough connecting the Heard and the McDonald Islands Plateau with that around the Kerguelen Islands.
3.3 Validation
Overall, the RCP model performed well in terms of its ability to predict when a species occurred at a validation site. One species, M. maculata, was not recorded in the validation dataset, and therefore an AUC could not be calculated. For the remaining 20 species, five species achieved AUC values >0.9 (excellent predictive capacity), and all but three species (B. murrayi, B. irrasa and Muraenolepis spp.) achieved AUCs >0.7 indicating good predictive capacity (Table 2; Franklin, 2009).
Species | Training prevalence | Validation prevalence | AUC |
---|---|---|---|
Alepocephalus antipodianus | 0.02 | 0.03 | 0.88 |
Antimora rostrata | 0.07 | 0.09 | 0.87 |
Bathydraco antarcticus | 0.04 | 0.10 | 0.79 |
Bathyraja eatonii | 0.44 | 0.47 | 0.73 |
Bathyraja irrasa | 0.32 | 0.20 | 0.60 |
Bathyraja murrayi | 0.24 | 0.37 | 0.65 |
Champsocephalus gunnari | 0.27 | 0.43 | 0.95 |
Channichthys rhinoceratus/velifer | 0.48 | 0.71 | 0.93 |
Dissostichus eleginoides | 0.86 | 0.95 | 0.72 |
Etmopterus viator | 0.01 | 0.04 | 0.74 |
Gobionotothen acuta | 0.21 | 0.27 | 0.93 |
Lepidonotothen mizops | 0.08 | 0.20 | 0.81 |
Lepidonotothen squamifrons | 0.38 | 0.51 | 0.77 |
Lycodapus antarcticus | 0.01 | 0.03 | 0.82 |
Macrourus spp | 0.51 | 0.37 | 0.91 |
Mancopsetta maculata | 0.00 | 0.21 | NA |
Muraenolepis spp | 0.27 | 0.30 | 0.64 |
Notothenia rossii | 0.02 | 0.15 | 0.83 |
Paradiplospinus gracilis | 0.05 | 0.15 | 0.78 |
Paraliparis spp | 0.01 | 0.04 | 0.92 |
Zanclorhynchus spinifer | 0.05 | 0.23 | 0.82 |
The model was also able to predict the composition of species within sites reasonably well. Twenty-three percentage of species’ occurrences across the seven RCPs fell outside of the 95% CI of the predicted mean, and is more than the 5% expected by chance (Figure 5). Many of these species fell just outside the 95% CI (i.e., <0.02 probability of occurrence), while for some (e.g., D. eleginoides in RCPs 3 and 5), the discrepancy was larger (Figure 5).

3.4 HIMI Marine Reserve coverage
The Heard Island and McDonald Island Marine Reserve (HIMI MR; Figure 6) covers 34% of the Australian EEZ region modelled (i.e., <1,200 m depth). RCPs 3, 5, 6 and 7 had more than 40% of their predicted area represented within the HIMI MR, while 31% and 25% of the predicted area of RCP 4 and RCP, respectively, fell within the reserve boundaries. The deep-water group, RCP 2, had the least coverage, with just 12% of its area within the reserve.

4 DISCUSSION
4.1 Biogeographic patterns on the Kerguelen Plateau
Defining and mapping assemblages are a fundamental task in ecology that can give deeper ecological insight into the composition of communities and their potential environmental drivers as well as reveal broad biogeographic patterns. We discuss the kind of biogeographic information and insight that the RCP approach gives with respect to the Kerguelen Plateau.
Our work provides the first quantitative and fine-scale map of demersal fish assemblages on the northern Kerguelen Plateau. Fish assemblages here are primarily depth-structured, as is common in marine taxa (Hill et al., 2014; Rubidge et al., 2016). We found evidence for two shallow assemblages (<400 m), three intermediate depth assemblages (~400–700 m) and two deep assemblages (>600 m depth). Despite the northern plateau extending more than 1,000 km north–south and crossing two frontal zones (Park & Vivier, 2011), most assemblages are represented in the north and south of the region. The broad latitudinal extent of assemblages, combined with the number of endemic species, supports the current designation of the entire Kerguelen Plateau as a single biogeographic province within the Southern Ocean (Duhamel et al., 2005).
The distribution of the targeted species on the Kerguelen Plateau has been well described. The RCP approach confirms the distributional patterns of these species, but also provides a quantitative picture of how the less common species covary, revealing depth-related patterns in endemicity. For example, as expected, juvenile Patagonian toothfish are prevalent in all assemblages on the Plateau, and icefish in differing proportions are key components of the two shallow assemblages (Constable, Williams, & De la Mare, 1998; Duhamel & Hautecoeur, 2009; Péron et al., 2016). However, shallow assemblages also feature species such as G. acuta, L. mizops and B. murrayi that are endemic to the Kerguelen Plateau, while the two deep assemblages feature macrourids that are generally more cosmopolitan in their distribution (Duhamel et al., 2014). Depth-related patterns in the distribution and endemicity of demersal fish exist around other subantarctic islands (Gregory, Collins, & Belchier, 2017; Pakhomov et al., 2006), potentially because deep ocean basins are more connected than shallow, isolated subantarctic islands and plateaux (Duhamel et al., 2005).
Chlorophyll a and surface temperature were also important in determining the distribution of assemblages and may provide insight into processes structuring the Plateau's assemblages. For example, the sharply rising Plateau diverts the eastward flowing ACC creating areas of relatively high current (Park, Roquet, Durand, & Fuda, 2008). Similarly, currents interacting with the shallow shelf release iron resulting in plumes of surface productivity that advect to the north-east (Mongin, Molina, & Trull, 2008). Both currents and an increase in surface-derived food can affect the type of benthic habitat (Welsford, Ewing, Constable, Hibberd, & Kilpatrick, 2014). On the steep, low productivity, south-western slopes, dense filter-feeding and structure-forming taxa have been recorded (Koubbi et al., 2016; Welsford et al., 2014) corresponding with our RCP 6. RCP 3 is found in similar depths mostly in areas of higher surface productivity where deposit-feeders have been recorded as dominant (Hibberd, 2015). Likewise, RCPs 1 and 2 occurred at similar depths (>600 m) but RCP 2 only occurred in the south-east of the plateau, coinciding with both lower surface temperature and productivity and likely sandy/silty substratum (Welsford et al., 2014). This indicates that pelagic-benthic coupling, likely through its influence on benthic habitat, may be an important factor for demersal fish in this region.
4.2 Management application
Maps depicting the distribution (and associated uncertainty) of assemblages are vital for informed and robust management of biodiversity, particularly for planning or evaluating spatial management, and we have shown that the RCP approach is useful in providing this kind of information. The HIMI Marine Reserve was established under the CAR principles (Comprehensive, Adequate and Representative) using a qualitative biogeography comprising 13 local units delineated using environmental conditions supplemented with information on the distribution of benthic invertebrates, target fish species and marine mammal and seabird foraging areas (Commonwealth of Australia, 2014; Meyer, Constable, & Williams, 2000). These units match reasonably well with our quantitative demersal fish ecoregionalization. A key difference is that the demersal fish regionalization has a finer depth discrimination, while the qualitative regionalization distinguished each of the banks as separate units (Commonwealth of Australia, 2014; Meyer et al., 2000). Perhaps not surprisingly then, we find that most of our quantitatively derived demersal fish assemblages are well represented within the reserve, except for the deep-water RCP 2 assemblage. RCP2 has a high occurrence of the cosmopolitan macrourids and Patagonian toothfish (which are widely distributed across the Plateau) but is otherwise relatively species poor. While this group does not contain species that are endemic to the plateau, increased representation could be achieved by incorporating a larger area of this assemblage into the reserve. Further, these maps may provide useful input into the spatial management process currently underway in the French EEZ.
4.3 Evaluation of the RCP approach
While the RCP approach shares the reported advantages of other model-based approaches over dissimilarity-based approaches (Warton et al., 2015a), a key feature of RCP is the identification of the species profile for each assemblage. As we demonstrate, the species profile allows us to directly and robustly interpret: (1) which species we expect to see in what groups and with what certainty; and (2) how those groups (and therefore species) are associated with the environment. It also allows validation at the level of species where we have additional survey data. Upon validation in the southern section of our survey area, we find our model was able to usefully predict the occurrence of most species at new sites. A few species that are closely associated with the bottom (e.g., the flatfish and skates) were more difficult to accurately predict which may be a reflection of gear-related artefacts that were unable to completely account for in our model because of confounding in gear type between the datasets (see Appendix S1 for more details on sampling effects). The model also performed reasonably well when predicting the composition of species at sites. Some species, such as icefish and toothfish, which form dense aggregations and/or exhibit variable recruitment (Duhamel, Pruvost, Bertignac, Gasco, & Hautecoeur, 2011) tended to be overpredicted in some groups (e.g., RCP 5 and 7); however, this does not substantially alter the ecological interpretation of these assemblages. Future surveys planned in the French EEZ will be especially useful for further validating our ecoregionalization. In contrast to dissimilarity-based approaches, using RCP we are able to get direct and transparent insights into the composition of assemblages and gain a detailed picture of when our model performs well, to the benefit of both ecology and conservation management.
Because the RCP approach is truly a “one-stage” classification that considers both species and the environment simultaneously, our prediction maps capture the uncertainty in the entire model. Two-stage approaches, in contrast, either do not quantify the uncertainty in the initial biological groupings in stage one (treating site allocation to groups as fixed) or if it is quantified are unable to incorporate it when calculating the model uncertainty for stage two (e.g., Rubidge et al., 2016). This leads to an overly optimistic representation of model outputs, which is a concern for conservation applications. When we propagate uncertainty through to the final maps, groupings are generally less certain in the intermediate depths of the Plateau and near boundaries of groups. This may reflect the relatively sparse sampling between the Kerguelen and Heard Island shelves or that the environmental variables available may not entirely capture biological differences in this section of the plateau. This uncertainty may also be due to the transition between shallow and deep fauna and a reflection of the long-standing difficulty in defining assemblages in a continuum of multispecies responses (Leaper, Dunstan, Foster, Barrett, & Edgar, 2014). Additionally, we used climatologies of environmental variables, and the temporal mismatch between the environmental variables and the biological sampling is likely to have added noise to our data and contributed to the uncertainty in the results (Barry & Elith, 2006).
While there are many advantages to using the RCP approach, there are some challenges in its implementation and some caveats in its use. Because the model has many parameters and requires multiple starts, multiple core machines or high-performance computing facilities are required for large datasets. Additionally, while some advice exists for selecting the number of RCPs (Foster et al., 2013), little is available for selecting of the number of RCPs and environmental covariates simultaneously; hence, the pragmatic approach adopted in this study. Model selection tools are being developed for latent variable models (e.g., Hui, 2016), and adaptions should be applicable to mixture models in the near future. The ability of RCP models to account for sampling artefacts (such as different gear types) lends itself to the amalgamation of datasets, but an assumption of the model is that there is no interaction between the sampling artefacts and the biogeography. This could occur if there is separation between the environmental coverage of the different sampling factors or if species utilize the environment differently between sampling factors (see Appendix S1). Despite these caveats, the interpretability of the RCP approach makes it an attractive method for ecoregionalization.
5 CONCLUSIONS
Here, we have demonstrated the capabilities of RCP, a new model-based method for ecoregionalization. The method shares many of the reported advantages of other model-based methods, but a key feature of RCP is the species profile which allows a direct interpretation of assemblages and their contents as well as their environment. We provide a method for validating the results of RCP models and find that the model performs reasonably well for the southern section of our study region where we have additional data. The ability of RCP to accommodate sampling artefacts promotes the amalgamation of multiple datasets, increasing the coverage and extent of future classifications. Additionally, because RCP is an analytical approach to ecoregionalization, it is easily updated as new data become available. Overall, the RCP approach should be a valuable tool that can be applied to a range of ecological and conservation management scenarios.
In the context of our case study, we provide the first quantitative and fine-scale map of the distribution of demersal fish assemblages across the northern Kerguelen Plateau. Our results extend current ecological and biogeographic knowledge for the region, and maps of the distribution of assemblages will be useful for ongoing spatial management and monitoring.
ACKNOWLEDGEMENTS
We thank Tim Lamb and Patrice Pruvost for establishing and curating the Australian and French fish databases and for providing access to and clarifying the data; the Australian Fisheries Management Authority and Muséum national d'histoire naturelle for permission to use the data; the crew, scientific teams and observers who collected the survey data; Eric Oliver for providing derived statistics of the NOAA High Resolution SST data (itself provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA); Stuart Corney for providing temperature and current outputs from a Southern Ocean Regional Oceanographic Model; Michael Sumner for providing tools to generate chlorophyll a variables derived from Ocean colour data (provided by NASA Goddard Space Flight Centre and the ESA GlobColour Project; http://oceancolor.gsfc.nasa.gov/) and for assistance with the interactive maps. We also thank Rebecca Leaper for the inspiration for this project and two anonymous referees for their comments. This work was completed as part of Australian Antarctic Science project 4124.
DATA ACCESSIBILITY
Demeral fish data are restricted access but may be made available on request by contacting Australian Fisheries Management Authority (RSTS) or Guy Duhamel (POKER). Csv files and plots of results are available through the Australian Antarctic Division Data Centre (https://doi.org/10.4225/15/57a01de46ed1e). Interactive maps of the distribution of RCP groups and RCP group representation within the Heard and the McDonald Islands Marine Reserve can be found at https://doi.org/10.4225/15/58169d06ee8fc.
References
BIOSKETCH
Nicole Hill is a quantitative ecologist whose recent research focuses on understanding, quantifying and mapping biodiversity to support decision-making in the marine environment. She currently leads a project that is applying novel statistical methods to model and map marine biodiversity in the Southern Ocean.
Author contribution: NH, SF and CJ conceived and designed the manuscript; GD and DW lead the programs that collected, identified and maintained the data and provided and interpreted the data; NH and SF analysed the data; All authors contributed to the interpretation of analyses; NH wrote the manuscript with input from all authors.