Volume 48, Issue 8 pp. 2171-2184

RESEARCH ARTICLE

Full Access

Equivalence of citizen science and scientific data for modelling species distribution of birds from a tropical savanna

Eduardo Guimarães Santos,

Corresponding Author

Eduardo Guimarães Santos

[email protected]

orcid.org/0000-0002-9858-1784

Programa de Pós-graduação em Ecologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil

Correspondence

Eduardo Guimarães Santos, Programa de Pós-graduação em Ecologia, Instituto de Ciências Biológicas, Universidade de Brasília, 70919-970 Brasília, DF, Brazil.

Email: [email protected]

Contribution: Conceptualization (equal), Data curation (equal), Formal analysis (equal), Investigation (equal), Methodology (equal), Validation (equal), Visualization (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author

Helga Correa Wiederhecker,

Helga Correa Wiederhecker

orcid.org/0000-0002-6454-0829

Independent Researcher, Brasília, Brazil

Contribution: Conceptualization (equal), Investigation (equal), Methodology (equal), Validation (equal), Writing - review & editing (equal)

Search for more papers by this author

Leonardo Esteves Lopes,

Leonardo Esteves Lopes

orcid.org/0000-0003-4014-9128

Laboratório de Biologia Animal, IBF, Universidade Federal de Viçosa – Campus Florestal, Florestal, Brazil

Contribution: Data curation (equal), Validation (equal), Writing - review & editing (equal)

Search for more papers by this author

Miguel Ângelo Marini,

Miguel Ângelo Marini

orcid.org/0000-0002-7300-7321

Departamento de Zoologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil

Contribution: Conceptualization (equal), Investigation (equal), Methodology (equal), Supervision (equal), Writing - review & editing (equal)

Search for more papers by this author

Eduardo Guimarães Santos,

Corresponding Author

Eduardo Guimarães Santos

[email protected]

orcid.org/0000-0002-9858-1784

Programa de Pós-graduação em Ecologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil

Correspondence

Eduardo Guimarães Santos, Programa de Pós-graduação em Ecologia, Instituto de Ciências Biológicas, Universidade de Brasília, 70919-970 Brasília, DF, Brazil.

Email: [email protected]

Search for more papers by this author

Helga Correa Wiederhecker,

Helga Correa Wiederhecker

orcid.org/0000-0002-6454-0829

Independent Researcher, Brasília, Brazil

Contribution: Conceptualization (equal), Investigation (equal), Methodology (equal), Validation (equal), Writing - review & editing (equal)

Search for more papers by this author

Leonardo Esteves Lopes,

Leonardo Esteves Lopes

orcid.org/0000-0003-4014-9128

Laboratório de Biologia Animal, IBF, Universidade Federal de Viçosa – Campus Florestal, Florestal, Brazil

Contribution: Data curation (equal), Validation (equal), Writing - review & editing (equal)

Search for more papers by this author

Miguel Ângelo Marini,

Miguel Ângelo Marini

orcid.org/0000-0002-7300-7321

Departamento de Zoologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil

Contribution: Conceptualization (equal), Investigation (equal), Methodology (equal), Supervision (equal), Writing - review & editing (equal)

Search for more papers by this author

First published: 05 November 2023

https://doi.org/10.1111/aec.13454

Share a link

Email
Wechat
Bluesky

Abstract

The Wallacean deficit continues to be a challenge to species distribution modelling. Although some authors have suggested that data collected by citizen scientists can be relevant for a better understanding of biodiversity, to our knowledge, no work has quantitatively tested the equivalence between scientific and citizen science data. Here, we investigate the hypothesis that data collected by citizen scientists can be equivalent to data collected by professional scientists when generating species spatial distribution models. For 42 bird species in the Cerrado region we generated and compared species distribution models based on three data sources: (1) scientific data, (2) citizen science data and (3) sample size corrected citizen science data. To test our hypothesis, we compared the equivalence of these datasets. We rejected the hypothesis of equivalence for about one-third (38%) of the evaluated species, revealing that, for most of the species considered, the models generated were equivalent irrespective of the data set used. The distances between centroids of the models that were equivalent were on average smaller than the distances between non-equivalent models. Also, the direction of change in the models showed no pattern, with no trend towards more populated regions. Our results show that the use of data collected by citizen scientists can be an ally in filling the Wallacean deficit gap. In fact, the lack of use of this wide range of data collected by citizen scientists seems to be an unjustified caution. We indicate the potential of using citizen science data for modelling the distribution of species, mainly due to the large set of data collected, which is impracticable for scientists alone to collect. Conservation measures will be favoured by the union of professional and amateur data, aiming for a better understanding of species distribution and, consequently, biodiversity conservation.

Resumo

O déficit wallaceano continua a ser um desafio para a modelagem da distribuição das espécies. Embora alguns autores tenham sugerido que os dados recolhidos por cientistas cidadãos podem ser relevantes para uma melhor compreensão da biodiversidade, pelon osso conhecimento, nenhum trabalho testou quantitativamente a equivalência entre dados científicos e de ciência cidadã. Aqui, investigamos a hipótese de que os dados coletados por cientistas cidadãos podem ser equivalentes aos dados coletados por cientistas profissionais na geração de modelos de distribuição espacial de espécies. Para 42 espécies de aves na região do Cerrado, geramos e comparamos modelos de distribuição de espécies baseados em três fontes de dados: 1) dados científicos, 2) dados da ciência cidadã, e 3) dados da ciência cidadã corrigidos pelo tamanho da amostra. Para testar a nossa hipótese, comparamos a equivalência destes conjuntos de dados. Rejeitamos a hipótese de equivalência para cerca de 1/3 (38%) das espécies avaliadas, revelando que, para a maioria das espécies consideradas, os modelos gerados eram equivalentes independentemente do conjunto de dados utilizado. As distâncias entre os centroides dos modelos equivalentes foram, em média, menores do que as distâncias entre os modelos não equivalentes. Ainda, a direção da mudança nos modelos não mostrou nenhum padrão, sem tendência para regiões mais populosas. Os nossos resultados mostram que a utilização de dados recolhidos por cientistas cidadãos pode ser um aliado no preenchimento da lacuna do déficit wallaceano. De fato, não utilizar esta vasta gama de dados recolhidos por cientistas cidadãos parece ser uma precaução injustificada. Indicamos o potencial da utilização de dados da ciência cidadã para a modelação da distribuição das espécies, principalmente devido ao grande conjunto de dados recolhidos, cujo recolhimento é impraticável apenas para os cientistas. As medidas de conservação serão favorecidas pela união de dados profissionais e amadores, visando uma melhor compreensão da distribuição das espécies e, consequentemente, a conservação da biodiversidade.

INTRODUCTION

One of the most dramatic aspects of the biodiversity crisis is the mismatch between biodiversity destruction and the speed at which species information is produced. Biodiversity loss is happening much faster than species information is produced. There has been an improvement in species distribution modelling in the last 20 years, playing an important role in understanding species spatial distribution patterns. In fact, understanding the distribution patterns of species through time can help researchers and managers intervene in the event of a distribution shift or decline in biodiversity, with direct implications for species conservation (Melo-Merino et al., 2020; Peterson et al., 2011, 2015). Despite this improvement, the Wallacean deficit, that is, lack of knowledge about where species occur geographically (Lomolino, 2004; Proença et al., 2017; Whittaker et al., 2005), continues to be a challenge to species distribution modelling, which in turn limits knowledge of species' geographic ranges. This is because the higher the number of records, the greater the probability of generating adequate predictive models (Feeley & Silman, 2011). Thus, methods that increase our accumulation of biodiversity information, such as data from citizen science, can be an ally of conservationists.

Citizen science is a way of engaging volunteer citizens (mostly non-specialists) with scientific production, including recording data of potential scientific use, such as species occurrence (Auerbach et al., 2019; Heigl et al., 2019). Citizen science began ~1900 (Cohn, 2008) and has advanced worldwide. It is regarded as a promising area, given that it can increase our knowledge of biodiversity (Follett & Strezov, 2015; Hannibal, 2016; Kerstes et al., 2019; MacPhail & Colla, 2020). Albeit not the only objective of citizen science (see Cohn, 2008; Haklay et al., 2021), involving volunteers in the collection of data often increases the volume of data produced. In fact, one of the limitations that has plagued scientists using citizen science data has been the spatial bias associated with human population centres, as most citizen science data is collected in the vicinity of these urban centres (Geldmann et al., 2016). Thus, it is important to understand the positives and limitations related to these large data sets that are being collected around the world.

Citizen science has become an ally for scientists interested in understanding the natural world. In fact, citizen collection has been used to supplement scientific data in many studies. Birds have received the majority of citizen science attention. For example, researchers are improving the understanding of several fields of knowledge such as: community structuring (La Sorte et al., 2018; La Sorte & Somveille, 2020), breeding (Ferreira et al., 2019; Turella et al., 2022), migration (Schubert et al., 2019), conservation (Steven et al., 2019), patterns of occurrence and abundance (Lepczyk et al., 2017) and, more recently, the impact of COVID-19 (Schrimpf et al., 2021). However, although some authors have highlighted that data collected by citizen scientists can be relevant for a better understanding of biodiversity (Chandler et al., 2017; La Sorte & Somveille, 2020; Poisson et al., 2020; Steven et al., 2019), including demonstrating its importance (e.g. Aceves-Bueno et al., 2017; Kosmala et al., 2016), to our knowledge, no work has quantitatively tested the equivalence of sets of data collected by citizen scientists and data collected by scientists.

Here, we used data on 42 Neotropical bird species to investigate the hypothesis that citizen scientists' (CIT) data are equivalent to professional scientists' (SCI) data when generating species geographic distribution models. To compare the equivalence of these data sets, we explore two hypotheses. First, we compared the equivalence of models using an identity test, controlling for the discrepancy in the number of records (H1). After that, to verify the spatial distribution of SCI and CIT models, we compared their overlaps and the centroid shifts of each generated model (H2). Our hypothesis was that models generated only with CIT data would be geographically biased to locations with higher human concentration, due to data collection biases. We then pinpointed the cases in which data from CIT can help fill in the gaps of species occurrence. Finally, we investigated the influence of biological and ecological characteristics on SCI and CIT data equivalence and discussed the implications of our results.

METHODS

Species evaluated

We used 42 species that occur in the Cerrado, a savanna-like biome, which is a biodiversity hotspot that has been intensely threatened by landscape changes due mainly to pasture and agriculture expansion (Klink & Machado, 2005; Myers et al., 2000). The 42 species differ in: extent of occurrence (min = 17 800 km², max = 71 600 00 km², www.datazone.birdlife.org); body size (min = 9.5 cm, max = 34.5 cm – Gwynne et al., 2010; Ridgely & Tudor, 2009); conservation status (Least Concern, Near Threatened, Vulnerable, Endangered – www.iucnredlist.org); population trend (decreasing, increasing, or stable – www.datazone.birdlife.org); and habitat-use (grasslands – GR, savannas – SV, and forests – FO) and sensitivity to disturbance (low, medium and high) (Sousa et al., 2021). The 42 species chosen for our analysis have been extensively studied by scientists over the last 20 years in our region (e.g. Lopes, 2008; Marini et al., 2009; Sousa et al., 2021), what has guaranteed reliable scientific data compiled from different databases (for more details see the topic Range delimitation in Appendix S1).

Species occurrence data

We collated CIT occurrence points from three data sources: iNaturalist (n = 1287, research-grade data from www.inaturalist.org), eBird (n = 53 423, www.ebird.org) and WikiAves (n = 47 816, www.wikiaves.com.br). We collated SCI occurrence points (professionally collected) from GBIF (n = 1652, www.gbif.org), ‘Portal da Biodiversidade’ (n = 7781, www.portaldabiodiversidade.icmbio.gov.br), and from scientific articles published until 2021 (n = 5172). We proceeded with the literature review using both the scientific name (including synonyms and previous combinations), and the common name (Portuguese and English) for each species. Also, our review included specimens deposited in 11 Brazilian institutions, six North American institutions, and seven European institutions (to access the complete scientific data review methodology, see Range delimitation in Appendix S1). We collated a total of 117 075 occurrence points (SCI = 14 605 and CIT = 102 470). We obtained, for each species, three distribution model ensembles (SCI, CITcorr and CITuncorr; Table S1).

Species distribution modelling – SDM

We used 19 bioclimatic variables from WorldClim (Hijmans et al., 2005; www.worldclim.org), and also altitude, slope (Amatulli et al., 2018), and slope aspect (Holland & Steyn, 1975). Spatial layers were adjusted to 5 km resolution. We condensed environmental information with a Principal Component Analysis (PCA) (De Marco & Nóbrega, 2018), using 95% of the total variation as a cut-off (Table S2). For each set of points (CIT and SCI) we filtered and removed occurrences that were within a distance of 10 km from each other (2× cell size) (Andrade et al., 2020; Veloz, 2009). Since CIT records are typically more abundant than SCI records (see Table S1), we randomly selected the same number of points from the SCI data as in the CIT data to create the sample-size corrected subset CITcorr. This controlled for the difference in sampling size between the two datasets. We created two comparison scenarios: (1) Total occurrences, using all records of SCI and CIT (SCI × CITuncorr); and (2) Equal occurrences, using a reduced set of CIT records (SCI × CITcorr).

Next, we generated species distribution models using the three data sets (SCI, CITcorr and CITuncorr). We used five widely used algorithms: Maximum Entropy (MXD); Support Vector Machine (SVM); Random Forest (RF); Generalized Linear Models (GLM); and Bayesian Gaussian Process (GAU). We used the bootstrap technique (10 replicates) (Fielding & Bell, 1997), separating 70% of occurrences for model training and 30% for testing. In the cases of models that work with pseudo-absence, we kept a 1:1 ratio with the presence records in each scenario, allocating pseudo-absences in geographic areas with less environmental suitability, predicted by a Bioclim model (Engler et al., 2004). We used True Skill Statistic (TSS) as a performance metric for each model generated (Allouche et al., 2006). To generate the final models for each data set, we assembled the best models, with TSS greater than the overall mean value, hereafter referred to as distribution models. All procedures were performed using the R programming language ENMTML package (Andrade et al., 2020).

Comparison between models

We compared the three distribution models for each species using the identity test (also called equivalence). We performed the pairwise comparison, SCI × CITuncorr and SCI × CITcorr contrasting the model's values for each cell. We considered two similarity metrics, D and I, which range from 0 to 1 (0 = no equivalence, 1 = niche models are identical) (Warren et al., 2008). Afterwards, we performed the hypothesis test based on null model distributions for D and I values derived from 1000 randomized null models for each compared dataset. We tested the similarity, comparing the observed value with the null distribution, with a cut-off limit of p < 0.05 (Warren et al., 2008). Comparisons were performed using the ENMTools package (Warren et al., 2021; Warren & Dinnage, 2021), in the R program (R Core Team, 2020).

We used an equivalence test to classify species' models as either equivalent or not-equivalent. To identify the most relevant variables for distinguishing equivalent from non-equivalent models, we used the selected variables of species in a guided regularized random forest analysis (GRRF, Breiman, 2001), using the ‘RRF’ (Deng, 2013) and ‘randomForest’ packages (Liaw & Wiener, 2002). We used the Mean Decrease Accuracy (MDA) value for evaluation and compared our results using model accuracy, using the confusionMatrix function of the ‘caret’ package (Kuhn, 2021).

To test the spatial differences in the models, using the scientific model (SCI) as a base, we characterized every modelled distribution (CITuncorr and CITcorr) by its range and position (latitude and longitude of the range centroid). Thus, we calculated: (1) the models' range increase for each species, calculating the percentage of areas indicated without intersection with the SCI model. That is, modelled areas that were distant from the model generated with the scientific data; (2) the range shift, calculating the distance and direction of the centroids of the models, using the ‘sp’ (Bivand et al., 2013) and ‘rearrr’ (Olsen, 2023) packages. Our objective was to present a spatial view of the predicted change, testing the hypothesis of change directed to more populated areas, in southeastern Brazil (IBGE, 2022).

RESULTS

Comparison between scientific and citizen science distribution models

All models presented adequate results, with TSS average values greater than 0.5 (Figure 1, Table S3, Figure S1). Mean TSS values obtained by comparing SCI and CIT models were similar between full (CITuncorr) and reduced (CITcorr) citizen science models (SCI × CITcorr, D = 0.655, I = 0.910; SCI × CITuncorr, D = 0.659, I = 0.910, Figure 2, Figure S1).

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Results from the species distribution models generated in our study. True Skill Statistic (TSS) values and standard deviations of the three generated models. *SCI* = Model generated with scientific data only; *CITcorr* = Model generated with data collected by citizen scientists that have been rarefied to equal the number of records of scientific data; and *CITuncorr* = Model generated with all data collected by citizen scientists. *Represents our hypothesis test, with p-values <0.05 (representing non-equivalence). Colourblind-friendly colour combinations generated with the *colorBlindness* package (Ou, 2021).

Comparison of the models revealed that most of the species in our investigation obtained equivalent distribution models. We rejected the hypothesis of equivalence between CIT and SCI distribution models for about one-third (38%) of the evaluated species (p < 0.05), revealing that, for most of the species considered, the models generated (62%) were equivalent irrespective of the data set used. This output was consistent regardless of the number of citizen science records used (CITcorr or CITuncorr). According to Random Forest results, the extent of occurrence, body size and habitat use were the most important variables distinguishing species with equivalent models and species with non-equivalent models (Figure 3).

The models indicated as equivalent showed a lower average percentage difference in their spatial distribution (Equivalent SCI × CITuncorr: 20%, Equivalent SCI × CITcorr: 21%) than the non-equivalent models (Non-equivalent SCI × CITuncorr: 32%, Non-equivalent SCI × CITcorr: 28%). Also, the distances between centroids of the models that were equivalent were on average smaller (Equivalent SCI × CITuncorr: 99 km, Equivalent SCI × CITcorr: 105 km) than the distances between non-equivalent models (Non-equivalent SCI × CITuncorr: 214 km, Non-equivalent SCI × CITcorr: 233 km), thus demonstrating that the equivalent models are spatially closer (Table 1). The direction of change in the models showed no pattern, with no trend towards more populated regions, as we had predicted (Figure 4).

TABLE 1. Estimated range size from scientific data (SCI), percentage of range increase relative to SCI, and direction of range shift from SCI under CITcorr and CITuncorr estimated ranges for 42 bird species.

Species	Area SCI (km²)	CITcorr			CITuncorr
		Range increase (%)	Range shift (km)	Direction of range change	Range increase (%)	Range shift (km)	Direction of range change
		Range increase (%)	Range shift (km)	(°to north)	Range increase (%)	Range shift (km)	(°to north)
Nothura minor	1 043 182	9	114	31	30	149	25
Taoniscus nanus	1 117 099	31	130	210	22	120	288
Uropelia campestris	3 871 768	15	135	238	15	130	246
Augastes scutatus	144 334	58	282	260	85	153	161*
Heliactin bilophus	4 130 344	7	126	263	7	120	269
Campylopterus calcirupicola	115 352	23	103	7	29	75	11
Campylopterus diamantinensis	23 564	20	80	164	20	98	171
Celeus obrieni	947 739	32	22	115	30	20	98
Alipiopsitta xanthops	3 553 351	4	82	256	6	83	267
Pyrrhura pfrimeri	16 027	18	49	271	23	50	290
Herpsilochmus longirostris	2 844 509	26	186	245	28	183	243*
Thamnophilus torquatus	3 295 494	12	43	25	16	76	23*
Cercomacra ferdinandi	350 698	45	72	272	23	60	227
Melanopareia torquata	3 709 474	29	192	289	29	224	281*
Scytalopus novacapitalis	44 646	46	449	155	36	132	177*
Geositta poeciloptera	1 477 521	55	280	160	56	430	149*
Syndactyla dimidiata	1 308 886	24	82	89	30	80	84
Clibanornis rectirostris	1 842 529	27	26	330	26	83	59
Asthenes luizae	39 441	25	55	274	17	50	253
Synallaxis simoni	3 328 488	19	105	266	11	100	225
Antilophia galeata	2 633 265	5	101	254	5	70	258
Phylloscartes roquettei	228 768	26	112	135	25	100	120
Euscarthmus rufomarginatus	4 921 760	13	158	257	7	150	248
Phyllomyias reiseri	1 618 435	27	230	293	24	233	289
Culicivora caudacuta	3 751 684	8	230	233	12	165	246*
Polystictus superciliaris	532 913	25	99	274	55	150	196*
Guyramemua affine	3 768 105	22	482	200	21	457	194*
Alectrurus tricolor	2 505 625	11	267	271	12	223	278*
Knipolegus franciscanus	398 409	27	112	339	30	110	266
Cyanocorax cristatellus	3 336 090	2	119	241	4	91	273
Myiothlypis leucophrys	1 381 891	35	131	300	28	86	213
Charitospiza eucosma	3 374 720	7	284	291	8	308	287*
Coryphaspiza melanotis	4 253 439	18	310	335	17	314	331*
Embernagra longicauda	330 734	37	32	214	15	26	223
Porphyrospiza caerulescens	3 835 194	5	168	232	8	163	235
Saltatricula atricollis	4 250 500	34	103	281	37	126	293*
Conothraupis mesoleuca	162 115	43	177	232	43	112	204*
Cypsnagra hirundinacea	3 688 177	30	254	358	27	223	1*
Microspingus cinereus	1 155 530	10	153	60	45	150	53
Neothraupis fasciata	3 776 245	24	96	274	24	150	262*
Schistochlamys ruficapillus	3 259 912	17	123	105	7	120	229
Paroaria baeri	654 666	50	124	349	33	50	149

* Represents our hypothesis test, with p-values <0.05 (representing non-equivalence).

DISCUSSION

Our results show that the use of data collected by citizen scientists can facilitate filling the Wallacean deficit gap. In our case, data from citizen scientists represented more than 88% of all records found for the studied species (Figure S1). Furthermore, almost two thirds of the evaluated species presented model equivalence when using SCI and CIT data sets, and the models generated with the different data sets showed a high geographical overlap, as expected, especially in the models that were equivalent. Furthermore, we did not observe bias of the CIT models related to data collection in more populated regions. Our findings demonstrate the high value of CIT data, which cannot be overlooked in light of the current biodiversity crisis and the need to implement effective conservation planning (Butchart et al., 2010; Isbell et al., 2023; Jaureguiberry et al., 2022).

The selection of reliable data sets is challenging and can influence the quality of the distribution model generated (Duputié et al., 2014; Guisan et al., 2017). Thus, caution has been required in the use of citizen science data due to possible identification biases and inaccuracies in occurrence data (Anderson, 2012). Nevertheless, we did not detect a pattern relating to species ecology and whether SCI and CIT models would be equivalent (Figure 3b). A few factors might contribute to that, including specialized amateur birdwatchers' groups that can make remarkable records even of species that are rare and difficult to detect. However, we did not observe bias due to possible collection in more populated regions (Figure 4). So, further investigation on equivalence determinants might reveal which species cannot be satisfactorily recorded by citizen scientists and thus guide species prioritization. Despite that, we argue that data from citizen scientists are an important ally and can be used in distribution modelling. This was supported by testing the species distribution model equivalence, the high geographic overlap of models, and the closeness of centroids between models. Thus, data from citizen scientists are not only useful to model the distribution of most species but also can contribute to other relevant aspects (La Sorte et al., 2018; La Sorte & Somveille, 2020; Zulian et al., 2021) such as biological invasion (Encarnação et al., 2021), phenology and diversity gradients (Soroye et al., 2018; Suzuki-Ohno et al., 2017), conservation (MacPhail & Colla, 2020) and evaluating the impacts of global changes on present and future distribution patterns.

A well-planned citizen science has enormous potential for collecting much-needed data more quickly, to improve species distribution models, and can reach a level of quality that matches the data collected by experts (Hoyer et al., 2012; van der Velde et al., 2017). Our results, however, show that high quality (demonstrated by the equivalence of CIT models) can be achieved without elaborate planning or scientific supervision. Our hypothesis of data bias due to concentrated collection in more populated regions was not corroborated, which confirms our argument for the usefulness of data collected by citizen scientists for species distribution modelling. In fact, one could expect high reliability whenever the requirement is solely the correct identification and location, which is the case of correlative species distribution models (e.g. Hedblom et al., 2014). In addition, some citizen science projects can persist for long periods, as they are not dependent on scarce funding, which limits sampling time (Poisson et al., 2020; Steven et al., 2019; Theobald et al., 2015). At present, it is essential to recognize that most bird species are under some level of human-mediated threat; therefore, scientific effort alone will not be able to gather species data at a desirable speed.

The mismatch between biodiversity information demand and production is even more acute in tropical regions, which harbour a significant number of bird species and lack proper historical efforts to register and share species occurrence data. Moreover, the studied species occur within the Cerrado biome, the world's most biodiverse savanna. In 35 years, more than half of the 2 million km² biome has been converted to agriculture (Klink & Machado, 2005). These figures reinforce the Cerrado as one of the most threatened biogeographic provinces in the world, urging the use of all available information to support conservation decision-making, including citizen science data. Thus, we emphasize that, regardless of the intrinsic characteristics of the evaluated species (size, area, habitat use, threat level, etc.), the non-use of citizen science data for SDMs is unjustifiably cautious.

Although the two methods of data collection are able to generate statistically equivalent and highly overlapping distribution models, we strongly disagree that scientific data are directly replaceable. It is essential to recognize that scientific and citizen science data have strengths and limitations, and we can derive most benefit by exploring their complementarities (La Sorte et al., 2018; La Sorte & Somveille, 2020; Zulian et al., 2021). Scientific data, such as biological collection vouchers, harbour unique historical information that allows historical changes to be mapped (e.g. Marini et al., 2020; Navarro et al., 2021). In addition, scientific efforts can reach isolated areas within reserves or less populated areas (least visited by citizens), directing data production to redress information gaps. Nevertheless, we point out that part of the species distribution data present in museums worldwide is not available for public consultation, hindering spatial analyses from supporting conservation (Marini et al., 2020; Peterson et al., 2005). Indeed, there is a debate regarding maintenance cost and the resources directed to curators and museums (for further details, see Graves, 2000; Peterson et al., 2005). In contrast, citizen science data may lack standardization and might fail to present detailed information about each specimen, such as geographical coordinates, measurements and sex. Moreover, this can be aggravated for species that need to be closely assessed to determine the identification. Finally, although we are demonstrating a high equivalence of citizen-collected data for generating species distribution models, standardized scientific collection is important and should be preferred.

In summary, we have shown the equivalence of SDMs generated with data collected by scientists and citizen scientists for most species analyses. Although we approached the question using only one set of birds present in the Cerrado region, we indicate the potential of using citizen science data for modelling the distribution of species, mainly due to the large set of data collected, which is impracticable for scientists to collect alone. Understanding the distribution patterns of species is important and will be key to mitigate current population declines of wildlife. Conservation measures will be favoured by the union of professional and amateur data, aiming for a better understanding of species distribution and, consequently, biodiversity conservation.

AUTHOR CONTRIBUTIONS

Eduardo Guimarães Santos: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); validation (equal); visualization (equal); writing – original draft (equal); writing – review and editing (equal). Helga Correa Wiederhecker: Conceptualization (equal); investigation (equal); methodology (equal); validation (equal); writing – review and editing (equal). Leonardo Esteves Lopes: Data curation (equal); validation (equal); writing – review and editing (equal). Miguel Ângelo Marini: Conceptualization (equal); investigation (equal); methodology (equal); supervision (equal); writing – review and editing (equal).

ACKNOWLEDGEMENTS

We thank the Brazilian research agency ‘Conselho Nacional de Desenvolvimento Científico e Tecnológico’ (CNPq) and the Brazilian education agency ‘Coordenação de Aperfeiçoamento de Pessoal de Nível Superior’ (CAPES – Finance Code 001) for fellowships.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

Open Research

DATA AVAILABILITY STATEMENT

Data available in article Appendix S1.

Supporting Information

REFERENCES

Aceves-Bueno, E., Adeleye, A.S., Feraud, M., Huang, Y., Tao, M., Yang, Y. et al. (2017) The accuracy of citizen science data: a quantitative review. The Bulletin of the Ecological Society of America, 98, 278–290.
10.1002/bes2.1336
Google Scholar
Allouche, O., Tsoar, A. & Kadmon, R. (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology, 43(6), 1223–1232.
10.1111/j.1365-2664.2006.01214.x
Web of Science® Google Scholar
Amatulli, G., Domisch, S., Tuanmu, M.N., Parmentier, B., Ranipeta, A., Malczyk, J. et al. (2018) A suite of global, cross-scale topographic variables for environmental and biodiversity modeling. Scientific Data, 5, 1–15.
10.1038/sdata.2018.40
PubMed Web of Science® Google Scholar
Anderson, R.P. (2012) Harnessing the world's biodiversity data: promise and peril in ecological niche modeling of species distributions. Annals of the New York Academy of Sciences, 1260(1), 66–80.
10.1111/j.1749-6632.2011.06440.x
PubMed Web of Science® Google Scholar
Andrade, A.F.A., Velazco, S.J.E. & de Marco, P.J. (2020) ENMTML: an R package for a straightforward construction of complex ecological niche models. Environmental Modelling & Software, 125, 104615.
10.1016/j.envsoft.2019.104615
Web of Science® Google Scholar
Auerbach, J., Barthelmess, E.L., Cavalier, D., Cooper, C.B., Fenyk, H., Haklay, M. et al. (2019) The problem with delineating narrow criteria for citizen science. Proceedings of the National Academy of Sciences, 116(31), 15336–15337.
10.1073/pnas.1909278116
CAS PubMed Web of Science® Google Scholar
Bivand, R.S., Pebesma, E. & Gomez-Rubio, V. (2013) Applied spatial data analysis with R, 2nd edition. New York: Springer. Available from: https://asdar-book.org/ [Accessed 10th March 2022].
10.1007/978-1-4614-7618-4
Google Scholar
Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.
10.1023/A:1010933404324
Web of Science® Google Scholar
Butchart, S.H.M., Walpole, M., Collen, B., Strien, A., van Scharlemann, J.P.W., Almond, R.E.A. et al. (2010) Global biodiversity: indicators of recent declines. Science, 328, 1164–1168.
10.1126/science.1187512
CAS PubMed Web of Science® Google Scholar
Chandler, M., See, L., Copas, K., Bonde, A.M.Z., López, B.C., Danielsen, F. et al. (2017) Contribution of citizen science towards international biodiversity monitoring. Biological Conservation, 213, 280–294.
10.1016/j.biocon.2016.09.004
Web of Science® Google Scholar
Cohn, J.P. (2008) Citizen science: can volunteers do real research? Bioscience, 58(3), 192–197.
10.1641/B580303
Web of Science® Google Scholar
De Marco, P. & Nóbrega, C.C. (2018) Evaluating collinearity effects on species distribution models: an approach based on virtual species simulation. PLoS One, 13(9), 1–25.
10.1371/journal.pone.0202403
Web of Science® Google Scholar
Deng, H. (2013) Guided random forest in the RRF package. arXiv:1306.0237.
Google Scholar
Duputié, A., Zimmermann, N.E. & Chuine, I. (2014) Where are the wild things? Why we need better data on species distribution. Global Ecology and Biogeography, 23(4), 457–467.
10.1111/geb.12118
Web of Science® Google Scholar
Encarnação, J., Teodósio, M.A. & Morais, P. (2021) Citizen science and biological invasions: a review. Frontiers in Environmental Science, 8, 1–13.
10.3389/fenvs.2020.602980
Web of Science® Google Scholar
Engler, R., Guisan, A. & Rechsteiner, L. (2004) An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data. Journal of Applied Ecology, 41(2), 263–274.
10.1111/j.0021-8901.2004.00881.x
Web of Science® Google Scholar
Feeley, K.J. & Silman, M.R. (2011) Keep collecting: accurate species distribution modelling requires more collections than previously thought. Diversity and Distributions, 17, 1132–1140.
10.1111/j.1472-4642.2011.00813.x
Web of Science® Google Scholar
Ferreira, D.F., de Aquino, M.M., Heming, N.M., Marini, M.Â., Leite, F.S.F. & Lopes, L.E. (2019) Breeding in the gray-headed tody-flycatcher (Aves: Tyrannidae) with comments on geographical variation in reproductive traits within the genus Todirostrum. Journal of Natural History, 53(9), 595–610.
10.1080/00222933.2019.1599458
Web of Science® Google Scholar
Fielding, A.H. & Bell, J.F. (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation, 24(1), 38–49.
10.1017/S0376892997000088
Web of Science® Google Scholar
Follett, R. & Strezov, V. (2015) An analysis of citizen science based research: usage and publication patterns. PLoS One, 23, 1–14.
Google Scholar
Geldmann, J., Heilmann-Clausen, J., Holm, T.E., Levinsky, I., Markussen, B., Olsen, K. et al. (2016) What determines spatial bias in citizen science? Exploring four recording schemes with different proficiency requirements. Diversity and Distributions, 22(11), 1139–1149.
10.1111/ddi.12477
Web of Science® Google Scholar
Graves, G.R. (2000) Costs and benefits of web access to museum data. Trees, 15(9), 374–375.
CAS Google Scholar
Guisan, A., Thuiller, W. & Zimmermann, N.E. (2017) Habitat suitability and distribution models with applications in R. In habitat suitability and distribution models: with applications in R (ecology, biodiversity and conservation, p. I). Cambridge: Cambridge University Press, p. 462.
Google Scholar
Gwynne, J.A., Ridley, R.S., Tudor, G. & Argel, M. (2010) Aves do Brasil: Pantanal & Cerrado, 1a edition. São Paulo: Horizonte, pp. 1–322.
Google Scholar
Haklay, M.M., Dörler, D., Heigl, F., Manzoni, M., Hecker, S. & Vohland, K. (2021) What is citizen science? The challenges of definition. In: K. Vohland, A. Land-Zandstra, L. Ceccaroni, R. Lemmens, J. Perelló, M. Ponti et al. (Eds.) The science of citizen science. Cham: Springer International Publishing, pp. 13–33.
10.1007/978-3-030-58278-4_2
Google Scholar
Hannibal, M.E. (2016) Citizen scientist: searching for heroes and hope in an age of extinction. New York: The Experiment LLC, p. 423.
Google Scholar
Hedblom, M., Heyman, E., Antonsson, H. & Gunnarsson, B. (2014) Bird song diversity influences young people's appreciation of urban landscapes. Urban Forestry & Urban Greening, 13(3), 469–474.
10.1016/j.ufug.2014.04.002
Web of Science® Google Scholar
Heigl, F., Kieslinger, B., Paul, K.T., Uhlik, J. & Dörler, D. (2019) Toward an international definition of citizen science. Proceedings of the National Academy of Sciences, 116(17), 8089–8092.
10.1073/pnas.1903393116
CAS PubMed Web of Science® Google Scholar
Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, G. & Jarvis, A. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965–1978.
10.1002/joc.1276
PubMed Web of Science® Google Scholar
Holland, P.G. & Steyn, D.G. (1975) Vegetational responses to latitudinal variations in slope angle and aspect. Journal of Biogeography, 2(3), 179.
10.2307/3037989
Google Scholar
Hoyer, M.V., Wellendorf, N., Frydenborg, R., Bartlett, D. & Canfield, D.E. (2012) A comparison between professionally (Florida Department of Environmental Protection) and volunteer (Florida LAKEWATCH) collected trophic state chemistry data in Florida. Lake and Reservoir Management, 28(4), 277–281.
10.1080/07438141.2012.736016
CAS Web of Science® Google Scholar
IBGE. (2022) Instituto Brasileiro de Geografia e Estatística – IBGE. Available from: https://www.ibge.gov.br/ [Accessed 1st January 2023].
Google Scholar
Isbell, F., Balvanera, P., Mori, A.S., He, J.S., Bullock, J.M., Regmi, G.R. et al. (2023) Expert perspectives on global biodiversity loss and its drivers and impacts on people. Frontiers in Ecology and the Environment, 21, 94–103.
10.1002/fee.2536
Web of Science® Google Scholar
Jaureguiberry, P., Titeux, N., Wiemers, M., Bowler, D.E., Coscieme, L., Golden, A.S. et al. (2022) The direct drivers of recent global anthropogenic biodiversity loss. Science Advances, 8(45), eabm9982.
10.1126/sciadv.abm9982
PubMed Web of Science® Google Scholar
Kerstes, N.A.G., Breeschoten, T., Kalkman, V.J. & Schilthuizen, M. (2019) Snail shell colour evolution in urban heat islands detected via citizen science. Communications Biology, 2(1), 1–11.
10.1038/s42003-019-0511-6
PubMed Web of Science® Google Scholar
Klink, C.A. & Machado, R.B. (2005) Conservation of the Brazilian Cerrado. Conservation Biology, 19(3), 707–713.
10.1111/j.1523-1739.2005.00702.x
Web of Science® Google Scholar
Kosmala, M., Wiggins, A., Swanson, A. & Simmons, B. (2016) Assessing data quality in citizen science. Frontiers in Ecology and the Environment, 14, 551–560.
10.1002/fee.1436
Web of Science® Google Scholar
Kuhn, M. (2021) Caret: classification and regression training. R package. Available from: https://CRAN.R-project.org/package=caret [Accessed 10th June 2022].
Google Scholar
La Sorte, F.A., Lepczyk, C.A., Burnett, J.L., Hurlbert, A.H., Tingley, M.W. & Zuckerberg, B. (2018) Opportunities and challenges for big data ornithology. The Condor, 120, 414–426.
10.1650/CONDOR-17-206.1
Web of Science® Google Scholar
La Sorte, F.A. & Somveille, M. (2020) Survey completeness of a global citizen-science database of bird occurrence. Ecography, 43, 34–43.
10.1111/ecog.04632
Web of Science® Google Scholar
Lepczyk, C.A., La Sorte, F.A., Aronson, M.F.J., Goddard, M.A., MacGregor-Fors, I., Nilon, C.H. et al. (2017) Global patterns and drivers of urban bird diversity. Ecology and Conservation of Birds in Urban Environments, 13–33.
10.1007/978-3-319-43314-1_2
Google Scholar
Liaw, A. & Wiener, M. (2002) Classification and regression by randomForest. R News, 2(3), 18–22.
Google Scholar
Lomolino, M.V. (2004) Conservation biogeography. In: M.V. Lomolino & L.R. Heaney (Eds.) Frontiers of biogeography: new directions in the geography of nature. Sunderland, MA: Sinauer Associates, pp. 293–296.
Google Scholar
Lopes, L.E. (2008) The range of the Curl-crested Jay: lessons for evaluating bird endemism in the South American Cerrado. Diversity and Distributions, 14, 561–568.
10.1111/j.1472-4642.2007.00441.x
Web of Science® Google Scholar
MacPhail, V.J. & Colla, S.R. (2020) Power of the people: a review of citizen science programs for conservation. Biological Conservation, 249, 108739.
10.1016/j.biocon.2020.108739
Web of Science® Google Scholar
Marini, M.Â., Barbet-Massin, M., Lopes, L.E. & Jiguet, F. (2009) Predicted climate-driven bird distribution changes and forecasted conservation conflicts in a neotropical savanna. Conservation Biology, 23, 1558–1567.
10.1111/j.1523-1739.2009.01258.x
PubMed Web of Science® Google Scholar
Marini, M.Â., Hall, L., Bates, J., Steinheimer, F.D., McGowan, R., Silveira, L.F. et al. (2020) The five million bird eggs in the world's museum collections are an invaluable and underused resource. The Auk: Ornithological Advances, 137(4), 1–7.
Web of Science® Google Scholar
Melo-Merino, S.M., Reyes-Bonilla, H. & Lira-Noriega, A. (2020) Ecological niche models and species distribution models in marine environments: a literature review and spatial analysis of evidence. Ecological Modelling, 415, 108837.
10.1016/j.ecolmodel.2019.108837
Web of Science® Google Scholar
Myers, N., Mittermeier, R.A., Mittermeier, C.G., Fonseca, G.A.B. & Kent, J. (2000) Biodiversity hotspots for conservation priorities. Nature, 403(6772), 853–858.
10.1038/35002501
CAS PubMed Web of Science® Google Scholar
Navarro, A.B., Magioli, M., Bogoni, J.A., Silveira, L.F., Moreira, M.Z., Alexandrino, E.R. et al. (2021) Isotopic niches of tropical birds reduced by centenary anthropogenic impacts. Oikos, 130(11), 1–13.
10.1111/oik.08386
Web of Science® Google Scholar
Olsen, L.R. (2023) Rearrr: rearranging data. R package version 0.3.3. Available from: https://CRAN.R-project.org/package=rearrr [Accessed 2nd July 2022].
Google Scholar
Ou, J. (2021) colorBlindness: safe color set for color blindness. R package version 0.1.9. Available from: https://CRAN.R-project.org/package=colorBlindness [Accessed 3rd March 2023].
Google Scholar
Peterson, A.T., Cicero, C. & Wieczorek, J. (2005) Free and open access to bird specimen data: why? Auk, 122(3), 987–990.
10.1093/auk/122.3.987
Web of Science® Google Scholar
Peterson, A.T., Papeş, M. & Soberón, J. (2015) Mechanistic and correlative models of ecological niches. European Journal of Ecology, 1(2), 28–38.
10.1515/eje-2015-0014
Google Scholar
Peterson, A.T., Soberón, J., Pearson, R.G., Anderson, R.P., Martínez-Meyer, E., Nakamura, M. et al. (2011) Ecological niches and geographic distributions. New Jersey: Princeton University Press, p. 316.
Google Scholar
Poisson, A.C., McCullough, I.M., Cheruvelil, K.S., Elliott, K.C., Latimore, J.Á. & Soranno, P.A. (2020) Quantifying the contribution of citizen science to broad-scale ecological databases. Frontiers in Ecology and the Environment, 18(1), 19–26.
10.1002/fee.2128
Web of Science® Google Scholar
Proença, V., Martin, L.J., Pereira, H.M., Fernandez, M., McRae, L., Belnap, J. et al. (2017) Global biodiversity monitoring: from data sources to essential biodiversity variables. Biological Conservation, 213, 256–263.
10.1016/j.biocon.2016.07.014
Web of Science® Google Scholar
R Core Team. (2020) R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Google Scholar
Ridgely, R.S. & Tudor, G. (2009) Field guide to the songbirds of South America: the passerines. Austin: University of Texas Press.
Google Scholar
Schrimpf, M.B., Des Brisay, P.G., Johnston, A., Smith, A.C., Sánchez-Jasso, J., Robinson, B.G. et al. (2021) Reduced human activity during COVID-19 alters avian land use across North America. Science Advances, 7(39), 1–12.
10.1126/sciadv.abf5073
Web of Science® Google Scholar
Schubert, S.C., Manica, L.T. & Guaraldo, A.C. (2019) Revealing the potential of a huge citizen-science platform to study bird migration. Emu, 119(4), 364–373.
10.1080/01584197.2019.1609340
Google Scholar
Soroye, P., Ahmed, N. & Kerr, J.T. (2018) Opportunistic citizen science data transform understanding of species distributions, phenology, and diversity gradients for global change research. Global Change Biology, 24(11), 5281–5291.
10.1111/gcb.14358
PubMed Web of Science® Google Scholar
Sousa, N.O.M., Lopes, L.E., Costa, L.M., Motta-Junior, J.C., Silva, G.H.F., Dornas, T. et al. (2021) Adopting habitat-use to infer movement potential and sensitivity to human disturbance of birds in a Neotropical Savannah. Biological Conservation, 254, 108921.
10.1016/j.biocon.2020.108921
Web of Science® Google Scholar
Steven, R., Barnes, M., Garnett, S.T., Garrard, G., Connor, J.O., Oliver, J.L. et al. (2019) Aligning citizen science with best practice: threatened species conservation in Australia. Conservation Science and Practice., 1(10), e100.
10.1111/csp2.100
Google Scholar
Suzuki-Ohno, Y., Yokoyama, J., Nakashizuka, T. & Kawata, M. (2017) Utilization of photographs taken by citizens for estimating bumblebee distributions. Scientific Reports, 7, 11215.
10.1038/s41598-017-10581-x
PubMed Web of Science® Google Scholar
Theobald, E.J., Ettinger, A.K., Burgess, H.K., DeBey, L.B., Schmidt, N.R., Froehlich, H.E. et al. (2015) Global change and local solutions: tapping the unrealized potential of citizen science for biodiversity research. Biological Conservation, 181, 236–244.
10.1016/j.biocon.2014.10.021
Web of Science® Google Scholar
Turella, I.Z., Silva, T.L., Rumpel, L. & Marini, M.Â. (2022) Breeding biology of swallow-tailed hummingbird (Eupetomena macroura) based on citizen science data. Ornithology Research, 30, 181–189.
10.1007/s43388-022-00098-x
Google Scholar
Van der Velde, T., Milton, D.A., Lawson, T.J., Wilcox, C., Lansdell, M., Davis, G. et al. (2017) Comparison of marine debris data collected by researchers and citizen scientists: is citizen science data worth the effort? Biological Conservation, 208, 127–138.
10.1016/j.biocon.2016.05.025
Web of Science® Google Scholar
Veloz, S.D. (2009) Spatially autocorrelated sampling falsely inflates measures of accuracy for presence-only niche models. Journal of Biogeography, 36(12), 2290–2299.
10.1111/j.1365-2699.2009.02174.x
Web of Science® Google Scholar
Warren, D.L. & Dinnage, R. (2021) ENMTools: analysis of niche evolution using niche and distribution models. Version 1.0.5.
Google Scholar
Warren, D.L., Glor, R.E. & Turelli, M. (2008) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62(11), 2868–2883.
10.1111/j.1558-5646.2008.00482.x
PubMed Web of Science® Google Scholar
Warren, D.L., Matzke, N.J., Cardillo, M., Baumgartner, J.B., Beaumont, L.J., Turelli, M. et al. (2021) ENMTools 1.0: an R package for comparative ecological biogeography. Ecography, 44(4), 504–511.
10.1111/ecog.05485
Web of Science® Google Scholar
Whittaker, R.J., Araújo, M.B., Jepson, P., Ladle, R.J., Watson, J.E.M. & Willis, K.J. (2005) Conservation biogeography: assessment and prospect. Diversity and Distributions, 11(1), 3–23.
10.1111/j.1366-9516.2005.00143.x
Web of Science® Google Scholar
Zulian, V., Miller, D.A.W. & Ferraz, G. (2021) Integrating citizen-science and planned-survey data improves species distribution estimates. Diversity and Distributions, 27, 2498–2509.
10.1111/ddi.13416
Web of Science® Google Scholar

Volume48, Issue8

Special Issue:Australian Freshwater Turtle

December 2023

Pages 2171-2184

Equivalence of citizen science and scientific data for modelling species distribution of birds from a tropical savanna

Abstract

Resumo

INTRODUCTION