Equivalence of citizen science and scientific data for modelling species distribution of birds from a tropical savanna
Abstract
enThe Wallacean deficit continues to be a challenge to species distribution modelling. Although some authors have suggested that data collected by citizen scientists can be relevant for a better understanding of biodiversity, to our knowledge, no work has quantitatively tested the equivalence between scientific and citizen science data. Here, we investigate the hypothesis that data collected by citizen scientists can be equivalent to data collected by professional scientists when generating species spatial distribution models. For 42 bird species in the Cerrado region we generated and compared species distribution models based on three data sources: (1) scientific data, (2) citizen science data and (3) sample size corrected citizen science data. To test our hypothesis, we compared the equivalence of these datasets. We rejected the hypothesis of equivalence for about one-third (38%) of the evaluated species, revealing that, for most of the species considered, the models generated were equivalent irrespective of the data set used. The distances between centroids of the models that were equivalent were on average smaller than the distances between non-equivalent models. Also, the direction of change in the models showed no pattern, with no trend towards more populated regions. Our results show that the use of data collected by citizen scientists can be an ally in filling the Wallacean deficit gap. In fact, the lack of use of this wide range of data collected by citizen scientists seems to be an unjustified caution. We indicate the potential of using citizen science data for modelling the distribution of species, mainly due to the large set of data collected, which is impracticable for scientists alone to collect. Conservation measures will be favoured by the union of professional and amateur data, aiming for a better understanding of species distribution and, consequently, biodiversity conservation.
Resumo
ptO déficit wallaceano continua a ser um desafio para a modelagem da distribuição das espécies. Embora alguns autores tenham sugerido que os dados recolhidos por cientistas cidadãos podem ser relevantes para uma melhor compreensão da biodiversidade, pelon osso conhecimento, nenhum trabalho testou quantitativamente a equivalência entre dados científicos e de ciência cidadã. Aqui, investigamos a hipótese de que os dados coletados por cientistas cidadãos podem ser equivalentes aos dados coletados por cientistas profissionais na geração de modelos de distribuição espacial de espécies. Para 42 espécies de aves na região do Cerrado, geramos e comparamos modelos de distribuição de espécies baseados em três fontes de dados: 1) dados científicos, 2) dados da ciência cidadã, e 3) dados da ciência cidadã corrigidos pelo tamanho da amostra. Para testar a nossa hipótese, comparamos a equivalência destes conjuntos de dados. Rejeitamos a hipótese de equivalência para cerca de 1/3 (38%) das espécies avaliadas, revelando que, para a maioria das espécies consideradas, os modelos gerados eram equivalentes independentemente do conjunto de dados utilizado. As distâncias entre os centroides dos modelos equivalentes foram, em média, menores do que as distâncias entre os modelos não equivalentes. Ainda, a direção da mudança nos modelos não mostrou nenhum padrão, sem tendência para regiões mais populosas. Os nossos resultados mostram que a utilização de dados recolhidos por cientistas cidadãos pode ser um aliado no preenchimento da lacuna do déficit wallaceano. De fato, não utilizar esta vasta gama de dados recolhidos por cientistas cidadãos parece ser uma precaução injustificada. Indicamos o potencial da utilização de dados da ciência cidadã para a modelação da distribuição das espécies, principalmente devido ao grande conjunto de dados recolhidos, cujo recolhimento é impraticável apenas para os cientistas. As medidas de conservação serão favorecidas pela união de dados profissionais e amadores, visando uma melhor compreensão da distribuição das espécies e, consequentemente, a conservação da biodiversidade.
INTRODUCTION
One of the most dramatic aspects of the biodiversity crisis is the mismatch between biodiversity destruction and the speed at which species information is produced. Biodiversity loss is happening much faster than species information is produced. There has been an improvement in species distribution modelling in the last 20 years, playing an important role in understanding species spatial distribution patterns. In fact, understanding the distribution patterns of species through time can help researchers and managers intervene in the event of a distribution shift or decline in biodiversity, with direct implications for species conservation (Melo-Merino et al., 2020; Peterson et al., 2011, 2015). Despite this improvement, the Wallacean deficit, that is, lack of knowledge about where species occur geographically (Lomolino, 2004; Proença et al., 2017; Whittaker et al., 2005), continues to be a challenge to species distribution modelling, which in turn limits knowledge of species' geographic ranges. This is because the higher the number of records, the greater the probability of generating adequate predictive models (Feeley & Silman, 2011). Thus, methods that increase our accumulation of biodiversity information, such as data from citizen science, can be an ally of conservationists.
Citizen science is a way of engaging volunteer citizens (mostly non-specialists) with scientific production, including recording data of potential scientific use, such as species occurrence (Auerbach et al., 2019; Heigl et al., 2019). Citizen science began ~1900 (Cohn, 2008) and has advanced worldwide. It is regarded as a promising area, given that it can increase our knowledge of biodiversity (Follett & Strezov, 2015; Hannibal, 2016; Kerstes et al., 2019; MacPhail & Colla, 2020). Albeit not the only objective of citizen science (see Cohn, 2008; Haklay et al., 2021), involving volunteers in the collection of data often increases the volume of data produced. In fact, one of the limitations that has plagued scientists using citizen science data has been the spatial bias associated with human population centres, as most citizen science data is collected in the vicinity of these urban centres (Geldmann et al., 2016). Thus, it is important to understand the positives and limitations related to these large data sets that are being collected around the world.
Citizen science has become an ally for scientists interested in understanding the natural world. In fact, citizen collection has been used to supplement scientific data in many studies. Birds have received the majority of citizen science attention. For example, researchers are improving the understanding of several fields of knowledge such as: community structuring (La Sorte et al., 2018; La Sorte & Somveille, 2020), breeding (Ferreira et al., 2019; Turella et al., 2022), migration (Schubert et al., 2019), conservation (Steven et al., 2019), patterns of occurrence and abundance (Lepczyk et al., 2017) and, more recently, the impact of COVID-19 (Schrimpf et al., 2021). However, although some authors have highlighted that data collected by citizen scientists can be relevant for a better understanding of biodiversity (Chandler et al., 2017; La Sorte & Somveille, 2020; Poisson et al., 2020; Steven et al., 2019), including demonstrating its importance (e.g. Aceves-Bueno et al., 2017; Kosmala et al., 2016), to our knowledge, no work has quantitatively tested the equivalence of sets of data collected by citizen scientists and data collected by scientists.
Here, we used data on 42 Neotropical bird species to investigate the hypothesis that citizen scientists' (CIT) data are equivalent to professional scientists' (SCI) data when generating species geographic distribution models. To compare the equivalence of these data sets, we explore two hypotheses. First, we compared the equivalence of models using an identity test, controlling for the discrepancy in the number of records (H1). After that, to verify the spatial distribution of SCI and CIT models, we compared their overlaps and the centroid shifts of each generated model (H2). Our hypothesis was that models generated only with CIT data would be geographically biased to locations with higher human concentration, due to data collection biases. We then pinpointed the cases in which data from CIT can help fill in the gaps of species occurrence. Finally, we investigated the influence of biological and ecological characteristics on SCI and CIT data equivalence and discussed the implications of our results.
METHODS
Species evaluated
We used 42 species that occur in the Cerrado, a savanna-like biome, which is a biodiversity hotspot that has been intensely threatened by landscape changes due mainly to pasture and agriculture expansion (Klink & Machado, 2005; Myers et al., 2000). The 42 species differ in: extent of occurrence (min = 17 800 km2, max = 71 600 00 km2, www.datazone.birdlife.org); body size (min = 9.5 cm, max = 34.5 cm – Gwynne et al., 2010; Ridgely & Tudor, 2009); conservation status (Least Concern, Near Threatened, Vulnerable, Endangered – www.iucnredlist.org); population trend (decreasing, increasing, or stable – www.datazone.birdlife.org); and habitat-use (grasslands – GR, savannas – SV, and forests – FO) and sensitivity to disturbance (low, medium and high) (Sousa et al., 2021). The 42 species chosen for our analysis have been extensively studied by scientists over the last 20 years in our region (e.g. Lopes, 2008; Marini et al., 2009; Sousa et al., 2021), what has guaranteed reliable scientific data compiled from different databases (for more details see the topic Range delimitation in Appendix S1).
Species occurrence data
We collated CIT occurrence points from three data sources: iNaturalist (n = 1287, research-grade data from www.inaturalist.org), eBird (n = 53 423, www.ebird.org) and WikiAves (n = 47 816, www.wikiaves.com.br). We collated SCI occurrence points (professionally collected) from GBIF (n = 1652, www.gbif.org), ‘Portal da Biodiversidade’ (n = 7781, www.portaldabiodiversidade.icmbio.gov.br), and from scientific articles published until 2021 (n = 5172). We proceeded with the literature review using both the scientific name (including synonyms and previous combinations), and the common name (Portuguese and English) for each species. Also, our review included specimens deposited in 11 Brazilian institutions, six North American institutions, and seven European institutions (to access the complete scientific data review methodology, see Range delimitation in Appendix S1). We collated a total of 117 075 occurrence points (SCI = 14 605 and CIT = 102 470). We obtained, for each species, three distribution model ensembles (SCI, CITcorr and CITuncorr; Table S1).
Species distribution modelling – SDM
We used 19 bioclimatic variables from WorldClim (Hijmans et al., 2005; www.worldclim.org), and also altitude, slope (Amatulli et al., 2018), and slope aspect (Holland & Steyn, 1975). Spatial layers were adjusted to 5 km resolution. We condensed environmental information with a Principal Component Analysis (PCA) (De Marco & Nóbrega, 2018), using 95% of the total variation as a cut-off (Table S2). For each set of points (CIT and SCI) we filtered and removed occurrences that were within a distance of 10 km from each other (2× cell size) (Andrade et al., 2020; Veloz, 2009). Since CIT records are typically more abundant than SCI records (see Table S1), we randomly selected the same number of points from the SCI data as in the CIT data to create the sample-size corrected subset CITcorr. This controlled for the difference in sampling size between the two datasets. We created two comparison scenarios: (1) Total occurrences, using all records of SCI and CIT (SCI × CITuncorr); and (2) Equal occurrences, using a reduced set of CIT records (SCI × CITcorr).
Next, we generated species distribution models using the three data sets (SCI, CITcorr and CITuncorr). We used five widely used algorithms: Maximum Entropy (MXD); Support Vector Machine (SVM); Random Forest (RF); Generalized Linear Models (GLM); and Bayesian Gaussian Process (GAU). We used the bootstrap technique (10 replicates) (Fielding & Bell, 1997), separating 70% of occurrences for model training and 30% for testing. In the cases of models that work with pseudo-absence, we kept a 1:1 ratio with the presence records in each scenario, allocating pseudo-absences in geographic areas with less environmental suitability, predicted by a Bioclim model (Engler et al., 2004). We used True Skill Statistic (TSS) as a performance metric for each model generated (Allouche et al., 2006). To generate the final models for each data set, we assembled the best models, with TSS greater than the overall mean value, hereafter referred to as distribution models. All procedures were performed using the R programming language ENMTML package (Andrade et al., 2020).
Comparison between models
We compared the three distribution models for each species using the identity test (also called equivalence). We performed the pairwise comparison, SCI × CITuncorr and SCI × CITcorr contrasting the model's values for each cell. We considered two similarity metrics, D and I, which range from 0 to 1 (0 = no equivalence, 1 = niche models are identical) (Warren et al., 2008). Afterwards, we performed the hypothesis test based on null model distributions for D and I values derived from 1000 randomized null models for each compared dataset. We tested the similarity, comparing the observed value with the null distribution, with a cut-off limit of p < 0.05 (Warren et al., 2008). Comparisons were performed using the ENMTools package (Warren et al., 2021; Warren & Dinnage, 2021), in the R program (R Core Team, 2020).
We used an equivalence test to classify species' models as either equivalent or not-equivalent. To identify the most relevant variables for distinguishing equivalent from non-equivalent models, we used the selected variables of species in a guided regularized random forest analysis (GRRF, Breiman, 2001), using the ‘RRF’ (Deng, 2013) and ‘randomForest’ packages (Liaw & Wiener, 2002). We used the Mean Decrease Accuracy (MDA) value for evaluation and compared our results using model accuracy, using the confusionMatrix function of the ‘caret’ package (Kuhn, 2021).
To test the spatial differences in the models, using the scientific model (SCI) as a base, we characterized every modelled distribution (CITuncorr and CITcorr) by its range and position (latitude and longitude of the range centroid). Thus, we calculated: (1) the models' range increase for each species, calculating the percentage of areas indicated without intersection with the SCI model. That is, modelled areas that were distant from the model generated with the scientific data; (2) the range shift, calculating the distance and direction of the centroids of the models, using the ‘sp’ (Bivand et al., 2013) and ‘rearrr’ (Olsen, 2023) packages. Our objective was to present a spatial view of the predicted change, testing the hypothesis of change directed to more populated areas, in southeastern Brazil (IBGE, 2022).
RESULTS
Comparison between scientific and citizen science distribution models
All models presented adequate results, with TSS average values greater than 0.5 (Figure 1, Table S3, Figure S1). Mean TSS values obtained by comparing SCI and CIT models were similar between full (CITuncorr) and reduced (CITcorr) citizen science models (SCI × CITcorr, D = 0.655, I = 0.910; SCI × CITuncorr, D = 0.659, I = 0.910, Figure 2, Figure S1).


Comparison of the models revealed that most of the species in our investigation obtained equivalent distribution models. We rejected the hypothesis of equivalence between CIT and SCI distribution models for about one-third (38%) of the evaluated species (p < 0.05), revealing that, for most of the species considered, the models generated (62%) were equivalent irrespective of the data set used. This output was consistent regardless of the number of citizen science records used (CITcorr or CITuncorr). According to Random Forest results, the extent of occurrence, body size and habitat use were the most important variables distinguishing species with equivalent models and species with non-equivalent models (Figure 3).

The models indicated as equivalent showed a lower average percentage difference in their spatial distribution (Equivalent SCI × CITuncorr: 20%, Equivalent SCI × CITcorr: 21%) than the non-equivalent models (Non-equivalent SCI × CITuncorr: 32%, Non-equivalent SCI × CITcorr: 28%). Also, the distances between centroids of the models that were equivalent were on average smaller (Equivalent SCI × CITuncorr: 99 km, Equivalent SCI × CITcorr: 105 km) than the distances between non-equivalent models (Non-equivalent SCI × CITuncorr: 214 km, Non-equivalent SCI × CITcorr: 233 km), thus demonstrating that the equivalent models are spatially closer (Table 1). The direction of change in the models showed no pattern, with no trend towards more populated regions, as we had predicted (Figure 4).
Species | Area SCI (km2) | CITcorr | CITuncorr | ||||
---|---|---|---|---|---|---|---|
Range increase (%) | Range shift (km) | Direction of range change | Range increase (%) | Range shift (km) | Direction of range change | ||
(°to north) | (°to north) | ||||||
Nothura minor | 1 043 182 | 9 | 114 | 31 | 30 | 149 | 25 |
Taoniscus nanus | 1 117 099 | 31 | 130 | 210 | 22 | 120 | 288 |
Uropelia campestris | 3 871 768 | 15 | 135 | 238 | 15 | 130 | 246 |
Augastes scutatus | 144 334 | 58 | 282 | 260 | 85 | 153 | 161* |
Heliactin bilophus | 4 130 344 | 7 | 126 | 263 | 7 | 120 | 269 |
Campylopterus calcirupicola | 115 352 | 23 | 103 | 7 | 29 | 75 | 11 |
Campylopterus diamantinensis | 23 564 | 20 | 80 | 164 | 20 | 98 | 171 |
Celeus obrieni | 947 739 | 32 | 22 | 115 | 30 | 20 | 98 |
Alipiopsitta xanthops | 3 553 351 | 4 | 82 | 256 | 6 | 83 | 267 |
Pyrrhura pfrimeri | 16 027 | 18 | 49 | 271 | 23 | 50 | 290 |
Herpsilochmus longirostris | 2 844 509 | 26 | 186 | 245 | 28 | 183 | 243* |
Thamnophilus torquatus | 3 295 494 | 12 | 43 | 25 | 16 | 76 | 23* |
Cercomacra ferdinandi | 350 698 | 45 | 72 | 272 | 23 | 60 | 227 |
Melanopareia torquata | 3 709 474 | 29 | 192 | 289 | 29 | 224 | 281* |
Scytalopus novacapitalis | 44 646 | 46 | 449 | 155 | 36 | 132 | 177* |
Geositta poeciloptera | 1 477 521 | 55 | 280 | 160 | 56 | 430 | 149* |
Syndactyla dimidiata | 1 308 886 | 24 | 82 | 89 | 30 | 80 | 84 |
Clibanornis rectirostris | 1 842 529 | 27 | 26 | 330 | 26 | 83 | 59 |
Asthenes luizae | 39 441 | 25 | 55 | 274 | 17 | 50 | 253 |
Synallaxis simoni | 3 328 488 | 19 | 105 | 266 | 11 | 100 | 225 |
Antilophia galeata | 2 633 265 | 5 | 101 | 254 | 5 | 70 | 258 |
Phylloscartes roquettei | 228 768 | 26 | 112 | 135 | 25 | 100 | 120 |
Euscarthmus rufomarginatus | 4 921 760 | 13 | 158 | 257 | 7 | 150 | 248 |
Phyllomyias reiseri | 1 618 435 | 27 | 230 | 293 | 24 | 233 | 289 |
Culicivora caudacuta | 3 751 684 | 8 | 230 | 233 | 12 | 165 | 246* |
Polystictus superciliaris | 532 913 | 25 | 99 | 274 | 55 | 150 | 196* |
Guyramemua affine | 3 768 105 | 22 | 482 | 200 | 21 | 457 | 194* |
Alectrurus tricolor | 2 505 625 | 11 | 267 | 271 | 12 | 223 | 278* |
Knipolegus franciscanus | 398 409 | 27 | 112 | 339 | 30 | 110 | 266 |
Cyanocorax cristatellus | 3 336 090 | 2 | 119 | 241 | 4 | 91 | 273 |
Myiothlypis leucophrys | 1 381 891 | 35 | 131 | 300 | 28 | 86 | 213 |
Charitospiza eucosma | 3 374 720 | 7 | 284 | 291 | 8 | 308 | 287* |
Coryphaspiza melanotis | 4 253 439 | 18 | 310 | 335 | 17 | 314 | 331* |
Embernagra longicauda | 330 734 | 37 | 32 | 214 | 15 | 26 | 223 |
Porphyrospiza caerulescens | 3 835 194 | 5 | 168 | 232 | 8 | 163 | 235 |
Saltatricula atricollis | 4 250 500 | 34 | 103 | 281 | 37 | 126 | 293* |
Conothraupis mesoleuca | 162 115 | 43 | 177 | 232 | 43 | 112 | 204* |
Cypsnagra hirundinacea | 3 688 177 | 30 | 254 | 358 | 27 | 223 | 1* |
Microspingus cinereus | 1 155 530 | 10 | 153 | 60 | 45 | 150 | 53 |
Neothraupis fasciata | 3 776 245 | 24 | 96 | 274 | 24 | 150 | 262* |
Schistochlamys ruficapillus | 3 259 912 | 17 | 123 | 105 | 7 | 120 | 229 |
Paroaria baeri | 654 666 | 50 | 124 | 349 | 33 | 50 | 149 |
- * Represents our hypothesis test, with p-values <0.05 (representing non-equivalence).

DISCUSSION
Our results show that the use of data collected by citizen scientists can facilitate filling the Wallacean deficit gap. In our case, data from citizen scientists represented more than 88% of all records found for the studied species (Figure S1). Furthermore, almost two thirds of the evaluated species presented model equivalence when using SCI and CIT data sets, and the models generated with the different data sets showed a high geographical overlap, as expected, especially in the models that were equivalent. Furthermore, we did not observe bias of the CIT models related to data collection in more populated regions. Our findings demonstrate the high value of CIT data, which cannot be overlooked in light of the current biodiversity crisis and the need to implement effective conservation planning (Butchart et al., 2010; Isbell et al., 2023; Jaureguiberry et al., 2022).
The selection of reliable data sets is challenging and can influence the quality of the distribution model generated (Duputié et al., 2014; Guisan et al., 2017). Thus, caution has been required in the use of citizen science data due to possible identification biases and inaccuracies in occurrence data (Anderson, 2012). Nevertheless, we did not detect a pattern relating to species ecology and whether SCI and CIT models would be equivalent (Figure 3b). A few factors might contribute to that, including specialized amateur birdwatchers' groups that can make remarkable records even of species that are rare and difficult to detect. However, we did not observe bias due to possible collection in more populated regions (Figure 4). So, further investigation on equivalence determinants might reveal which species cannot be satisfactorily recorded by citizen scientists and thus guide species prioritization. Despite that, we argue that data from citizen scientists are an important ally and can be used in distribution modelling. This was supported by testing the species distribution model equivalence, the high geographic overlap of models, and the closeness of centroids between models. Thus, data from citizen scientists are not only useful to model the distribution of most species but also can contribute to other relevant aspects (La Sorte et al., 2018; La Sorte & Somveille, 2020; Zulian et al., 2021) such as biological invasion (Encarnação et al., 2021), phenology and diversity gradients (Soroye et al., 2018; Suzuki-Ohno et al., 2017), conservation (MacPhail & Colla, 2020) and evaluating the impacts of global changes on present and future distribution patterns.
A well-planned citizen science has enormous potential for collecting much-needed data more quickly, to improve species distribution models, and can reach a level of quality that matches the data collected by experts (Hoyer et al., 2012; van der Velde et al., 2017). Our results, however, show that high quality (demonstrated by the equivalence of CIT models) can be achieved without elaborate planning or scientific supervision. Our hypothesis of data bias due to concentrated collection in more populated regions was not corroborated, which confirms our argument for the usefulness of data collected by citizen scientists for species distribution modelling. In fact, one could expect high reliability whenever the requirement is solely the correct identification and location, which is the case of correlative species distribution models (e.g. Hedblom et al., 2014). In addition, some citizen science projects can persist for long periods, as they are not dependent on scarce funding, which limits sampling time (Poisson et al., 2020; Steven et al., 2019; Theobald et al., 2015). At present, it is essential to recognize that most bird species are under some level of human-mediated threat; therefore, scientific effort alone will not be able to gather species data at a desirable speed.
The mismatch between biodiversity information demand and production is even more acute in tropical regions, which harbour a significant number of bird species and lack proper historical efforts to register and share species occurrence data. Moreover, the studied species occur within the Cerrado biome, the world's most biodiverse savanna. In 35 years, more than half of the 2 million km2 biome has been converted to agriculture (Klink & Machado, 2005). These figures reinforce the Cerrado as one of the most threatened biogeographic provinces in the world, urging the use of all available information to support conservation decision-making, including citizen science data. Thus, we emphasize that, regardless of the intrinsic characteristics of the evaluated species (size, area, habitat use, threat level, etc.), the non-use of citizen science data for SDMs is unjustifiably cautious.
Although the two methods of data collection are able to generate statistically equivalent and highly overlapping distribution models, we strongly disagree that scientific data are directly replaceable. It is essential to recognize that scientific and citizen science data have strengths and limitations, and we can derive most benefit by exploring their complementarities (La Sorte et al., 2018; La Sorte & Somveille, 2020; Zulian et al., 2021). Scientific data, such as biological collection vouchers, harbour unique historical information that allows historical changes to be mapped (e.g. Marini et al., 2020; Navarro et al., 2021). In addition, scientific efforts can reach isolated areas within reserves or less populated areas (least visited by citizens), directing data production to redress information gaps. Nevertheless, we point out that part of the species distribution data present in museums worldwide is not available for public consultation, hindering spatial analyses from supporting conservation (Marini et al., 2020; Peterson et al., 2005). Indeed, there is a debate regarding maintenance cost and the resources directed to curators and museums (for further details, see Graves, 2000; Peterson et al., 2005). In contrast, citizen science data may lack standardization and might fail to present detailed information about each specimen, such as geographical coordinates, measurements and sex. Moreover, this can be aggravated for species that need to be closely assessed to determine the identification. Finally, although we are demonstrating a high equivalence of citizen-collected data for generating species distribution models, standardized scientific collection is important and should be preferred.
In summary, we have shown the equivalence of SDMs generated with data collected by scientists and citizen scientists for most species analyses. Although we approached the question using only one set of birds present in the Cerrado region, we indicate the potential of using citizen science data for modelling the distribution of species, mainly due to the large set of data collected, which is impracticable for scientists to collect alone. Understanding the distribution patterns of species is important and will be key to mitigate current population declines of wildlife. Conservation measures will be favoured by the union of professional and amateur data, aiming for a better understanding of species distribution and, consequently, biodiversity conservation.
AUTHOR CONTRIBUTIONS
Eduardo Guimarães Santos: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); validation (equal); visualization (equal); writing – original draft (equal); writing – review and editing (equal). Helga Correa Wiederhecker: Conceptualization (equal); investigation (equal); methodology (equal); validation (equal); writing – review and editing (equal). Leonardo Esteves Lopes: Data curation (equal); validation (equal); writing – review and editing (equal). Miguel Ângelo Marini: Conceptualization (equal); investigation (equal); methodology (equal); supervision (equal); writing – review and editing (equal).
ACKNOWLEDGEMENTS
We thank the Brazilian research agency ‘Conselho Nacional de Desenvolvimento Científico e Tecnológico’ (CNPq) and the Brazilian education agency ‘Coordenação de Aperfeiçoamento de Pessoal de Nível Superior’ (CAPES – Finance Code 001) for fellowships.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
Open Research
DATA AVAILABILITY STATEMENT
Data available in article Appendix S1.