International Journal of Intelligent Systems

Volume 2025, Issue 1 3145236

Review Article

Open Access

Continual Learning Inspired by Brain Functionality: A Comprehensive Survey

Muhammad Azeem Aslam,

Corresponding Author

Muhammad Azeem Aslam

[email protected]

orcid.org/0000-0002-4026-1294

School of Information Engineering , Xi’an Eurasia University , Xi’an , Shaanxi, 710071 , China , eurasia.edu

Changchun Institute of Optics , Fine Mechanics and Physics , Chinese Academy of Sciences , Changchun , China , cas.cn

Search for more papers by this author

Muhammad Hamza,

Muhammad Hamza

School of Economics and Management , Xidian University , Xi’an , Shaanxi, China , xidian.edu.cn

Search for more papers by this author

Zhu Shuangtong,

Zhu Shuangtong

Changchun Institute of Optics , Fine Mechanics and Physics , Chinese Academy of Sciences , Changchun , China , cas.cn

Search for more papers by this author

Hu Hongfei,

Hu Hongfei

School of Information Engineering , Xi’an Eurasia University , Xi’an , Shaanxi, 710071 , China , eurasia.edu

Search for more papers by this author

Xu Wei,

Xu Wei

Changchun Institute of Optics , Fine Mechanics and Physics , Chinese Academy of Sciences , Changchun , China , cas.cn

Search for more papers by this author

Muhammad Irfan,

Muhammad Irfan

School of Software , Northwestern Polytechnical University , Xi’an , China , nwpu.edu.cn

Search for more papers by this author

Zheng Jiangbin,

Zheng Jiangbin

School of Software , Northwestern Polytechnical University , Xi’an , China , nwpu.edu.cn

Search for more papers by this author

Saba Aslam,

Saba Aslam

School of Humanities and Social Sciences , Xian Jiaotong University , Xi’an , China , xjtu.edu.cn

Search for more papers by this author

Muhammad Azeem Aslam,

Corresponding Author

Muhammad Azeem Aslam

[email protected]

orcid.org/0000-0002-4026-1294

School of Information Engineering , Xi’an Eurasia University , Xi’an , Shaanxi, 710071 , China , eurasia.edu

Changchun Institute of Optics , Fine Mechanics and Physics , Chinese Academy of Sciences , Changchun , China , cas.cn

Search for more papers by this author

Muhammad Hamza,

Muhammad Hamza

School of Economics and Management , Xidian University , Xi’an , Shaanxi, China , xidian.edu.cn

Search for more papers by this author

Zhu Shuangtong,

Zhu Shuangtong

Changchun Institute of Optics , Fine Mechanics and Physics , Chinese Academy of Sciences , Changchun , China , cas.cn

Search for more papers by this author

Hu Hongfei,

Hu Hongfei

School of Information Engineering , Xi’an Eurasia University , Xi’an , Shaanxi, 710071 , China , eurasia.edu

Search for more papers by this author

Xu Wei,

Xu Wei

Changchun Institute of Optics , Fine Mechanics and Physics , Chinese Academy of Sciences , Changchun , China , cas.cn

Search for more papers by this author

Muhammad Irfan,

Muhammad Irfan

School of Software , Northwestern Polytechnical University , Xi’an , China , nwpu.edu.cn

Search for more papers by this author

Zheng Jiangbin,

Zheng Jiangbin

School of Software , Northwestern Polytechnical University , Xi’an , China , nwpu.edu.cn

Search for more papers by this author

Saba Aslam,

Saba Aslam

School of Humanities and Social Sciences , Xian Jiaotong University , Xi’an , China , xjtu.edu.cn

Search for more papers by this author

First published: 26 July 2025

https://doi.org/10.1155/int/3145236

Academic Editor: Mohamadreza (Mohammad) Khosravi

Share a link

Email
Wechat
Bluesky

Abstract

Neural network–based models have shown tremendous achievements in various fields. However, standard AI-based systems suffer from catastrophic forgetting when undertaking sequential learning of multiple tasks in dynamic environments. Continual learning has emerged as a promising approach to address catastrophic forgetting. It enables AI systems to learn, transfer, augment, fine-tune, and reuse knowledge for future tasks. The techniques used to achieve continual learning are inspired by the learning processes of the human brain. In this study, we present a comprehensive review of research and recent developments in continual learning, highlighting key contributions and challenges. We discuss essential functions of the biological brain that are pivotal for achieving continual learning and map these functions to the recent machine-learning methods to aid understanding. Additionally, we offer a critical review of five recent types of continual learning methods inspired by the biological brain. We also provide empirical results, analysis, challenges, and future directions. We hope that this study will benefit both general readers and the research community by offering a complete picture of the latest developments in this field.

1. Introduction

In recent years, neural network–based models have made impressive progress in various application areas. Current AI-based systems perform well with isolated, well-organized, and stationary data. However, real-world scenarios are dynamic and involve multiple tasks [1]. For example, systems such as autonomous cars, drones, and robots may encounter dynamic and versatile situations. For AI-based systems to perform effectively in real-world scenarios, they must be able to learn continuously, much like the human brain [2]. To achieve this capability, AI systems need to acquire new knowledge and enhance or augment previously learned knowledge without forgetting [3]. Currently, most learning systems are unable to learn continuously and may underperform when exposed to dynamic or incremental data environments. The primary challenge facing current AI-based systems is catastrophic forgetting of past knowledge when performing new tasks [4].

Neuroscientists have discovered various aspects of how the human brain functions. The human brain is capable of continuous learning across a lifespan, processing a constant stream of information that becomes progressively available over time [5]. Learned knowledge is retained, augmented, fine-tuned, and applied to perform new tasks [6]. The brain retains specific memories of episodic-like events and generalizes learned experiences to solve future tasks [7]. The well-known complementary learning system (CLS) model describes brain learning as the extraction of the statistical structure of perceived events, while retaining specific memories of episodic-like events to generalize to novel situations [8]. The hippocampal system is responsible for short-term adaptation, facilitating the rapid learning of new knowledge, which is then transferred and integrated into the neocortical system for long-term storage [9]. As the brain accumulates knowledge over time, it becomes increasingly capable of handling complex tasks [10]. Although humans may occasionally forget old information, a complete loss of prior knowledge is rarely observed [11, 12].

Continual learning (CL) is an emerging field that aims to train machines in a way similar to how the human brain learns [13], such that knowledge learned from past tasks is retained, accumulated, fine-tuned, and subsequently used to solve future tasks [14]. CL is considered an inherent capability of the biological brain [15]. Key characteristics of CL include forward and backward knowledge transfer, overcoming catastrophic forgetting, adaptability, scalability, robustness, resource efficiency, task identification, and learning task similarity [16]. Another important goal of CL is the ability to use learned knowledge to enhance performance on other tasks [17].

In recent years, various studies have adopted biologically inspired CL mechanisms in AI-based systems [18, 19]. These methods can be divided into five categories: (1) regularization-based, (2) architecture-based, (3) replay-based, (4) meta-learning-based, and (5) hybrid methods. Regularization-based approaches iteratively adjust model parameters for each new task, optimizing them to find a new local minimum [20]. Architecture-based approaches expand networks by adding or, in some cases, pruning a subnetwork, branch, or node to learn a new task [21]. Replay-based approaches store learned knowledge as selected old input samples or data encoded and stored, which is then replayed during the learning of new tasks [22]. Meta-learning is learning to learn. Meta-learning-based approaches adjust network hyperparameters to adjust learning for the new tasks and mitigate biases introduced by manual settings [23]. Hybrid methods combine the above-mentioned approaches [24]. In recent years, hybrid methods have aimed to offer state-of-the-art solutions, particularly in class incremental learning (CIL) scenarios [25].

There are a few review studies on CL. Mai et al. [26] and Masana et al. [27] presented empirical studies comparing several methods for the image classification problem. Aljundi et al. [12] offered an overview of state-of-the-art CL methods, a framework to assess the stability–plasticity trade-off, and an empirical study of eleven CL methods using three benchmark datasets, examining the impact of different parameters. Febrinanto et al. [28] presented a survey of recent progress in graph lifelong learning, classifying existing methods, potential applications, and research challenges. Kudithipudi et al. [? ] identified key capabilities of CL, biological mechanisms to address catastrophic forgetting, and biologically inspired models implemented in artificial systems to achieve CL. Mundt et al. [29] described open set learning and the challenges associated with it, particularly regarding unknown examples beyond the observed set. The authors also proposed a unified perspective integrating CL, active learning, and open set recognition within deep neural networks, along with an empirical study on three datasets.

It can be noted that the aforementioned studies cover a limited scope, and the latest research developments and trends are not addressed. To the best of our knowledge, no extensive survey covers all aspects, such as brain functioning, the link between brain functions and machine learning (ML), CL methods, benchmarks, experimental configurations, results, challenges, and future directions. This survey provides researchers and general readers with an overview of the state-of-the-art developments in CL, particularly for supervised learning scenarios. Compared to previous studies, our work is more recent and comprehensive, and it includes the following:

•
A detailed overview of the brain’s learning mechanisms and key capabilities that play a crucial role in learning.
•
A mapping of brain capabilities to current CL-based ML mechanisms.
•
A comprehensive and critical review of the latest CL methodologies.
•
The presentation of results achieved by recent CL-based methods in various application areas.
•
Discussion of benchmark datasets, metrics, and experimental configurations.
•
A conclusion that addresses key challenges and future directions.

Overall, this study encompasses the characteristics of CL, brain functions that inspire CL-based methods, the mapping between brain functions, and CL-based methods, CL-based methods themselves, benchmark datasets, metrics, and experimental configurations, as well as results and empirical analysis, challenges, and future directions. The details of these characteristics are provided in the following sections.

2. Characteristics of CL

In this section, we discuss the key characteristics of CL. These characteristics are ideally desired in any CL-enabled system designed for a real-world dynamic environment, as shown in Figure 1.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Desired characteristics of the continual learning–based system.

2.1. Knowledge Transfer and Adaptation

The real-world dynamic environment consists of multiple problems or tasks with varying situations and conditions. The system should be capable of learning knowledge, transferring it, and reusing it to perform new tasks. In addition, the system should be able to adapt to any new environment.

2.2. Control Catastrophic Forgetting

Currently, most AI systems suffer from catastrophic forgetting. A CL-based system should be capable of learning new knowledge while also retaining previously learned information. The system must be stable enough not to forget this information while being plastic enough to learn new tasks. This is known as the stability-plasticity dilemma.

2.3. Task Identification

In a real-world scenario, the system should be capable of performing multiple tasks, which requires it to identify new tasks. In addition, the system should be able to handle online learning and continuous data in the context of streaming [30].

2.4. Learning Task Similarity

The system should be able to learn the similarities among different tasks. This ability will help enhance knowledge transfer, both for forward transfer (FWT) and backward transfer (BWT). In addition, the capability to divide complex tasks into simpler tasks can also facilitate the reuse of knowledge among subtasks [31].

2.5. Noise Tolerance

Generally, AI systems are trained on controlled, well-organized, and clean datasets. In contrast, the real-world environment may contain noise, disorganized, and uncontrolled data. The system should be capable of tackling such noise, dynamic environments, and varying conditions. Various studies propose robust models; however, these models have not been studied in relation to CL.

2.6. Resource Efficiency

To mimic the CL functions of the brain, the system should manage constrained resources. New resources for every new task are undesirable. Adaptability and robustness are also key characteristics for any system to achieve CL. In real-world scenarios, the system may encounter noise, out-of-distribution data inputs, and unseen, dynamic environments. As shown in Table 1, CL is achieved in ML systems by implementing methods such as neuromodulation [32], multisensory integration [33], and episodic replay [34].

Table 1. Continual learning methods based on architecture.

Network type	Study	Implementation method	Application
Dynamic network	[119]	CNN	Image resolution
	[120]	Model with multiple blocks to learn and store actions	Human action recognition
	[121]	ResNet, plastic blocks for old knowledge retention and learning new	Classification
	[122]	Task-specific network block new scene, NAS for network unit search	Matching driving scenes
	[123]	Multistage multitasking network. RL + LSTM.	Text classification
	[124]	Bayesian neural networks. Task modeling and representation. Feature space through Gaussian posterior distributions.	Classification
	[57]	Shared backbone network. Prediction head training	Image quality assessment
	[58]	Decoder networks, mixed label uncertainty strategy for robustness.	Classification
	[125]	Dynamic branches fixed and trainable. Knowledge distillation of spatial and channel dimensions	Semantic segmentation
	[126]	Main-branch evolution and sub-branch param modification	Classification
	[127]	Stacked generalization principle. Deep neural networks.	Multiple
	[128]	Memory networks	Classification
	[96]	Reinforced continual learning, Bayesian optimized continual learning	Multiple
	[129]	CNN, growing network for new class	Classification
	[130]	Global and local parameters	Classification
	[131]	Cross-domain, backbone with extra params	Cross-domain classification
	[118]	Linear filtering, network pruning	Classification
	[132]	SOINN. Network modification, noisy data mapping	Unsupervised learning
	[133]	Linear combinations, network pruning, progressive learning	Classification, speech recognition
	[134]	Manifold alignment for CL. Learning without premonition.	Intrusion detection
	[134]	Manifold alignment for CL. Learning without premonition	Intrusion detection

Static network	[135]	Out-of-distribution (OOD) detection and task masking	Classification
	[136]	Class-specific convolution and modulation parameters	Label-to-image translation
	[137]	Incremental sparse convolution layers with uncertainty loss variable	3D segmentation

Note: The first column shows the network type used to achieve continual learning. Implementation methodology is shown in the third column.

3. Brain Functions That Support and Inspire CL

The neurological processes of the human brain involve various phenomena, such as neural signaling, neuronal connections, and the actions of neurotransmitters, which together form the basis of brain activity and function [35]. Humans have evolved several methods for CL. Some studies suggest that a CL-based method may handle learning in dynamic environments [36]. While not all ML methods for CL are explicitly designed to mimic these biological mechanisms, many exhibit functional parallels that offer insights into robust learning systems. Below, we analyze brain functions that either directly inspire CL techniques (e.g., hippocampal replay) or share convergent principles (e.g., sparsity for efficiency) with engineered solutions. Figure 2 and Table 2 map these relationships, distinguishing between explicit bioinspired methods and implicit analogies. The matrix depicts the connections between key goals of CL and biological mechanisms in the brain. We describe the brain functions that support the goal of CL, the relationship between different CL characteristics and brain functions, and recent CL-based ML implementations. The circle with a tick mark indicates that the brain function contributes to the CL characteristic. The line below shows the part of the brain that is responsible for the said characteristic.

Table 2. Bridging brain functions with continual learning models.

Continual learning characteristics	Neurogenesis	Memory	Brain	Functions	Multisensors	Context learning
Continual learning characteristics	Neurogenesis	Memory	Modularity	Metaplasticity	Multisensors	Context learning
Knowledge transfer and reuse	[21, 57–59]	—	[32, 60–62]	—	—	[33, 63–65]
Control catastrophic forget	[66, 67, 67, 68]	[69–71]	[61, 72–74]	[16, 50, 75, 76]	—	[32, 77]
Task management	[59, 78–80]	[69, 81–83]	[57, 61, 84]	[16, 50, 85]	[86, 87]	[33, 64, 78]
Adaptability	—	[22, 34, 81, 88]	[32, 89–91]	[16, 74]	[33, 92]	[33, 64, 89]
Resource management	[66, 93, 94]	[8, 69, 71, 95]	[21, 42, 61, 96]	—	—	—

Note: The table represents the studies which cover the brain biological mechanism (shown as top row) and continual learning characteristic.

3.1. Neurogenesis

Neurogenesis is the process of generating new neurons in the central nervous system [37]. It occurs in the dentate gyrus of the hippocampal formation and in the subventricular zone of the lateral ventricles (LVs) [38, 39]. This process is expedited when the brain undergoes a variety of experiences. Additionally, changes in the size and configuration of the brain have been observed [15]. Structural connections remain almost the same throughout life [15]. During the study of the mouse brain, it was observed how newly created neuroblasts travel from the subventricular zone of the LV through the rostral migratory stream (RMS) into the olfactory bulb (OB), where mature interneuron populations are produced [40]. Although this process remains active throughout life, it occurs more actively in the early years. Some studies related to neuroimaging suggest that the infant brain is more plastic and continues to grow and develop. Biological neurons in the brain utilize constructive algorithms in the form of plasticity, involving the enlargement and pruning of synapses during neurogenesis. Several studies suggest that synaptic plasticity and neurogenesis are important mechanisms to alleviate catastrophic forgetting in the biological brain [41].

In the context of ML, the explicit inspiration from neurogenesis is achieved by various architecture-based methods, where new neurons, blocks, or branches are added to the existing network to learn new tasks [42]. In addition, the implicit analogy is that both brains and ML systems face trade-offs between stability (retaining old knowledge) and plasticity (integrating new information), leading to similar solutions like dynamic capacity expansion.

Biological neurogenesis is much more dynamic and adaptable than what current algorithms can replicate. In the brain, it is carefully controlled by factors such as environmental enrichment, stress, and chemicals like serotonin and BDNF [43].

3.2. Memory System

In the biological brain, replay is defined as the repetition of activities in some format during sleep that have already occurred while awake. This activity occurs in the hippocampus alone or collectively in the hippocampus and neocortex [44]. Similarly, studies have observed the presence of multiple memories in the brain, such as hippocampal–cortical memory [45]. It has been observed that the hippocampus is used for fast learning, whereas slow learning occurs in the cortex. This explains how the brain forms declarative memories. However, studies suggest that there may be other memory models in addition to these. The CLS theory presents the idea of two memory blocks in the brain [46]: the first is the hippocampus for fast learning and the second is the neocortex for slow and detailed learning [44].

According to the modified description of CLS, the hippocampal model continuously trains the neocortical model by replaying nonoverlapping representations of previously learned knowledge [44, 47]. Neuromodulators like acetylcholine help switch between encoding and retrieval modes. Oscillatory coupling, such as theta–gamma phase coordination, plays a key role during memory replay. Additionally, behavioral context, such as stress hormones, enhances the recall of important or emotionally significant memories. Several studies, such as [8], have proposed methods to control memory size while learning multiple tasks.

In artificial neural networks, the explicit inspiration is implemented in the form of dual-network architectures (e.g. [48]), generative replay [49] which replicate hippocampal–neocortical interactions, and replay of samples [44]. In addition, as an implicit analogy, the convergent principle is adopted through replay-based methods which address forgetting by “rehearsing” past data, mirroring the brain’s need to counteract interference with limited resources [50]. By taking inspiration from the time-varying plasticity of biological synapses, various studies have tried to replicate metaplasticity by combining memory, architecture, and regularization-based methods [24, 49, 51].

In the following two paragraphs, we will discuss two important topics related to the memory system, which are active forgetting and memory reconsolidation in brain. Active forgetting in the brain is a controlled practice in the brain. During sleep, the brain loses less important neuron connections to free memory for new learning. This process is called synaptic downscaling [52]. In addition, less important memory is identified to be eliminated through dopamine mapping [53]. Moreover, new neuron cell addition replaces the old memory cells to incorporate new learning [54].

Memory reconsolidation in the brain is the process of recalling a previously stored memory which are considered to be temporarily unstable or malleable. This process depends on the synaptic plasticity mechanism such that the memory retrieval temporarily weakens it, allowing to update for new learning [55, 56].

3.3. Neuromodulation

Studies suggest the modular nature of the human brain. It is observed that neuromodulation plays an important role in learning and adoption in different environments and contexts, knowledge remembering, augmentation and refinement [61], robustness to noise, and adjustments even during uncertainties [97]. Sparsity is closely associated with the neuromodulation. Sparse coding happens generally in the brain [98]. It is reported that the hippocampal dentate gyrus utilizes very sparse depictions for pattern discernment and knowledge retention [99]. Sparsity also offers robustness, memory capability, storing efficacy, and efficiency [100]. In this context, the explicit inspiration is the gating mechanisms in ML (e.g., [62]) which mimic neuromodulatory attention.

As shown in Table 1, brain functions such as neuromodulation, neurogenesis, and context-dependent learning are closely related to knowledge transfer. Neuromodulation is extensively inspired by various studies, especially by using architecture-based models [62].

3.4. Metaplasticity

Metaplasticity is related to the updating of synapses which depends upon the current state, history, and neural activities. It also indicates the direction, duration, and degree of synaptic plasticity in the future [16, 101]. Studies suggest that metaplasticity phenomena play various roles in learning by the biological brain such as controlling the threshold and activation for plasticity, synchronization, and consolidation of the learned knowledge [102–104]. Synaptic plasticity is the mechanism by which the brain stores the information. The capability of different synapses can be updated through neural activity [105]. Subcortical acetylcholine happens in the substantia innominata, and in the medial septum (M), dopamine happens in the ventral tegmental area and the substantia nigra compacta, noradrenaline happens in the locus coeruleus, and serotonin happens in the dorsal and medial raphe nuclei [106]. Learning new knowledge can affect the previous learning which may cause forgetting. The availability of limited resources may cause rapid forgetting. This may also occur in biological synaptic weights. Metaplasticity rules that govern the learning are the source of inspiration for the design of ANNs to cater to forgetting and solving the stability–plasticity phenomena with available resources. Based on explicit inspiration, regularization-based methods penalize changes to important weights, akin to synaptic tagging [107].

In biological metaplasticity, synaptic changes are shaped by dynamic factors like neuromodulator activity patterns such as dopamine bursts aligning with theta brain waves, and behavioral states such as sleep, which can reset plasticity through specific processes [108]. In contrast, artificial methods like elastic weight consolidation (EWC) simplify this complexity by stabilizing model weights using basic importance measures [109]. Unlike these methods, biological synapses adapt flexibly, incorporating multiple influences like sensory inputs, energy constraints, and genetic regulation to fine-tune plasticity.

3.5. Multisensor Input Learning

Biological organisms take input from multiple sensors such as vision, tactile, and auditory signals. These input signals may be noisy, distributed, nonlinear, and out of sync [110]. These signals from multiple sensors are integrated by the superior colliculus as a final input to enable final coordinated motor functions among different organisms [110]. The initiation of movements through a common motor map offers efficiency in handling these signals and responses [111]. The knowledge of the process involved in handling, filtering, and processing multisensory input from different organisms provides the lead for the task-agnostic method which is a desired property to achieve CL capability in AI systems [111].

The OB is the cortical area that receives sensory signals from the other parts of the brain and the nose. Neurons that connect directly to brain regions involved in memory, context, and emotion are mainly driven by internal states, behavioral expectations, and the context of learned odors [112]. Low-level sensitized tissues and response at the subcortical level facilitate the interface with the environment. The high-level cortical brain executes only necessary plans and selects and tunes them. This method offers better resource utilization and better learning [113]. These inputs facilitate the fast learning of tasks, associating rewards with stimuli, coordinating appropriate motor actions, and helping modulate responses [114]. As an implicit analogy, task-agnostic ML methods [86] resemble biological multisensory processing but often lack modularity seen in cortical hierarchies.

3.6. Context-Dependent Learning

Studies suggest that the context plays a pivotal role in information understanding, processing, and making inferences and strategies in the biological brain by offering in the context information [64]. Context information plays an important role in the formation of responses in the olfactory system. Similarly, the neurons related to emotion, memory, context, and information also consider context information including state and behavior in decision-making. This additive information related to the context offers a complete picture and flexibility in learning, decisions, and actions. Context modulation and gating are also involved in selective attention [115]. For example, gain modulation helps to encode target trajectories in insect vision. Studies suggest the presence of some biases that are present since birth. These inductive biases play a role in complete learning from the environment [116]. In the brain, context-dependent learning works by neuromodulatory states such that arousal controlled by norepinephrine adjusts the importance of sensory inputs dynamically, cross-regional oscillatory coupling which regulates theta–gamma brainwave interactions to link related information, and developmental metaplasticity such that the brain fine-tunes its sensitivity to context using genetic and molecular changes [113]. In comparison, ML methods like task-specific gating can handle context changes in a fixed way but lack the brain’s ability to continuously adapt and scale context sensitivity. Context-dependent learning is achieved in ML through meta-learning and task-specific gating [117]. Different studies utilize network pruning method to control the growing dynamic network [118].

4. CL Methods

CL-based methods can be categorized into five types: regularization-based, architecture-based, replay-based, meta-learning-based, and hybrid methods, as shown in Figure 3. These methods are considered biologically inspired having roots in the brain functions.

4.1. Architecture-Based Methods

In this approach, architectural properties and parameters of the model are updated in response to the new task to accommodate the new information as shown in Figure 4(c). These models are inspired by modularity. Modularity is a capability of biological learning systems that defines the functional specialty of the brain subsystems. Although the working principles of neurogenesis and structural plasticity are still topics of active research, the available findings suggest that modular neural networks with plasticity offer an effective mechanism for knowledge retention and reuse even in a dynamic environment. Architecture-based methods can further be divided into dynamic networks and static networks. In both cases, the existing model is preserved and reused while adding new learning capacity. These approaches are envisaged as the knowledge retention and reuse mechanism, which is not possible to acquire by keeping the capacity of the model permanent, especially in the case of multiple dynamic tasks. Table 1 enlists the recent CL-based methods that adopt architecture-based methods for learning. It also highlights the type of network and neural network implementation methodology.

4.1.1. Dynamic Network

The dynamic network may grow for every new task by adding a new branches [42], submodel/block [91], and new layers or adding new nodes [73]. Some studies suggest that pruning of nodes, layers, or parameters may also be performed in this learning method [138, 139]. As the newly added parameters, blocks, and branches do not interfere with each other, hence, the performance on past tasks is not disturbed by learning the new tasks [96]. Various schemes are proposed for dynamically growing the networks, such as [57], which presented an idea of a shared backbone model and a prediction head for each new task with simultaneous training of all heads. Overall performance is calculated by a weighted sum of the results of all heads. Ref. [21] presented the idea of a network that grows as per new feature space and also devised a procedure to compact the enlarged network by a self-activator mechanism. Ref. [58] added a new decoder to the main network for every new class with mixed label uncertainty and an average probability of perturbed samples is calculated to compute the results. Ref. [59] devised a network comprised of two branches such that one branch is dedicated to learned knowledge of old tasks and the second branch is dedicated to the learning of the new tasks. A feature fusion block is utilized to synchronize two branches. It can be observed that such methods utilize task-specific oracle and generally follow multihead configuration [80]. In the case of dynamic networks, hyperparameters are manually assigned. However, a few studies utilized the neural architecture search (NAS) mechanism to calculate the number of cells and blocks of the network [122]. NAS is also utilized to select appropriate parameters already learned through the learning of the old tasks for re-usability [140].

In dynamic networks, a few studies utilized self-organizing incremental neural networks (SOINNs). Such networks are capable of mapping the probability distribution of input data to network structure through competitive learning to grow network structure [132]. However, to prepare the network for new tasks, filters/layers are deleted at user-defined fixed patterns which results in abrupt alteration in the network structure [132]. In addition, the formation of a network structure is also a major challenge [141]. Recent studies try to address these issues by optimizing the network updating process for unsupervised learning [132, 141]. SOINN was extended for supervised learning by using virtual nodes that carry label information [142].

In the dynamic network-based methods, the network may keep growing due to the addition of new parameters, blocks, or branches. In recent years, some methods have been adopted to control and compact the growing network. In the growing network, a large number of old neurons dominate the network which limits the learning capacity of the network [140]. Network pruning or quantization is adopted to control the network size and to preserve the learning [118]. The second technique is weight rectification to adapt the model to new tasks by limiting the network size [143]. This technique proved effective even for zero-shot scenarios and data without task identification. The third technique to prune and compact the model to learn multiple tasks is based on the variational Bayesian approximation [144]. It utilizes sparsity-inducing priors to compact the hidden units. It can also be implemented through sparse storage and constructive algorithms. The fourth technique is based on the utilization of the power law to grow sparse networks through preferential attachment, topology driver method, and node self-activation [145]. As an additional benefit, these methods also alleviate negative FWT and overfitting by limiting the parameter transfer [133].

4.1.2. Static Network

In static network–based methods, generally, a fixed block is allocated for each task with the option to mask out blocks dedicated to old tasks during the learning of the new tasks [138]. To achieve this goal, some studies devised a gating mechanism. In this line, [146] used gating which governs the selection of the filters for the training of new tasks. Ref. [147] used gating which calculates the dependence between previously stored knowledge and new knowledge.

The main challenge in the static network is the balance of learning between old and new tasks [140]. Various schemes have been proposed to keep the knowledge by modifying the parameters of the static networks, such as [148], which combined spectral attention semantics and coefficients locations along with the pruning to store the knowledge. Ref. [135] utilized out-of-distribution detection and task masking for the learning of parameters to alleviate catastrophic forgetting. Ref. [136] presented an idea of a semantic aware convolution filter and normalization technique to train the fixed parameters of the network, whereas [137] proposed terms for the learning of fixed parameters both at the network and layer level such that residual propagation method for sparse convolution is defined at layer level while an uncertainty variable is defined at network level.

4.1.3. Limitations of Architecture-Based Methods

Although the architecture-based method can retain the learned knowledge, however, such methods may reach limits of learning capacity and size Therefore, these methods require an appropriate mechanism to manage the increasing number of parameters [149], and a mechanism to decide which parameters should be used at the test time [124]. In addition, most of the architecture-based methods need a task-controlling mechanism at the test time [149]. It is also observed that as the number of tasks increases, the performance on old tasks gradually decreases [150]. Similarly, the dissimilarity between data of old tasks and new tasks may also affect the performance as the model may suffer from gradient conflict problems [59]. Moreover, it may take a large amount of time to learn distributions to form the tree structure in the case of large datasets [151].

4.2. Regularization-Based Methods

Different studies suggest that synaptic plasticity takes care of the synchronization between previously learned and new knowledge [13]. Inspired by this mechanism of the brain, regularization-based methods are devised [13]. This methodology is different from architecture-based and replay-based methods in such a way that it does not generally require increasing memory or networks, rather it utilizes a regularization factor in the loss term during the training of new tasks [12] as shown in Figure 4(a).

Table 3 enlists the recent CL-based methods that adopt regularization-based methods for learning. It also highlights the type of gradient additional terms and neural network implementation methodology.

Table 3. Continual learning studies based on the regularization method.

Regularization	Study	Implementation method	Application
Priori focused	[152]	Incremental hashing loss function. CNN	Image retrieval
	[153]	RL. Policy relaxation for new tasks. Weighting for instances of episodic replay.	RL
	[154]	Regularization at the functional space.	Classification
	[155]	Random theory and Bayes’ rule. Randomized neural networks. A single pass-over data.	Classification
	[20]	Class-correlation loss function. Low computation for training	Classification

Data focused	[156]	Regularization of the feature space. For generative models	Classification
	[157]	Intertask synaptic mapping (ISYANA) to consider the shared feature space between the tasks. Parameter importance matrix	Classification
	[158]	The regularizing term is modified based on the difference in probability distribution at the target layer based on the Cramer–Wold distance	Classification
	[80]	Remember and adjust model behavior. Two loss functions.	Classification
	[159]	Gradient descent for online learning. Dynamic gradient descent settings	NLP
	[160]	Feature map weighting. Restricts modification of critical features	Classification

Param focus	[161]	Adaptive uncertainty-based regularization. Three uncertainty factors used to regularize weights. A task-specific residual adaptation block for the network	NLP
	[162]	Federated learning class-aware gradient loss and a semantically related loss. Protect privacy by gradient-based communication.	Classification
	[163]	Parameter significance-based weight updation	Image deraining
	[164]	For embedding networks. The computed semantic drift of features.	Classification
	[165]	Quadratic gradient based on Hessian approximation. Regularize parameters of BN for continual learning.	Classification

Note: The first column shows the gradient type. Implementation methodology is shown in the third column.

4.2.1. Priori Focused

Prior-focused methods emphasize regularizing the gradient stick to new tasks with an emphasis on dealing with the catastrophic forgetting of previously learned knowledge. In these methods, the model trained on old tasks becomes a prior for the learning on the new task using the regularization factor. The general strategy is to regularize the weights of the model to keep more relevant to old tasks [156].

Some studies calculate the weightage of parameters of the model through the distributions and update the task-specific parameters as per the weightage and gates mechanism [13]. However, updating parameters through controlled weighted penalties may not be successful in optimizing the loss function in case of a large number of tasks [96]. Various schemes are proposed for updating the weights of the network, such as [166], which proposed an idea of training network in the null space for past problems; however, strong null space prognosis may affect the learning on the current problems. Ref. [167] utilized mode connectivity in the loss term to train the network for multiple tasks such that null space projection is used for past problems and SGD for current problems; however, it needs to store past samples similar to rehearsal-based methods. Ref. [168] presented the idea of gradient decomposition, named as shared gradient and task-specific gradients. Task-specific gradients are prone to new tasks and consistent with shared gradients. This line [168] further presented the idea of separate gradients for each layer to evade the dissimilarity and hegemony of gradients for different layers. It is further observed that convergence of shared gradient is good for retaining the knowledge of old tasks; however, the convergence of task-specific gradient may disturb the learning of other tasks [168]. Similar methods such as drop-out and training regimes also emphasize the modification of the loss functions through a weight selection mechanism [13]. These methods are considered suitable for multitasking learning problems [13].

4.2.2. Data Focused

To solve the issue of coherence of embedding between old tasks and new tasks [169], a few studies utilize knowledge distillation to retain the learning [96]. Knowledge distillation offers a compression method to transfer knowledge from one model to another model [170]. This is done by taking a copy of the shared weights and layers of the model from the previous problems. In addition, a supplementary loss function is used for knowledge distillation. In data-focused paradigm, various schemes are proposed for updating the weights of the network, such as [171] presented the weight aligning to regularize the biased weights of the FC layer. FC layer is selected based on the assumption that weights of the FC layer are very sensitive and biased in CIL, particularly in case of class imbalance data. Generally, CL-based methods use standard gradient descent with fixed steps per repetition. Such settings are important for cross-task learning. However, keeping these steps predefined may undermine learning and generalization across multiple tasks/problems. Recent studies suggest gradient descent settings to be dynamic at run time, called continual gradient descent [159].

4.2.3. Param Focused

Generally, three types of approaches are adopted with respect to the incorporation of regularization terms in the network: first, to introduce the regularization term in the parameter space. Most of the studies fall in this category; the second is to introduce the regularization term in the functional space [154] by regulating the models’ predictions to stay aligned with the old tasks through task-specific functions; third, utilization of regularizing terms in the shared feature space [156].

4.2.4. Limitations of Regularization-Based Methods

Regularization-based methods offer very less or zero forgetting; however, such methods are less prone to learn new tasks as learned weights are inclined to old tasks [128]. In such scenarios, the model may suffer from a gradient conflict problem where gradients of multiple task objectives are not aligned so that adopting the average gradient value can be disadvantageous in achieving better results [59]. Second, such methods are likely to fail if the model is presented with many tasks or the sequence of the presented tasks is very diverse due to the dependability on the correct estimation of the loss function. Third, such methods do not consider intertask correlation to exploit shared features. In such cases, even the parameter importance matrix (if used) may explode [157]. In this regard, various recent studies present the loss function as the cross-correlation among different tasks and classes [20]. Fourth, in the case of a few-shot learning scenarios, regularization-based methods may fail due to the requirement of a high learning rate due to the availability of few input samples for the new task [129]. Five, these methods may not perform well for lifetime learning augmentation. Six, a few empirical studies suggest that regularization-based methods offer comparatively low accuracy as compared to other methods and offer a poor proxy for keeping the network output [194].

4.3. Replay-Based Methods

Memory is the core subsystem of the human brain that retains learning over a lifetime. This retention of learning offers the baseline for knowledge reuse and augmentation. This mechanism offers inspiration for the memory-based mechanism for knowledge retention and reuses in ML-based methods to achieve CL [195–198] as shown in Figure 4(b).

Table 4 enlists the recent CL-based methods which adopt regularization-based method for the learning. It also highlights the type of the gradient additional terms and neural network implementation methodology.

Table 4. Continual learning studies based on the replay method.

Replay type	Study	Implementation method	Application
Rehearsal	[172]	Memory augmented. Sparse knowledge replay. GRU	Trajectory prediction
	[173]	Cortex, hippocampus mapping, semantic building for memory management	Anomaly detection
	[174]	Instance/class-level correlations in knowledge replay	Classification
	[175]	Dual memory for samplers and statistics of past classes	Classification
	[176]	Contrastive learning for feature extraction. Self-distillation to preserve knowledge	Classification
	[177]	Reservoir and class balance methods were utilized. 3D geometry data are saved for future tasks	3D localization
	[178]	Continual learning for noisy data. Self-replay to tackle forgetting. Self-centered filter to clean data	Classification
	[179]	Casual inference to identify and remove causal effects. Incremental momentum exclusion method	Classification
	[180]	Intertask attention strategy to enhance knowledge. Dual-classifier structure	Classification

Pseudo rehearsal	[181]	Dynamically growing dual memory-based network with memory replay	3D object recognition
	[182]	Task-related knowledge dictionary, complementary-aware latent space based on nonlinear relation	Sentiment analysis
	[183]	Two-level parameterized input samples	Classification
	[184]	Knowledge extraction through prototype matching. Feature sparsification to optimize the memory. Contrastive learning	Segmentation
	[185]	Distinguished feature embedding for objects. Few-shot object detection and instance segmentation.	Segmentation and detection.
	[186]	Cross-domain knowledge transfer. Two graph structures to preserve learned knowledge	Person identification
	[187]	Multilevel pooling for knowledge and feature relation extraction and storage. Entropy-based pseudolabeling.	Segmentation
	[188]	Word embedding for semantic knowledge. Mechanism for visual-semantic information arrangement	Classification
	[189]	Dreaming memory to protect privacy. Knowledge distillation, contrastive learning	Person identification

Generative	[190]	Generative memory. Memory-based data distribution, temporary, and long-term	Classification
	[191]	Dynamic generative memory knowledge retention through parameter-level attention mechanism	Classification
	[192]	Clustering of features. Synthesized, saved features combined for new tasks.	Classification
	[193]	Semisupervised learning classifier. Conditional data generation, pseudolabel estimation for enhanced learning.	Classification

Note: The first column shows the replay type. Implementation methodology is shown in the third column.

4.3.1. Rehearsal

Rehearsal or replay-based technique requires to store input subset from previously learned tasks [196]. These stored samples are replayed to recall the learning for solving future problems []. A similar approach is episodic memory methods which not only use the previously stored samplers for training but also for making inferences [].

Due to data diversity and storage limitations, selecting an input subgroup effectively and efficiently for replay is a challenging problem [199]. Various schemes are proposed to select, preserve, and retrieve the samples effectively and efficiently, such as [200] proposed a scheme to save supplementary low-fidelity samplers; Ref. [201] selected samples based on importance and class affiliation; Ref. [202] utilized weighted k-nearest neighbor rule to select and store memory cells; Ref. [199] utilized combination of multiple functions such as variance in class and loss function for the selection of appropriate samples; Ref. [203] proposed replay strategy for task-free boundaries; Ref. [204] proposed a procedure to update the memory efficiently and applying knowledge distillation to cover CIL; Ref. [205] used linear mode connections of network to store the knowledge; Ref. [22] tried to mimic the declarative memory in the form of instance memory which is used to save the samples and the task memory which saves the semantic information; and Ref. [88] proposed a methodology to preserve the inter instance spatial relation by using knowledge invariant and spread out properties.

4.3.2. Pseudo-Rehearsal

Pseudo-rehearsal technique is presented to solve the scaling issues related to rehearsal-based methods for sample selection [206]. Random inputs based on some distributions which are supposed to estimate the samplers from previous tasks are used as input to the model [207]. This technique is proven effective for small-sized networks. However, for large-sized models and huge datasets, such random-based inputs may not represent the complete input feature space.

Various schemes are proposed to convert the input space to the distributions and vice versa, such as [17] utilized code fragments–based knowledge extraction and transfer scheme in LCS. Ref. [208] utilized Bayesian-based statistical representation of latent knowledge space for clustering. Ref. [181] presented dynamically growing dual memory-based network with the concept of memory replay for neural activation. Ref. [209] saved the mean and standard deviation of extracted features and then regenerated the features based on Gaussian distribution for replay. Ref. [210] saved pixels’ affiliation information using self-attention maps obtained from the last layer of multihead transformer encoder and utilized a class-specific region pooling. Ref. [83] randomly selected episodes to generalize the features and a mechanism to generate self-promoted prototype. Ref. [211] presented an idea of preparing and reserving embedding space to represent the input classes.

4.3.3. Generative

In generative learning methods, instead of storing input samples to give as input to the model to learn future tasks, input data are generated using different statistical distribution and generative ML models. These distributions are derived based on the input samples of the previous tasks [190]. Various schemes are proposed for generating the input samples from different distributions, such as [212] proposed schemes for knowledge extraction and transfer by combining pseudo-rehearsal and generative memory methods; Ref. [202] saved the input samples in the form of statistical distributions and then generated the replay samples with the help of these distributions; Ref. [213] proposed an idea of generative negative replay; and Ref. [82] utilized an adaptive feature generation mechanism by exploiting diverse knowledge of noisy unlabeled data. However, such approaches have high computational complexity due to the high training time of the generative models, particularly for large datasets [213].

4.3.4. Limitations of Replay-Based Methods

In rehearsal-based methods, the data are presented sequentially, and the solution is prone to remain in the low-loss area for each task. This phenomenon causes overfitting which hampers generalizing capability of the model [173]. Second, as the number of tasks increases, the forgetting issue becomes more prominent due to the memory and computational constraints and inevitable overfitting of the model to a particular feature space [96]. Third, in the case of generative learning, the model may take a vast amount of time to learn appropriate distributions of large datasets [151]. Fourth, such methods generally suffer from inadequate generalization ability and are more fragile to noise [214].

Regarding pseudo-rehearsal, the approach may suffer from scaling and overfitting in the case of complex tasks [192]. Second, it requires an effective and efficient mechanism for encoding, retrieving, and reusing the learned knowledge for better utilization for future tasks [169]. Third, if encoding scheme is not updated with the arrival of the new data, it may learn the new tasks well [215]. Four, for each new task, embedding space is updated to understand new labels; however, the model may confuse in learning new labels [216].

4.4. Meta-Learning–Based Methods

Above-mentioned methods may suffer from inductive biases introduced due to the manual setting of parameters. Such biases may cause the suboptimal performance due to differences in objective and expected output [117]. Meta-learning offers better solutions in complex and diverse environments when complicated setting of parameters is required.

Meta-learning is learning to learn. It is inspired by the capability of the human brain that finds new solutions after small learning and experience. By utilizing the learning-to-learn mechanism, this technique tries to learn the dynamic settings as per the working of the brain. Meta-learning is the learning at two levels, i.e., outer loop and inner loop. Outer loop controls the working of inner loops in dynamic scenarios through meta-updates of meta-objectives. The inner loop manages the task-specific fine-tuning and specific task [217]. Generally, meta-learning methods are implemented through support sets and query sets. Support set is utilized for fast learning. The query set is utilized to learn the adaptation [218].

Meta-learning methods can be categorized as model-based, optimization-based, and parameter-based methods [219]. Reference [23] presented a meta-learning methodology to tackle deterministic problems by CL using neural networks. The method utilized a corset as memory and the degree of relationship between global and individual tasks are exploited to tackle forgetting. Ref. [219] implemented meta-learning by the combination of preconditioning matrices to update the gradient for sharing the learned knowledge across multiple tasks to tackle forgetting. Ref. [220] devised an approach in which attention methodology is utilized to empower the meta-learning algorithm to highlight and acquire nontrivial important features, and then, the meta-learning algorithm is enabled to take care of knowledge learned from past tasks and reuse it for future tasks. Ref. [117] proposed a two-level regularization based on meta-learning to regulate the network for knowledge retention and utilization across multiple tasks. The optimization methodology is devised for each task. Ref. [221] exploited the meta-learning to achieve CL through a vision transformer network, i.e., attention to self-attention. A new mask is trained for each new task and added to the main vision-transformed model to achieve learning across multiple tasks. Ref. [222] utilized domain randomization and meta-learning to achieve CL. Meta-learning-based mechanism is used to regularize the loss function related to the learning of multiple domains. The concept of meta-domains/auxiliary domains which are generated by randomized image manipulations is also utilized. Inspired by the slow and fast memory concepts of the human brain, Ref. [218] tried to devise a mechanism of fast and slow updates of weights. In the proposed study, updating weights, feature extraction, and semantic learning mechanisms are defined in fast and slow-based learning.

4.4.1. Limitations of Meta-Learning–Based Methods

Though the prospect of meta-learning–based solutions is exciting, such methods have few limitations [223]. First, these methods are computationally expensive. Second, such methods require controlled, well-defined selection of the tasks, which may not be the case in the real-world scenarios. Three, it is observed that meta-learning-based methods may undergo task overfitting in case of multitask problems. This weakness limits its generalization capability when deployed in dynamic real-world scenarios with multiple tasks [220]. Four, in the case of incremental learning, the model needs to balance stability and plasticity. Currently, most meta-learning-based methods lack this capability. Five, in the case of task-specific feedback mechanisms to update the parameters, the meta-learning technique may not perform well for complex datasets [224].

4.5. Hybrid Methods

Few studies suggest that the above-mentioned approaches may fail even for controlled and well-arranged datasets in scenarios when labels or task IDs are unavailable at the test time [135]. Keeping in view the limitations of the above-mentioned approaches, some studies proposed hybrid solutions that are a combination of the above-mentioned methods [24, 49, 51]. Recent studies show that hybrid methods offer state-of-the-art solutions to avoid catastrophic forgetting particularly for CIL scenarios [25]. Such methods may be categorized in the following types.

4.5.1. Replay + Architecture

In these types of studies, catastrophic forgetting is tackled by combining the replay- and architecture-based approaches such that the memory is devised for storing samples or embedding, whereas the network is modified for new knowledge learning and consistency across different tasks. However, in such cases, as the semantic information of the weights is incomplete, it is difficult to store the knowledge continuously. Hence, a few studies suggest the separation of the memory module and the task module for the better performance [225]. Reference [146] utilized episodic or generative memories to replay the saved samples. In addition, a static network mechanism is adopted. For the training, a gating mechanism is adopted which governs the selection of the filters for the training related to each new task. Ref. [226] proposed a framework consisting of a network, a memory matrix to store the features, and a recall network with a conditional generative adversarial structure to retrieve old concept memories. Based on the probability distribution alignment, Ref. [227] managed the replay buffer with correct offset probabilities of buffers. In addition, the parameter consistency during the learning of multiple tasks is also maintained by updating the loss function for a subset of weights. Ref. [228] presented the idea of prompts which are small learnable parameters. The prompt pool is used for CL across the multiple tasks. The learned knowledge is kept in the form of a prompt pool in the memory.

4.5.2. Replay + Regularization

Instead of simply selecting previous samples from the retained ones, a few studies [229–231] devised a regularization-based strategy for sample selection. Ref. [230] selected samples whose gradients are most interfered with by the newly arrived samples. Ref. [12] proposed the idea of prototyping actual data for efficient learned knowledge and memory management. As the model suffers from low accuracy and weak knowledge transfer across different classifiers in the case of CIL, Ref. [232] presented a method based on a variational autoencoder network for classification and sampler generations. In addition, a contrastive loss function is introduced for better knowledge learning and sharing. Ref. [233] proposed a hashing-based network with gradient aware memory system. The hashing is achieved in two steps. In the first step, the network learns to hash the semantics, and in the second step, learned semantics are converted to particular codes.

To address the limitation of overfitting and overlapping of embedding/label space, Ref. [216] presented a geometric structure to retain the selected samples along with the geometric relation in the embedding space along with the idea of a pretrained model which are trained with episodic samples. In addition, a contrastive loss function is introduced for the training to enhance CL. Ref. [234] utilized a fully spiking network which is trained by biologically plausible local rules. In addition, a regularization term based on synaptic noise and Langevin dynamics is introduced for knowledge extraction, transfer, and utilization.

In recent studies, various researchers utilized core sets in conjunction with memory-based methods for the selection and embedding of the samples [235]. Core sets are generally used for clustering, unsupervised, and supervised learning [229]. Reference [236] utilized it for sample selection by getting maximum value of sample gradient diversity by combining mini-batch gradient similarity and cross-batch diversity [237]. Reference [240] devised a bilevel optimization mechanism and [229] proposed a weighted corset mechanism to select samples from saved knowledge. However, above approaches defined mechanisms that are very specific to methodologies. Such approaches cannot be generalized to broad range of CL methods. In addition, such approaches do not cover other memory replay methods such as pseudo-rehearsal. Moreover, these approaches are computationally expensive.

4.5.3. Architecture + Regularization

To accommodate the CL in temporal-based problems, Ref. [241] utilized attentive recurrent neural networks and two new loss functions for knowledge retention and utilization across multiple problems. Various CL-based methods suffer from knowledge interference across different tasks. To address this issue, Ref. [242] proposed an idea that sparse neurons are reserved for learning current and past tasks, whereas the majority of parameters be reserved for learning future tasks. To achieve it, variational Bayesian sparsity priors are utilized as an activation function. In addition, a loss mitigation-based input sampling and replay mechanisms are defined. The proposed method is capable to learn new tasks without explicitly boundaries. To protect the privacy of the data, Ref. [243] presented the idea of using the sketches of the data instead of replaying the actual data. It is achieved by utilizing a gradient-based consensus mechanism for mapping the actual data to sketches. However, the application of the study is very limitation which is achieved through paying heavy computational cost.

4.5.4. Meta-Learning + Others

CL-based model may suffer to adopt continuous to unlabeled data in a dynamic environment. To address the said limitation, Ref. [243] presented a meta-based optimization and data replay scheme to update the network parameters. Ref. [217] proposed a methodology based on dynamic prototype-guided replay of samples for CL in NLP. In the proposed methodology, an online meta-learning method is adopted by using prototyping mechanism for the selection and representation of effective samples.

4.5.5. Miscellaneous

The thalamus in the brain serves as a filter and relay station to manage sensory input to the cortex. It enhances relevant signals and suppresses background noise. In addition, the thalamus module acts as a learned noise filter such that it reduces overfitting [244]. In the neural network, the control of flow of information between layers mimics the thalamus filters. In addition, amplification of key information and suppressing irrelevant ones mimics the brain’s ability to focus on important information while ignoring distractions. Additionally, the top-down feedback is similar to cortical–thalamic loops [245, 246].

5. Benchmark Datasets

In this subsection, we will review existing benchmark datasets specially designed for CL. Its pertinent to mention that most of the studies as discussed above, use normal benchmark datasets which are not specially designed for CL problems. When normal datasets are used for CL, the datasets are divided and arranged as per the requirements of the CL. However, a few authors described the condition of the dataset to be designed for CL. For example, Ref. [247] describe that real CL benchmark dataset should offer several views. Ref. [248] holds the repetition of classes in the dataset. Ref. [249] also emphasized the importance and need of repetition of classes and instances, as these conditions resemble the real-world scenarios. A few studies such as [12] emphasized on the need of the presence of the context with the dataset to feel like the real world. It is observed from the literature that most of the benchmark datasets are designed for the classification tasks.

5.1. CORe50

Reference [247] proposed CORe50 as a benchmark dataset for object classification. It offers different learning scenarios. It comprised 164,866 images of size 128 × 128. There are a total of fifty types of items which are arranged in ten classes. It offers two types of environments, i.e., indoor and outdoor are covered. Eight scenarios belong to indoor, three scenarios belong to outdoor. These different scenarios mimic the different managed context. Videos and images belong to different lighting, occlusions, pose, and background conditions. Videos and pictures are taken in such a way that all these items are presented to a robot. As an overall, it offers context-based new classes as well as new instances scenarios.

5.2. iCubWorld28

This dataset comprises images of 28 items belonging to seven classes. It represents four different sessions. These sessions represent four different days (days 1–4), in which data are recorded under different environments and conditions. It consists of more than 12,000 images.

5.3. Toys200

It comprised of 200 unique toy-like object samples which are synthetically generated shapes of children toys. Each object has multiple views in the dataset.

5.4. Clear

Reference [250] presented the dataset of different objects that are changed in the shape due to the evolution during the period of 2004–07. These items include cameras, computer, etc. This dataset is constructed from YFCC100M dataset. The main contribution of this dataset is that it portrayed the evolution of real-world objects over the time.

5.5. OpenLORIS

This dataset offers the collection of images taken from RGB-D camera under different environments. It is a candidate dataset for training and testing CL models for domain incremental scenarios. The environment conditions include different occlusions, size, noise, and difficulty levels. It is a real-world dataset. It consists of 186 instances, 63 classes, and 2,138,050 images [251].

5.6. CLAD-C

It is a benchmark dataset for continual object detection [252]. This dataset is constructed based on SODA10M. It comprised of online continuous stream data of 3 days and nights. In the continuous stream, objects appear after every 10 s. The dataset offers data for different domains. The domain shift is exhibited by changing frequencies, time, and weather. It also offers multiple views of the scenes and objects. In order to exactly portray the real-world scenario, the background in the subsequent objects has overlapping.

5.7. CLAD-D

It is also based on SODA10M dataset [252]. It offers a benchmark for autonomous driving. It is proposed for domain incremental learning scenarios. Different domains are portrayed through different highways, day and night time, weather, and location within and outside the city.

5.8. vCLIMB

vCLIMB is video dataset constructed for CL to judge the forgetting in class incremental CL scenarios. It offers a sequence of disjoint tasks with the number of classes which are evenly available over the tasks [131, 159].

5.9. Experimental Configurations Used for CL Methods

There exist various experimental settings which are discussed in the literature related to CL. These settings mimic the real-world situations and natural settings the system has to counter with. Each of these scenarios has its own advantages and challenges [255]. It is pertinent to mention here that there is no agreed standard or guidelines for the formulation of the experimental configurations and division of tasks/problems. In the subsequent subsection, we discuss a few famous experimental configurations reported in the literature related to CL.

5.9.1. New Instances Scenario

In this setting, new instances of the same class become progressively available over time. All classes to be learned are known in the start of the experiments [257]. The original task T having dataset D (full dataset). D consists of n number of records. T is decomposed into four subtasks T₁, T₂, T₃, and T₄. Each task having dataset D₁, D₂, D₃, and D₄, respectively, where D₁ ∈ D₂, D₂ ∈ D₃, and D₃ ∈ D₄. D₁ consists of n₁ number of records, D₂ consists of n₂ number of records, and so on. Similarly, each of these tasks consists of C1, C2, C3, and C4 classes, respectively, such that C1 ∈ C2, C2 ∈ C3, C3 ∈ C4, and n₁…. n₄ are randomly selected numbers such that n₁ < n₂ < n₃ < n₄ and n₄ = n. For any task, knowledge learned from all classes and instances of previous tasks is reused.

5.9.2. New Instances and New Classes Scenarios

In this scenario, new instances of already known and new classes become progressively available in subsequent training or at testing time [257]. This scenario is considered more difficult as compared to CIL and new instances scenario. The original task T having dataset D (full dataset). D consists of n number of records. T is decomposed into four subtasks T₁, T₂, T₃, and T₄. Each task having dataset D₁, D₂, D₃, and D₄, respectively, where D₁ ∈ D₂, D₂ ∈ D₃, and D₃ ∈ D₄. D₁ consists of n₁ number of records, D₂ consists of n₂ number of records, and so on. n₁…. n₄ are randomly selected numbers such that n₁ < n₂ < n₃ < n₄ and n₄ = n. The Level 4 task T₄ having dataset D₄ consists of n₄ records that is actually the whole dataset, i.e., D₄ = D.

5.9.3. CIL

In this setting, samples from new classes are gradually presented to the model. The model may face new classes even at test time. In some scenarios, the model may not have information about task id at test time [155]. The model needs to tackle the confusion between old and new classes due to overlapping, gradient biases, and class imbalance [80]. In addition, the model needs to handle the challenges such as the prediction of unknown classes without previous knowledge, knowledge learning, and enhancement with few instances of each class [151, 188]. In this scenario, the system should have the identification of task and classes belonging to each task, to solve any problem.

Let a problem P consist of four tasks T1, T2, T3, and T4. Each of these tasks consists of C1, C2, C3, and C4 classes, respectively, such that C1 ∈ C2, C2 ∈ C3, and C3 ∈ C4. For any task, knowledge learned from all classes and instances of previous tasks are reused [258].

5.9.4. Domain Incremental Learning

In the domain incremental scenario, generally the structure of the problem and number of classes and instances remain the same throughout training and testing phases. In this scenario, input data distribution, environment, and the context may change [155, 255]. In addition, the system may not know to which task the domain sample belongs. In domain incremental learning, further three configurations are used by a few studies as follows [18]:

Partial Seen Data Configurations: All images in the dataset D are divided into 4 levels L1, L2, L3, and L4, with datasets D₁, D₂, D₃, and D₄, respectively. D₁ consists of 20% of images, D₂ consists of 40% of images, and D₃ consists of 60% of images from each class and D₄ = D, such that D₁ ∈ D₂, D₂ ∈ D₃, and D₃ ∈ D₄.

Unseen − Equal Dataset Size Configurations: The classification task T is divided into three subtasks T₁, T₂, and T₃. The dataset D is divided into equal three parts D₁, D₂, and D₃. Each part consists of 33.3% of total images in the dataset such that D₁∩D₂∩D₃ = ϕ and Sizeof(D₁) = Sizeof(D₂) = Sizeof(D₃). Unseen - Increasing Dataset Size Configurations: The classification task T is divided into four subtasks T₁, T₂, T₃, and T₄. Each task having the dataset D₁, D₂, D₃, and D₄, respectively, such that D₁∩D₂∩D₃∩D₄ = ϕ. Each of D₁, D₂, and D₃ consists of 20% of all images, and D₄ consists remaining 40% of all images from each class.

5.9.5. Task Incremental Learning (TIL)

In this scenario, the training and testing data may consist of sequence of number of disjoint tasks. Generally, the model has access to data from one task at a time. Regarding the task id, the system may or may not have access to task id [27, 155]. The system is supposed to learn incrementally a sequence of tasks without forgetting the previously learned tasks. In the case of TIL, the system has the knowledge about the sequence and arrangement of tasks, both during the training as well as testing phase [255]. algorithm—also at test time—which task must be performed. A special case in this context is single-incremental-task (SIT) scenarios [262]. In these settings, a single task is incremental in nature. Few studies consider new instances and new classes scenarios as special case of this scenarios. Generally, SIT scenario is considered more challenging than task incremental scenario [262].

6. Benchmark Metrics Used in CL Methods

This section describes benchmark metrics used by CL methods for measuring the performance. It can be observed by going through the studies that CL methods are evaluated on metrics specific to problem area. For example, accuracy is used for classification problems. It is learned that there is no universally accepted standardized metric to measure the performance of the models used for CL scenarios. There exist a few metrics which are particularly designed to assess the performance of CL-based methods.

6.1. Average Accuracy

A few CL-based studies utilized average accuracy of each task to measure the performance of the methods. Accuracy is computed by the following expression:

()

where TP is true positive, TN is true negative, FP is false positive, and FN is false negative. It is to mention here that in this study, accuracy is used to measure the performance.

6.2. FWT

FWT is to use the learning of known task to expedite learning of future task [263]. This metric measures whether the CL-based system utilizes the knowledge from past task to learn a new task. It needs the availability of evaluation blocks before and after each new task’s first learning to compute the values.

6.3. Average BWT

BWT utilizes knowledge of new task to enhance the accuracy of the past known task [263]. Let a problem comprised of two tasks A and B. The systems take these tasks in sequence from A to B, and the system has performance value for A as P_A.Before, and after taking task B, the system has performance value for A as P_A.After. Then, BT can be defined as follows:

()

6.4. Relative Performance (RP)

RP is employed to make the performance of the model equal or improved for the single task. It offers insight into whether a system is showing an improvement over previous attempts to attain CL versus whether a system simply experiencing CL.

6.5. Cumulative Gain

Cumulative gain keeps record of performance after completion of each task during the complete problem solution scenarios [264].

6.6. Performance Maintenance (PM)

PM offers the complete picture of the model with respect to the performance it maintained on each task. It is a relative term which gives the change in performance on the same task when it was attempted for the first time and the performance on the attempts made next time to solve the same task [131, 265]. It compares a model’s performance when it learns a task first time to subsequent times it encounters the task. It does not measure the absolute performance level. It measures the change in performance over complete lifetime of the system [265].

Regarding the values of the PM, if PM > 0, it means the performance on task is getting better over lifetime, if PM = 0, no forgetting; no learning. If PM < 0, it means catastrophic forgetting.

7. Results and Analysis

The results of various methods using different datasets are presented.

7.1. Image Classification

Tables 5, 6, 7, 8, and 9 show the results of different CL methods for the image classification problem. Table 5 shows the results for CIFAR-10 dataset for CIL and TIL scenarios. It can be observed that as an overall the accuracy for TIL scenarios remains high as compared to CIL scenarios, as CIL scenarios are considered difficult than TIL scenarios. In CIL scenarios, the accuracy of memory replay-based system remained higher as compared to network- and gradient-based methods, whereas the accuracy of architecture-based methods remained at lower side. In TIL scenario, the accuracy of static architecture-based methods remained highest, whereas the accuracy of memory replay method remained lowest. In case, of CIFAR-100 dataset as shown in Table 6, the accuracy of memory-based methods remained at higher side for TIL scenario, whereas the performance of dynamic network method remained at highest end. Table 7 shows the result for ImageNet dataset. For CIL scenario, the accuracy achieved by gradient-based method remained highest. Similarly, for the TIL scenario, the performance of gradient-based method remained highest. Table 10 displays the results of statistical t-test for ImageNet dataset. The table also shows the p-value. t-Statistics and p-value show that Du et al. performed better than other methods. Table 8 shows the results for TinyImageNet dataset for CIL and TIL scenario. It can be observed that the performance of the memory as well as gradient-based methods remained almost same at the higher side. Table 11 displays the results of statistical t-test for Tiny ImageNet dataset. The table also shows the p-value. t-Statistics and p-value show that Cheraghian et al. performed better than other methods. Table 9 shows the results for MNIST dataset for CIL and TIL scenarios. It can be observed that Abati et al. performed better as compared to other methods both in CIL and TIL scenarios.

Table 5. Results for CIFAR-10 dataset.

Class incremental learning			Task incremental learning
Method	Technique	Accuracy	Method	Technique	Accuracy
Kim et al. 2022 [135]	SN	88	Abati et al. 2020 [146]	M + N	96
Abati et al. 2020 [146]	M + N	70	Ji et al. 2021 [201]	MR	95
Yang et al. 2023 [144]	DN	68.8	Kim et al. 2022 [135]	SN	96
Rosenfeld et al. 2020 [118]	DN	70.9	Wang et al. 2022 [253]	MR	93
Fayek et al. 2020 [133]	DN	79	Cha et al. 2021 [176]	MR	96
Yang et al. 2023 [144]	DN	56
Wang et al. 2022 [253]	MR	85.6
Cha et al. 2021 [176]	MR	95.9
Du et al. 2023 [58]	DN	87.9
Hong et al. 2022 [254]	MG	65.4
Ji et al. 2021 [201]	MR	85.4
Cha et al. 2021 [176]	MR	84.2

Note: (a) Class incremental scenario and (b) task incremental scenario. M + N = hybrid (memory + network) and MR = replay.
Abbreviations: DN = dynamic network, MG = memory generative, and SN = static network.

Table 6. Results for CIFAR-100 dataset.

Class incremental learning			Task incremental learning
Method	Technique	Accuracy	Method	Technique	Accuracy
Tao et al. 2020 [129]	DN	57.5	Kim et al. 2022 [135]	SN	96
Liu et al. 2021 [121]	DN	67.6	Mazumder et al. 2021 [143]	DN	69.58
Verma et al. 2021 [130]	DN	90	Liu et al. 2022 [137]	GD	62
Boschini et al. 2023 [204]	MR	59	Ji et al. 2021 [201]	MR	89
Zhuang et al. 2022 [199]	MR	64	Wang et al. 2022 [253]	MR	89
Kim et al. 2022 [135]	SN	65
Zhu et al. 2022 [83]	SN	56.8
Fayek et al. 2020 [133]	DN	78.5
Xu et al. 2022 [170]	DN/RCL	58.8
Xu et al. 2022 [170]	DN/BOCL	61.7
Du et al. 2023 [58]	DN	70.7
Du et al. 2023 [58]	GD	65
Ji et al. 2021 [201]	MR	52.3
Wang et al. 2022 [253]	MR	56.6
Zhuang et al. 2022 [199]	MR	64

Note: (a) Class incremental scenario and (b) task incremental scenario. M + N = hybrid (memory + network), MR = replay, and GD = gradient data focus.
Abbreviations: BOCL = biobjective CL, DN = dynamic network, MG = memory generative, RCL = reliable CL, and SN = static network.

Table 7. Results for ImageNet dataset.

Class incremental learning			Task incremental learning
Method	Technique	Accuracy	Method	Technique	Accuracy
Abati et al. 2020 [146]	N + M	35	Mazumder et al. 2021 [143]	DN	98
Liu et al. 2021 [121]	DN	64	Du et al. 2023 [58]	DN	93
Wu et al. 2022 [256]	DN	69
Zhu et al. 2022 [83]	DN	68
Liu et al. 2022 [137]	GD	74
Du et al. 2023 [58]	GP	85

Note: (a) Class incremental scenario and (b) task incremental scenario. M + N = hybrid (memory + network), MR = replay, and GD = gradient data focus.
Abbreviations: DN = dynamic network, MG = memory generative, and SN = static network.

Table 8. Results for tiny ImageNet dataset.

Class incremental learning			Task incremental learning
Method	Technique	Accuracy	Method	Technique	Accuracy
Verma et al. 2021 [130]	DN	66.8	Ji et al. 2021 [201]	MR	70
Zhu et al. 2022 [83]	DN	50	Wang et al. 2022 [253]	MR	72.8
Cheraghian et al. 2021 [188]	MP	68	Cha et al. 2021 [176]	MR	53.1
Yang et al. 2023 [144]	DN	63.5
Du et al. 2023 [58]	GP	55
Ji et al. 2021 [201]	MR	37.6
Wang et al. 2022 [253]	MR	41.5
Zhuang et al. 2022 [199]	MR	56.7
Boschini et al. 2023 [204]	MR	31.7
Cha et al. 2021 [176]	MR	20.1

Note: (a) Class incremental scenario and (b) task incremental scenario. M + N = hybrid (memory + network), MR = replay, MP = pseudorehearsal, and GD = gradient data focus.
Abbreviations: DN = dynamic network, MG = memory generative, and SN = static network.

Table 9. Results for MNIST dataset.

Class incremental learning			Task incremental learning
Method	Technique	Accuracy	Method	Technique	Accuracy
Abati et al. 2020 [146]	M + N	97	Abati et al. 2020 [146]	M + N	99.7
Zhou et al. 2022 [211]	DN	81	Ji et al. 2021 [201]	MR	99
Du et al. 2023 [58]	GP	96	Wang et al. 2022 [253]	MR	99.5
Ji et al. 2021 [201]	MR	95	Cha et al. 2021 [176]	MR	98.6
Wang et al. 2022 [253]	MR	95.6	XDG^∗	—	99.1
EWC^∗	RP	20.64	EWC^∗	RP	99
SI^∗	RP	21.2	SI^∗	RP	99.2
LwF^∗	RD	21.9	LwF^∗	RD	99.6
FROMP^∗	RD	77.3	FROMP^∗	RD	99.1
DGR^∗	MR	90.35	DGR^∗	MR	99.5
BI-R^∗	MR	94.4	BI-R^∗	MR	99.6
ER^∗	MR	88.8	ER^∗	MR	98.98
A-GEM^∗	MR	95.1	A-GEM^∗	MR	98.5
iCaRLof^∗	MG	92.5

Note: (a) Class incremental scenario and (b) task incremental scenario. M + N = hybrid (memory + network), MR = replay, and GD = gradient data focus.
Abbreviations: DN = dynamic network, MG = memory generative, and SN = static network.
^∗results taken from [255].

Table 10. Results of paired t-test for classification accuracy for ImageNet dataset.

Method 1	Method 2	t-statistic
Class incremental learning
Baseline	Liu et al.	28.225^∗∗
	Wu et al.	47.051^∗∗
	Zhu et al.	42.176^∗∗
	Liu et al.	48.313^∗∗
	Zhu et al.	65.726^∗∗

Task incremental learning
Du et al.	Mazumder	16.681^∗∗

Note: We consider Abati et al.’s method as the baseline method.
^∗∗means p < 0.001.

Table 11. Results of paired t-test for classification accuracy for Tiny ImageNet dataset.

Method 1	Method 2	t-statistic
Class incremental learning
Baseline	Liu et al.	171.695^∗∗
	Varma et al.	9.11^∗∗
	Zhu et al.	86.412^∗∗
	Cheraghian et al.	123.50^∗∗
	Yang et al.	112.402^∗∗
	Du et al.	88.484^∗∗
	Ji et al.	90.054^∗∗
	Wang et al.	204.40^∗∗
	Zhuang et al.	138.974^∗∗
	Boschini et al.	25.987^∗∗
	Cha et al.	6.52^∗∗

Task incremental learning
Cha et al.	Ji et al.	46.904^∗∗
Cha et al.	Wang et al.	47.431^∗∗

Note: We consider Cha et al.’s method as the baseline method.
^∗∗means p < 0.001.

7.2. Text Classification

Table 12 shows the results for the text classification. It can be observed that the hybrid method (meta + memory-based) outperformed other methods for Yelp Review and Twitter dataset, whereas for the Amazon Review dataset, dynamic network-based method outperformed other methods and its performance remained far better than compared methods. In the case of AG News dataset, the gradient-based method outperformed network and hybrid methods.

Table 12. Results for text classification.

AG news dataset			Amazon review dataset			Yelp review dataset
Method	Way	Result	Method	Way	Result	Method	Wau	Result
Shan et al. 2020 [123]	DN	89	Shan et al. 2020 [123]	DN	61.2	Shan et al. 2020 [123]	DN	68.1
Zhang et al. 2023 [161]	GP	92.5	Zhang et al. 2023 [161]	GP	58.4	Ho et al. 2023	Meta	58
Ho et al. 2023	Meta	41	Ho et al. 2023	Meta	60

Note: M + N = hybrid (memory + network), MR = replay, and GP = gradient priori focus.
Abbreviations: DN = dynamic network, MG = memory generative, and SN = static network.

7.3. Segmentation

Tables 13 and 14 display the results for the segmentation problem. Table 9 shows the results for Pascal VOC dataset for CIL and TIL scenarios. For CIL scenario, it can be observed that the performance of most of the methods remained in the range of 60–70%. Pseudo-rehearsal memory replay-based method outperformed other compared methods. Similarly, in case of TIL scenario, replay-based methods outperformed other compared methods. It can be observed in Tables 9 and 12 that hybrid methods outperformed other methods in the case of ADE20K dataset for TIL as well as CIL scenarios. Table 15 displays the results of statistical t-test for Pascal VOC dataset. The table also shows the p-value. t-Statistics and p-value show that Douillard et al. performed better than other methods.

Table 13. Results for Pascal VOC dataset. (a) Class incremental scenario and (b) task incremental scenario.

Class incremental learning			Task incremental learning
Method	Technique	Accuracy	Method	Technique	Accuracy
Zhang et al. 2022 [161]	DN	67.3	Qiu et al. 2023 [210]	MR (6 tasks)	76.61
Michieli et al. 2021 [184]	MSR	69	Phan et al. 2022 [259]	G + N (2 task)	76.22
Maracani et al. 2021 [260]	MG/D	65.8
Douillard et al. 2021 [187]	MSR	75.47
Cermelli et al. 2020 [261]	GP/D	63.5

Note: M + N = hybrid (memory + network), MR = replay, and GD = gradient data focus.
Abbreviations: D = disjoint, DN = dynamic network, MG = memory generative, O = overlapping, and SN = static network.

Table 14. Results for ADE20K dataset.

Class incremental learning			Task incremental learning
Method	Technique	Accuracy	Method	Technique	Accuracy
Zhang et al. 2022 [161]	DN	34.5	Qiu et al. 2023 [210]	MR	35.45
Michieli et al. 2021 [184]	MSR	21.7	Phan et al. 2022 [259]	G + N	38.43
Douillard et al. 2021 [187]	MSR	37.39	Cermelli et al. 2020 [261]		37.31
Cermelli et al. 2020 [261]	GP	25.9	Michieli et al. 2021 [184]		37.31

Note: (a) Class incremental scenario and (b) task incremental scenario. M + N = hybrid (memory + network), MR = replay, GD = gradient data focus, and G + N = hybrid (gradient + network).
Abbreviations: DN = dynamic network, MG = memory generative, and SN = static network.

Table 15. Results of paired t-test for performance of Pascal VOC dataset.

Method 1	Method 2	t-statistic
Class incremental learning
Baseline	Zhang et al.	9.125^∗∗
	Michieli et al.	16.968^∗∗
	Maracani et al.	7.660^∗∗
	Douillard et al.	33.502^∗∗

Task incremental learning
Phan et al.	Qiu et al.	0.076^∗∗

Note: We consider Cermelli et al.’s method as the baseline method.
^∗∗means p < 0.001.

7.4. Human Action Recognition

Tables 16 and 17 show the results for human action recognition dataset. It shows that the methods in offline settings outperformed other compared methods for cross-subject (CS) scenarios. It shows results for cross-view (CV) scenario. It can be observed that the performance of top five methods remained almost similar. However, the method in offline settings outperformed other methods. Similar is the case for the NTU RGBD dataset, where the method with offline setting outperformed other methods for CV and CS scenarios.

Table 16. Results for PKU-MD dataset.

Cross-subject (CS)			Cross-view (CV)
Method	Technique	Accuracy	Method	Technique	Accuracy
Li et al. 2021 [120]	CL	84.6	Li et al. 2021 [120]	CL	87
Li et al. 2021 [120]	Offline	95.3	Li et al. 2021 [120]	Offline	97.2
Hayes et al. 2020 [266]		71.2	Hayes et al. 2020 [266]		75.3
Chao et al. 2018	HCN	92.6	Chao et al. 2018		94.2
Tianhong et al. 2019	RF-action	92.9	Tianhong et al. 2019		94.4
Lopez-Paz et al. 2022 [263]		65.9	Lopez-Paz et al. 2022 [263]		61.3

Note: (a) Cross-subject (CS) and (b) cross-view (CV).

Table 17. Results for NTU-RGB-D dataset.

Cross-view (CV)			Cross-view (CV)
Method	Technique	Accuracy	Method	Technique	Accuracy
Li et al. 2021 [120]	CL	84.6	Li et al. 2021 [120]	CL	87
Li et al. 2021 [120]	Offline	95.3	Li et al. 2021 [120]	Offline	97.2
Liu et al. 2021 [121]	CL	46.3	Liu et al. 2021 [121]	CL	54.5
Liu et al. 2021 [121]	Offline	91.5	Liu et al. 2021 [121]	Offline	96.2
Yan et al. 2021 [267]		81.5	Yan et al. 2021 [267]		88.3
Hayes et al. 2020 [266]		56	Hayes et al. 2020 [266]		59.8
Hayes et al. 2021 [266]	AGCN	88.5	Hayes et al. 2021 [266]	AGCN	95.1
Lopez-Paz et al. 2022 [263]	GEM	55.3	Lopez-Paz et al. 2022 [263]	GEM	54.5

Note: (a) Cross-subject (CS) and (b) cross-view (CV).

8. Challenges and Looking Ahead

This section describes the challenges faced by the CL-based methods.

8.1. Backward Knowledge Transfer

Currently, most approaches addressing catastrophic forgetting in neural networks focus on minimizing the loss of previously learned tasks, ensuring forward knowledge transfer only. However, this approach limits backward knowledge transfer, where insights from new tasks could enhance the performance on earlier tasks. Effective two-way knowledge transfer, both forward and backward, remains a challenging area that requires attention to develop mechanisms that facilitate bidirectional learning.

8.2. Growing Demand of Resources

Most CL-based methods reported in the literature require increasingly substantial resources, such as computing power or storage capacity. For ML systems to emulate the CL capability of the biological brain, they need to operate with limited dedicated resources. Relying on additional resources for learning each new task may not be sustainable or feasible.

8.3. Graceful Forgetting

Graceful forgetting is an essential aspect of CL. Like the biological brain, ML systems need mechanisms to selectively forget nonessential data or outdated knowledge, freeing up resources for critical functions and future learning. Developing effective strategies for graceful forgetting—ones that conserve resources without sacrificing valuable information—remains an active area of research.

8.4. Online Continuous Stream of Data

Task identification and management are crucial features of CL. However, with an online continuous data stream, identifying and managing tasks becomes particularly challenging.

8.5. Biases

Manually setting hyperparameters can introduce biases into the system. Beyond parameter-related biases, the system may also be affected by biases from other factors, such as data imbalances and environmental influences.

8.6. Availability of Real Test Environment

A crucial factor in advancing lifelong learning technology is developing realistic testing environments that specifically assess CL capabilities, moving beyond static, pre-prepared datasets. Currently, most AI models are trained in controlled, well-defined settings. For a model to operate effectively in real-world, dynamic environments, it needs training with data that closely mirror these conditions.

8.7. Multiagent CL

Traditional ML algorithms are typically centralized and rely on single-agent frameworks. However, the rise of the Internet of Things (IoT) has led to unprecedented data growth, creating demand for more sophisticated AI models that operate across multiple agents. Despite this need, multiagent CL remains relatively underexplored compared to traditional AI approaches.

In multiagent reinforcement learning (MARL), transfer learning (TL) is a crucial technique for accelerating learning by facilitating knowledge sharing among agents. TL enables the reuse of knowledge acquired in a source domain, thereby reducing the reliance on extensive datasets and long training times for target tasks. In the context of reinforcement learning (RL), TL can significantly lower the number of samples needed to derive an optimal policy by leveraging previously learned knowledge.

However, applying TL to real-world problems presents three significant challenges. First, most real-world environments are only partially observable, complicating the application of learned knowledge. Second, collecting prior knowledge in unfamiliar domains is inherently difficult. Third, negative transfer—where transferred knowledge hinders rather than aids learning—poses a serious obstacle to progress.

While TL in RL has been the subject of extensive research, a few studies have simultaneously addressed the combined challenges of partial observability, precollecting knowledge, and mitigating negative transfer. Furthermore, the domain of multiagent CL, which involves incorporating both knowledge transfer and incremental learning, remains largely underexplored. This is especially true for scenarios involving evolutionary multitasking, where identifying and transferring relevant and beneficial knowledge across diverse tasks is both essential and highly complex.

We believe that brain biology will continue to serve as a valuable source of inspiration for developing novel lifelong learning approaches. Advances in our understanding of key brain biological mechanisms, such as dynamic memory updating processes like active forgetting, extinction, and memory reconsolidation, will inspire new algorithms beyond those currently discussed. Additionally, deeper insights into intracellular processes like signaling and gene regulation, as well as intercellular communication, could drive lifelong learning innovations beyond the central nervous system.

Lifelong learning systems, with their enhanced capabilities and broader range of behaviors in real-world applications, have the potential to revolutionize fields such as autonomous vehicles, smart cities, and healthcare. Realizing this potential will require ongoing multidisciplinary initiatives that unite researchers from biology, neuroscience, psychology, engineering, and AI. Such collaborations are essential for developing the convergent solutions that this emerging form of AI demands.

Considering the space limitations, this study focuses on the application of CL for computer vision problems using supervised learning methods. Future work may include a comprehensive survey of studies exploring RL and unsupervised learning approaches in domains beyond computer vision. Additionally, other advanced deep learning techniques, such as graph neural networks, could also be examined to provide a broader perspective.

9. Ethical Considerations

CL has attracted various researchers in recent years. However, addressing the ethical challenges arising from advancements in this field is crucial.

CL systems leverage historical data from multiple tasks; however, if these data contain biases, the systems may learn them. To mitigate such issues, it is crucial to use diverse datasets and develop algorithms specifically designed to address bias. Additionally, regular testing and updates are essential to address biases.

Training CL models requires extensive data from multiple tasks, which presents significant privacy and data security challenges. To address such concerns requires the adoption of encryption techniques and strict adherence to privacy and security protocols.

Most CL methods rely on deep learning, whereas deep learning models are considered as “black boxes,” making it difficult to comprehend internal working to reach any conclusions. Various researchers are working on the transparency and explainability of deep learning models.

The ultimate goal of CL models is to integrate them into autonomous systems. Important concerns related to autonomous systems are liability and responsibility. Establishing and enforcing regulations will be essential to ensure the safe and responsible use of such systems.

10. Conclusion

Advances in neuroscience provide important inspiration for the development of CL. In light of this, we present in this study the most comprehensive review of research and the latest developments in the field of CL, highlighting the contributions and challenges from recent research papers. We examine the key functions of the biological brain, map brain functions to recent ML methods, and critically review five types of CL-based methods, along with their results, analysis, challenges, and future directions. We hope that this study will benefit both the general reader and the research community by providing a complete picture of the latest research in the field. Additionally, it aims to motivate the development of brain-inspired CL methods across various application areas. From this comprehensive viewpoint, we anticipate that advancements in CL will ultimately enable AI systems to exhibit human-like adaptability, allowing these systems to respond flexibly to real-world dynamics and continuously evolve throughout their lifespan.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

The funding for the study is provided by the Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences (Grant No. 2024PVB0036), and the School of Information Engineering, Xi’an Eurasia University, Xi’an, Shaanxi, China, for their financial support and funding.

Acknowledgments

All authors thank the Changchun Institute of Optics, Fine Mechanics, and Physics, Chinese Academy of Sciences, Changchun China, and the School of Information Engineering, Xi’an Eurasia University, Xi’an, Shaanxi, China, for their financial support and funding.

Open Research

Data Availability Statement

The data that support the findings of this study are available upon request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

1 Guanglei Y., Xu E., Rota D., Ding P., and Nabi M., Uncertainty-Aware Contrastive Distillation for Incremental Semantic Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2022) .
Google Scholar
2 Irfan M., Jiangbin Z., Iqbal M., Masood Z., and Arif M. H., Knowledge Extraction and Retention Based Continual Learning by Using Convolutional Autoencoder-based Learning Classifier System, Information Sciences. (2022) 591, 287–305, https://doi.org/10.1016/j.ins.2022.01.043.
10.1016/j.ins.2022.01.043
Web of Science® Google Scholar
3 Feng T., Wang M., and Yuan H., Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 9427–9436.
Google Scholar
4 French R. M., Catastrophic Forgetting in Connectionist Networks, Trends in Cognitive Sciences. (1999) 3, no. 4, 128–135, https://doi.org/10.1016/S1364-6613(99)01294-2, 2-s2.0-0032923221.
10.1016/S1364-6613(99)01294-2
CAS PubMed Web of Science® Google Scholar
5 Hadsell R., Rao D., Rusu A. A., and Pascanu R., Embracing Change: Continual Learning in Deep Neural Networks, Trends in Cognitive Sciences. (2020) 24, no. 12, 1028–1040, https://doi.org/10.1016/j.tics.2020.09.004.
10.1016/j.tics.2020.09.004
Web of Science® Google Scholar
6 Chen Z. and Liu B., Lifelong Machine Learning, Second Edition, Synthesis Lectures on Artificial Intelligence and Machine Learning. (2018) 12, 1–207.
10.1007/978-3-031-01581-6
Google Scholar
7 Guest O. and Martin A. E., On Logical Inference over Brains, Behaviour, and Artificial Neural Networks, Computational Brain & Behavior. (2023) 6, no. 2, 213–227, https://doi.org/10.1007/s42113-022-00166-x.
10.1007/s42113-022-00166-x
Google Scholar
8 McClelland J. L., McNaughton B. L., and Lampinen A. K., Integration of New Information in Memory: New Insights from a Complementary Learning Systems Perspective, Philosophical Transactions of the Royal Society B: Biological Sciences. (2020) 375, no. 1799, https://doi.org/10.1098/rstb.2019.0637.
10.1098/rstb.2019.0637
Web of Science® Google Scholar
9 Pisupati S. and Niv Y., The Challenges of Lifelong Learning in Biological and Artificial Systems, Trends in Cognitive Sciences. (2022) 26, no. 12, 1051–1053, https://doi.org/10.1016/j.tics.2022.09.022.
10.1016/j.tics.2022.09.022
Web of Science® Google Scholar
10 Moscovitch M., Cabeza R., Winocur G., and Nadel L., Episodic Memory and Beyond: the Hippocampus and Neocortex in Transformation, Annual Review of Psychology. (2016) 67, no. 1, 105–134, https://doi.org/10.1146/annurev-psych-113011-143733, 2-s2.0-84953790747.
10.1146/annurev-psych-113011-143733
PubMed Web of Science® Google Scholar
11 Chen Z., Li Y., Liang H., and Yu J., Hierarchical Cosine Similarity Entropy for Feature Extraction of ship-radiated Noise, Entropy. (2018) 20, no. 6, https://doi.org/10.3390/e20060425, 2-s2.0-85048712344.
10.3390/e20060425
Web of Science® Google Scholar
12 Aljundi D. L. M., Masana R., Parisot M., Jia S., and Leonardis X., A Continual Learning Survey: Defying Forgetting in Classification Tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2021) 44, 3366–3385.
Google Scholar
13 Hadsell R., Rao D., Rusu A. A., and Pascanu R., Embracing Change: Continual Learning in Deep Neural Networks, Trends in Cognitive Sciences. (2020) 24, no. 12, 1028–1040, https://doi.org/10.1016/j.tics.2020.09.004.
10.1016/j.tics.2020.09.004
Web of Science® Google Scholar
14 Parisi G. I. and Lomonaco V., Online Continual Learning on Sequences, 2020, Springer International Publishing.
10.1007/978-3-030-43883-8_8
Google Scholar
15 Zaadnoordijk L., Besold T. R., and Cusack R., Lessons from Infant Learning for Unsupervised Machine Learning, Nature Machine Intelligence. (2022) 4, no. 6, 510–520, https://doi.org/10.1038/s42256-022-00488-2.
10.1038/s42256-022-00488-2
Web of Science® Google Scholar
16 Jedlicka P., Tomko M., Robins A., and Abraham W. C., Contributions by Metaplasticity to Solving the Catastrophic Forgetting Problem, Trends in Neurosciences. (2023) 46, no. 10, 893–894, https://doi.org/10.1016/j.tins.2023.07.008.
10.1016/j.tins.2023.07.008
CAS Web of Science® Google Scholar
17 Irfan M., Jiangbin Z., Iqbal M., and Arif M. H., A Novel Lifelong Learning Model Based on Cross Domain Knowledge Extraction and Transfer to Classify Underwater Images, Information Sciences. (2021) 552, 80–101, https://doi.org/10.1016/j.ins.2020.11.048.
10.1016/j.ins.2020.11.048
Web of Science® Google Scholar
18 Irfan M., Jiangbin Z., Iqbal M., Masood Z., Arif M. H., and Hassan S. R., Brain Inspired Lifelong Learning Model Based on Neural Based Learning Classifier System for Underwater Data Classification, Expert Systems with Applications. (2021) 186, https://doi.org/10.1016/j.eswa.2021.115798.
10.1016/j.eswa.2021.115798
Google Scholar
19 Irfan M., Jiangbin Z., Iqbal M., Masood Z., and Arif M. H., Knowledge Extraction and Retention Based Continual Learning by Using Convolutional Autoencoder-based Learning Classifier System, Information Sciences. (2022) 591, 287–305, https://doi.org/10.1016/j.ins.2022.01.043.
10.1016/j.ins.2022.01.043
Web of Science® Google Scholar
20 Liu D. J., Vong P., Chen C. M., Wang C., and Chen T., Class-Incremental Learning Method with Fast Update and High Retainability Based on Broad Learning System, IEEE Transactions on Neural Networks and Learning Systems. (2023) 1–14.
Web of Science® Google Scholar
21 Lin Y. B., Zhang M., Liu Y., Liang B., and Ji X., Dynamic Support Network for few-shot Class Incremental Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2023) 45, 2945–2951.
Web of Science® Google Scholar
22 Xiao Z., Du Z., Wang R., Gan R., and Li J., Online Continual Learning with Declarative Memory, Neural Networks. (2023) 163, 146–155, https://doi.org/10.1016/j.neunet.2023.03.025.
10.1016/j.neunet.2023.03.025
Web of Science® Google Scholar
23 Yao W. X., Wang L., Paik X., and Sen H.-Y. W., Uncertainty Estimation with Neural Processes for meta-continual Learning, IEEE Transactions on Neural Networks and Learning Systems. (2022) 1–11.
Google Scholar
24 Karunaratne H. M., Cherubini G., Benini G., Sebastian L., and Abbas A. R., Constrained few-shot class-incremental Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 9057–9067.
Google Scholar
25 Menezes A. G., de Moura G., Alves C., and de Carvalho A. C., Continual Object Detection: a Review of Definitions, Strategies, and Challenges, Neural Networks. (2023) 161, 476–493, https://doi.org/10.1016/j.neunet.2023.01.041.
10.1016/j.neunet.2023.01.041
Web of Science® Google Scholar
26 Mai Z., Li R., Jeong J., Quispe D., Kim H., and Sanner S., Online Continual Learning in Image Classification: an Empirical Survey, Neurocomputing. (2022) 469, 28–51, https://doi.org/10.1016/j.neucom.2021.10.021.
10.1016/j.neucom.2021.10.021
Web of Science® Google Scholar
27 Masana M., Liu X., Twardowski B., Menta M., Bagdanov A. D., and van de Weijer J., Class-Incremental Learning: Survey and Performance Evaluation on Image Classification, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2023) 45, no. 5, 5513–5533, https://doi.org/10.1109/tpami.2022.3213473.
10.1109/TPAMI.2022.3213473
PubMed Web of Science® Google Scholar
28 Febrinanto F. G., Xia F., Moore K., Thapa C., and Aggarwal C., Graph Lifelong Learning: a Survey, IEEE Computational Intelligence Magazine. (2023) 18, no. 1, 32–51, https://doi.org/10.1109/mci.2022.3222049.
10.1109/mci.2022.3222049
Web of Science® Google Scholar
29 Mundt M., Hong Y., Pliushch I., and Ramesh V., A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning, Neural Networks. (2023) 160, 306–336, https://doi.org/10.1016/j.neunet.2023.01.014.
10.1016/j.neunet.2023.01.014
Web of Science® Google Scholar
30 He J., Mao R., Shao Z., and Zhu F., Incremental Learning in Online Scenario, 2003, 13191.
Google Scholar
31 Wang Y., Yao Q., Kwok J. T., and Ni L. M., Generalizing from a Few Examples: a Survey on few-shot Learning, ACM Computing Surveys. (2020) 53, no. 3, 1–34, https://doi.org/10.1145/3386252.
10.1145/3386252
Web of Science® Google Scholar
32 Brna A. P., Brown R. C., Connolly P. M., Simons S. B., Shimizu R. E., and Aguilar-Simon M., Uncertainty-Based Modulation for Lifelong Learning, Neural Networks. (2019) 120, 129–142, https://doi.org/10.1016/j.neunet.2019.09.011.
10.1016/j.neunet.2019.09.011
Web of Science® Google Scholar
33 Warner J., Devaraj A., and Miikkulainen R., Using Context to Make Gas Classifiers Robust to Sensor Drift, 2020.
Google Scholar
34 Tadros T., Krishnan G., Ramyaa R., and Bazhenov M., Biologically Inspired Sleep Algorithm for Increased Generalization and Adversarial Robustness in Deep Neural Networks, International Conference on Learning Representations, 2020.
Google Scholar
35 Jin C., Feng X., and Yu H., A brain-inspired Incremental Multitask Reinforcement Learning Approach, IEEE Transactions on Cognitive and Developmental Systems. (2024) 16, no. 3, 1147–1160, https://doi.org/10.1109/TCDS.2023.3338241.
10.1109/TCDS.2023.3338241
Web of Science® Google Scholar
36 Ashfahani A. and Pratama M., Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments, 2019, Society for Industrial and Applied Mathematics.
Google Scholar
37 Kharratzadeh M. and Shultz T., Neural Implementation of Probabilistic Models of Cognition, Cognitive Systems Research. (2016) 40, 99–113, https://doi.org/10.1016/j.cogsys.2016.04.002, 2-s2.0-84969534222.
10.1016/j.cogsys.2016.04.002
Web of Science® Google Scholar
38 Kuhn H. G., Dickinson-Anson H., and Gage F. H., Neurogenesis in the Dentate Gyrus of the Adult Rat: Age-Related Decrease of Neuronal Progenitor Proliferation, Journal of Neuroscience. (1996) 16, no. 6, 2027–2033, https://doi.org/10.1523/jneurosci.16-06-02027.1996.
10.1523/jneurosci.16-06-02027.1996
CAS Web of Science® Google Scholar
39 Lim D. A. and Alvarez-Buylla A., The Adult ventricular–subventricular Zone (v-svz) and Olfactory Bulb (Ob) Neurogenesis, Cold Spring Harbor Perspectives in Biology. (2016) 8, no. 5, https://doi.org/10.1101/cshperspect.a018820, 2-s2.0-84964999292.
10.1101/cshperspect.a018820
Web of Science® Google Scholar
40 Lennington J. B., Yang Z., and Conover J. C., Neural Stem Cells and the Regulation of Adult Neurogenesis, Reproductive Biology and Endocrinology. (2003) 1, https://doi.org/10.1186/1477-7827-1-99, 2-s2.0-4344640577.
10.1186/1477-7827-1-99
Google Scholar
41 Gridchyn I., Schoenenberger P., O’Neill J., and Csicsvari J., Assembly-Specific Disruption of Hippocampal Replay Leads to Selective Memory Deficit, Neuron. (2020) 106, no. 2, 291–300.e6, https://doi.org/10.1016/j.neuron.2020.01.021.
10.1016/j.neuron.2020.01.021
CAS Web of Science® Google Scholar
42 Yoon J., Yang E., Lee J., and Hwang S. J., Lifelong Learning with Dynamically Expandable Networks, 2017, https://arxiv.org/abs/1708.01547.
Google Scholar
43 Kempermann G., Gage F. H., Aigner L. et al., Human Adult Neurogenesis: Evidence and Remaining Questions, Cell Stem Cell. (2018) 23, no. 1, 25–30, https://doi.org/10.1016/j.stem.2018.04.004, 2-s2.0-85045564681.
10.1016/j.stem.2018.04.004
CAS PubMed Web of Science® Google Scholar
44 Oudiette D. and Paller K. A., Upgrading the Sleeping Brain with Targeted Memory Reactivation, Trends in Cognitive Sciences. (2013) 17, no. 3, 142–149, https://doi.org/10.1016/j.tics.2013.01.006, 2-s2.0-84875260023.
10.1016/j.tics.2013.01.006
PubMed Web of Science® Google Scholar
45 Preston A. R. and Eichenbaum H., Interplay of Hippocampus and Prefrontal Cortex in Memory, Current Biology. (2013) 23, no. 17, R764–R773, https://doi.org/10.1016/j.cub.2013.05.041, 2-s2.0-84883821745.
10.1016/j.cub.2013.05.041
CAS PubMed Web of Science® Google Scholar
46 McClelland J. L., McNaughton B. L., and O’Reilly R. C., Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory, Psychological Review. (1995) 102, no. 3, 419–457, https://doi.org/10.1037/0033-295X.102.3.419, 2-s2.0-0029340352.
10.1037/0033-295X.102.3.419
PubMed Web of Science® Google Scholar
47 McClelland J. L., McNaughton B. L., and Lampinen A. K., Integration of New Information in Memory: New Insights from a Complementary Learning Systems Perspective, Philosophical Transactions of the Royal Society B: Biological Sciences. (2020) 375, no. 1799, https://doi.org/10.1098/rstb.2019.0637.
10.1098/rstb.2019.0637
Web of Science® Google Scholar
48 Blakeman S. and Mareschal D., A Complementary Learning Systems Approach to Temporal Difference Learning, Neural Networks. (2020) 122, 218–230, https://doi.org/10.1016/j.neunet.2019.10.011.
10.1016/j.neunet.2019.10.011
PubMed Web of Science® Google Scholar
49 Deng Y. B. et al., Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 9255–9264.
Google Scholar
50 Laborieux A., Ernoult M., Hirtzlin T., and Querlioz D., Synaptic Metaplasticity in Binarized Neural Networks Nat, 2021.
Google Scholar
51 Meng Q., Shin’ichi, and Adinet S., Attribute Driven Incremental Network for Retinal Image Classification, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 4033–4042.
Google Scholar
52 Davis R. L. and Zhong Y., The Neurobiology of Forgetting, Nature Reviews Neuroscience. (2017) 18, 547–558.
Google Scholar
53 Berry J. A., Cervantes-Sandoval I., Chakraborty M., and Davis R. L., Dopamine Mediates Adaptive Forgetting of Olfactory Memories in Drosophila, Current Biology. (2015) 25, 630–635.
Google Scholar
54 Akkerman S., Blokland A., and Reneerkens O., Object Recognition Memory: Neurobiological Mechanisms of Encoding, Consolidation and Retrieval, Neuroscience & Biobehavioral Reviews. (2014) 22, 1–14.
Google Scholar
55 Nader K., Schafe G. E., and LeDoux J. E., Memory Traces Unbound, Trends in Neurosciences. (2000) 23, 65–72.
Google Scholar
56 Lee J. L., Everitt B. J., and Thomas K. L., Reconsolidation and Extinction of Conditioned Fear: Inhibition and Potentiation, Journal of Neuroscience. (2004) 24, 8308–8313.
Google Scholar
57 Zhang W., Li D., Ma C., Zhai G., Yang X., and Ma K., Continual Learning for Blind Image Quality Assessment, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2023) 45, no. 3, 2864–2878, https://doi.org/10.1109/TPAMI.2022.3178874.
10.1109/TPAMI.2022.3178874
Web of Science® Google Scholar
58 Du F., Yang Y., Zhao Z., and Zeng Z., Efficient Perturbation Inference and Expandable Network for Continual Learning, Neural Networks. (2023) 159, 97–106, https://doi.org/10.1016/j.neunet.2022.10.030.
10.1016/j.neunet.2022.10.030
Web of Science® Google Scholar
59 Fu Z., Wang Z., Xu X., Li D., and Yang H., Knowledge Aggregation Networks for Class Incremental Learning, Pattern Recognition. (2023) 137, https://doi.org/10.1016/j.patcog.2023.109310.
10.1016/j.patcog.2023.109310
Web of Science® Google Scholar
60 Zou X., Kolouri S., Pilly P. K., and Krichmar J. L., Neuromodulated Attention and Goal-Driven Perception in Uncertain Domains, Neural Networks. (2020) 125, 56–69, https://doi.org/10.1016/j.neunet.2020.01.031.
10.1016/j.neunet.2020.01.031
Web of Science® Google Scholar
61 Madireddy S., Yanguas-Gil A., and Balaprakash P., Neuromodulated Neural Architectures with Local Error Signals for memory-constrained Online Continual Learning, 2020.
Google Scholar
62 Beaulieu F., Miconi L., Lehman T., and Stanley J., Learning to Continually Learn, arXiv preprint arXiv:2002.09571. (2020) .
Google Scholar
63 Hwu T. and Krichmar J. L., A Neural Model of Schemas and Memory Encoding, Biological Cybernetics. (2020) 114, no. 2, 169–186, https://doi.org/10.1007/s00422-019-00808-7.
10.1007/s00422-019-00808-7
Web of Science® Google Scholar
64 Tutum C., Abdulquddos S., and Miikkulainen R., Generalization of Agent Behavior Through Explicit Representation of Context, 2021 IEEE Conference on Games (CoG), 2021, IEEE, 1–7.
Google Scholar
65 Mathieu E., Rainforth T., Siddharth N., and Teh Y. W., Disentangling Disentanglement in Variational Autoencoders, International Conference on Machine Learning, 2019, PMLR, 4402–4412.
Google Scholar
66 Pandit T. and Kudithipudi D., Relational Neurogenesis for Lifelong Learning Agents, Proceedings of the Neuro-Inspired Computational Elements Workshop, 2020, 1–9, https://doi.org/10.1145/3381755.3381766.
10.1145/3381755.3381766
Google Scholar
67 Lee S., Ha J., Zhang D., and Kim G., A Neural Dirichlet Process Mixture Model for Task-free Continual Learning, 2020.
Google Scholar
68 Stanley K. O., Clune J., Lehman J., and Miikkulainen R., Designing Neural Networks Through Neuroevolution, Nature Machine Intelligence. (2019) 1, 24–35, https://doi.org/10.1038/s42256-018-0006-z.
10.1038/s42256-018-0006-z
Web of Science® Google Scholar
69 Van de Ven G. M., Siegelmann H. T., and Tolias A. S., Brain-Inspired Replay for Continual Learning with Artificial Neural Networks, Nature Communications. (2020) 11, no. 1, https://doi.org/10.1038/s41467-020-17866-2.
10.1038/s41467-020-17866-2
Web of Science® Google Scholar
70 González O. C., Sokolov Y., Krishnan G. P., Delanois J. E., and Bazhenov M., Can Sleep Protect Memories from Catastrophic Forgetting?, eLife. (2020) 9, https://doi.org/10.7554/elife.51005.
10.7554/elife.51005
Google Scholar
71 Rolnick D., Ahuja A., Schwarz J., Lillicrap T., and Wayne G., Experience Replay for Continual Learning, Advances in Neural Information Processing Systems. (2019) 32.
Google Scholar
72 Hwu T., Kashyap H. J., and Krichmar J. L., A Neurobiological Schema Model for Contextual Awareness in Robotics, 2020 International Joint Conference on Neural Networks (IJCNN), 2020, IEEE, 1–8.
Google Scholar
73 Rusu A. A., Desjardins N. C., Soyer G., and Kirkpatrick H., Progressive Neural Networks, arXiv preprint arXiv:1606.04671. (2016) .
Google Scholar
74 Terekhov A. V., Montone G., and O’Regan J. K., Knowledge Transfer in Deep block-modular Neural Networks, Lecture Notes in Computer Science. (2015) 4, 268–279, https://doi.org/10.1007/978-3-319-22979-9_27, 2-s2.0-84947087639.
10.1007/978-3-319-22979-9_27
Google Scholar
75 Soures N., Helfer P., Daram A., Pandit T., and Kudithipudi D., Tacos: Task Agnostic Continual Learning in Spiking Neural Networks, Theory and Foundation of Continual Learning Workshop at ICML’2021, 2021.
Google Scholar
76 Zenke F., Poole B., and Ganguli S., Continual Learning Through Synaptic Intelligence, International Conference on Machine Learning, 2017, PMLR, 3987–3995.
Google Scholar
77 Masse N. Y., Grant G. D., and Freedman D. J., Alleviating Catastrophic Forgetting Using Context-dependent Gating and Synaptic Stabilization, Proceedings of the National Academy of Sciences of the United States of America. (2018) 115, no. 44, E10467–E10475, https://doi.org/10.1073/pnas.1803839115, 2-s2.0-85055659881.
10.1073/pnas.1803839115
CAS PubMed Web of Science® Google Scholar
78 Tsuda B., Tye K. M., Siegelmann H. T., and Sejnowski T. J., A Modeling Framework for Adaptive Lifelong Learning with Transfer and Savings Through Gating in the Prefrontal Cortex, Proceedings of the National Academy of Sciences. (2020) 117, no. 47, 29872–29882, https://doi.org/10.1073/pnas.2009591117.
10.1073/pnas.2009591117
CAS Web of Science® Google Scholar
79 Mendez J. A. and Eaton E., Lifelong Learning of Compositional Structures, 2020.
Google Scholar
80 Hong L. Y., Tao X., Dong X., Shi S., and Gong J., Model Behavior Preserving for class-incremental Learning, IEEE Transactions on Neural Networks and Learning Systems. (2022) 1–12.
Web of Science® Google Scholar
81 Krishnan G. P., Tadros T., Ramyaa R., and Bazhenov M., Biologically Inspired Sleep Algorithm for Artificial Neural Networks, arXiv preprint arXiv:1908.02240. (2019) .
Google Scholar
82 Tang Y.-M., Peng Y.-X., and Zheng W.-S., Learning to Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (2022) 9539–9548, https://doi.org/10.1109/cvpr52688.2022.00933.
10.1109/cvpr52688.2022.00933
Google Scholar
83 Zhu K., Cao Y., Zhai W., Cheng J., and Zha Z.-J., Self-Promoted Prototype Refinement for few-shot class-incremental Learning, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (2021) 6797–6806, https://doi.org/10.1109/cvpr46437.2021.00673.
10.1109/cvpr46437.2021.00673
Google Scholar
84 Soltoggio A., Bullinaria J. A., Mattiussi C., Dürr P., and Floreano D., Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios, Proceedings of the 11th International Conference on Artificial Life (Alife XI), CONF, 2008, MIT Press, 569–576.
Google Scholar
85 Zohora F. T., Karia V., Daram A. R., Zyarah A. M., and Kudithipudi D., Metaplasticnet: Architecture with Probabilistic Metaplastic Synapses for Continual Learning, 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, IEEE, 1–5.
Google Scholar
86 Tan H., Zhou Y., Tao Q., Rosen J., and van Dijken S., Bioinspired Multisensory Neural Network with Crossmodal Integration and Recognition, Nature Communications. (2021) 12, no. 1, https://doi.org/10.1038/s41467-021-21404-z.
10.1038/s41467-021-21404-z
Web of Science® Google Scholar
87 Zeng T., Tang F., Ji D., and Si B., Neurobayesslam: Neurobiologically Inspired Bayesian Integration of Multisensory Information for Robot Navigation, Neural Networks. (2020) 126, 21–35.
10.1016/j.neunet.2020.02.023
PubMed Web of Science® Google Scholar
88 Han Y. n. and Liu J. w., Online Continual Learning via the Knowledge Invariant and spread-out Properties, Expert Systems with Applications. (2023) 213, https://doi.org/10.1016/j.eswa.2022.119004.
10.1016/j.eswa.2022.119004
Web of Science® Google Scholar
89 Imam N. and Cleland T. A., Rapid Online Learning and Robust Recall in a Neuromorphic Olfactory Circuit, Nature Machine Intelligence. (2020) 2, no. 3, 181–191, https://doi.org/10.1038/s42256-020-0159-4.
10.1038/s42256-020-0159-4
Web of Science® Google Scholar
90 Soltoggio A., Short-Term Plasticity as cause–effect Hypothesis Testing in Distal Reward Learning, Biological Cybernetics. (2015) 109, no. 1, 75–94, https://doi.org/10.1007/s00422-014-0628-0, 2-s2.0-84922260944.
10.1007/s00422-014-0628-0
Web of Science® Google Scholar
91 Fini E. et al., Self-Supervised Models Are Continual Learners, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 9621–9630.
Google Scholar
92 Marjaninejad A., Urbina-Meléndez D., Cohn B. A., and Valero-Cuevas F. J., Autonomous Functional Movements in a tendon-driven Limb via Limited Experience, Nature Machine Intelligence. (2019) 1, no. 3, 144–154, https://doi.org/10.1038/s42256-019-0029-0.
10.1038/s42256-019-0029-0
Web of Science® Google Scholar
93 Miner T. J., Lamb N., Cox C., and Vineyard J., Neurogenesis Deep Learning: Extending Deep Networks to Accommodate New Classes, 2017 International Joint Conference on Neural Networks (IJCNN), 2017, IEEE, 526–533.
Google Scholar
94 Parisi G. I., Tani J., Weber C., and Wermter S., Lifelong Learning of Spatiotemporal Representations with dual-memory Recurrent self-organization, Frontiers in Neurorobotics. (2018) 12, https://doi.org/10.3389/fnbot.2018.00078, 2-s2.0-85057716855.
10.3389/fnbot.2018.00078
Web of Science® Google Scholar
95 Chaudhry A., Gordo A., Dokania P., Torr P., and Lopez-Paz D., Using Hindsight to Anchor past Knowledge in Continual Learning, Proceedings of the AAAI Conference on Artificial Intelligence. (2021) 35, no. 8, 6993–7001, https://doi.org/10.1609/aaai.v35i8.16861.
10.1609/aaai.v35i8.16861
Google Scholar
96 Xu J., Ma J., Gao X., and Zhu Z., Adaptive Progressive Continual Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2022) 44, no. 10, 6715–6728, https://doi.org/10.1109/tpami.2021.3095064.
10.1109/tpami.2021.3095064
Web of Science® Google Scholar
97 Li A., Rao X., Zhou Y., and Restrepo D., Complex Neural Representation of Odour Information in the Olfactory Bulb, Acta Physiologica. (2020) 228, no. 1, https://doi.org/10.1111/apha.13333, 2-s2.0-85068605611.
10.1111/apha.13333
Web of Science® Google Scholar
98 Van de Ven G. M. and Tolias A. S., Three Scenarios for Continual Learning, 2019, https://arxiv.org/abs/1904.07734.
Google Scholar
99 Saumweber T., Rohwedder A., Schleyer M. et al., Functional Architecture of Reward Learning in Mushroom Body Extrinsic Neurons of Larval Drosophila, Nature Communications. (2018) 9, no. 1, https://doi.org/10.1038/s41467-018-03130-1, 2-s2.0-85044220188.
10.1038/s41467-018-03130-1
Web of Science® Google Scholar
100 Wang, Lehman R., Rawal J., Zhi A., and Li J., Enhanced Poet: Open-Ended Reinforcement Learning Through Unbounded Invention of Learning Challenges and Their Solutions, International Conference on Machine Learning, 2020, PMLR, 9940–9951.
Google Scholar
101 Abraham W. C. and Bear M. F., Metaplasticity: the Plasticity of Synaptic Plasticity, Trends in Neurosciences. (1996) 19, no. 4, 126–130, https://doi.org/10.1016/s0166-2236(96)80018-x, 2-s2.0-0029984320.
10.1016/S0166-2236(96)80018-X
CAS PubMed Web of Science® Google Scholar
102 Hulme S. R., Jones O. D., and Abraham W. C., Emerging Roles of Metaplasticity in Behaviour and Disease, Trends in Neurosciences. (2013) 36, no. 6, 353–362, https://doi.org/10.1016/j.tins.2013.03.007, 2-s2.0-84878602729.
10.1016/j.tins.2013.03.007
CAS PubMed Web of Science® Google Scholar
103 Hendrikse S. C. F., Kluiver S., Treur J., Wilderjans T. F., Dikker S., and Koole S. L., How Virtual Agents Can Learn to Synchronize: an Adaptive Joint decision-making Model of Psychotherapy, Cognitive Systems Research. (2023) 79, 138–155, https://doi.org/10.1016/j.cogsys.2022.12.009.
10.1016/j.cogsys.2022.12.009
Web of Science® Google Scholar
104 Benna M. K. and Fusi S., Computational Principles of Synaptic Memory Consolidation, Nature Neuroscience. (2016) 19, no. 12, 1697–1706, https://doi.org/10.1038/nn.4401, 2-s2.0-84989912892.
10.1038/nn.4401
CAS Web of Science® Google Scholar
105 Langille J. J. and Brown R. E., The Synaptic Theory of Memory: a Historical Survey and Reconciliation of Recent Opposition, Frontiers in Systems Neuroscience. (2018) 12, https://doi.org/10.3389/fnsys.2018.00052, 2-s2.0-85059031768.
10.3389/fnsys.2018.00052
Web of Science® Google Scholar
106 Doya K., Metalearning and Neuromodulation, Neural Networks. (2002) 15, no. 4-6, 495–506, https://doi.org/10.1016/S0893-6080(02)00044-8, 2-s2.0-0036592023.
10.1016/S0893-6080(02)00044-8
PubMed Web of Science® Google Scholar
107 Tyulmankov D., Yang G. R., and Abbott L., Meta-Learning Synaptic Plasticity and Memory Addressing for Continual Familiarity Detection, Neuron. (2022) 110, no. 3, 544–557.e8, https://doi.org/10.1016/j.neuron.2021.11.009.
10.1016/j.neuron.2021.11.009
CAS Web of Science® Google Scholar
108 Nadim F. and Bucher D., Neuromodulation of Neurons and Synapses, Current Opinion in Neurobiology. (2014) 29, 48–56, https://doi.org/10.1016/j.conb.2014.05.003, 2-s2.0-84901841077.
10.1016/j.conb.2014.05.003
CAS Web of Science® Google Scholar
109 Feng T., Wang M., and Yuan H., Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, Los Alamitos, CA, IEEE Computer Society, 9417–9426.
Google Scholar
110 Stein B. E., Stanford T. R., and Rowland B. A., Multisensory Integration and the Society for Neuroscience: Then and now, Journal of Neuroscience. (2020) 40, no. 1, 3–11, https://doi.org/10.1523/jneurosci.0737-19.2019.
10.1523/jneurosci.0737-19.2019
CAS Web of Science® Google Scholar
111 Stein B. E., Stanford T. R., and Rowland B. A., Multisensory Integration and the Society for Neuroscience: Then and now, Journal of Neuroscience. (2020) 40, no. 1, 3–11, https://www.jneurosci.org/content/40/1/3.full.pdf, https://doi.org/10.1523/JNEUROSCI.0737-19.2019.
10.1523/JNEUROSCI.0737-19.2019
CAS Web of Science® Google Scholar
112 Kay L. M. and Laurent G., Odor- and Context-dependent Modulation of Mitral Cell Activity in Behaving Rats, Nature Neuroscience. (1999) 2, no. 11, 1003–1009, https://doi.org/10.1038/14801, 2-s2.0-0033325669.
10.1038/14801
CAS Web of Science® Google Scholar
113 Kudithipudi D., Aguilar-Simon M., Babb J. et al., Biological Underpinnings for Lifelong Learning Machines, Nature Machine Intelligence. (2022) 4, no. 3, 196–210, https://doi.org/10.1038/s42256-022-00452-0.
10.1038/s42256-022-00452-0
Web of Science® Google Scholar
114 Kiselycznyk C. L., Zhang S., and Linster C., Role of Centrifugal Projections to the Olfactory Bulb in Olfactory Processing, Learning & Memory. (2006) 13, no. 5, 575–579, https://doi.org/10.1101/lm.285706, 2-s2.0-33749328104.
10.1101/lm.285706
PubMed Web of Science® Google Scholar
115 Benn Y., Webb T. L., Chang B. P. I., Sun Y. H., Wilkinson I. D., and Farrow T. F. D., The Neural Basis of Monitoring Goal Progress, Frontiers in Human Neuroscience. (2014) 8, https://doi.org/10.3389/fnhum.2014.00688, 2-s2.0-84933678663.
10.3389/fnhum.2014.00688
PubMed Web of Science® Google Scholar
116 BartolT. M.Jr, Bromer C., Kinney J. et al., Nanoconnectomic Upper Bound on the Variability of Synaptic Plasticity, Elife. (2015) 4, https://doi.org/10.7554/elife.10778, 2-s2.0-84955254040.
10.7554/elife.10778
Web of Science® Google Scholar
117 Chi Z., Gu L., Liu H., Wang Y., Yu Y., and Tang J., Metafscil: a meta-learning Approach for few-shot Class Incremental Learning, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 14146–14155, https://doi.org/10.1109/cvpr52688.2022.01377.
10.1109/cvpr52688.2022.01377
Google Scholar
118 Rosenfeld A. and Tsotsos J. K., Incremental Learning Through Deep Adaptation, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2020) 42, no. 3, 651–663, https://doi.org/10.1109/tpami.2018.2884462, 2-s2.0-85057824333.
10.1109/tpami.2018.2884462
Web of Science® Google Scholar
119 He J., Dong C., and Qiao Y., Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, Los Alamitos, CA, IEEE Computer Society, 11048–11056.
Google Scholar
120 Ke L. T., Rahmani Q., Ho H., Ding R. E., and Liu H., Else-Net: Elastic Semantic Network for Continual Action Recognition from Skeleton Data, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, 13414–13423.
Google Scholar
121 Liu Y., Schiele B., and Sun Q., Adaptive Aggregation Networks for class-incremental Learning, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Los Alamitos, CA, IEEE Computer Society, 2544–2553.
Google Scholar
122 Tian Z. C., Fan K., Meng B., Zhang G., and Pan Z., Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 18879–18888.
Google Scholar
123 Shan G., Xu S., Yang L., Jia S., and Xiang Y., Learn#: a Novel Incremental Learning Method for Text Classification, Expert Systems with Applications. (2020) 147, https://doi.org/10.1016/j.eswa.2020.113198.
10.1016/j.eswa.2020.113198
Web of Science® Google Scholar
124 Li H., Barnaghi P., Enshaeifar S., and Ganz F., Continual Learning Using Bayesian Neural Networks, IEEE Transactions on Neural Networks and Learning Systems. (2021) 32, no. 9, 4243–4252, https://doi.org/10.1109/tnnls.2020.3017292.
10.1109/tnnls.2020.3017292
Web of Science® Google Scholar
125 Zhang C.-B., Xiao J.-W., Liu X., Chen Y.-C., and Cheng M.-M., Representation Compensation Networks for Continual Semantic Segmentation, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 7043–7054, https://doi.org/10.1109/cvpr52688.2022.00692.
10.1109/cvpr52688.2022.00692
Google Scholar
126 Zhu K., Zhai W., Cao Y., Luo J., and Zha Z.-J., Self-Sustaining Representation Expansion for Non-exemplar class-incremental Learning, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 9286–9295, https://doi.org/10.1109/cvpr52688.2022.00908.
10.1109/cvpr52688.2022.00908
Google Scholar
127 Pratama M. and Wang D., Deep Stacked Stochastic Configuration Networks for Lifelong Learning of Non-stationary Data Streams, Information Sciences. (2019) 495, 150–174, https://doi.org/10.1016/j.ins.2019.04.055, 2-s2.0-85065260542.
10.1016/j.ins.2019.04.055
Web of Science® Google Scholar
128 Park G.-M., Yoo S.-M., and Kim J.-H., Convolutional Neural Network with Developmental Memory for Continual Learning, IEEE Transactions on Neural Networks and Learning Systems. (2021) 32, no. 6, 2691–2705, https://doi.org/10.1109/tnnls.2020.3007548.
10.1109/tnnls.2020.3007548
Web of Science® Google Scholar
129 Tao X. et al., Few-Shot class-incremental Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 12183–12192.
Google Scholar
130 Verma V. K., Liang K. J., Mehta N., Rai P., and Carin L., Efficient Feature Transformations for Discriminative and Generative Continual Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 13865–13875.
Google Scholar
131 Simon C., Faraki M., Tsai Y. H., Yu X., and Schulter S., On Generalizing Beyond Domains in cross-domain Continual Learning, 2022.
10.1109/CVPR52688.2022.00905
Google Scholar
132 Wiwatcharakoses C. and Berrar D., Soinn+, A Self-Organizing Incremental Neural Network for Unsupervised Learning from Noisy Data Streams, Expert Systems with Applications. (2020) 143, https://doi.org/10.1016/j.eswa.2019.113069.
10.1016/j.eswa.2019.113069
Google Scholar
133 Fayek H. M., Cavedon L., and Wu H. R., Progressive Learning: a Deep Learning Framework for Continual Learning, Neural Networks. (2020) 128, 345–357, https://doi.org/10.1016/j.neunet.2020.05.011.
10.1016/j.neunet.2020.05.011
PubMed Web of Science® Google Scholar
134 Mahdavi E., Fanian A., Mirzaei A., and Taghiyarrenani Z., Itl-Ids: Incremental Transfer Learning for Intrusion Detection Systems, Knowledge-Based Systems. (2022) 253, https://doi.org/10.1016/j.knosys.2022.109542.
10.1016/j.knosys.2022.109542
Web of Science® Google Scholar
135 Kim G., Esmaeilpour S., Xiao C., and Liu B., Continual Learning Based on Ood Detection and Task Masking, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 3856–3866.
Google Scholar
136 Chen P., Zhang Y., Li Z., and Sun L., Few-Shot Incremental Learning for label-to-image Translation, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 3687–3697, https://doi.org/10.1109/cvpr52688.2022.00368.
10.1109/cvpr52688.2022.00368
Google Scholar
137 Liu L., Zheng T., Lin Y., Ni K., and Fang L., Ins-Conv: Incremental Sparse Convolution for Online 3d Segmentation, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
Google Scholar
138 Mallya A. and Lazebnik S., Packnet: Adding Multiple Tasks to a Single Network by Iterative Pruning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, 7765–7773.
Google Scholar
139 Golkar S., Kagan M., and Cho K., Continual Learning via Neural Pruning, 2019.
Google Scholar
140 Gao Q., Luo Z., Klabjan D., and Zhang F., Efficient Architecture Search for Continual Learning, IEEE Transactions on Neural Networks and Learning Systems. (2022) 1–11.
Web of Science® Google Scholar
141 Yu H., Lu J., and Zhang G., Online Topology Learning by a Gaussian Membership-based self-organizing Incremental Neural Network, IEEE Transactions on Neural Networks and Learning Systems. (2020) 31, no. 10, 3947–3961, https://doi.org/10.1109/tnnls.2019.2947658.
10.1109/tnnls.2019.2947658
Web of Science® Google Scholar
142 Wiwatcharakoses C. and Berrar D., A self-organizing Incremental Neural Network for Continual Supervised Learning, Expert Systems with Applications. (2021) 185, https://doi.org/10.1016/j.eswa.2021.115662.
10.1016/j.eswa.2021.115662
Web of Science® Google Scholar
143 Mazumder P., Singh P., Rai P., and Namboodiri V. P., Rectification-Based Knowledge Retention for Task Incremental Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2022) 1–13.
Google Scholar
144 Yang Y., Chen B., and Liu H., Bayesian Compression for Dynamically Expandable Networks, Pattern Recognition. (2022) 122, https://doi.org/10.1016/j.patcog.2021.108260.
10.1016/j.patcog.2021.108260
Web of Science® Google Scholar
145 Feng F., Hou L., She Q., Chan R. H. M., and Kwok J. T., Power Law in Deep Neural Networks: Sparse Network Generation and Continual Learning with Preferential Attachment, IEEE Transactions on Neural Networks and Learning Systems. (2022) 1–15.
Web of Science® Google Scholar
146 Tomczak A. D., Blankevoort J., Calderara T., Cucchiara S., and Bejnordi R., Conditional Channel Gated Networks for task-aware Continual Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 3931–3940.
Google Scholar
147 Ye F. and Bors A. G., Lifelong Infinite Mixture Model Based on knowledge-driven Dirichlet Process, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, Los Alamitos, CA, IEEE Computer Society, 10675–10684.
Google Scholar
148 Gao H., Wu M., Chen Z. et al., Ssa-Icl: Multi-Domain Adaptive Attention with Intra-dataset Continual Learning for Facial Expression Recognition, Neural Networks. (2023) 158, 228–238, https://doi.org/10.1016/j.neunet.2022.11.025.
10.1016/j.neunet.2022.11.025
Web of Science® Google Scholar
149 Douillard A., Rame A., Couairon G., and Cord M. D., Transformers for Continual Learning with Dynamic Token Expansion, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, Los Alamitos, CA, IEEE Computer Society, 9275–9285.
Google Scholar
150 Li Z. and Hoiem D., Learning Without Forgetting, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2018) 40, no. 12, 2935–2947, https://doi.org/10.1109/tpami.2017.2773081, 2-s2.0-85035137409.
10.1109/TPAMI.2017.2773081
CAS PubMed Web of Science® Google Scholar
151 Zhou D.-W., Yang Y., and Zhan D.-C., Learning to Classify with Incremental New Class, IEEE Transactions on Neural Networks and Learning Systems. (2022) 33, no. 6, 2429–2443, https://doi.org/10.1109/tnnls.2021.3104882.
10.1109/tnnls.2021.3104882
Web of Science® Google Scholar
152 Wu D., Dai Q., Liu J., Li B., and Wang W., Deep Incremental Hashing Network for Efficient Image Retrieval, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 9069–9077.
Google Scholar
153 Wang Z., Li H.-X., and Chen C., Incremental Reinforcement Learning in Continuous Spaces via Policy Relaxation and Importance Weighting, IEEE Transactions on Neural Networks and Learning Systems. (2020) 31, no. 6, 1870–1883, https://doi.org/10.1109/tnnls.2019.2927320.
10.1109/tnnls.2019.2927320
Web of Science® Google Scholar
154 Titsias M. K., Schwarz J., Matthews A. G. d. G., Pascanu R., and Teh Y. W., Functional Regularisation for Continual Learning with Gaussian Processes, arXiv preprint arXiv:1901. (2019) .
Google Scholar
155 Li D. and Zeng Z., Crnet: a Fast Continual Learning Framework with Random Theory, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2023) 45, no. 9, 10731–10744, https://doi.org/10.1109/tpami.2023.3262853.
10.1109/tpami.2023.3262853
Web of Science® Google Scholar
156 Lao Q., Mortazavi M., Tahaei M., Dutil F., and Fevens T., Focl: Feature-Oriented Continual Learning for Generative Models, 2020.
Google Scholar
157 Mao F., Weng W., Pratama M., and Yee E. Y. K., Continual Learning via Inter-task Synaptic Mapping, Knowledge-Based Systems. (2021) 222, https://doi.org/10.1016/j.knosys.2021.106947.
10.1016/j.knosys.2021.106947
Web of Science® Google Scholar
158 Mazur M., Łukasz P., Knop S., Pagacz P., and Spurek P., Target Layer Regularization for Continual Learning Using cramer-wold Generator, 2021.
Google Scholar
159 Hu H., Sener O., Sha F., and Koltun V., Drinking from a Firehose: Continual Learning with web-scale Natural Language, 2020.
Google Scholar
160 Kang M., Park J., and Han B., Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation, 2022.
10.1109/CVPR52688.2022.01560
Google Scholar
161 Zhang L., Wang S., Yuan F., Geng B., and Yang M., Lifelong Language Learning with Adaptive Uncertainty Regularization, Information Sciences. (2023) 622, 794–807, https://doi.org/10.1016/j.ins.2022.11.141.
10.1016/j.ins.2022.11.141
Web of Science® Google Scholar
162 Dong J., Wang L., Fang Z., Sun G., and Xu S., Federated class-incremental Learning, 2022.
10.1109/CVPR52688.2022.00992
Google Scholar
163 Xiao Z. M., Chang J., Fu Y., Liu X., and Pan A., Image de-raining via Continual Learning, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 4905–4914.
Google Scholar
164 Yu L., Twardowski B., Liu X., Herranz L., and Wang K., Semantic Drift Compensation for class-incremental Learning (2020), 2004.
Google Scholar
165 Lee J., Hong H. G., Joo D., and Kim J., Continual Learning With Extended Kronecker-Factored Approximate Curvature, 2020.
10.1109/CVPR42600.2020.00902
Google Scholar
166 Wang S., Li X., Sun J., and Xu Z., Training Networks in Null Space of Feature Covariance for Continual Learning, 2021.
10.1109/CVPR46437.2021.00025
Google Scholar
167 Lin G., Chu H., and Lai H., Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector, 2022.
Google Scholar
168 Tang S., Chen D., Zhu J., Yu S., and Ouyang W., Layerwise Optimization by Gradient Decomposition for Continual Learning, 2021.
10.1109/CVPR46437.2021.00951
Google Scholar
169 Wan T. S. T., Chen J.-C., Wu T.-Y., and Chen C.-S., Continual Learning for Visual Search with Backward Consistent Feature Embedding, 2022.
10.1109/CVPR52688.2022.01620
Google Scholar
170 Xu G., Liu Z., and Loy C. C., Computation-Efficient Knowledge Distillation via uncertainty-aware Mixup, Pattern Recognition. (2023) 138, https://doi.org/10.1016/j.patcog.2023.109338.
10.1016/j.patcog.2023.109338
Web of Science® Google Scholar
171 Zhao B., Xiao X., Gan G., Zhang B., and Xia S., Maintaining Discrimination and Fairness in Class Incremental Learning, 2019.
Google Scholar
172 Yang B., Fan F., Ni R., Li J., Kiong L., and Liu X., Continual Learning-based Trajectory Prediction with Memory Augmented Networks, Knowledge-Based Systems. (2022) 258, https://doi.org/10.1016/j.knosys.2022.110022.
10.1016/j.knosys.2022.110022
Web of Science® Google Scholar
173 Corizzo R., Baron M., and Japkowicz N., Cpdga: Change Point Driven Growing auto-encoder for Lifelong Anomaly Detection, Knowledge-Based Systems. (2022) 247, https://doi.org/10.1016/j.knosys.2022.108756.
10.1016/j.knosys.2022.108756
Web of Science® Google Scholar
174 Wang Q., Liu J., Ji Z., Pang Y., and Zhang Z., Hierarchical Correlations Replay for Continual Learning, Knowledge-Based Systems. (2022) 250, https://doi.org/10.1016/j.knosys.2022.109052.
10.1016/j.knosys.2022.109052
Web of Science® Google Scholar
175 Belouadah E. and Popescu A., Il2m: Class Incremental Learning with Dual Memory, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, 583–592, https://doi.org/10.1109/iccv.2019.00067.
10.1109/iccv.2019.00067
Google Scholar
176 Cha H., Lee J., and Shin J. C., Contrastive Continual Learning, 2021.
Google Scholar
177 Wang S., Laskar Z., Melekhov I., Li X., and Kannala J., Continual Learning for Image-based Camera Localization, 2022.
Google Scholar
178 Kim C. D., Jeong J., Moon S., and Kim G., Continual Learning on Noisy Data Streams via self-purified Replay, 2021.
10.1109/ICCV48922.2021.00058
Google Scholar
179 Hu X., Tang K., Miao C., Hua X.-S., and Zhang H., Distilling Causal Effect of Data in class-incremental Learning, 2021.
10.1109/CVPR46437.2021.00395
Google Scholar
180 Wang Z., Liu L., Duan Y., Kong Y., and Tao D., Continual Learning with Lifelong Vision Transformer, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 171–181, https://doi.org/10.1109/cvpr52688.2022.00027.
10.1109/cvpr52688.2022.00027
Google Scholar
181 Santhakumar K. and Kasaei H., Lifelong 3d Object Recognition and Grasp Synthesis Using Dual Memory Recurrent self-organization Networks, Neural Networks. (2022) 150, 167–180, https://doi.org/10.1016/j.neunet.2022.02.027.
10.1016/j.neunet.2022.02.027
Web of Science® Google Scholar
182 Lin Y., Ji P., Chen X., and He Z., Lifelong text-audio Sentiment Analysis Learning, Neural Networks. (2023) 162, 162–174, https://doi.org/10.1016/j.neunet.2023.02.008.
10.1016/j.neunet.2023.02.008
Web of Science® Google Scholar
183 Liu Y., Su Y., Liu A.-A., Schiele B., and Sun Q., Mnemonics Training: Multi-Class Incremental Learning Without Forgetting, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, IEEE.
Google Scholar
184 Michieli U. and Zanuttigh P., Continual Semantic Segmentation via repulsion-attraction of Sparse and Disentangled Latent Representations, 2021.
10.1109/CVPR46437.2021.00117
Google Scholar
185 Ganea D., Boom B., and Poppe R., Incremental few-shot Instance Segmentation, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Los Alamitos, CA, IEEE Computer Society, 1185–1194.
Google Scholar
186 Pu N., Chen W., Liu Y., Bakker E. M., and Lew M. S., Lifelong Person re-identification via Adaptive Knowledge Accumulation, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Los Alamitos, CA, IEEE Computer Society, 7897–7906.
Google Scholar
187 Douillard A., Chen Y., Dapogny A., and Cord M., Plop: Learning Without Forgetting for Continual Semantic Segmentation, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Los Alamitos, CA, IEEE Computer Society, 4039–4049.
Google Scholar
188 Rahman C. A., Fang S., Roy P., Petersson S. K., and Harandi L., Semantic-Aware Knowledge Distillation for few-shot class-incremental Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Los Alamitos, CA, IEEE Computer Society, 2534–2543.
Google Scholar
189 Lu Y., Wang M., and Deng W., Augmented Geometric Distillation for Data-free Incremental Person Reid, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 7319–7328, https://doi.org/10.1109/cvpr52688.2022.00718.
10.1109/cvpr52688.2022.00718
Google Scholar
190 Su X., Guo S., Tan T., and Chen F., Generative Memory for Lifelong Learning, IEEE Transactions on Neural Networks and Learning Systems. (2020) 31, no. 6, 1884–1898, https://doi.org/10.1109/tnnls.2019.2927369.
10.1109/tnnls.2019.2927369
Web of Science® Google Scholar
191 Ostapenko O., Puscas M., Klein T., Jähnichen P., and Nabi M., Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning, 2019.
Google Scholar
192 Rahman C. A., Ramasinghe S., Fang S., Simon P., and Petersson C., Synthesized Feature Based few-shot class-incremental Learning on a Mixture of Subspaces, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, 8641–8650.
Google Scholar
193 Yang W. L., Li K., Hong C., Li L., and Zhu Z. O., Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 5379–5388.
Google Scholar
194 Hsu Y.-C., Liu Y.-C., Ramasamy A., and Kira Z., Re-evaluating Continual Learning Scenarios: a Categorization and Case for Strong Baselines, 2018.
Google Scholar
195 Li D., Liu S., Gao F., and Sun X., Continual Learning Classification Method with constant-sized Memory Cells Based on the Artificial Immune System, Knowledge-Based Systems. (2021) 213, https://doi.org/10.1016/j.knosys.2020.106673.
10.1016/j.knosys.2020.106673
Web of Science® Google Scholar
196 Fan L., Xiong P., Wei W., and Wu Y. F., A Unified Prototype Framework for few-sample Lifelong Active Recognition, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, 15374–15383.
Google Scholar
197 Chaudhry A., Ranzato M., Rohrbach M., and Elhoseiny M., Efficient Lifelong Learning with a-gem, 2019.
Google Scholar
198 Bang J., Koh H., Park S., Song H., and Ha J.-W., Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries, 2022.
10.1109/CVPR52688.2022.00906
Google Scholar
199 Zhuang C., Huang S., Cheng G., and Ning J., Multi-Criteria Selection of Rehearsal Samples for Continual Learning, Pattern Recognition. (2022) 132, https://doi.org/10.1016/j.patcog.2022.108907.
10.1016/j.patcog.2022.108907
Web of Science® Google Scholar
200 Zhao H., Wang H., Fu Y., Wu F., and Li X., Memory-Efficient class-incremental Learning for Image Classification, IEEE Transactions on Neural Networks and Learning Systems. (2022) 33, no. 10, 5966–5977, https://doi.org/10.1109/tnnls.2021.3072041.
10.1109/tnnls.2021.3072041
Web of Science® Google Scholar
201 Ji Z., Liu J., Wang Q., and Zhang Z., Coordinating Experience Replay: a Harmonious Experience Retention Approach for Continual Learning, Knowledge-Based Systems. (2021) 234, https://doi.org/10.1016/j.knosys.2021.107589.
10.1016/j.knosys.2021.107589
Web of Science® Google Scholar
202 Li D., Gu M., Liu S., Sun X., Gong L., and Qian K., Continual Learning Classification Method with the Weighted k-nearest Neighbor Rule for time-varying Data Space Based on the Artificial Immune System, Knowledge-Based Systems. (2022) 240, https://doi.org/10.1016/j.knosys.2022.108145.
10.1016/j.knosys.2022.108145
Web of Science® Google Scholar
203 Gautam C., Parameswaran S., Mishra A., and Sundaram S., Tf-GCZSL: Task-Free Generalized Continual zero-shot Learning, Neural Networks. (2022) 155, 487–497, https://doi.org/10.1016/j.neunet.2022.08.034.
10.1016/j.neunet.2022.08.034
Web of Science® Google Scholar
204 Boschini M., Bonicelli L., Buzzega P., Porrello A., and Calderara S., Class-Incremental Continual Learning into the Extended DER-Verse, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2022) 1–16.
Google Scholar
205 Chen Q., Sun J., Palade V., and Yu Z., Continual Relation Extraction via Linear Mode Connectivity and Interval Cross Training, Knowledge-Based Systems. (2023) 264, https://doi.org/10.1016/j.knosys.2023.110288.
10.1016/j.knosys.2023.110288
Web of Science® Google Scholar
206 Song Z. C., Lin N., Zheng G., Pan Y., and Xu P., Few-Shot Incremental Learning with Continually Evolved Classifiers, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 12455–12464.
Google Scholar
207 Zhu F., Zhang X.-Y., Wang C., Yin F., and Liu C.-L., Prototype Augmentation and self-supervision for Incremental Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 5871–5880.
Google Scholar
208 Zhao T., Wang Z., Masoomi A., and Dy J., Deep Bayesian Unsupervised Lifelong Learning, Neural Networks. (2022) 149, 95–106, https://doi.org/10.1016/j.neunet.2022.02.001.
10.1016/j.neunet.2022.02.001
Web of Science® Google Scholar
209 Zhang B., Guo Y., Li Y., He Y., Wang H., and Dai Q., Memory Recall: a Simple Neural Network Training Framework Against Catastrophic Forgetting, IEEE Transactions on Neural Networks and Learning Systems. (2022) 33, no. 5, 2010–2022, https://doi.org/10.1109/tnnls.2021.3099700.
10.1109/tnnls.2021.3099700
Web of Science® Google Scholar
210 Qiu Y., Shen Y., Sun Z. et al., Sats: Self-Attention Transfer for Continual Semantic Segmentation, Pattern Recognition. (2023) 138, https://doi.org/10.1016/j.patcog.2023.109383.
10.1016/j.patcog.2023.109383
Web of Science® Google Scholar
211 Zhou D. W., Wang F. Y., Ye H. J., and Ma L., Forward Compatible few-shot class-incremental Learning, 2022.
10.1109/CVPR52688.2022.00884
Google Scholar
212 Lao Q., Jiang X., Havaei M., and Bengio Y., A two-stream Continual Learning System with Variational domain-agnostic Feature Replay, IEEE Transactions on Neural Networks and Learning Systems. (2022) 33, no. 9, 4466–4478, https://doi.org/10.1109/tnnls.2021.3057453.
10.1109/tnnls.2021.3057453
Web of Science® Google Scholar
213 Graffieti G., Maltoni D., Pellegrini L., and Lomonaco V., Generative Negative Replay for Continual Learning, Neural Networks. (2023) 162, 369–383, https://doi.org/10.1016/j.neunet.2023.03.006.
10.1016/j.neunet.2023.03.006
Web of Science® Google Scholar
214 Gao Y., Ascoli G. A., and Zhao L., Schematic Memory Persistence and Transience for Efficient and Robust Continual Learning, 2021.
10.1016/j.neunet.2021.08.011
Google Scholar
215 Toldo M. and Ozay M., Bring Evanescent Representations to Life in Lifelong Class Incremental Learning, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, 16711–16720.
Google Scholar
216 Zhang X., Jiang M., Chen H., Zheng J., and Pan Z., Incorporating Geometry Knowledge into an Incremental Learning Structure for few-shot Intent Recognition, Knowledge-Based Systems. (2022) 251, https://doi.org/10.1016/j.knosys.2022.109296.
10.1016/j.knosys.2022.109296
Web of Science® Google Scholar
217 Ho S., Liu M., Du L., Gao L., and Xiang Y., Prototype-Guided Memory Replay for Continual Learning, IEEE Transactions on Neural Networks and Learning Systems. (2024) 35, no. 8, 10973–10983, https://doi.org/10.1109/tnnls.2023.3246049.
10.1109/tnnls.2023.3246049
Web of Science® Google Scholar
218 Lee E., Huang C.-H., and Lee C.-Y., Few-Shot and Continual Learning with Attentive Independent Mechanisms, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 9455–9464.
Google Scholar
219 Joseph K. J., Rajasegaran J., Khan S., Khan F., and Balasubramanian V. N., Incremental Object Detection via meta-learning, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2022) 44, no. 12, 9209–9216, https://doi.org/10.1109/tpami.2021.3124133.
10.1109/TPAMI.2021.3124133
CAS PubMed Web of Science® Google Scholar
220 Qin Y., Zhang W., Zhao C. et al., Prior-Knowledge and Attention Based meta-learning for few-shot Learning, Knowledge-Based Systems. (2021) 213, https://doi.org/10.1016/j.knosys.2020.106609.
10.1016/j.knosys.2020.106609
Web of Science® Google Scholar
221 Xue M., Zhang H., Song J., and Song M., Meta-Attention for vit-backed Continual Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 150–159, https://doi.org/10.1109/cvpr52688.2022.00025.
10.1109/cvpr52688.2022.00025
Google Scholar
222 Volpi R., Larlus D., and Rogez G., Continual Adaptation of Visual Representations via Domain Randomization and meta-learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 4443–4453.
Google Scholar
223 Yu T., Quillen D., He Z., Julian R., and Narayan A., Meta-World: a Benchmark and Evaluation for multi-task and Meta Reinforcement Learning, 2021.
Google Scholar
224 Rajasegaran J., Khan S., Hayat M., Khan F. S., and Shah M. itaml, An Incremental task-agnostic meta-learning Approach, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 13588–13597.
Google Scholar
225 Wei X., Liu S., Xiang Y., Duan Z., Zhao C., and Lu Y., Incremental Learning Based multi-domain Adaptation for Object Detection, Knowledge-Based Systems. (2020) 210, https://doi.org/10.1016/j.knosys.2020.106420.
10.1016/j.knosys.2020.106420
Web of Science® Google Scholar
226 Li H., Dong W., and Hu B.-G., Incremental Concept Learning via Online Generative Memory Recall, IEEE Transactions on Neural Networks and Learning Systems. (2021) 32, no. 7, 3206–3216, https://doi.org/10.1109/tnnls.2020.3010581.
10.1109/tnnls.2020.3010581
Web of Science® Google Scholar
227 Wang Z., Zhang Y., Xu X., Fu Z., Yang H., and Du W., Federated Probability Memory Recall for Federated Continual Learning, Information Sciences. (2023) 629, 551–565, https://doi.org/10.1016/j.ins.2023.02.015.
10.1016/j.ins.2023.02.015
Web of Science® Google Scholar
228 Zhang W. Z., Lee Z., Zhang C. Y., Sun H., and Ren R., Learning to Prompt for Continual Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 139–149.
Google Scholar
229 Tiwari R., Killamsetty K., Iyer R., and Shenoy P. G., Gradient Coreset Based Replay Buffer Selection for Continual Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 99–108.
Google Scholar
230 Yang G. Y., Wei X., and Deng K., Not Just Selection, but Exploration: Online class-incremental Continual Learning via Dual View Consistency, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 7442–7451.
Google Scholar
231 Tian Y. Z., Shi Y., Guo X., Wang P., and Zha P., Continual Neural Mapping: Learning an Implicit Scene Representation from Sequential Observations, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 15782–15792.
Google Scholar
232 Sun W., Li Q., Zhang J., Wang D., Wang W., and Geng Ya, Exemplar-Free Class Incremental Learning via Discriminative and Comparable Parallel one-class Classifiers, Pattern Recognition. (2023) 140, https://doi.org/10.1016/j.patcog.2023.109561.
10.1016/j.patcog.2023.109561
Web of Science® Google Scholar
233 Song G., Tan X., and Yang M., Deep Continual Hashing with gradient-aware Memory for cross-modal Retrieval, Pattern Recognition. (2023) 137, https://doi.org/10.1016/j.patcog.2022.109276.
10.1016/j.patcog.2022.109276
Web of Science® Google Scholar
234 Antonov D., Sviatov K., and Sukhov S., Continuous Learning of Spiking Networks Trained with Local Rules, Neural Networks. (2022) 155, 512–522, https://doi.org/10.1016/j.neunet.2022.09.003.
10.1016/j.neunet.2022.09.003
CAS Web of Science® Google Scholar
235 Ros F. and Guillaume S., Sampling Techniques for Supervised or Unsupervised Tasks, 2020, Springer.
10.1007/978-3-030-29349-9
Google Scholar
236 Aljundi R., Lin M., Goujaud B., and Bengio Y., Gradient Based Sample Selection for Online Continual Learning, Advances in Neural Information Processing Systems. (2019) 32.
Google Scholar
237 Yoon J., Madaan D., Yang E., and Hwang S. J., Online Coreset Selection for Rehearsal-based Continual Learning, arXiv preprint arXiv:2106.01085. (2021) .
Google Scholar
238 Shim D., Mai Z., Jeong J., Sanner S., Kim H., and Jang J., Online class-incremental Continual Learning with Adversarial Shapley Value, Proceedings of the AAAI Conference on Artificial Intelligence. (2021) 35, no. 11, 9630–9638, https://doi.org/10.1609/aaai.v35i11.17159.
10.1609/aaai.v35i11.17159
Google Scholar
239 Rebuffi S.-A., Kolesnikov A., Sperl G., and Lampert C. H., Icarl: Incremental Classifier and Representation Learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 2001–2010.
Google Scholar
240 Borsos Z., Mutny M., and Krause A., Coresets via Bilevel Optimization for Continual Learning and Streaming, Advances in Neural Information Processing Systems. (2020) 33, 14879–14890.
Google Scholar
241 Yin S.-Y., Huang Y., Chang T.-Y., Chang S.-F., and Tseng V. S., Continual Learning with Attentive Recurrent Neural Networks for Temporal Data Classification, Neural Networks. (2023) 158, 171–187, https://doi.org/10.1016/j.neunet.2022.10.031.
10.1016/j.neunet.2022.10.031
Web of Science® Google Scholar
242 Yan Q., Gong D., Liu Y., van den Hengel A., and Shi J. Q., Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 109–118, https://doi.org/10.1109/cvpr52688.2022.00021.
10.1109/cvpr52688.2022.00021
Google Scholar
243 Zhang H. Z., Lan Z., Zeng C., Chu W., and You P., Lifelong Unsupervised Domain Adaptive Person re-identification with Coordinated Anti-forgetting and Adaptation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 14288–14297.
Google Scholar
244 Bernardi S., Benna M. K., Rigotti M., Munuera J., Fusi S., and Salzman C. D., The Geometry of Abstraction in the Hippocampus and Prefrontal Cortex, Cell. (2020) 183, no. 4, 954–967.e21, https://doi.org/10.1016/j.cell.2020.09.031.
10.1016/j.cell.2020.09.031
CAS PubMed Web of Science® Google Scholar
245 Hummos A., Thalamus: A brain-inspired Algorithm for biologically-plausible Continual Learning and Disentangled Representations, 2023.
Google Scholar
246 Meulemans A., Carzaniga F. S., Suykens J. A. K., Sacramento J., and Grewe B. F., A Theoretical Framework for Target Propagation, CoRR. (2020) 14331.
Google Scholar
247 Lomonaco V. and Maltoni D., Core50: a New Dataset and Benchmark for Continuous Object Recognition, Conference on Robot Learning, 2017, PMLR, 17–26.
Google Scholar
248 Cossu A., Graffieti G., Pellegrini L., and Maltoni D., Is class-incremental Enough for Continual Learning?, 2021.
Google Scholar
249 Cai Z., Sener O., and Koltun V., Online Continual Learning with Natural Distribution Shifts: an Empirical Study with Visual Data, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 8281–8290.
Google Scholar
250 Lin Z., Shi J., Pathak D., and Ramanan D., The Clear Benchmark: Continual Learning on real-world Imagery, Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
Google Scholar
251 Lan C., Feng F., Liu Q. et al., Towards Lifelong Object Recognition: a Dataset and Benchmark, Pattern Recognition. (2022) 130, https://doi.org/10.1016/j.patcog.2022.108819.
10.1016/j.patcog.2022.108819
Web of Science® Google Scholar
252 Verwimp E., Yang K., Parisot S. et al., Clad: a Realistic Continual Learning Benchmark for Autonomous Driving, Neural Networks. (2023) 161, 659–669, https://doi.org/10.1016/j.neunet.2023.02.001.
10.1016/j.neunet.2023.02.001
Web of Science® Google Scholar
253 Wang L., Lei B., Li Q., Su H., Zhu J., and Zhong Y., Triple-Memory Networks: a brain-inspired Method for Continual Learning, IEEE Transactions on Neural Networks and Learning Systems. (2022) 33, no. 5, 1925–1934, https://doi.org/10.1109/tnnls.2021.3111019.
10.1109/tnnls.2021.3111019
CAS Web of Science® Google Scholar
254 Hong Y., Mundt M., Park S., Uh Y., and Byun H., Return of the Normal Distribution: Flexible Deep Continual Learning with Variational auto-encoders, Neural Networks. (2022) 154, 397–412, https://doi.org/10.1016/j.neunet.2022.07.016.
10.1016/j.neunet.2022.07.016
Google Scholar
255 van de Ven G. M., Tuytelaars T., and Tolias A. S., Three Types of Incremental Learning, Nature Machine Intelligence. (2022) 1–13.
Google Scholar
256 Wu Z., Baek C., You C., and Ma Y., Incremental Learning via Rate Reduction, CoRR abs/2011. (2020) 14593.
Google Scholar
257 Lomonaco V., Maltoni D., and Pellegrini L., Fine-Grained Continual Learning, 2019.
Google Scholar
258 Hayes T. L., Kafle K., Shrestha R., Acharya M., and Kanan C., Remind Your Neural Network to Prevent Catastrophic Forgetting, Computer Vision–ECCV 2020: 16Th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, 2020, Springer, 466–483.
10.1007/978-3-030-58598-3_28
Google Scholar
259 Phan M. H., Phung S. L., Tran-Thanh L. et al., Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 16866–16875.
Google Scholar
260 Maracani A., Michieli U., Toldo M., and Zanuttigh P., Recall: Replay-Based Continual Learning in Semantic Segmentation, 2021.
Google Scholar
261 Cermelli F., Mancini M., Bulò S. R., Ricci E., and Caputo B., Modeling the Background for Incremental Learning in Semantic Segmentation, 2020.
10.1109/CVPR42600.2020.00925
Google Scholar
262 Maltoni D. and Lomonaco V., Continuous Learning in single-incremental-task Scenarios, 2019.
10.1016/j.neunet.2019.03.010
Google Scholar
263 Lopez-Paz D. and Ranzato M., Gradient Episodic Memory for Continual Learning, 2022.
Google Scholar
264 Powers S., Xing E., Kolve E., Mottaghi R., and Gupta A., Cora: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning Agents, 2022.
Google Scholar
265 Baker M. M., New A., Aguilar-Simon M. et al., A domain-agnostic Approach for Characterization of Lifelong Learning Systems, Neural Networks. (2023) 160, 274–296, https://doi.org/10.1016/j.neunet.2023.01.007.
10.1016/j.neunet.2023.01.007
Web of Science® Google Scholar
266 Hayes T. L. and Kanan C., Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, 220–221.
Google Scholar
267 Yan S., Xie J., and He X. D. E. R., Dynamically Expandable Representation for Class Incremental Learning, CoRR abs/2103. (2021) 16788.
Google Scholar

All articles