Detecting modulated signals in modulated noise: (II) neural thresholds in the songbird forebrain
Abstract
Sounds in the real world fluctuate in amplitude. The vertebrate auditory system exploits patterns of amplitude fluctuations to improve signal detection in noise. One experimental paradigm demonstrating these general effects has been used in psychophysical studies of ‘comodulation detection difference’ (CDD). The CDD effect refers to the fact that thresholds for detecting a modulated, narrowband noise signal are lower when the envelopes of flanking bands of modulated noise are comodulated with each other, but fluctuate independently of the signal compared with conditions in which the envelopes of the signal and flanking bands are all comodulated. Here, we report results from a study of the neural correlates of CDD in European starlings (Sturnus vulgaris). We manipulated: (i) the envelope correlations between a narrowband noise signal and a masker comprised of six flanking bands of noise; (ii) the signal onset delay relative to masker onset; (iii) signal duration; and (iv) masker spectrum level. Masked detection thresholds were determined from neural responses using signal detection theory. Across conditions, the magnitude of neural CDD ranged between 2 and 8 dB, which is similar to that reported in a companion psychophysical study of starlings [U. Langemann & G.M. Klump (2007) Eur. J. Neurosci., 26, 1969–1978]. We found little evidence to suggest that neural CDD resulted from the across-channel processing of auditory grouping cues related to common envelope fluctuations and synchronous onsets between the signal and flanking bands. We discuss a within-channel model of peripheral processing that explains many of our results.
Introduction
An important goal of auditory neuroscience is to uncover the mechanisms by which the brain segregates behaviourally relevant signals from background masking noise (Feng & Ratnam, 2000; Carlyon, 2004). Natural sounds exhibit correlated fluctuations in amplitude (‘comodulation’) across the frequency spectrum, and the vertebrate auditory system can exploit this comodulation to improve signal detection (e.g. Klump, 1996; Nelken et al., 1999). In studies of ‘comodulation masking release’ (CMR; reviewed in Verhey et al., 2003), for example, thresholds for detecting a tone masked by a narrow band of amplitude-modulated noise are reduced when ‘flanking bands’ of noise with correlated envelopes are added to different regions of the frequency spectrum. Studies of ‘comodulation detection difference’ (CDD; McFadden, 1987) have revealed related effects. Compared with conditions in CDD experiments in which a modulated narrowband noise signal and flanking band noises all have correlated envelopes, detection thresholds are lower when the signal envelope fluctuates independently of the comodulated flanking bands (Cohen & Schubert, 1987; McFadden, 1987). There is some debate about the mechanisms underlying this CDD effect (Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002; see Langemann & Klump, 2007, for further discussion). Investigating the perception of modulated signals in modulated noise and its underlying neural correlates in appropriate animal models could help to resolve this debate.
For humans, the problem of detecting amplitude-modulated signals in modulated noise is exemplified by the ‘cocktail party problem’ (Bronkhorst, 2000), which refers to the difficulty we have perceiving speech in noisy social settings. Many other animals, such as songbirds singing in a dawn chorus, also face their own cocktail-party-like problems (Klump, 1996; Hulse, 2002; Bee & Micheyl, 2007), thus making them good candidates for studies of acoustic signal detection in noise. European starlings (Sturnus vulgaris) share an impressive number of similarities with humans in auditory processing (Klump et al., 2000). For example, previous psychoacoustic studies have shown that starlings experience a magnitude of CMR similar to that described for humans (Klump & Langemann, 1995; Klump et al., 2001; Langemann & Klump, 2001). In a companion study, Langemann & Klump (2007) show that starlings also experience magnitudes of CDD similar to that in humans. Here, we report results from a study of starlings that investigated signal detection thresholds in the tonotopically organized avian homologue of mammalian primary auditory cortex (field L2; Jarvis et al., 2005) using the same acoustic stimulus conditions tested by Langemann & Klump (2007). Our study had three objectives. First, we tested the hypothesis that neural detection thresholds exhibit patterns of CDD similar to those reported in the behavioural study of Langemann & Klump (2007). Second, we tested the hypothesis that improvements in the detection of a modulated signal in the presence of a modulated masker result from the across-channel grouping of spectrally separated sounds. Finally, we tested the hypothesis that a within-channel model of signal processing (Buschermöhle et al., 2006) could explain the patterns of neural responses, signal detection thresholds and CDD reported here.
Materials and methods
Readers are referred to our previous work on starlings for more detailed descriptions of surgical, electrophysiological and experimental procedures (e.g. Nieder & Klump, 1999a; Bee & Klump, 2004).
Experimental subjects
The subjects for this experiment were six wild-caught, adult starlings (three males, three females; 76–91 g). The birds were housed in individual cages (80 × 40 × 40 cm, L × W × H) located in a common room with other birds, given ad libitum access to food and water, and kept on a natural day/night cycle. The care and treatment of the animals were in accordance with the procedures of animal experimentation approved by the Bezirksregierung Weser-Ems and with the European Communities Council Directive of 24 November 1986 (86/609/EEC).
Surgery and electrophysiological recordings
Surgery was performed under general anaesthesia (Isoflurane: 5% for induction, 1.5–2.5% for maintenance). Anaesthetized animals were fixed in a stereotaxic holder with the bill inclined about 45° below the horizontal plane. Recording electrodes (3.6–12.1 MΩ; 1 kHz a/c) were fixed to a head-mounted microdrive and chronically implanted into the field L2 of the right hemisphere. Two indifferent electrodes were implanted through a second small opening in the left rostral hemisphere. Recordings began 3–9 days following surgery.
Multi-unit recordings were made from awake and freely behaving birds placed in a test cage (56 × 36 × 33 cm, L × W × H), located inside a radio-shielded sound chamber (IAC 402A, Industrial Acoustics, Niederkrüchten, Germany). The inside dimensions of the chamber measured 193 × 183 × 198 cm (L × W × H), and the inside walls, ceiling and floor were covered with acoustic foam wedges to reduce reverberations (Illbruck waffle 125/65 mounted on Illbruck Plano 50 mm, absorption coefficient α > 0.85 for frequencies above 500 Hz). Due to the nearly anechoic conditions above 500 Hz in the setup, the reverberation time was sufficiently short for the measurements of CDD. The decay of reverberations after an amplitude peak of a stimulus was measured in 1/3-octave bands using a Stanford Research SR780 Analyser and a Brüel & Kjær 2238 Mediator sound level meter as the recording device. The drop in amplitude of the reverberations after a stimulus offset was 3.5 dB/ms, 5.1 dB/ms and 6.0 dB/ms for frequencies of 1, 2 and 4 kHz, respectively, indicating that the temporal structure of the envelope of the different frequency bands was not compromised. Neural activity was recorded via radio telemetry using a small FM radio transmitter (FHC type 40-71-1, Frederick Haer). The radio signal was received by a dipole antenna inside the sound chamber and demodulated by an FM tuner (Technics ST-GT 550, Panasonic, Hamburg, Germany) located outside the chamber. These radio signals were bandpass filtered (600–4500 Hz), amplified, digitized (Sound Blaster PCI 128, 16-bit, 44.1 kHz) and stored on the hard drive of a Linux workstation (AMD AthlonTM XP 1900+) for later analysis. At the beginning of an experimental recording session, the microdrive was lowered stepwise until a site was found at which auditory-evoked activity was elicited in response to a series of test tones. Once a suitable recording site was found, the bird was released into the test cage and given ad libitum access to food and water. Continuous video observations confirmed that subjects remained awake during recording sessions (Bee & Klump, 2004). At the completion of all experimental recordings, the subject was killed with an overdose of sodium pentobarbital, its brain was fixed by transcardial perfusion of Zambonis Reagent, and frozen sagittal sections (50 µm) were sliced and stained with Cresyl violet to confirm the position of the electrodes (Nieder & Klump, 1999a).
Stimulus generation and presentation
Acoustic stimuli (44.1 kHz, 16-bit resolution) were generated using custom software that allowed for the synchronous playback of acoustic stimuli and recording of neural responses. The analogue sound output of the computer was attenuated (Hewlett-Packard 350D, Böblingen, Germany; TDT PA 4, Tucker-Davis Technologies, Alachua, FL, USA), amplified (Rotel RB-1050, Sussex, England) and presented through a speaker (Type SP3253, KEF Audio, Maidstone, England) mounted from the ceiling of the sound chamber approximately 70 cm above the position of a starling sitting in the test cage. The frequency response inside the test cage was flat (± 4 dB) over the range of frequencies used in this study.
Readers are referred to the study by Langemann & Klump (2007) for more detailed descriptions of stimulus synthesis. Briefly, the signal consisted of an amplitude-modulated narrowband noise (100 Hz bandwidth) that was spectrally centred within a masker comprised of six flanking noise bands with a bandwidth of 100 Hz each (Fig. 1). The signal band and the flanking bands were generated on two separate channels in software, output through separate channels of the computer sound card, independently attenuated, and then added together at the input stage of the amplifier and output through the single overhead speaker. The centre frequency of the signal band was fixed to be the same as the recording site's characteristic frequency (CF) determined from a frequency-tuning curve (FTC; see below). The flanking bands were amplitude-modulated narrowband noises that were 600 ms in duration, were synchronously gated, and had centre frequencies differing from the recording site's CF by ± 300 Hz, ± 600 Hz and ± 900 Hz. These spectral separations between the flanking bands and the signal band fall within the range of spectral separations used in previous studies of humans (e.g. McFadden & Wright, 1990; Wright, 1990; Borrill & Moore, 2002; Moore & Borrill, 2002). The duration of the signal and its onset relative to the masker onset were two of the variables in our experimental design (see below). Signals and flanking band maskers were shaped with 10 ms onset and offset Hanning ramps.

Example of spectrograms of the three envelope conditions. Depicted here is the condition with a 400-ms signal band (SB) with a delayed onset (100 ms) relative to the onset of the 600-ms flanking bands (FB). Different shades of grey represent the envelope fluctuations of the noise bands (amplitude maxima and minima). (A) The SB and FBs had the same envelope in the ‘all correlated’ (AC) condition. (B) In the ‘co-uncorrelated’ (CU) condition, the FBs had a common envelope that was different from that of the SB. (C) In the ‘all uncorrelated’ (AU) condition all noise band envelopes were different; see Langemann & Klump (2007) for depictions of the temporal waveforms.
Hypotheses
Our primary aims in this study were: (i) to test the hypothesis that starling forebrain neurons exhibit patterns of CDD similar to those described in the study by Langemann & Klump (2007); (ii) to test the hypothesis that neural CDD, if present, reflects the operation of across-channel processes related to auditory grouping (e.g. Borrill & Moore, 2002; Moore & Borrill, 2002); and (iii) to test a recently developed within-channel model of signal detection in comodulated noise (see below; Buschermöhle et al., 2006).
According to the ‘auditory grouping hypothesis’, signal detection is impaired when the signal is grouped with the flanking band maskers into a single auditory object, but signal detection is improved when only the flanking bands are grouped together, thus allowing the signal to be more easily segregated (and hence detected). We investigated whether two well-known auditory grouping cues − common patterns of amplitude modulation and synchronous onsets (Bregman, 1990, 1993; Cusack & Carlyon, 2004) − function to bind signals and flanking bands together, thus impairing signal detection.
To investigate the role of common patterns of amplitude modulation, we tested three treatment levels of envelope correlation ‘within subjects’ (i.e. within a single recording session at a particular recording site). One condition was an ‘all correlated’ (AC) masking condition, in which the envelopes of the signal band and that of each flanking band were correlated (Fig. 1A). In a second masking condition − the ‘co-uncorrelated’ (CU) condition − the envelopes of the six flanking bands were correlated with each other, but that of the signal band fluctuated independently of the flanking bands (Fig. 1B). In the third masking condition − the ‘all uncorrelated’ (AU) condition − the envelopes of the signal band and each flanking band were independently modulated (Fig. 1C). According to the general hypothesis that neural signal detection thresholds exhibit a CDD effect, we predicted that signal detection thresholds would be lower in the CU conditions compared with the AC conditions (Table 1, e.g. McFadden, 1987). Following McFadden (1987), we calculated the magnitude of CDD as the difference between thresholds in the CU and AC conditions (CU − AC), so that negative values indicate a CDD effect. Further details on how the various correlation conditions were generated are provided in our companion paper (Langemann & Klump, 2007).
Hypothesis | Predictions |
---|---|
CDD hypothesis | CU < AC |
Auditory grouping hypothesis | |
Cue: common envelope modulation | CU < AU < AC |
Cue: signal onset delay | 100 ms < 0 ms |
- AC, all correlated; AU, all uncorrelated; CU, co-uncorrelated.
According to the auditory grouping hypothesis, we further predicted that signal detection thresholds would be highest in the AC conditions, lowest in the CU condition and intermediate in the AU conditions (Table 1). We reasoned that the comodulated envelopes of the signal and flanking bands in the AC condition (Fig. 1A) should promote the grouping of the signal with the flanking bands, thus impairing signal detection. In the CU condition, in contrast, the flanking bands have comodulated envelopes that differ from the signal envelope (Fig. 1B); thus, we reasoned that the signal in these conditions should tend to ‘pop out’ from the grouped flanking bands leading to relatively lower signal detection thresholds in the CU conditions. Notice that these predictions provide one potential explanation for the CDD effect (i.e. lower thresholds in the CU compared with the AC conditions) according to the hypothesis that correlated amplitude fluctuations promote across-channel grouping, but these predictions do not necessarily exclude the operation of other across-channel processes, such as ‘dip listening’. Based on an across-channel grouping hypothesis, we further predicted that thresholds in the AU condition should be intermediate between those in the AC and CU conditions because the signal and each flanking band have independently fluctuating envelopes (Fig. 1C). That is, for the AU condition, across-channel processes related to auditory grouping based on common amplitude fluctuation should neither improve signal detection (as in the CU condition) nor impair signal detection (as in the AC condition) because all noise bands fluctuated independently in the AU condition.
To investigate the role of synchronous onsets between the signal and masker, we tested two levels of signal onset delay within subjects. The signal onset was either synchronous with that of the flanking band maskers (0 ms signal onset delay), or the signal onset was delayed by 100 ms relative to the onset of the flanking bands. According to the auditory grouping hypothesis, we predicted that synchronous onsets would promote the grouping of signals and flanking bands and, thus, impair signal detection. Therefore, we expected to find lower signal detection thresholds when the signal onset was delayed by 100 ms relative to the onset of the flanking bands (Table 1).
To generalize our results for the specified treatment levels of correlation and signal onset delay, we manipulated two additional variables that included the signal duration (60 ms and 400 ms) and the spectrum level of the flanking band maskers [15 dB sound pressure level (SPL) or 50 dB SPL]. The former was manipulated within subjects, while the latter was manipulated ‘between subjects’ (i.e. at different recording sites in different recording sessions). The 15-dB and 50-dB masker spectrum levels correspond to overall masker levels of 42.8 dB SPL and 77.8 dB SPL, respectively. At the masker spectrum levels of 15 dB and 50 dB, the overall levels of each individual 100-Hz-wide flanking band were 35 dB SPL and 70 dB SPL, respectively.
Experimental design
The overall design of the experiment was a three-correlation (within: AU, AC, CU) × two-signal onset delay (within: 0 ms, 100 ms) × two-signal duration (within: 60 ms, 400 ms) × two-masker spectrum level (between: 15 dB, 50 dB) fully factorial design. Thus, at each masker spectrum level there were 12 possible combinations of correlation, signal onset delay and signal duration. For each of these 12 combinations, we created a sequence of 30 stimulus realizations, each consisting of a signal and six flanking bands with the specified combination of correlation, signal onset delay and signal duration. Within a sequence, stimulus realizations repeated with a period of 1.3 s (Fig. 2). The narrowband noises used to generate the signal and flanking bands in a sequence of 30 realizations were drawn randomly from a 60-s-long noise that was created de novo at each new recording site for each of the 12 stimulus conditions tested.

Oscillograms of stimuli and neural responses. (A) Two exemplars of a 60-ms signal presented with a 100-ms delay relative to the onset of the 600-ms flanking bands in the AC condition. (B) Multiunit neural responses from a representative recording in response to the two stimulus exemplars depicted in (A). Depicted in the figure is the relative timing of the stimulus period, the signals and the maskers, as well as the timing of the neural responses to the signals and maskers.
To determine neural thresholds, we varied the overall level of the signal in 5-dB steps between 0 and 85 dB SPL (18 levels for the 15-dB masker spectrum level) or between 5 and 100 dB SPL (20 levels for the 50-dB masker spectrum level) while keeping the long-term spectrum level of the masker constant. At each recording site, all 12 stimulus sequences were presented once at each signal level and the nominal signal level within a sequence was constant. At both masker spectrum levels, we also determined the neural response to the masker alone by presenting each of the 12 stimulus sequences with the signal channel muted. Within a recording session at each new recording site, the possible combinations of envelope correlation, signal duration, signal onset delay and nominal signal level were presented in a randomized order. The attenuator settings used to realize the nominal signal and masker levels stated above were calibrated using 30-s-long stimuli recorded with the microphone of a Brüel and Kjær Type 2238 sound level meter placed at the approximate position of a bird's head in the test cage.
Data acquisition and threshold determination
We quantified neural activity by calculating the mean multiunit firing rate (impulses/s) over the duration of the signal, averaged over artefact-free responses to 20 stimulus realizations in each sequence of 30 such realizations (see Bee & Klump, 2004). The timing of analysis windows was delayed by 14 ms relative to the signal onset to compensate for the average response latency of field L2 neurons. Within each analysis window, a neural response (‘impulse’) was scored when the voltage trace of the multiunit recording exceeded a predetermined ‘impulse threshold’. All of our recordings had sufficient signal-to-noise ratios (> 2 : 1) to reliably discriminate multiunit neural activity from the background noise floor created by the preamplifier of the head-mounted radio transmitter (Fig. 2 shows a recording that just passed the criterion). Spike-sorting methods for single-unit analyses were explored and found to be unreliable given the typical amplitude ratio of between 2 : 1 and 4 : 1. One potential problem with analyses based on multiunit impulse rates is that these could potentially vary with stimulus-dependent or level-dependent changes in the relative latencies of action potentials generated by the different neurons comprising a multiunit cluster. To evaluate whether this might be the case for our data, we computed a phase-independent estimate of neural activity as the integral of that part of the voltage trace within an analysis window that exceeded the impulse threshold, again, averaged over artefact-free responses to 20 signals in each stimulus sequence and standardized to 1 s. A comparison of threshold estimates based on multiunit impulse rates and the integration of multiunit activity revealed a mean difference of less than 1 dB. Moreover, across the 12 stimulus conditions at the two spectrum levels, the correlation between mean thresholds based on impulse rates and integrals was r = 0.95 (P < 0.001, N = 24). Below, we report thresholds based on impulse rates.



Within-channel model
Buschermöhle et al. (2006) recently proposed a within-channel model to explain the CDD effect by simulating signal processing in the auditory periphery. The model's basic assumption is that the mean neural firing rate of auditory neurons reflects the mean compressed envelope of the filtered stimulus. This assumption is based on the well-known compression and frequency filtering that occur in the peripheral auditory systems of birds and mammals (e.g. Köppl & Yates, 1999; Robles & Ruggero, 2001), as well as the observation that the firing rates of auditory nerve fibres can follow a temporally modulated stimulus envelope (e.g. Joris et al., 2004). Buschermöhle et al. (2006) derived analytical expressions for approximating the time- and trial-averaged compressed envelope values of spectrally filtered stimuli that resembled those used in the present study. In comparison to results from one multiunit recording site in the starling forebrain, the Buschermöhle et al. (2006) model successfully reproduced the rate-level functions and detection thresholds in the three correlation conditions (the CU, AC and AU correlation conditions; 400-ms signal duration, 100-ms signal onset delay, 50-dB spectrum level). Here, we assess the robustness of the Buschermöhle et al. (2006) approach. We used their model to simulate the signal detection thresholds that were determined from the actual recording sites on which our analyses are based, using as input the same stimulus sequences and experimental parameters we used to evoke responses from forebrain neurons. In other words, we ran the model in a way that simulated how each of our recording sites should respond as a function of signal level to stimuli that varied in correlation condition (CU, AC, AU), signal onset delay (0 ms, 100 ms), signal duration (60 ms, 400 ms) and masker spectrum level (15 dB, 50 dB).


Schematic diagram showing various stages of the peripheral model. Clockwise from top left to bottom left, the subplots show: (A) the stylized spectrum of the raw stimulus showing the signal band in black and the flanking bands in grey; (B) the spectrum of the filtered stimulus along with the corresponding filter shape; (C) the time signal with its envelope; (D) the time signal with the compressed envelope; (E) the mean firing rates depending on signal level calculated from the compressed envelopes by using a saturating function (different line styles for the three correlation conditions see legend in part F); and (F) the sensitivity curves (da) with the signal detection criterion shown by the dash-dotted line. AC, all correlated; AU, all uncorrelated; CU, co-uncorrelated.
A total of four model parameters could be adjusted for each recording site, and each combination of signal onset delay and signal duration. Because of the phasic-tonic response properties of field L2 neurons (e.g. Nieder & Klump, 1999a), different fitted parameters for r0, rmax and c were required to model responses to the two signal onset delays (0 ms, 100 ms) and the two signal durations (60 ms, 400 ms). Model parameters were determined as follows. The four free parameters (β, r0, rmax and c) were set to realistic initial values for each recording site, and then varied within physiologically realistic boundaries in order to minimize the mean squared differences between the model firing rates and the experimentally obtained firing rates from field L2 neurons. This procedure was performed independently for all four combinations of signal duration and signal onset delay to derive initial estimates of the optimal model parameters. We assumed that the compression β was constant for a given recording site; therefore, we averaged the derived values of the compression β across the four combinations of signal onset delay and signal duration. With β fixed, the remaining three parameters were readjusted for each combination of signal onset delay and signal duration by again minimizing the mean squared differences between the model and experimental firing rates. Hence, the same value of β was used for a given recording site to model the rate-level curves for all 12 stimulus conditions tested at that recording site (three correlations × two signal onset delays × two signal durations), whereas different values of r0, rmax and c were used to model the rate-level curves for the different combinations of signal onset delay and signal duration.
Finally, the average firing rates were converted to measures of sensitivity (da) to determine the model's signal detection threshold as the signal level at which da first exceeded a threshold criterion of 1.8 (Fig. 3F). This is the same threshold criterion used for determining neural response thresholds. In the model, da was calculated from the model rate-level curves by subtracting the rate response to the masker alone from the response to the signal plus masker and then dividing by an average standard deviation, σ, of neural firing rates. Averaged across experimental conditions, the standard deviations of the physiologically determined impulse rates elicited in response to the two signal durations were different (mean ± 95% CI: 60-ms condition, 50.8 ± 0.4 impulses/s; 400-ms condition, 25.1 ± 0.3 impulses/s); therefore, we used standard deviations of σ60 = 50.8 impulses/s and σ400 = 25.1 impulses/s for all recordings with 60-ms and 400-ms signal durations, respectively. We did not vary the standard deviation based on other manipulated variables (e.g. correlation and signal onset delay); therefore, differences in firing rate variability could not account for any CDD effect in the model results.
Effects of frequency tuning
A secondary aim of this study was to investigate the influence of frequency tuning on the magnitude of CDD and on threshold differences that result from effects related to envelope correlations and common onsets. At each new recording site, and prior to presenting our experimental stimuli, we generated a pure-tone FTC (Fig. 4) using procedures fully described elsewhere (Bee & Klump, 2004). We derived four quantitative descriptors of each recording site's FTC, including: (i) the recording site's response threshold (defined as the lowest amplitude at which the firing rate exceeded the spontaneous rate by a factor of 1.8): (ii) the CF as the tone frequency with the lowest threshold; (iii) the bandwidth of the excitatory response field at 10 dB above the CF threshold; and (iv) the FTC's Q10dB value (defined as the CF divided by the bandwidth at 10 dB above threshold).

FTCs based on multiunit activity recorded in field L2. The solid black lines in (A–C) delineate the excitatory region where the impulse rate was above threshold, defined as being 1.8 times greater than the spontaneous rate. The thin black lines in (B) and (C) delineate areas of suppression in which the impulse rate was less than the spontaneous rate divided by 1.8. Each plot also depicts seven vertical bars of 100-Hz bandwidth that represent the signal (black bar) and the six flanking bands (grey bars). Thin lines to the left and right of each FTC depict the lower and upper frequencies over which FTCs were determined. (A–C) illustrate three broad categories into which FTCs could be grouped based on the dispersion of the centre frequencies of the flanking bands relative to the frequencies where suppression was observed in the FTC. (A) Type 1 FTC: no suppression was observed within the range of frequencies and levels over which the FTC was determined; (B) Type 2 FTCs: there were areas of suppression, but the flanking bands did not extend into the frequencies where suppression was observed; (C) Type 3 FTCs: one or more flanking bands fell in a frequency region where suppression was observed. SPL, sound pressure level.
As illustrated in Fig. 4, the excitatory region of FTCs in field L2 were often flanked by one or two suppressive sidebands, but in some cases no suppression was observed (Nieder & Klump, 1999a). As a qualitative descriptor of these differences, we divided FTCs into three categories that related the spectral separation between the signal band and six flanking bands (CF ± 300 Hz, CF ± 600 Hz and CF ± 900 Hz) to the frequency-tuning characteristics of the recording site. In Type 1 FTCs (Fig. 4A), there were no suppressive sidebands in the frequency region over which the FTC was generated and, therefore, none of the flanking bands fell into areas of suppression. Recording sites with Type 2 FTCs (Fig. 4B) had clear areas of suppression, but the masker flanking bands did not extend into the range of frequencies where suppression was observed in the FTC. For sites with Type 3 FTCs (Fig. 4C), one or more of the flanking bands fell in a frequency region where suppression was observed in the FTC. There were no FTCs for which only the signal band fell into the excitatory region; usually several flanking bands also fell in the excitatory region. To examine the potential effects of these categorical differences, we included FTC type as a between-subjects factor in our statistical analyses.
Statistical analyses
We analysed our data using multiway analyses of variance (anova) or covariance (ancova) and other parametric statistics (e.g. t-tests, Pearson's product-moment correlations). In describing the results of various statistical analyses, we also report partial η2 as a measure of effect size. All analyses were performed using Statistica 7.0, and an experiment-wide criterion of α = 0.05 was used to determine statistical significance.
To examine the potential influence of frequency tuning on neural responses, we explored the use of properties of the FTC (CF, threshold, bandwidth at 10 dB above threshold, Q10dB) as covariates in our analyses. Only properties that were significantly correlated with the dependent variables across experimental conditions following a sequential Bonferroni correction (Rice, 1989) for multiple comparisons were used as covariates. In our analyses of the CDD effect, the correlation between the magnitude of CDD and the CF in the 0-ms delay/60-ms duration condition was significant (r = –0.48; P = 0.006; N = 32); however, none of the other 15 correlations between our four quantitative descriptors of FTCs and the magnitudes of CDD in the other signal onset delay × signal duration conditions was significant (0.01 ≤ |r| ≤ 0.30; 0.092 ≤ P ≤ 0.935; N = 32). Therefore, no covariates were used in our analyses of the magnitude of CDD. In our analyses of masked thresholds, we included Q10dB as a covariate because the correlations between masked thresholds and Q10dB were significant and negative (−0.56 < r <−0.43; 0.001 < P < 0.013; N = 32) across all 12 stimulus combinations of envelope correlation, signal onset delay and signal duration. There were no significant correlations between the masked detection thresholds and the CF (−0.42 < r <−0.19), the pure-tone threshold (0.15 < r < 0.47) or the bandwidth at 10 dB above threshold (0.28 < r < 0.46).
Results
We obtained FTCs and signal detection thresholds in response to all 12 stimulus sequences from 32 different recording sites (15-dB spectrum level: N = 16 sites; 50-dB spectrum level: N = 16 sites). The mean (± SD) CF was 2000 ± 357 Hz, which is the same as the centre frequency of the signals used in the behavioural study by Langemann & Klump (2007). The mean neural threshold was 7.5 ± 11.2 dB, and the excitatory region of the frequency-tuning curve had a mean bandwidth of 628 ± 278 Hz at 10 dB above the pure-tone threshold (mean Q10dB = 3.7 ± 1.5). These results are similar to those reported previously by Nieder & Klump (1999a). Readers are referred to Buus et al. (1995), Langemann et al. (1995) and Nieder & Klump (1999a) for discussions of the relationships between the bandwidths of tuning curves in the forebrain and auditory periphery, and psychophysically determined critical bandwidths.
CDD
The average magnitudes of CDD ranged between −2.3 dB and −8.3 dB across experimental treatments, and were significantly less than 0 dB at all combinations of signal onset delay and signal duration at the 50-dB spectrum level, and at both signal durations at the 0-ms signal onset delay and 15-dB spectrum level (Table 2). Hence, the signal detection thresholds of starling field L2 neurons exhibited the CDD effect. We compared the magnitude of CDD across stimulus conditions using a two (signal onset delay, within) × two (signal duration, within) × two (spectrum level, between) × two (FTC type, between) anova. The main effect of signal duration was significant (F1,26 = 13.5, P = 0.0011, η2 = 0.34), and the signal onset delay × signal duration (F1,26 = 3.9, P = 0.0585; η2 = 0.13) and signal duration × masker spectrum level (F1,26 = 3.5, P = 0.0720; η2 = 0.12) interactions approached significance. Other effects in the anova were non-significant and associated with small effect sizes (η2 < 0.10). In general, the magnitude of CDD was greater (more negative) when the signal duration was 400 ms compared with the 60-ms duration (Fig. 5A). These duration-dependent differences in CDD were similar at both levels of signal onset delay when the masker spectrum level was 50 dB (Fig. 5A, right panel). At the lower masker spectrum level of 15 dB (Fig. 5A, left panel), however, the duration-dependent differences in the magnitude of CDD were more pronounced in the 0-ms delay conditions, at which the CDD effect was present compared with the 100-ms delay, at which no significant CDD effect was observed (Table 2).
Stimulus properties | CDD effect (dB) | ||||||
---|---|---|---|---|---|---|---|
Neural thresholds | Behavioural thresholds (mean) | Differences (|neural CDD − behavioural CDD|) | |||||
Masker spectrum level (dB) | Signal onset delay (ms) | Signal duration(ms) | (mean ± SD) | t 15-value | P-value | ||
15 | 0 | 60 | −3.5 ± 4.3 | −3.2 | 0.0060 | −3.7 | 0.2 |
15 | 0 | 400 | −6.2 ± 5.4 | −4.6 | 0.0003 | −2.7 | 3.5 |
15 | 100 | 60 | −2.8 ± 6.4 | −1.8 | 0.1003 | −5.2 | 2.4 |
15 | 100 | 400 | −2.3 ± 8.6 | −1.1 | 0.2985 | −3.2 | 0.9 |
50 | 0 | 60 | −2.7 ± 3.2 | −3.3 | 0.0046 | −2.2 | 0.5 |
50 | 0 | 400 | −8.3 ± 3.4 | −9.7 | < 0.0001 | −6.6 | 1.7 |
50 | 100 | 60 | −4.2 ± 3.6 | −4.7 | 0.0003 | −3.1 | 1.1 |
50 | 100 | 400 | −7.4 ± 2.8 | −10.7 | < 0.0001 | −6.6 | 0.8 |
- Statistical results from t-tests are of the null hypothesis that the mean magnitudes of comodulation detection difference (CDD) based on neural thresholds were not significantly different from 0 dB.

Comodulation detection difference (CDD) and masked detection thresholds in the starling forebrain. (A) The mean (± SE) magnitude of CDD for the 15-dB (left) and 50-dB (right) masker spectrum levels depicted as a function of signal duration (60 ms or 400 ms) with signal onset delay as the parameter. The CDD effect was calculated as CDD = signal-to-masker ratio (SMR)CU − SMRAC (McFadden, 1987). The horizontal dashed line indicates no difference between threshold SMRs in the co-uncorrelated (CU) and all correlated (AC) conditions, and negative values indicate a CDD effect. (B) The mean (± SE) SMRs (in dB) at thresholds for the 15-dB (top) and 50-dB (bottom) masker spectrum levels depicted as a function of correlation condition [all uncorrelated (AU), AC and CU] for each signal duration (60 ms or 400 ms), with signal onset delay (0 ms or 100 ms) as the parameter.
Auditory grouping
To test the auditory grouping hypothesis, we compared masked thresholds using a three (correlation, within) × two (signal onset delay, within) × two (signal duration, within) × two (masker spectrum level, between) × two (FTC type, between) ancova that included the recording site's Q10dB value as a covariate. The mean (± SE) detection thresholds are depicted in Fig. 5B as functions of envelope correlation, signal onset delay, signal duration and masker spectrum level. For brevity, we report below only what we regard as the most important trends, and do not discuss relatively small differences across treatments.
The main effect of envelope correlation was significant (F2,50 = 15.7, P < 0.0001, η2 = 0.39), but differences among the three correlation conditions were not entirely consistent with our predictions (Table 1). Averaged over the levels of other factors in the ancova model, the mean (± SD) threshold in the CU condition was the lowest (3.5 ± 11.3 dB), as we predicted; however, the mean threshold in the AU condition (10.9 ± 14.1 dB) was the highest and that for the AC condition (8.3 ± 10.8 dB) was intermediate (Fig. 5B), which is the reverse rank order that we predicted for these two correlation conditions (Table 1). The thresholds in the CU, AC and AU conditions were significantly different in all pairwise comparisons across the three levels of correlation (post hoc Scheffé tests: P < 0.0001). Generally similar trends in the threshold differences among the CU, AC and AU conditions were evident at most combinations of signal onset delay, signal duration and masker spectrum level (Fig. 5B).
The effect of signal onset delay was also significant (F1,25 = 23.5, P < 0.0001, η2 = 0.48) and in the predicted direction (Table 1). When averaged over the levels of other factors in the ancova model, however, the mean thresholds in the 100-ms delay conditions (6.6 ± 12.7 dB) were only about 2 dB lower than those in the synchronous onset conditions (8.6 ± 15.6 dB). As illustrated in Fig. 5B, there were no clear associations between the effects of signal onset delay and envelope correlation, signal duration, and masker spectrum level. The interaction between signal onset delay and the covariate (Q10dB) was also significant, but was associated with a much smaller effect size (F1,25 = 6.2, P = 0.0194, η2 = 0.20) compared with the main effect of signal onset delay.
The main effects of masker spectrum level (F1,25 = 8.6, P = 0.0070, η2 = 0.26) and signal duration (F1,25 = 4.8, P = 0.0382, η2 = 0.16) were also significant. Signal detection thresholds were on average about 6 dB lower when the masker spectrum level was 50 dB (4.5 ± 6.0 dB) compared with the 15-dB spectrum level (10.6 ± 5.6 dB; see Fig. 5B). Thresholds averaged about 4 dB lower when the signal duration was 400 ms (5.7 ± 14.6 dB) compared with the 60-ms signal duration (9.5 ± 14.3 dB; see Fig. 5B). The main effect of FTC type was non-significant (F1,25 = 2.8, P = 0.0819, η2 = 0.18), and none of the interactions between FTC type and one of our manipulated variables approached significance (0.1300 < P < 0.8750). All other interaction terms in the ancova model were non-significant (0.1125 < P < 0.9840) and associated with small effect sizes (η2 < 0.09).
Masker-driven responses and rate-level functions
The effects of differences in envelope correlation on signal detection thresholds are best considered in light of rate-level functions and the responses to the masker alone. As illustrated in Fig. 6, the AU correlation conditions consistently elicited more intense masker-driven responses compared with the AC and CU conditions, which were similar (Fig. 6). These differences were also evident in the peristimulus time histograms (PSTHs) (Fig. 7) and the rate-level functions (Fig. 8A) of individual recordings sites. The greater responses to the masker in the AU conditions, compared with the AC and CU conditions (Fig. 6), probably contributed to the higher signal detection thresholds in the AU conditions compared with the AC and CU conditions. But what about the threshold differences between the AC and CU conditions, namely the CDD effect? In the AC and CU conditions, the maskers were the same (i.e. six comodulated flanking bands), and these two conditions differed only in whether or not the signal envelope was correlated with that of the flanking bands. Not surprisingly, the masker-driven responses in these two correlation conditions were also quite similar (Fig. 6). The differences between the AC and CU conditions resulted from differences in the rate-level functions in these two conditions. While impulse rates (and da) generally increased with increasing signal level in both the AU and CU conditions, this was not the case for the rate-level functions in the AC conditions, which were non-monotonic (Fig. 8A and B). The prominent ‘dip’ in the AC rate-level functions was a common feature across recording sites, and it tended to occur when the overall level of the correlated signal was similar to or slightly lower than the overall level of the masker at both the 15-dB and 50-dB masker spectrum levels (7, 8). Note that while the functions of Köppl & Yates (1999) did not fit the dip in the AC condition (see Fig. 6A and B), they were very effective at fitting the regions of the da-level function at signal levels below and above those at which the dip occurred, and at which the da-level function first exceeded our threshold criterion. The presence of the non-monotonic dip in the AC rate-level functions, but not the CU conditions, resulted in relatively higher thresholds in the former conditions. Hence, some portion of the CDD effect can ultimately be traced back to differences in the shapes of the rate-level functions between the CU and AC correlation conditions (Buschermöhle et al., 2006).

Masker-driven responses. The mean (± SE) impulse rate elicited in the masker-alone conditions for the 15-dB (top) and 50-dB (bottom) masker spectrum levels depicted as a function of correlation condition [all uncorrelated (AU), all correlated (AC) and co-uncorrelated (CU)] for each signal duration (60 ms or 400 ms), with signal onset delay (0 ms or 100 ms) as the parameter. For the effects of duration and onset delay the magnitude of the masker-driven responses was determined over an analysis window that had a duration and onset delay that were the same as the analysis windows over which responses to signals were analysed in the signal plus masker stimulus combinations.

Colour-coded PSTHs. These PSTHs depict the responses from a representative recording site to each of 12 stimulus conditions when the flanking band masker was presented at a spectrum level of 50 dB sound pressure level (SPL). The 12 stimulus conditions varied according to correlation condition [all uncorrelated (AU), all correlated (AC) or co-uncorrelated (CU)], signal duration (60 ms or 400 ms) and signal onset delay relative to the masker onset (0 ms or 100 ms). Each of the 12 plots shows 21 PSTHs, one for the masker-driven response (‘M’, bottom row of each plot) and one for each of the 20 overall signal levels that were tested (5–100 dB SPL). Each PSTH depicts the distributions of impulse counts (in 5-ms bins) in a 1.3-s window centred around the stimulus. Impulse counts (colour-coded z axis) are summed over 20 stimulus realizations and depicted as a function of time (x axis) and overall signal level (y axis).

Representative rate-level and da-level functions for neural data and model output. (A) Neural rate-level functions (top panels) and da-level functions (bottom panels) from a representative recording site tested with the 50-dB masker spectrum level illustrating the changes in impulse rate that occurred as a function of overall signal level for the all uncorrelated (AU), all correlated (AC) and co-uncorrelated (CU) correlation conditions (signal onset delay: 0 ms; signal duration: 60 ms). ‘M’ and the filled data points depict the masker-driven responses in each condition. The da-level functions in the lower panels are from the same recording site and conditions depicted in the upper panels after converting impulse rates to a measure of sensitivity (da) using signal detection theory [Eq. (1)]. Black horizontal lines depict a threshold criterion of da = 1.8. Smooth curves represent the fitted da-level curves from the functions of Köppl & Yates (1999). Vertical lines show the point where the fitted da-level function crossed the threshold criterion. (B) The rate-level functions (top panels) and the da-level functions (bottom panels) from the model output for the same recording site shown in (A). (C) The rate-level functions (top panels) and the da-level functions (bottom panels) from the model output for a recording site with one of the poorest model fits. SPL, sound pressure level.
As reported above, the effects of signal onset delay were small but consistent with an across-channel auditory grouping hypothesis in that detection thresholds were higher when the signal and masker were gated synchronously (Table 1). Examination of the masker-driven responses, however, suggests a possible alternative explanation. Higher detection thresholds in the 0-ms signal onset conditions may, in part, be related to the ‘primary-like’ responses of field L2 neurons (Nieder & Klump, 1999b). For example, in the masker-alone conditions, and for relatively low overall signal levels, the flanking band maskers (350–950 ms) elicited a strong onset response followed by a lower and sustained response over the duration of the masker (see neural traces in Fig. 2 and PSTHs in Fig. 7). Similar patterns in masker-driven responses were observed at both masker spectrum levels. Hence, masker-driven impulse rates were greater in the 0-ms delay conditions, compared with the 100-ms delay conditions, and this difference was more pronounced for the 60-ms analysis window (Fig. 6). The phasic response to the onset of the masker therefore likely contributed to the small but significant effect of signal onset delay. In other words, detection thresholds were higher when the signal was gated synchronously with the masker because the response to the masker included the large phasic onset response. Indeed, such an overshoot effect has been reported in a previous study of simultaneous masking of starling forebrain neurons (Nieder & Klump, 1999b).
Within-channel model
The rate-level (and da-level) functions based on the time- and trial-averaged firing rates output by the model were similar to those observed for the actual rate-level functions of field L2 neurons (Fig. 8). While there was some variation in how well the model reproduced the rate-level functions from different recording sites (cf. Fig. 8B and C), the model generally captured two important aspects of real rate-level functions. First, the masker-driven responses in the AU conditions were higher than those in the AC and CU conditions (e.g. Fig. 8B). Second, the model was able to successfully reproduce the prominent ‘dip’ in the rate-level functions of the AC conditions (e.g. Fig. 8B).
Across the eight combinations of signal onset delay, signal duration and masker spectrum level, the differences between signal detection thresholds generated by the model in the CU and AC conditions were significantly less than zero, indicating that the model generated a CDD effect in all stimulus conditions tested (Table 3; Fig. 9A). To examine the effects of signal onset delay, signal duration and masker spectrum level on the model's output, we compared the magnitudes of CDD across stimulus conditions using a two (signal onset delay, within) × two (signal duration, within) × two (spectrum level, between) anova. The only significant main effect was signal duration (F1,30 = 43.7, P < 0.0001, η2 = 0.59). The mean magnitude of CDD in the model was slightly, but significantly, larger when the signal duration was 400 ms (−2.7 ± 1.5 dB) compared with 60 ms (−1.6 ± 1.5 dB). This duration-dependent difference in the magnitude of CDD was more pronounced at the 50-dB masker spectrum level compared with the 15-dB spectrum level (Fig. 9A), which accounts for the significant signal duration × spectrum level interaction (F1,30 = 18.3, P = 0.0002, η2 = 0.38). No other main effects or interactions were significant.
Stimulus properties | CDD effect (dB) | ||||||
---|---|---|---|---|---|---|---|
Model thresholds | Neural thresholds(mean) | Differences (|model CDD − neural CDD|) | |||||
Masker spectrumlevel (dB) | Signal onsetdelay (ms) | Signal duration(ms) | (mean ± SD) | t 15-value | P-value | ||
15 | 0 | 60 | −1.4 ± 1.9 | −2.9 | 0.0109 | −3.5 | 2.1 |
15 | 0 | 400 | −1.5 ± 1.1 | −5.5 | 0.0001 | −6.2 | 4.7 |
15 | 100 | 60 | −1.0 ± 0.6 | −6.6 | < 0.0001 | −2.8 | 1.8 |
15 | 100 | 400 | −1.7 ± 1.0 | −6.6 | < 0.0001 | −2.3 | 0.6 |
50 | 0 | 60 | −2.1 ± 4.3 | −4.7 | 0.0003 | −2.7 | 0.6 |
50 | 0 | 400 | −3.9 ± 5.4 | −13.8 | < 0.0001 | −8.3 | 4.4 |
50 | 100 | 60 | −2.0 ± 6.4 | −6.2 | < 0.0001 | −4.2 | 2.2 |
50 | 100 | 400 | −3.7 ± 8.6 | −13.0 | < 0.0001 | −7.4 | 3.7 |
- Statistical results from t-tests are of the null hypothesis that the mean magnitudes of comodulation detection difference (CDD) based on model thresholds were not significantly different from 0 dB.

Comodulation detection difference (CDD) and masked detection thresholds from a within-channel peripheral model. (A) The mean (± SE) magnitude of CDD for the 15-dB (left) and 50-dB (right) masker spectrum levels depicted as a function of signal duration (60 ms or 400 ms), with signal onset delay as the parameter. Negative values indicate a CDD effect. (B) The mean (± SE) signal-to-masker ratios (SMR in dB) at thresholds for the 15-dB (top) and 50-dB (bottom) masker spectrum levels depicted as a function of correlation condition [all uncorrelated (AU), all correlated (AC) and co-uncorrelated (CU)] for each signal duration (60 ms or 400 ms), with signal onset delay (0 ms or 100 ms) as the parameter. For comparison to neural data, the results depicted in Fig. 5 are reproduced here in greyscale.
The mean magnitudes of CDD in the model's output across stimulus conditions (−1.0 dB to −3.9 dB) were uniformly smaller than those measured in field L2 (Table 3). This underestimation of the CDD effect by the model is directly related to the model's overestimation of masked neural thresholds. Across stimulus conditions, the model overestimated neural thresholds by about 3.7 dB (range: 0.3–7.2 dB; Fig. 9B). Importantly, however, this overestimation was not random with respect to the three correlation conditions. Rather, the model overestimated the mean thresholds in the CU, AC and AU conditions by 5.8 dB, 3.2 dB and 1.9 dB, respectively. Given that the actual thresholds in these three conditions were ranked in the opposite order (i.e. AU > AC > CU), the threshold overestimations by the model had the effect of reducing the threshold differences among the correlation conditions (and hence the magnitude of CDD) in the model's output.
We assessed the effects of various stimulus manipulations on the model's output by comparing model thresholds in a three (correlation, within) × two (signal onset delay, within) × two (signal duration, within) × two (masker spectrum level, between) anova. To facilitate direct comparison to the physiological data, we included the recording site's Q10dB as a covariate, and we indicate effect sizes for the model () and those from the parallel analysis of neural thresholds reported above (
). The main effects of correlation (F2,58 = 26.2, P < 0.0001,
= 0.47,
= 0.39), signal duration (F1,29 = 37.5, P < 0.0001,
= 0.56,
= 0.16) and spectrum level (F1,29 = 9.8, P = 0.0040,
= 0.25,
= 0.26) were significant in analyses of both the model's output and neural thresholds determined physiologically. The following general trends were similar in both the model and physiological results (Fig. 9B): (i) thresholds were highest in the AU condition, lowest in the CU condition and intermediate in the AC condition; (ii) thresholds were lower with the 400-ms signal compared with the 60-ms signal; and (iii) thresholds were lower at the 50-dB spectrum level compared with the 15-dB spectrum level. One important difference between the model and neural thresholds concerns the effect of signal onset delay, which was significant for neural thresholds (see above) but was not significant in the model output (F1,29 = 0.7, P = 0.4253,
= 0.02,
= 0.48). A number of two-way and three-way interactions were significant at the α = 0.05 level in analyses of model thresholds, but not in analyses of neural thresholds. However, most of these significant interactions in the analyses of model thresholds were associated with relatively smaller effect sizes (0.11 <
< 0.19) compared with the significant main effects in the same analyses.
Discussion
In the real world, behaviourally relevant signals and masking background noises exhibit patterns of fluctuation in amplitude that the auditory system can exploit to segregate signals from noise (Nelken et al., 1999; Singh & Theunissen, 2003; Verhey et al., 2003). The overarching goal of this study and our companion study (Langemann & Klump, 2007) was to investigate signal detection in an animal model when both signals and background noise fluctuate in amplitude. In-depth comparisons of perceptual CDD effects in humans and starlings are discussed by Langemann & Klump (2007). Here, we focus on comparisons between starling perception and physiology. Compared with the absolute SMRs at threshold reported by Langemann & Klump (2007), those reported here for neural responses averaged about 11.5 dB higher. This difference in magnitude between behaviourally and physiologically determined thresholds in parallel experiments likely stems from two causes. First, behavioural decisions may, in part, be mediated by the most sensitive neurons (Parker & Newsome, 1998). Second, in starlings, auditory filters in the CNS have wider bandwidths than behaviourally measured filters, and thus integrate more of the masker energy (Langemann et al., 1995; Nieder & Klump, 1999a; see Langemann & Klump, 2007 for further discussion). What is important for comparing detection thresholds determined using behavioural and neurophysiological methods are the relative effects of various stimulus manipulations.
The effects of envelope correlation had strong and similar influences on behavioural and neural thresholds. Thresholds in the AU condition were generally highest, those in the CU condition were lowest and those in the AC condition were intermediate (cf. Fig. 5B in this study and fig. 2 in Langemann & Klump, 2007). We return to our discussion of these effects below. Behavioural and neural thresholds were also influenced by signal duration, with lower thresholds in response to the 400-ms signal compared with the 60-ms signal. These results most likely reflect improvements in signal detection resulting from the longer temporal integration of signal energy over the 400-ms signal (Heil & Neubauer, 2004) and the overshoot effect associated with the primary-like responses of field L2 neurons (Smith & Zwislocki, 1971; Nieder & Klump, 1999b). In both behaviour and neurophysiology, masked detection thresholds were lower in the presence of the higher (50 dB) masker spectrum level. One difference between the comparisons of behavioural and neural thresholds concerns the effects of signal onset delay, which were significant in this study, but not in the study by Langemann & Klump (2007). We believe this difference is a relatively minor one when considering that the average difference in thresholds due to differences in signal onset delay were about 0.5 dB in behaviour and 2 dB in the neural responses. The only other differences in the effects of manipulated variables on behavioural and neural thresholds involved three interactions (two-way: correlation × level and signal onset delay × signal duration; three-way: correlation × signal duration × spectrum level). These interactions were significant in analyses of behavioural thresholds, but not neural thresholds. Readers are referred to Langemann & Klump (2007) for a discussion of these effects on behavioural thresholds.
CDD
European starlings experience a CDD effect that is similar in magnitude to that reported for humans (Langemann & Klump, 2007). The signal detection thresholds of starling forebrain neurons were about 2–8 dB lower in the CU conditions compared with the AC conditions, indicating a considerable CDD effect (Table 2; Fig. 5A). In other words, the ability of forebrain neurons to detect an amplitude-modulated signal improved when the signal envelope fluctuated independently of that of the comodulated flanking bands (the CU condition) compared with conditions in which the signal envelope was correlated with that of the flanking bands (the AC condition). As summarized in Table 2, the deviation between neural and behavioural estimates of CDD across combinations of signal onset delay, signal duration and masker spectrum level was always less than 4 dB and, in most cases, this deviation was less than 2 dB (Table 2). These findings support the hypothesis that the responses of forebrain neurons represent neural correlates of the CDD effect demonstrated in the psychoacoustic study of Langemann & Klump (2007).
Auditory grouping
One hypothesis for the CDD effect in humans centres around the across-channel processing of auditory grouping cues (e.g. Cohen & Schubert, 1987; McFadden, 1987; Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002). Common patterns of amplitude modulation and common onsets between the signal and the flanking bands should promote their fusion into a single auditory object, and hence make signal detection more difficult. According to the auditory grouping hypothesis, we predicted that signal detection thresholds would be highest in the AC condition, lowest in the CU condition and intermediate in the AU condition (Table 1). Although thresholds in the AC condition were higher than those in the CU condition, thresholds in the AU condition were generally the highest (Fig. 5B). This general trend in relative thresholds across the three correlation conditions (AU > AC > CU) was also observed in the output of the within-channel model, and is consistent with the behavioural thresholds reported by Langemann & Klump (2007). Thus, the observed thresholds for the CU and AC conditions from behaviour, neurophysiology and a within-channel model were consistent with expectations for an across-channel explanation for the CDD effect (CU < AC); however, as the model results show (Fig. 9), this general pattern of threshold differences as a function of envelope correlation are also consistent with the operation of within-channel processes that do not require across-channel auditory grouping.
We also predicted that signal detection thresholds would be relatively higher when the signal and flanking bands had synchronous onsets and were consequently grouped together (Table 1). While our results on neural thresholds are generally consistent with this prediction (Fig. 3B), the threshold improvements due to the 100-ms delay were small, and were not strongly and consistently associated with particular combinations of envelope correlation, signal duration and masker spectrum level (Fig. 5B). Thus, if auditory grouping based on common onset is the explanation for the effect of signal onset delay on neural thresholds reported here, we would conclude that the effects of this auditory grouping cue on neural signal detection were relatively weak in our experimental conditions. Moreover, the effects of signal onset delay on neural thresholds may have been related, at least in part, to the larger masker-driven responses in the synchronous onset conditions (Fig. 6). Importantly, signal onset delay had no significant effects on determining masked thresholds in behaviour (Langemann & Klump, 2007) or in the model's determination of masked thresholds. Taken together, we believe the results of this study and that of Langemann & Klump (2007) do not provide strong evidence for the operation of across-channel grouping based on common onset.
Effects of frequency tuning
The influences of frequency tuning on the magnitude of CDD and the SMRs at thresholds were small. For the CDD effect, the recording site's FTC type (Fig. 4) influenced neither the magnitude of CDD nor did it enter into any interactions with signal onset delay, signal duration or masker spectrum level. Likewise, there were no significant correlations between the magnitude of CDD and the recording site's CF, threshold, bandwidth at 10 dB above threshold or the site's Q10dB value. Thus, we found very little evidence that the magnitude of CDD depended on the frequency-tuning characteristics of the recording site. The recording site's Q10dB value was strongly and negatively related to the SMR at threshold, indicating that more sharply tuned recording sites had lower masked thresholds. This result makes sense considering that the ratio of signal energy to flanking band energy at a nominal SMR would be relatively greater in a more sharply tuned filter centred on the frequency of the signal. Other tuning characteristics, including FTC type, had small or negligible effects on masked thresholds. We believe the lack of effects on CDD and masked thresholds related to aspects of frequency tuning, especially bandwidth and FTC type, further support the hypothesis that the across-channel processing of auditory grouping cues plays a minor role in the CDD effect.
A within-channel model for CDD
For most recording sites, the model initially proposed by Buschermöhle et al. (2006) successfully reproduced the higher masker-driven responses in the AU conditions and the prominent ‘dip’ in the AC rate-level functions. Importantly, both of these features are already clearly present in the mean compressed envelope values of the filtered CDD stimuli (Buschermöhle et al., 2006). The average compressed envelope of the uncorrelated flanking bands in the AU condition is larger than that of the comodulated flanking bands in the AC and CU conditions due to destructive interference between the correlated bands in the AC and CU conditions. No such destructive interference occurs between the uncorrelated flanking bands in the AU conditions. Destructive interference is also responsible for the dip in the AC rate-level curves. The existing destructive interference between the co-modulated flanking bands in the AC and CU correlation conditions is enhanced in the AC condition with the addition of a correlated signal band. Consequently, when the level of the signal approaches the overall level of the flanking bands, the average compressed envelope gets smaller before increasing again once the overall envelope becomes dominated by the signal envelope. No such additional interference occurs when the signal is added in the CU conditions because the signal is not correlated with the flanking bands. Buschermöhle et al. (2006) provide further discussion of the origins of this interference in the CDD stimulus paradigm.
In general, the model also performed fairly well in reproducing the effects of signal onset delay, signal duration and masker spectrum level on the magnitudes of CDD (Fig. 9A) and masked detection thresholds (Fig. 9B). The largest discrepancy between the modelled and physiologically determined thresholds was the effect of signal onset delay, which was significant for neural thresholds but not for the model output. As noted earlier, although the effects of signal onset delay on neural thresholds were significant, these effects were small, and no effects of signal onset delay were observed in the study by Langemann & Klump (2007). Hence, the importance of this discrepancy for refining the model remains unclear.
Two notable and related features of the model's performance in this study are that it underestimated the mean magnitudes of the neural CDD effect and overestimated the masked detection thresholds (Fig. 9). There are two general classes of explanation for this shortcoming. First, our model does not include across-channel processing, which some have hypothesized could contribute to the CDD effect (e.g. Cohen & Schubert, 1987; McFadden, 1987). While the model itself does not exclude the possible operation of across-channel processes, we found little evidence for auditory grouping in the patterns of neural thresholds in this study and behavioural thresholds in our companion study (Langemann & Klump, 2007). Second, and more likely, the model in its current form does not capture the full breadth of the within-channel processes that may contribute to CDD. Langemann & Klump (2007) discuss these additional within-channel cues in more detail. We believe one likely within-channel mechanism that is not currently implemented in the model, but that could also contribute to CDD, is the suppression that results from cochlear mechanics, which has been demonstrated in the peripheral auditory systems of birds and mammals (Manley, 1990; see Moore & Borrill, 2002 for further discussion).
Conclusions
Results from the present study are consistent with our psychoacoustic experiment with starlings (Langemann & Klump, 2007) and provide the first evidence for the neurophysiological correlates of the CDD effect reported in earlier studies of humans (Cohen & Schubert, 1987; McFadden, 1987; Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002). Together, these studies of humans and starlings provide only weak evidence at best for the operation of across-channel processes related to auditory grouping in the CDD effect. Rather, CDD appears to be largely mediated by within-channel processes. Many of the effects of correlation condition reported here were in large part due to differences in rate-level functions and masker-driven responses (Fig. 8), which could be reproduced with some success using our implementation of the within-channel model of Buschermöhle et al. (2006). The model was also able to successfully simulate many of the effects that differences in signal onset delay, signal duration and masker spectrum level had on the magnitudes of CDD and masked detection thresholds in the forebrain. While there is still room for improving the model, we believe the model in combination with our neural data lends further support to the hypothesis that the within-channel processing of envelope cues plays important roles in the detection of modulated signals in modulated noise. At present, we do not know if the effects reported here for the auditory forebrain might be observed elsewhere in the auditory system; however, given the lack of strong evidence for across-channel auditory grouping and the results of our within-channel model of peripheral processing, we would predict that CDD effects similar to those reported here for forebrain neurons might be observed at lower levels of the auditory system.
Acknowledgements
This research was supported by the Deutsche Forschungsgemeinschaft (FOR 306 ‘Hörobjekte’, SFB/TRR 31 ‘The Active Auditory System’), and by postdoctoral fellowships to M.A.B. from the National Science Foundation (INT-0107304) and the International Graduate School for Neurosensory Science and Systems (DFG GK 591). Numerous discussions with Ulrike Langemann at various stages of the study were very helpful and much appreciated. The botanical garden of the University of Oldenburg kindly houses the aviary with our stock of starlings.
Abbreviations
-
- AC
-
- all correlated
-
- AU
-
- all uncorrelated
-
- CDD
-
- comodulation detection difference
-
- CF
-
- characteristic frequency
-
- CMR
-
- comodulation masking release
-
- CU
-
- co-uncorrelated
-
- FTC
-
- frequency-tuning curve
-
- PSTH
-
- peristimulus time histogram
-
- SMR
-
- signal-to-masker ratio
-
- SPL
-
- sound pressure level.