Volume 26, Issue 7 pp. 1969-1978
Full Access

Detecting modulated signals in modulated noise: (I) behavioural auditory thresholds in a songbird

Ulrike Langemann

Ulrike Langemann

Institut für Biologie und Umweltwissenschaften, Fakultät V, Universität Oldenburg, Carl von Ossietzky Str. 9-11, D-26129 Oldenburg, Germany

Search for more papers by this author
Georg M. Klump

Georg M. Klump

Institut für Biologie und Umweltwissenschaften, Fakultät V, Universität Oldenburg, Carl von Ossietzky Str. 9-11, D-26129 Oldenburg, Germany

Search for more papers by this author
First published: 25 September 2007
Citations: 13
Dr U. Langemann, as above.
E-mail: [email protected]

Abstract

Most signals from the auditory world have temporal patterns of amplitude modulation that either emanate from the signal source or result from environmental interference (e.g. air turbulence). To investigate mechanisms associated with the segregation and processing of amplitude-modulated signals, we trained European starlings (Sturnus vulgaris) to detect a signal noise band embedded in several flanking noise bands (FBs). We manipulated the envelope correlation between the signal and FBs, the onset synchrony between signal and FBs (0 or100 ms), signal duration (60 or 400 ms) and the spectrum level of the FBs (15 or 50 dB). The lowest signal-detection thresholds were found when the envelopes of the FBs were correlated with each other but different from the signal envelope (the ‘co-uncorrelated’ condition). Detection thresholds were on average 7 dB higher when both the signal and the FBs had correlated envelopes (the ‘all correlated’ condition). Thresholds were even higher when the envelopes of all noise bands were independent (the ‘all uncorrelated’ condition). The difference in detection thresholds between the co-uncorrelated and the all correlated conditions is termed ‘comodulation detection difference’ (CDD). Differences in signal duration and masker level had significant effects on detection threshold, but not on CDD magnitudes; differences in onset synchrony had no effects. We compare data from starlings with those from previous psychoacoustic studies of humans, and discuss possible mechanisms on which these perceptual effects may rely. Our behavioural data are the reference for a companion study investigating CDD at the neuronal level of the starling [M.A. Bee et al. (2007) Eur. J. Neurosci., 26, 1979–1994].

Introduction

The vertebrate auditory system is evolutionarily adapted to segregate behaviourally relevant sounds from interfering sounds in the environment (Gans, 1992). Interfering sounds have the potential to mask the relevant sounds, such as communication signals, resulting in reduced signal-to-noise ratios for detecting these signals. Signal detection in noise critically depends on the temporal structures of both signals and the environmental background noise, both of which typically have pronounced amplitude fluctuations (e.g. Klump, 1996; Nelken et al., 1999; Singh & Theunissen, 2003). In human auditory scene analysis, differences in the amplitude modulation patterns of simultaneous spectral components play an important role for segregating sound sources (Bregman, 1990). The problem of sound source segregation can also be demonstrated in psychoacoustic experiments studying co-modulation detection difference (CDD). The CDD effect refers to an improvement in signal detection for an amplitude-modulated signal masked by amplitude-modulated background noise. The CDD effect has been studied using narrow bands of noise both for the signal (a single noise band) and for the masking background (multiple noise bands). It is determined by comparing detection thresholds in two different conditions. Signal detection is improved when the envelopes of the background noise bands are correlated with each other but are different from the envelope of the signal compared with when the envelopes of all background noise bands are similar to that of the signal (e.g. Cohen & Schubert, 1987; McFadden, 1987; McFadden & Wright, 1990; Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002).

There is some debate about the underlying mechanisms of CDD (e.g. Borrill & Moore, 2002). Previous explanations of these mechanisms have been based on theoretical models derived from psychoacoustic studies and applied to the stimuli in the different experimental conditions (e.g. Borrill & Moore, 2002). What is missing to date is an animal model system that would allow us to directly investigate the neuronal mechanisms associated with the perceptual CDD effect. Here we present an animal model, the European starling, a bird species that can easily be trained in psychoacoustic experiments and that also offers the opportunity for collecting data at the neuronal level. One objective in this study was to test the hypothesis that starlings experience perceptual effects of CDD that are similar to those demonstrated in humans. Another objective was to evaluate which auditory processes are used in the analysis of CDD stimuli. One possible type of mechanism involves the analysis of temporal patterns of modulation within a single auditory filter. Alternatively, the auditory system could exploit cues present in the stimulus that promote auditory grouping based on comparisons across different auditory filters. Here, we assess the relevance of the different mechanisms behaviourally. In a companion paper (Bee et al., 2007) we describe neural correlates of CDD in the starling forebrain and apply a model of peripheral coding that predicts the observed neural response patterns and that highlights the contribution of inner ear processing within a single auditory filter to the perceptual CDD effect.

Materials and methods

Subjects

The five European starling (Sturnus vulgaris) subjects were maintained on a natural day/night cycle and were housed in individual cages of 80 × 40 × 40 cm in a common room with other birds. All birds were experienced psychoacoustic subjects. They were kept at about 95% of their free-feeding weight and had unrestricted access to water. The food rewards during the experimental sessions consisted of pieces of mealworms (larvae of Tenebrio molitor) that were favourite food items for the birds. Outside the experimental sessions the starlings were fed with duck-food pellets (Geflügelfinisher, Alleinfutter I, Raiffeisen Kraftfuttermittelwerk Dörpen GmbH, Dörpen, Germany). The care and treatment of the birds were in accordance with the procedures of animal experimentation approved by the Bezirksregierung Weser-Ems, Germany, and with the European Community Council Directive of 24 November 1986 (86/609/EEC).

Experimental setup

Masked auditory thresholds were determined within an experimental cage (24 × 36 × 32 cm) placed in a sound-proof anechoic box (attenuation: 48 dB at 500 Hz, > 57 dB for frequencies of 1 kHz and above). For echo reduction, the double-walled box was lined with sound-absorbing wedges (Illbruck Illsonic Pyramide 100/100 mounted on 50 mm of Illsonic Plano, cutoff frequency of 500 Hz, α > 0.99; Illbruck GmbH, Leverkusen, Germany). Two response keys (observation key and report key) with light-emitting diodes (key lights) were mounted on one inside wall of the experimental cage. A rotary food dispenser that could be turned by a stepping motor was placed below the response keys. The behaviour of the birds during experimental sessions could be observed on a video monitor. All behavioural protocols, including the delivery of food rewards, were controlled by a Linux-operated microcomputer.

Stimulus generation and experimental parameters

Experimental stimuli were synthesized digitally (16 bit, 44.1 kHz sampling rate; Sound Blaster PCI 512, Creative Technology, Singapore, Republic of Singapore) in two separate channels of the soundcard in the computer, and the stimulus levels were adjusted independently by computer-controlled attenuators (TDT PA4; Tucker-Davis Technologies, Alachua, FL, USA). The channel with the background signals (see below) was then added to the channel with the test signals in the input stage of the amplifier (Yamaha A-520; Nippon Gakki, Japan) driving the speaker in the sound-proof chamber. The speaker (Twin 700, 200 Hz−9 kHz, ± 2.5 dB; Canton Elektronik GmbH & Co. KG, Weilrod, Germany) was positioned about 30 cm above the bird's head, and slightly behind it (6 cm).

We presented the birds with stimuli composed of several noise bands with a bandwidth of 100 Hz each. The birds had to detect a signal noise band (termed ‘signal’, with variable duration, see below) that was occasionally presented in the presence of a repeating background of six flanking bands (FBs, 600 ms duration, including 10 ms Hanning ramps, repeated every 1300 ms). The signal was centred at 2000 Hz, and the FBs were centred at 1100, 1400, 1700, 2300, 2600 and 2900 Hz (see Fig. 1 for examples of spectrograms and waveforms). Four experimental parameters were varied in a factorial design. (i) We presented three types of envelope correlation between the noise bands. In one condition, termed the ‘all uncorrelated’ (AU) condition, the temporal patterns of envelope fluctuations between signal and the six FBs were independent from each other. In the ‘all correlated’ (AC) condition, the temporal fluctuations in the envelope patterns of all noise bands were the same; and in the third condition − the ‘co-uncorrelated’ (CU) condition − the envelopes of the FBs were temporally correlated with each other but were not correlated with the envelope fluctuation of the signal. (ii) The signal either had a synchronous onset with the FBs (termed 0 ms delay) or the signal was delayed by 100 ms relative to the onset of the FBs. (iii) We chose signal durations of either 60 ms or 400 ms (including 10 ms Hanning ramps). (iv) The spectrum level of the FBs was either 15 dB SPL or 50 dB SPL.

Details are in the caption following the image

Example of spectrograms (left) and waveforms (right) of the three envelope conditions with delayed onset of the signal noise band. Different shades of grey represent the envelope fluctuation of the noise bands (amplitude maxima and minima). Waveforms in red depict the envelopes of six flanking bands (FBs); waveforms in black depict the signal noise band. (A) In the ‘all uncorrelated’ (AU) condition all noise band envelopes were different. (B) The FBs and signal noise band had the same envelope in the ‘all correlated’ (AC) condition. (C) In the ‘co-uncorrelated’ (CU) condition, the FBs had a common envelope that was different from the envelope of the signal noise band.

Each noise band was generated by multiplying a sinusoid of the required centre frequency with a low-pass-filtered noise with a cut-off frequency of 50 Hz, resulting in a 100 Hz wide noise band that was centred at the sinusoidal frequency. We created noise bursts with unique envelope patterns for every sound presentation as follows. A 30-s random sample of the 50-Hz low-pass-filtered noise was computed de novo before each experimental session. For every noise burst presented during an experimental session, 600-ms samples were randomly drawn from the 30-s sample. In the AC condition (signal and FBs correlated), one 600-ms noise band was randomly drawn from the low-pass noise and multiplied with each of the seven sinusoids (1100, 1400, 1700, 2000, 2300, 2600 and 2900 Hz), resulting in seven noise bands with coherently modulated envelopes. In the AU condition (all noise bands uncorrelated), seven different 600-ms samples were randomly drawn from the low-pass noise and each was multiplied with one of the seven sinusoids, thus generating seven noise bands with temporally uncorrelated envelopes. In the CU condition, two 600-ms noise bands were randomly drawn from the low-pass-filtered noise; one noise band was multiplied with the six sinusoids comprising the FBs and the other was multiplied by the 2000 Hz sinusoid constituting the signal noise band. This procedure resulted in six FBs with coherently modulated envelopes and a signal noise band with an envelope that was independent from that of the FBs. In all three correlation conditions (AC, AU and CU), the six, 600-ms-long noise bands comprising the FBs were added together and shaped by 10 ms onset/offset Hanning ramps. The signal noise band, centred at 2000 Hz, had a total duration of either 60 or 400 ms (including 10 ms onset/offset Hanning ramps), and was either gated synchronously with the FBs or it was delayed by 100 ms relative to the onset of the FBs. Sound levels were calibrated at least once per day (General Radio type 1982 sound level meter; Concord, MA, USA) by placing the microphone (General Radio 1/2-inch condenser microphone type 1962-9611) at about the location where the bird's head would be in an experiment.

Data acquisition: behavioural testing procedure

We trained starlings in a Go/NoGo procedure to repeatedly peck the observation key when only the repeating background comprised of the FBs alone was presented (NoGo condition), and to peck the report key when the signal was added to the background (test signal, Go stimulus). Each trial started when the bird pecked on the observation key. After a random waiting interval of up to 7 s (during which pecking on the observation key had no consequence), another peck on the observation key resulted in the presentation of the test signal. If the bird pecked the report key within 3 s from the start of the presentation of the test signal, the food tray rotated and the bird was reinforced by a food reward with a probability of 80%. This reinforcement schedule ensured that the capacity of the feeder was sufficient for the whole session. We always presented a feeder light as a secondary reinforcer. The trial ended with the reinforcement or, in the case of no response, after the 3-s response time had elapsed. To obtain a measure of spontaneous responding, 30% of the trials were catch trials during which no signal was added to the repeating background. The bird's responses in catch trials were scored as in test-signal trials. The rate of pecking the response key in catch trials was the false-alarm rate. A response during a catch trial or during a waiting interval before presentation of a test signal resulted in a time-out period of between 5 and 20 s with the lights in the experimental cage switched off.

Estimates of thresholds

We obtained masked auditory thresholds using the method of constant stimuli (e.g. Dooling & Okanoya, 1995). A block of 10 trials, consisting of three catch trials and a set of seven signal trials with the signal differing in sound pressure level (step size 3 dB), was repeated 10 times during a session with a randomized sequence of the trials in each block. At the beginning of each session, a block of 10 ‘warm-up’ trials was presented with test signals of an amplitude that was easily detected by the birds. Thus, a session was made up of 110 trials that the birds would usually finish in about 45–60 min. We excluded sessions from the analyses if the false-alarm rate exceeded 20% or if the birds reported the two signals of the set with the highest sound pressure level with a probability of less than 80%. If the estimated thresholds from two consecutive sessions differed by 3 dB or less, the data of both sessions were combined to a psychometric function consisting of 20 signal trials at each sound pressure level. Spontaneous responding was thus rated over 60 catch trials. Applying signal detection theory (e.g. Green & Swets, 1966), a threshold estimate was computed by linear interpolation of the psychometric function as the sound pressure level at which the value of the signal-detection measure d′ was 1.8.

At both masker levels, all combinations of signal duration and signal delay were presented in random order for each individual. We presented the three types of envelope correlation for any duration-delay combination one after another, but also with a randomized sequence. The first session of any new combination was regarded as a training session and was never included in the final threshold estimate.

Experimental hypotheses and predictions

The experimental parameters were chosen to identify stimulus characteristics relevant for the emergence of the CDD effect. One relevant parameter is the correlation of the envelopes of signal and FBs. We presented the starlings with three patterns of envelope correlation that varied in their potential to provide suitable cues to group components of a sound. Our predictions for the outcome of the experiment when coherent envelope patterns may serve as a grouping cue were the following. (i) The lowest thresholds should occur in the CU condition, where the FBs might be perceptually grouped and the deviating envelope of the signal should allow the signal to be perceived as an object separate from the FBs. (ii) The AC condition would promote grouping of all noise bands into a single auditory percept by means of their coherent envelope fluctuation, making it more difficult to segregate the signal from the FBs in this condition. Thresholds in the AC conditions should thus be much higher than in the CU condition. (iii) In the third condition with the envelopes of all noise bands independent from each other (AU), no grouping is expected, i.e. the signal should neither be grouped with, nor segregated from, the other noise bands. Accordingly, thresholds in the AU condition were expected to be intermediate between those in the CU and AC conditions.

The synchrony of the onset of signal and FBs is another stimulus parameter that may affect auditory grouping and thus influence thresholds and CDD. Our predictions for the signal-to-masker ratio (SMR) at detection threshold when common onset may serve as a grouping cue were as follows. (i) A common onset of all noise bands would promote grouping of the signal with the FBs (making signal detection more difficult), whereas a delayed signal onset relative to the FBs should impair grouping the signal with the FBs (and thus make the signal more detectable). (ii) Thresholds for different values of onset delay might also vary as a function of the different correlation conditions. These correlation-dependent effects of signal onset delay could, in turn, affect the magnitude of CDD if differences in delay differentially influenced thresholds in the AC and CU conditions.

Signal duration is the third parameter that may affect thresholds and CDD. Due to the effects of temporal summation, we expected to find lower thresholds for long-duration signals than for short-duration signals. To generalize our results for different sound pressure levels, we presented the signal in FBs of 15 and 50 dB SPL. Signal level was thus introduced as the fourth experimental parameter.

Signal detection performance should also be a function of signal uncertainty and may differ systematically in the different correlation conditions. Borrill & Moore (2002) argued that level fluctuations of the masker affect the slopes of the psychometric functions. Because the amount of time (i.e. the detection opportunities) during which the SMR is sufficiently large for signal detection increases more rapidly with increasing signal level in the AC condition than in the CU condition, steeper slopes are expected in the AC than in the CU condition.

Data analysis

The anova designs for data analysis were adapted to specifically test the hypotheses and predictions outlined above. All data shown are based on the birds' detection thresholds for the signal presented in masking FBs. The masked thresholds are expressed as SMRs in dB, i.e. the spectrum level of the signal noise band at detection threshold is expressed relative to the spectrum level of the FBs. The magnitude of CDD was calculated by subtracting thresholds obtained in the AC condition from thresholds in the CU condition (CU − AC; McFadden, 1987). We analysed signal detection thresholds and the magnitude of CDD using repeated measures anova with the Greenhouse–Geisser correction (Greenhouse & Geisser, 1959). Detection thresholds were analysed using a three-envelope correlation (AU, AC and CU) × two-signal onset delay (0 ms and 100 ms) × two-signal duration (60 ms and 400 ms) × two-masker spectrum level (15 dB and 50 dB) anova. The magnitude of CDD (CU − AC) was analysed using a two-signal onset delay (0 ms and 100 ms) × two-signal duration (60 ms and 400 ms) × two-masker spectrum level (15 dB and 50 dB) anova. Additional contrast analyses were used to test specific predictions related to the CDD effect and auditory grouping hypotheses. anova and contrast analysis were also used to compare slopes of psychometric functions. Slopes of the psychometric functions in the AC and CU conditions were estimated by linear interpolation of d′ values between signal levels above and below the starlings' individual thresholds. We are aware that a sample size of five subjects is relatively small for a multiway analysis, although three to five individuals is a common sample size for psychoacoustic studies, both with humans and animals. Results close to the significance level of 0.05 must therefore be interpreted with caution, as effects with biological significance might not be detected statistically with a small sample size. For all main effects and interactions, we computed partial η2 as a measure of effect size. The partial η2 measures the degree of association between an effect and the dependent variable, and can be interpreted as the proportion of variance in the dependent variable that is attributable to each effect. This is similar to the more familiar coefficient of determination, r2, although partial η2 is non-additive. In our analyses, special attention was paid to the relative magnitudes of the effect sizes, i.e. the influence of a variable was not solely judged by the magnitude of the associated P-value. Statistical analyses were performed using SPSS 12.0 (Systat Software, Richmond, CA, USA) or Statistica 5.5 (StatSoft, Tulsa, USA).

Results

Although our criteria for the acceptance of individual sessions were rather conservative (see Materials and methods), only a small fraction of all sessions had to be excluded from our analyses. In 64 of 603 sessions (10.6%), the bird's false-alarm rate was above 20% and the sessions were discarded. The average false-alarm rate of the five subjects was 5.8%. In 19 sessions (3.2%), the starlings responded with a probability of less then 80% to the stimuli with the highest sound pressure level, leading to the rejection of these sessions. In 27 cases (4.5%), sessions were not completed by the experimental bird or were aborted by the experimenter because of technical failures.

Mean (± SD, here and throughout) slopes of the psychometric functions expressed as change of d′ units per dB were 0.32 ± 0.17 (AC) and 0.36 ± 0.16 (CU) at 15 dB, and were 0.28 ± 0.15 (AC) and 0.31 ± 0.17 (CU) at 50 dB spectrum level. An anova evaluating the effects of correlation condition, signal delay, signal duration and FB spectrum level revealed no significant relation between these factors and the slopes of the psychometric functions. However, the correlation condition as a factor showed a rather large effect size of 0.52, suggesting that the non-significant result (F2,8 = 4.349, P = 0.053) may be due to the small sample of only five birds. Therefore, we conducted a contrast analysis comparing the slopes of the psychometric functions in the AC and CU conditions. We found no significant difference between the slopes in the two correlation conditions (F1,4 = 0.878, P = 0.401, η2 = 0.18).

Signal detection thresholds

To evaluate the impact of the different experimental manipulations on signal detection thresholds and to relate the results to the exploitation of potential within- and across-channel cues, we compared threshold patterns in all experimental conditions (Fig. 2A). An anova revealed significant differences in signal detection thresholds that were due to differences in the envelope correlation between the noise bands, the signal duration and the FB spectrum level, but there were no significant differences related to differences in signal onset delay (Table 1). The main effects of correlation condition and signal duration explained more than 90% of the variance in the threshold data (see values of partial η2 in Table 1). The mean thresholds in the three correlation conditions, averaged over the other factors, were 0.1 ± 4.2 dB (AU), −3.6 ± 9.2 dB (AC) and −7.7 ± 4.4 dB (CU). Hence, envelope correlation in the CU condition resulted in the lowest signal detection thresholds. Averaging over the different onset delays, signal durations and spectrum levels, contrast analysis revealed statistically significant differences between masked thresholds in the AC and CU conditions (F1,4 = 18.1, P = 0.013, η2 = 0.82), between the AU and CU conditions (F1,4 = 1762.1, P < 0.001, η2 = 0.99), and between the AU and AC conditions (F1,4 = 15.5, P = 0.017, η2 = 0.79). Averaged over the other factors, masked thresholds for signal detection were about 6 dB lower for the 400 ms signal (−6.9 ± 8.2 dB) compared with the 60 ms signal (−0.6 ± 6.2 dB). The significant main effect of FB spectrum level explained about 77% of the variation in thresholds (see Table 1). Averaged across all other factors, masked thresholds were 2.3 dB lower at the 50 dB spectrum level (−4.9 ± 7.2 dB) compared with the 15 dB spectrum level (−2.6 ± 7.5 dB). The overall difference in the signal onset delay was non-significant and was associated with a relatively small effect size (Table 1), with the SMR for synchronous onset of signal and FBs (−3.5 ± 8.0 dB) being on average less than half a decibel higher than the SMR for a delayed onset of the signal (−3.9 ± 6.1 dB).

Details are in the caption following the image

(A) Behavioural detection thresholds of five starlings (N = 5) expressed as signal-to-masker ratios (SMRs) in dB. Mean thresholds (± SE) are shown as a function of the correlation condition for two different spectrum levels of the flanking bands (FBs) and with signal delay as the parameter. The relation of the temporal envelope fluctuations between signal noise band and the six FBs was either ‘all uncorrelated’ (AU), ‘all correlated’ (AC) or ‘co-uncorrelated’ (CU). See Fig. 1 for examples of the stimulus conditions. (B) The comodulation detection difference (CDD) effect results from the individuals' difference in signal detection in the AC compared with the CU condition (mean data ± SE). Negative values describe a release from masking in the CU condition compared with the AC condition.

Table 1. Results of repeated measures anova
Source of variation d.f. F-value P-value η2-value
Level 1,4 13.728 0.021* 0.774
Duration 1,4 128.420 < 0.001* 0.970
Duration × Level 1,4 5.723 0.075 0.589
Delay 1,4 0.996 0.375 0.199
Delay × Level 1,4 1.887 0.242 0.321
Correlation 2,8 49.528 0.002* 0.925
Correlation × Level 2,8 14.980 0.006* 0.789
Duration × Delay 1,4 9.972 0.034* 0.714
Duration × Delay × Level 1,4 0.771 0.430 0.162
Duration × Correlation 2,8 2.321 0.191 0.367
Duration × Correlation × Level 2,8 12.256 0.009* 0.754
Delay × Correlation 2,8 0.132 0.833 0.032
Delay × Correlation × Level 2,8 0.088 0.880 0.022
Duration × Delay × Correlation 2,8 0.232 0.765 0.055
Duration × Delay ×  Correlation × Level 2,8 4.014 0.070 0.501
  • The repeated-measures anova compared the effects of the spectrum level of the FBs (‘Level’: 15 dB, 50 dB), the duration of the signal (‘Duration’: 60 ms, 400 ms), the delay of the signal relative to the onset of the FBs (‘Delay’: 0 ms, 100 ms) and the correlation condition (‘Correlation’: AU, AC, CU) on the starling subjects' SMR at detection threshold. *P < 0.05.

In the anova model, there were also three significant interactions, each of which explained a high amount of the variance in masked thresholds (Table 1). The significant two-way interaction between correlation condition and spectrum level was associated with a large effect size of η2 = 0.79 (Table 1). This interaction resulted because the effects of correlation condition on thresholds were more pronounced at the 50 dB spectrum level. At the 50 dB spectrum level, SMRs differed by up to 10.8 dB across the three correlation conditions, resulting in a steeper ‘slope’ across correlation conditions compared with the 15 dB spectrum level, for which SMRs differed only up to 4.8 dB across correlation conditions (see Fig. 2A). The signal duration × signal delay interaction was also significant (P = 0.03; Table 1), and was associated with an effect size of η2 = 0.71. This effect size, however, is smaller than the effect size associated with the main effect of signal duration (effect size η2 = 0.97). This interaction may be explained as follows. The effect of signal duration on masked thresholds was slightly larger (i.e. caused larger threshold differences) for a synchronous onset of signal and FBs than for delayed signal onset. This difference, however, is very subtle and cannot be easily spotted in Fig. 2A. The most complex term was a three-way interaction involving the same factors that were significant as main effects (signal duration × correlation condition × spectrum level, P < 0.01, η2 = 0.75; Table 1). Although the effect size associated with this three-way interaction was nearly as large as that associated with differences in FB spectrum level (η2 = 0.77; Table 1), it was still considerably less than the effect size of the two other significant main effects (signal duration: η2 = 0.97; correlation condition: η2 = 0.93; Table 1).

CDD

Following McFadden (1987), the magnitude of CDD was calculated as the difference in the threshold SMRs in the CU and AC correlation conditions (CU − AC); hence, more negative CDD values indicate a release from masking in the CU condition relative to the AC condition. As noted above, signal detection thresholds were significantly lower in the CU condition compared with the AC condition. This result indicates a significant CDD effect (McFadden, 1987). The anova failed to detect any significant differences in the magnitude of CDD that were due to differences in signal onset delay, signal duration or FB masker level. As illustrated in Fig. 2B, the average amount of variation in CDD was rather small across the different treatment conditions, despite considerable individual variability. Individual variation in the magnitude of CDD in all conditions was between +3.9 and −14.4 dB. For most conditions, the magnitude of CDD was on average between about −2 and −5 dB (Fig. 2B). The largest magnitude of CDD of −6.6 dB occurred at the 50 dB spectrum level for 400-ms duration signals of either signal delay (Fig. 2B, right panel). Signal duration was the factor associated with the largest effect size (η2 = 0.28, Table 2), but did not approach statistical significance with our sample size of five birds. When averaged over the other factors, the 60-ms duration signal produced a CDD of −3.5 ± 3.7 dB, whereas signals of 400-ms duration resulted in a slightly larger CDD of −4.8 ± 5.8 dB. The main effects of spectrum level and signal onset delay were associated with small effect sizes of η2 = 0.08 and η2 = 0.06, respectively. On average, the CDD was −3.7 ± 7.4 dB and −4.6 ± 2.5 dB at 15 dB and 50 dB spectrum levels, respectively. There was little variability in CDD that was related to differences in signal onset delay. Synchronous onsets of the signal and FBs (0-ms delay) produced an average magnitude of CDD of −3.8 ± 7.3 dB, while the 100-ms delay resulted in an average CDD magnitude of −4.5 ± 2.1 dB. There was one significant interaction between signal duration and FB spectrum level that was associated with a rather large effect size (η2 = 0.78; Table 2). As illustrated in Fig. 2B, we observed a slight decrease in the magnitude of CDD with increasing signal duration at the 15 dB spectrum level, whereas the opposite trend was observed at the 50 dB spectrum level. Other interactions were not significant and were associated with small effect sizes (e.g. η2 < 0.05; Table 2).

Table 2. Results of repeated measures anova
Source of variation d.f. F-value P-value η2-value
Level 1,4 0.365 0.578 0.084
Duration 1,4 1.567 0.279 0.282
Duration × Level 1,4 14.099 0.020* 0.779
Delay 1,4 0.266 0.633 0.062
Delay × Level 1,4 0.125 0.741 0.030
Duration × Delay 1,4 0.179 0.694 0.043
Duration × Delay × Level 1,4 0.004 0.951 0.001
  • The repeated-measures anova compared the effects of the spectrum level of the FBs (‘Level’: 15 dB, 50 dB), the duration of the signal (‘Duration’: 60 ms, 400 ms) and the delay of the signal relative to the onset of the FBs (‘Delay’: 0 ms, 100 ms) on the starling subjects' CDD. *P < 0.05.

Discussion

Do birds experience CDD similar to humans?

In general the bird auditory system exhibits the same functional characteristics as the mammalian auditory system (e.g. see Manley, 1990; Klump et al., 2000). Studies of rate-intensity functions indicate that birds have a cochlear non-linear compression that is functionally similar to that of mammals or humans (Köppl & Yates, 1999; Saunders et al., 2002). Furthermore, auditory nerve fibres in birds including the starling show suppression effects that are similar to those found in mammals (Manley, 1990). The starling's absolute auditory thresholds in the range of 0.5–6 kHz are about 10 ± 10 dB, which is comparable to that of humans between the age of 40 and 50 years (Spoor, 1967; Zwicker & Fastl, 1990; Klump et al., 2000). The best sensitivity of the starling auditory system is about 2–5 dB in the frequency range between 2 and 5 kHz. Above 6 kHz the starling's sensitivity declines considerably faster than in humans, and is about 50 dB at 8 kHz. The frequency selectivity of the starling auditory system has been shown to be similar to that of the human auditory system (Langemann et al., 1995; Klump et al., 2000). For example, auditory filter bandwidth at the signal frequency of 2 kHz is 233 Hz in the starling and 300 Hz in humans (Zwicker & Fastl, 1990; Langemann et al., 1995). Thus, this avian model provides for a good comparison to characteristics of human auditory perception, including those relevant for CDD.

The primary objective of this study was to test the hypothesis that starlings experience a magnitude of CDD similar to that observed in human psychoacoustic studies in order to establish an animal model system for studies of CDD and related effects. To this end, we presented trained European starlings with experimental stimuli similar to those presented in earlier psychoacoustic studies with human listeners. Our results suggest that starlings experience perceptual effects of CDD very similar to those previously reported in studies with human listeners (Cohen & Schubert, 1987; McFadden, 1987; McFadden & Wright, 1990; Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002). Thus, the starling appears to be an excellent animal model for investigating CDD. This view is supported by results from our companion study of the neural correlates of CDD in the starling forebrain (Bee et al., 2007). Recently, CDD has been demonstrated in another bird species. Jensen (2007) determined a relatively large CDD (mean effect −14.8 dB, N = 2) in hooded crows (Corvus corone cornix). However, the mode of stimulus presentation and the unlimited response time per trial make it difficult to compare the magnitude of CDD in hooded crows to our results with starlings and with results from the human hearing literature.

Having demonstrated CDD in starlings, the more important issue for our purpose concerns the extent to which patterns of CDD in starlings conform to expectations from previous CDD studies of human listeners. Similar to most of the studies on CDD in humans (e.g. McFadden & Wright, 1990; Wright, 1990; Fantini & Moore, 1994), in the starling we used a spacing of the FBs that placed them in auditory filters that were separate from the filter centred at the signal band. Unless stated otherwise, the following comparisons between our data and the human psychophysical literature will be restricted to the 50 dB data, as none of the other psychoacoustic studies used a spectrum level as low as 15 dB. Data from the lower spectrum level were collected for comparisons with the neuronal data of our companion study (Bee et al., 2007). In the remainder of the Discussion, we will first review the human psychophysical literature, and present a summary of data on the magnitude of CDD and SMRs necessary for detection of the signals in the modulated FBs. We then compare the human data to the magnitude of CDD and SMRs determined in the starlings. Finally, we specifically address the question regarding the mechanisms that may explain the observed effects.

We presented the starlings with three correlation conditions that were comparable to conditions in experiments with human listeners by Wright (1990), McFadden & Wright (1990), and Fantini & Moore (1994). Humans experienced magnitudes of CDD (CU − AC) ranging between about −2 dB and −9 dB in the study by McFadden & Wright (1990), and about −7 dB in the study by Wright (1990). Both of these studies used noise bands that were 100 Hz wide, similar to those used in the present study with starlings. Fantini & Moore (1994) presented human listeners with noise bands of different bandwidths (4–64 Hz wide). The average CDD in that study was about −3 to −6 dB for all bandwidths, and was about −5 dB at the 64 Hz bandwidth. These magnitudes of CDD reported in humans are quite similar to those found in the present study of starlings, in which the threshold differences between the CU and AC conditions resulted in an average CDD of between about −2 and −7 dB (see Fig. 2B). In starlings, the average masked thresholds in the AU condition were higher than those in both the CU and AC conditions (Fig. 2A). On average, starling's thresholds were about 6 dB higher in the AU compared with the AC condition, and 11 dB higher in the AU condition compared with the CU condition. Human data presented by Fantini & Moore (1994) showed on average no such difference between the AU and AC conditions using noise bandwidths of 64 Hz. With a smaller bandwidth, however, AU thresholds were about 3–6 dB higher than thresholds in the AC condition (Fantini & Moore, 1994). Wright (1990) described thresholds for humans in the AU condition (with a single signal noise band) that were either slightly above or below the AC thresholds, with threshold differences less than 2 dB. Differences between the detection thresholds in AU and AC conditions tested by McFadden & Wright (1990) were mostly within about 2–3 dB in either direction. In contrast, in a study by Borrill & Moore (2002; two FBs, 20 Hz bandwidth), thresholds in the AU condition were similar to those in the CU conditions and about 4–5 dB lower than those in the AC condition (Borrill & Moore, 2002). Taken together, the correlation condition in which the envelopes of noise bands were AU yielded the most variable data among human studies. While most of these studies made plausible suggestions about possible mechanisms to explain differences between the AC and CU conditions (see below), no convincing models have been offered to explain the higher thresholds in the AU condition (but see Buschermöhle et al., 2006). In our companion paper, we propose a possible mechanism that is consistent with the observed behavioural differences and is supported by physiological evidence in starlings (see Bee et al., 2007).

Delaying the signal relative to the onset of the FBs had on average no effect on masked thresholds or the magnitude of CDD in starlings (Fig. 2B). Two threshold patterns may be found in humans. Five of seven human listeners tested by McFadden & Wright (1990) had lower thresholds when the signal was delayed relative to the FBs, with thresholds in the AC and CU condition changing in a similar way. Thresholds of two additional listeners, however, were hardly affected by signal delay. Human thresholds in experiments by Moore & Borrill (2002) are comparable to our results with starlings in that there were no differences in human thresholds (and in CDD) between signal delays of 0 ms and 200 ms in the AC and CU conditions.

In addition to differences in envelope correlation, differences in signal duration and masker spectrum level had significant effects on the starlings' masked thresholds. Some effects of signal duration on threshold patterns were demonstrated by McFadden & Wright (1990), with human thresholds being on average higher for short-duration signal noise bands (i.e. 50 or 60 ms) compared with a long-duration signal noise band (i.e. 240 ms). In the conditions most similar to the experimental conditions tested here with starlings, human thresholds for detecting a long-duration signal were up to about 17 dB lower than for detecting a short-duration signal. In starlings, signal detection was on average only about 7 dB lower for a long-duration signal compared with the short-duration signal. Such differences could be explained by temporal summation (Klump & Maier, 1990) and the overshoot effect (i.e. higher thresholds for short-duration signals presented at the beginning of a masker compared with thresholds for signals presented during the ongoing masker, see Nieder & Klump, 1999). In human studies, differences in signal duration have been reported to also influence the magnitude of CDD, which was about −2 to −3 dB for short-duration signals, but about −6 dB for a long-duration signal (McFadden & Wright, 1990). Signal duration has similar, but non-significant effects on the average magnitude of CDD in starlings, which was about −3 dB for short-duration (60 ms) signals, and −7 dB for longer-duration (400 ms) signals.

Masker level significantly affected the starlings' perception, with the SMRs at threshold being on average about 2 dB more sensitive at the higher masker level compared with the lower masker level. In addition, the effects of envelope correlation were somewhat more pronounced at the higher masker level. It is possible that due to the broader excitation pattern in the cochlea at the higher presentation level (e.g. see Buus et al., 1995), noise bands can interact over a broader frequency range and thus contribute to more pronounced differences in the response within auditory filters. The patterns of differences in threshold as a function of envelope correlation (CU < AC < AU), however, were similar at the two masker levels (see Fig. 2A). Because CDD is a measure of difference between SMR thresholds, masker level had consequently no significant effect on the amount of CDD. Similarly, a psychoacoustic study in humans by Borrill & Moore (2002) reported only a slight increase of about 2–3 dB in the amount of CDD when increasing the spectrum level of the FBs from 45 to 65 dB. Individual threshold variation of the human listeners, however, was very high. For example, the individual variation in SMR for uncorrelated noise bands was about 20 dB at the highest masker level in humans, but was less than 5 dB at the high masker level in the comparable condition with starlings (AU, synchronous onset, long-duration signal). In this respect, starlings seem a more consistent model species for investigating CDD than humans.

Some of the experimental conditions for investigating CDD were quite similar to those used in experiments on comodulation masking release (CMR). In CMR experiments, the target signal is commonly a tone centred within a narrow band noise (the on-frequency band). When the envelopes of one or more FBs are correlated with that of the on-frequency band, thresholds for detecting the tonal signal are typically lower than in conditions in which all of the noise bands have uncorrelated envelopes. In CDD experiments, the task is to detect a noise band flanked by other noise bands, and thresholds are lowest when all FB envelopes are correlated but are different to the envelope pattern of the signal noise band (CU) compared with when noise band envelopes are AC or AU. In starlings, CMR effects were on average about 11 dB with FB maskers separated by about 300 Hz from the on-frequency band (Klump et al., 2001; masker bandwidth was 25 Hz). This result was comparable to the human CMR of about 9 dB for similar bandwidth and a frequency separation of 200 Hz (Schooneveldt & Moore, 1987). This shows again that the starling is a suitable animal model for investigating the processing of signals in modulated noise. Both for CMR and CDD experiments, mechanisms related to both across-channel and within-channel processing have been discussed (reviewed, e.g., in Borrill & Moore, 2002). In the following these mechanisms will be evaluated with respect to CDD in the starling.

The relevance of across-channel cues and auditory grouping for CDD

Auditory grouping primarily demands a comparison across different auditory channels. Common envelope correlation between spectral components and common onset times are considered two of the strongest signal characteristics that promote the grouping of different signal components into a single auditory object (Bregman, 1990). Generally, detection of the signal should be enhanced whenever the signal is perceived as an auditory object that is distinct from the FBs compared with conditions where all components fuse into a single auditory object. Cohen & Schubert (1987) and McFadden (1987) were among the first to suggest that mechanisms related to perceptual grouping could account for some of the results observed in CDD experiments. If these grouping cues are useful in CDD experiments, threshold differences between specific experimental conditions should be expected.

We presented the starlings with three patterns of envelope correlation that varied in their potential to provide grouping cues. If the signal and FBs have correlated envelopes, they will be grouped together and the signal should be more difficult to segregate from the FBs than if the envelopes of the signal and the FBs are not correlated with each other. Thus, the pattern of differences in thresholds as a function of envelope fluctuation is predicted to exhibit: (i) lowest thresholds in the CU condition; (ii) thresholds in the AC conditions should be much higher than in the CU condition; and (iii) thresholds in the AU condition were expected to be intermediate between those in the CU and AC conditions (see ‘Experimental hypotheses and predictions’ section for detailed predictions). The first two predictions were supported by our starling data. The third prediction, however, was not supported, as thresholds in the AU conditions were not intermediate between those in the CU and AC conditions, but were, on average, the highest of all thresholds.

Differences in the onset of the signal and FBs should also promote the segregation of the signal from the FBs (see ‘Experimental hypotheses and predictions’). Specifically: (i) a synchronous onset of all stimulus components should promote grouping of the signal with the FBs, while delaying the signal onset relative to the FBs should not; (ii) thresholds for different values of onset delay might also vary as a function of the different correlation conditions. Our results do not support these predictions because of the relatively small effect sizes associated with the main effect of signal onset delay (η2 = 0.20) and the interaction between signal onset delay and correlation condition (η2 = 0.03) compared with other effects in the model (see Table 1). Also in humans (McFadden & Wright, 1990; Moore & Borrill, 2002), a delay of the signal did not affect thresholds in ways that were consistent with an auditory grouping hypothesis. In summary, signal cues related to auditory grouping seem to play a minor role for explaining CDD in the starling. Also, some of the human data investigating different bandwidths and frequency separations of the noise bands seem to be at odds with mechanisms that only rely on auditory grouping cues (e.g. see Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002).

An alternative hypothesis for explaining CDD is based on ‘dip listening’, in which signal detection occurs at instances of low masker amplitude (e.g. Buus, 1985; Schooneveldt & Moore, 1989). This has been viewed as across-channel processing where envelope fluctuations at the output of auditory filters remote from the signal indicate optimum time intervals to listen in the auditory filter centred on the signal. According to this idea, thresholds in a CDD experiment should be high in the AC condition because envelope peaks and dips of all noise bands will coincide across all channels. In contrast, thresholds should be low in the CU conditions where envelope peaks in the auditory channel tuned to the signal will frequently occur at times of envelope dips at the auditory filters tuned to the FBs (Wright, 1990). In the AU condition we would expect that thresholds are either similar to the AC condition or are slightly improved relative to the AC thresholds as the envelope dips of FBs will sometimes indicate suitable listening times in the auditory filter tuned to the signal. Our observation that the signal detection thresholds of starlings were highest in the AU condition is not consistent with these expectations. Moreover, the results of studies in humans are mostly at odds with an across-channel dip-listening mechanism (Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002).

A comparison between the levels of signal and FBs in different frequency bands can also be viewed as a profile-analysis task (for a review on profile analysis, see Green, 1988). In profile analysis the auditory system detects the signal by observing the differences in the spectral profile of multicomponent sounds (for a study demonstrating profile analysis in a bird, see Langemann et al., 2005). Predictions based on profile analysis for the three correlation conditions are as follows. (i) The lowest thresholds should be expected in the AC condition as the spectral profile is constant throughout the signal presentation − only the overall level fluctuates over time. (ii) The highest thresholds should be expected in the AU condition in which the spectral profile varies over time as all FBs and the signal vary in amplitude independently, and only a high signal level relative to the level of the FBs will be detectable. (iii) Thresholds in the CU conditions should be intermediate as the spectral profile provided by the FBs is stable (as in the AC condition), but the contribution of the signal varies over time and thus only temporarily good cues are provided for profile analysis. The threshold differences between the AU and AC conditions in the present experiment are as expected. A similar threshold pattern was observed in a study on the barn owl (Tyto alba; Langemann et al., 2005) and in studies of humans (Fantini & Moore, 1994). The lower thresholds in the CU conditions, compared with the AC conditions, observed in both starlings (this study) and in humans (Fantini & Moore, 1994), however, seems to rule out a profile analysis hypothesis.

The relevance of within-channel cues for CDD

Given that the evidence for the use of auditory grouping cues is weak, most psychoacoustic studies in humans favour within-channel mechanisms for explaining the CDD effect (Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002). These studies suggest that information available from only one auditory filter might contribute to the typical threshold patterns observed in CDD studies. One potential within-channel cue was proposed by Fantini & Moore (1994), who postulated that multiple FBs create temporal fine structure cues that promote the CDD effect when they fall within a single auditory filter. They examined the output of a simulated auditory filter centred at the signal noise band. Adding the signal to multiple 20-Hz-wide FBs resulted in little change in the temporal fine structure in the AC condition, but caused a larger variation in the fine structure that was similar in both the CU and AU conditions (Fantini & Moore, 1994). The observation of the highest thresholds in the AU condition both in humans (Fantini & Moore, 1994) and starlings, however, is not consistent with this explanation. Further within-channel cues are provided by the interaction of the signal and its neighbouring FB that can change the rate and depth of envelope modulations in this filter (Wright, 1990). The nature of these changes in the waveforms depends on the envelope correlations between the signal and FB. For example, if the envelope of the signal is uncorrelated with the neighbouring FB's envelope (i.e. a CU condition), then the modulation depth of the combined waveform would be reduced to a comparatively larger extent than in an AC condition. Borrill & Moore (2002) evaluated data from human listeners applying a psychoacoustic single-channel model that was based on a temporal integration model (introduced by Viemeister & Wakefield, 1991). The model considered only two noise bands, the signal and the FB, that were either correlated or uncorrelated with each other (300 ms duration, bandwidth 20 Hz, frequency separation 600 Hz). Similar to Wright (1990), Borrill & Moore (2002) described a reduction in modulation depth of uncorrelated combined noise bands and suggested that the resulting change in the envelope fluctuations was the salient cue responsible for improved signal detection in the uncorrelated conditions. In addition, Moore & Borrill (2002) suggested that within-channel suppression could be of importance. The authors argued that in the AC condition the effect of suppression will not vary over time, whereas in the CU condition suppression should vary during the ongoing stimulus, and instances of low suppression would occur in dips of the masker making signal peaks occurring at that time more detectable.

A follow-up study by the same authors (Moore & Borrill, 2002) tested whether the within-channel model proposed by Borrill & Moore (2002) might also explain differences in the slopes of the psychometric functions. Their model predicted steeper slopes in the AC condition than in the CU condition. The data obtained with human listeners confirmed the model's predictions (Moore & Borrill, 2002). In contrast, the slopes of the psychometric functions of starlings were not significantly different in the AC and the CU conditions at both spectrum levels. The variability of the slopes in the different experimental conditions tested in the starling in an anova was high, and values showed a broad overlap and thus did not bear out the prediction.

The question remains whether the behavioural starling data that show parallels to human psychophysical data could be explained by within-channel processing. In our companion paper we present a within-channel model of peripheral processing that was developed based on the neuronal response patterns observed in the starling (for details of the model, see Bee et al., 2007). This model can consistently explain the observed pattern of threshold differences as a function of the correlation condition (CU < AC < AU), both in physiology and behaviour.

Conclusions

The data presented here, and elsewhere (Klump et al., 2000), establish the European starling as a suitable model for studying the perceptual processes underlying the detection of signals in amplitude-modulated noise, such as those associated with the psychoacoustic effect of CDD. Signal detection thresholds in starlings depended significantly on the envelope correlation between signal and masker components, signal duration, and masker spectrum level, while delaying the signal relative to the onset of the masker noise bands had no effect. These general results are strikingly similar to those reported in earlier studies of human listeners (e.g. Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002). Starlings also experience a magnitude of CDD that was very similar to that experienced by humans. Mechanisms related to both across-channel processing and the evaluation of within-channel cues have been suggested for explaining CDD (Wright, 1990; Fantini & Moore, 1994; Borrill & Moore, 2002; Moore & Borrill, 2002). Overall, across-channel cues seem to play a minor role for explaining the psychoacoustic thresholds and CDD in humans and starlings. Previous hypotheses for potential within-channel cues can only partly explain the threshold patterns observed in humans and starlings. A physiological study of CDD in the starling using the same stimuli provides alternative explanations based on single auditory frequency channels (Bee et al., 2007).

Acknowledgements

This study was supported by the Deutsche Forschungsgemeinschaft (FOR 306 ‘Hörobjekte’, SFB/TRR 31 ‘Das aktive Gehör’). Many thanks to Susanne Groß for excellent training of the birds. Numerous discussions with Mark Bee at various stages of the study were very helpful and much appreciated. The botanical garden of the University of Oldenburg kindly houses the aviary with our stock of starlings.

Abbreviations

  • AC
  • all correlated
  • AU
  • all uncorrelated
  • CDD
  • comodulation detection difference
  • CMR
  • comodulation masking release
  • CU
  • co-uncorrelated
  • FB
  • flanking band
  • SMR
  • signal-to-masker ratio.
    • The full text of this article hosted at iucr.org is unavailable due to technical difficulties.