Spatial location is accurately tracked by human auditory sensory memory: evidence from the mismatch negativity
Abstract
The nature of spatial representation in human auditory cortex remains elusive. In particular, although humans can discriminate the locations of sounds as close as 1–10 degrees apart, such resolution has not been shown in auditory cortex of humans or animals. We used the mismatch negativity (MMN) event related brain potential to measure the neural response to spatial change in humans in narrow 10 degree spatial steps. Twelve participants were tested using a dense array EEG setup while watching a silent movie and ignoring the sounds. The MMN was reliably elicited by infrequent changes of spatial location of sounds in free field. The MMN amplitude was linearly related to the degree of spatial change with a resolution of at least 10 degrees. These electrophysiological responses occurred within a window of 100–200 milliseconds from stimulus onset, and were localized to the posterior superior temporal gyrus. We conclude that azimuthal spatial displacement is rapidly, accurately and automatically represented in auditory sensory memory in humans, at the level of the auditory cortex.
Introduction
A fundamental function of the sensory system is to alert the organism to environmental change. It is easy to envisage natural situations in which the ability to automatically and accurately detect unexpected changes in sound source locations may determine survival. However, animal studies suggest that auditory neurons have very broad receptive fields, frequently encompassing whole hemispaces (Ahissar et al., 1992; Brugge et al., 1996; Middlebrooks et al., 1998; Tian et al., 2001; Stecker et al., 2003, 2005a), and although auditory cortex lesions are reported to disrupt sound localization in humans (Zatorre & Penhune, 2001), nearly nothing is known about the resolution of acoustic space in human auditory cortex.
In humans, automatic change detection is reflected by the mismatch negativity (MMN; Näätänen et al., 2001) event related brain potential. MMN is elicited experimentally by an infrequent unattended sound (deviant) that differs along one or more features from previous unattended sounds (standards). The main generators of the MMN are in primary and secondary auditory cortex in the superior temporal gyrus (e.g. Alho, 1995; Halgren et al., 1995; Rosburg, 2003), with secondary generators in the frontal cortex (Giard et al., 1990; Deouell et al., 1998). The exact location of superior temporal gyrus generators is probably contingent upon the deviating acoustic feature (Giard et al., 1995; Rosburg, 2003). The MMN depends on the magnitude of deviance, increasing monotonically in amplitude as the difference between the standard and the deviant increases (e.g. along sound frequency, Tiitinen et al., 1994; Yago et al., 2001; duration, Amenedo & Escera, 2000; or stimulation rate, Sable et al., 2003). This MMN gain function provides an objective measure of the nature (e.g. linear or logarithmic) and accuracy of representation of the specific dimension in sensory memory (Tiitinen et al., 1994; Näätänen & Alho, 1997).
Conflicting results have been obtained regarding the MMN gain function for spatial change. The MMN amplitude increased with deviance magnitude when space was simulated with headphones, manipulating interaural time or level differences, or even using standard head-related transfer functions (Näätänen et al., 1988; Paavilainen et al., 1989; Doeller et al., 2003; Nager et al., 2003; Sonnadara et al., 2006). Nevertheless, although humans may be able to discriminate deviations as small as 1–10° of azimuth in frontal space (Perrott & Saberi, 1990), and auditory attention can be tightly focused within a range of 6° or better (Teder-Sälejärvi et al., 1999b), the spatial separation between deviants in these studies of nonattentional change detection was 30° at best. Moreover, studies that have employed a realistic ‘free field’ setup, using an array of loudspeakers, failed to find a clear relation between the degree of change and MMN amplitude. Rather, an ‘all or none’ phenomenon was found wherein the MMN simply indexed any change in spatial location without localizing information (Paavilainen et al., 1989; Colin et al., 2002). [The term ‘free field’ is used in the literature either to represent an anechoic environment or to distinguish presentation from environmental sources (including loudspeakers) as opposed to presentation via headphones. In this paper, we take the term to denote the latter meaning.]
Thus, whether the location of sound is automatically tracked at the level of the auditory cortex in humans, with a fine resolution compatible with behaviour, is unknown. We used the MMN to assess the preattentive representation of sound location using a free-field setup and a finer scale of spatial resolution.
Materials and methods
Participants
We recorded EEGs in 12 students at the Hebrew University of Jerusalem (four men, eight women, aged 21–26, 11 right handed) who watched a silent animation film presented straight ahead, and were requested to ignore the sounds. To reduce scanning eye movements, which would cause EEG artifacts, the size of the window in which the movie was seen was restricted to ∼12 × 9 cm at 100 cm from the eyes. Participants' heads and eyes were monitored continuously using a closed circuit video system. The procedures conformed to the Code of Ethics of the World Medical Association (Declaration of Helsinki), and were approved by the institutional ethics committee at the Hebrew University of Jerusalem. Prior to the experiment, the participants signed a written consent form after being properly informed of the nature of the experiment. The exact aims of the study were explained after the study to avoid directing the subjects' attention to the sounds. The participants took part in the experiment in return for payment or course credit.
Stimuli and procedure
Subjects sat in an armchair in a dimly lit, sound attenuated and low reverberation double-wall chamber (Eckel C-26, Eckel, UK). The walls of the chamber were additionally treated all around with acoustic absorbing foam to further reduce echoes. An array of loudspeakers formed a semicircle with a radius of 90 cm centred at the centre of the head (Fig. 1). In the main part of the experiment, eight blocks of five hundred 50 ms (2 ms rise and 2 ms fall time), spectrally rich tones, with 500 Hz fundamental and three partials of descending amplitudes, were played at ∼70 dB (SPL) with a random onset to onset intervals of 350–450 ms (square distribution, steps of 10 ms). The standard stimuli (probability 0.76) originated from a loudspeaker located 5° to the right of the midline, whereas the deviant stimuli (probability = 0.06 each) originated from one of four other loudspeakers located at −5°, 15°, 25° and 35°. These locations correspond to deviations of −10°, 10°, 20° and 30° relative to the standard (negative numbers indicate deviation to the left). All locations were mixed within blocks of 500 stimuli, with at least three standards before each deviant. The reason for the inclusion of the −5° (deviation of −10°) location in the experiment was to explore a possible effect of crossing the midline, by comparing it to the deviation of +10° (Fig. 1). In addition to these main blocks, each subject set through four control blocks of 500 stimuli each. These control blocks were similar to the main blocks except that the standard (frequent) stimuli originated from positions −5°, 15°, 25° or 35° (one block each), leaving the remaining four locations in each block as deviants. We contrasted the responses to the standards in these control blocks, and compared them to the response to the standard in the main block. This comparison allowed us to compare the response to stimuli at all five locations when they were all standards (see Fig. 2E).

Stimulation setup. Standard sounds in the main block were presented at 5° to the right of the midline.

The mismatch response is linearly related to the magnitude of change. (A) Response to standards (at 5° to the right, thin line) and deviants. See panel B for deviants colour legend. (B) Difference waveforms (deviant minus standard). (C) Scalp voltage distribution at the two peaks of the MMN for 30° deviance. Note polarity reversal between frontal and temporal sites for both peaks. (D) MMN peak amplitudes and linear trends as a function of degree of deviation. The 10° point represents the average response to deviations of 10° to the right and left that did not significantly differ. Error bars, standard error of the mean. (E) Response to the five locations when they were standards in separate control blocks.
EEG recordings
EEG was continuously recorded by an Active 2 system (Biosemi, the Netherlands), using 66 sintered Ag/AgCl electrodes laid out according to the extended 10–20 system, with the addition of mastoid electrodes. Eye movements were recorded using electrodes above and below the right eye, and near the left and right outer canthi. The effective sampling rate was 256 Hz (nine of 12 subjects were recorded at 2048 Hz and later down-sampled to 256 Hz). A lowpass filter with cutoff of 1/4 the sampling rate was used during recording to prevent aliasing of high frequencies. The data was referenced off-line to the nose channel. Epochs contaminated by blinks, gross eye movements, muscle artifacts or abrupt signal changes were rejected. Each subject's signal was then parsed into 500 ms segments starting 100 ms before stimulus onset. The segments were sorted by stimulus type (standard and four deviants) and averaged, band pass filtered (1–20 Hz, 24 dB/octave) and referenced to the mean of the 100 ms prestimulus period. To measure the MMN, differences were computed by subtracting the standard response from each type of deviant. Grand averages were obtained across the group. We estimated the cortical current densities based on the grand average difference wave using the low resolution tomographic analysis (loreta-key) package (Pascual-Marqui et al., 1994; Pascual-Marqui, 1999; http://www.unizh.ch/keyinst/NewLORETA/LORETA01.htm), with regularization based on the cross-validation procedure (Pascual-Marqui, 2002). Mastoid and EOG electrodes were not included in this analysis. besa (MEGIS, Germany) software was used to further assess the loreta results. Using besa, equivalent current dipoles were seeded at the site of the peak loreta activation and allowed to freely rotate, to best explain the leading parts (from onset to peak) of the MMN deflections.
Statistical analysis
Mean difference voltages (each deviance minus the standard), within a window of ±20 ms around the peak latencies of the group average deviance-related negativity (N1/MMN), were measured for each subject and served as the dependent variable. They were entered into one-way repeated measures analysis of variance (anova) to assess the effect of the degree of deviance. These were followed by planned contrasts (one-tailed Student's t-tests) between adjacent degrees of deviation (i.e. 10 vs. 20°, 20 vs. 30°). The one-tailed test is appropriate as we tested for a specific direction of effect. Linear regression was performed on the mean group values to determine the slope of the gain function. To further assess the consistency of the linear relationship across subjects, a two-stage procedure was performed; first, a linear regression analysis was performed within each subject, next, the resultant individual slopes (beta values) were entered into a one-sample t-test against zero. Greenhouse-Geisser (G-G) correction was used where indicated; the G-G epsilon is indicated where it is smaller than one. The uncorrected degrees of freedom are reported (Picton et al., 2000). P-values were rounded to the nearest nonzero digit.
Results
Compared to standards at 5° right of midline, rare spatial displacements of 10°, 20° and 30° elicited a frontal negative ERP between approximately 70 and 200 ms (Fig. 2A and B). This negativity had an early and a late phase, most conspicuously seen for the 30° deviants. Both phases reversed polarity at the lower temporal electrodes as typical for the MMN (Fig. 2C).
The early phase peaked at 98/94/105/115 ms for the −10/10/20/30° deviance, respectively. There was no significant amplitude difference between shifts of 10° to the right or to the left at these latencies (t11 = 1.03, P = 0.32), hence these responses were combined for further analysis. All deviants elicited a significant negativity (one-tailed t-test against zero, P = 0.006, 0.0005, and 0.00002 for deviations of 10, 20 and 30°) at these latencies. The degree of deviance significantly affected the amplitude of the early peak (within subject one-way anova, F2,22 = 12.68, P < 0.001, Greenhouse-Geiser epsilon = 0.84). Follow-up contrasts showed that the response to the 10° deviance was significantly smaller than the response to the 20° deviance (t11 = 2.82, P < 0.01), which was smaller than the response to the 30° deviance (t11 = 2.41, P < 0.05). The group average MMN amplitude was linearly related to the degree of deviance (r-squared = 0.97; slope = −0.056 µV/deg, Fig. 2D). This was significant across subjects; the mean regression coefficient (slope) across subjects was significantly smaller than zero (i.e. negative slope t11 = −4.59, P < 0.001, see Materials and methods). In fact, all individual regression lines except for one had negative slopes, although the actual slope varied across subjects (Supplementary material Table S1).
The peak latency of the late phase was set at 156 ms based on the response to the 30° shift, where it was most distinct. There was no amplitude difference between shifts of 10° to the right or left at this latency (t11 = 0.25, P = 0.8), and these responses were collapsed. All conditions showed a negativity at this latency, although for the 10° deviation this was only a trend (one-tailed t-test, P = 0.087, 0.008, and 0.00007 for 10, 20 and 30° deviance). Here again, the degree of deviance affected significantly the MMN amplitude (F2,22 = 14.25, P < 0.001, Greenhouse-Geisser epsilon = 0.98). MMN was larger for 30° than for 20° (t11 = 3.47, P < 0.01) which was in turn larger than for the 10° shift (t11 = 2, P < 0.05). The grand average MMN amplitude was linearly related to the degree of deviance (r-squared = 0.98; slope = −0.071 µV/deg; Fig. 2D). The average linear regression coefficient across subjects was again significantly different then zero (t11 = −5.2, P < 0.001) as all individual regression lines had a negative slope, albeit with variable magnitudes (Supplementary material Table S1). To examine the possibility that the differences observed between the responses to the different deviants is due simply to their physical characteristics (rather than to the difference from the standard), we compared the response to all stimuli when they were standards (in the control blocks). Similar ERPs were observed to all spatial locations (Fig. 2E).
Inverse solutions are approximations which are most accurate with higher signal to noise ratios (SNR); hence, we used the condition with the best SNR for this analysis. loreta analysis, based on the 30° deviance condition, placed the main focus of activity at both time intervals around the posterior part of the superior temporal gyrus (peak current density Talairach coordinates: x, ±59, y, −32, z, 8/15; Fig. 3). Equivalent dipoles seeded bilaterally at the location of the peak loreta activity using the besa package and allowed to rotate, explained 93.3% of the actual data variance around the early peak and 93% of the data around the late peak. The PT activity was stronger on the left (i.e. contralateral to the side of the stimuli) than on the right at the early MMN peak but much more symmetrical at the later peak. This was confirmed by measuring in each subject the average loreta current density within a 14 mm sphere around the location of the peak activity of the grand average, restricted to the superior temporal gyrus, within a window spanning ±20 ms around the peaks latency of each phase as described above. A side × latency anova revealed a significant interaction (F1,11 = 8.42, P < 0.05) which was due to a significant left side advantage at the early phase (two-tailed t11 = 2.71, P = 0.05) but not at the late phase (t11 = 1.12, P = 0.28).

Estimated inverse solution. loreta current density maps at the early and late peaks of the MMN for the 30° deviant showing activation around the posterior superior temporal gyrus and adjacent cortex. Right, average values across the 12 subjects in the planum temporale cluster (see Text). Error bars, standard error of the mean.
An alternative explanation to the observed spatial gradient is that the mismatch response is in fact ‘all or none’, but with increasing magnitude of deviance, a larger proportion of trials would show the mismatch response. In this case averaging across trials will result in an apparent spatial gradient. To address this alternative explanation, we chose the subject with maximal number of trials remaining after artifact rejection (subject 103, with 224, 228, 223, and 227 trials for the −10°, 10°, 20° and 30°, respectively). This subject showed the response gradient observed in the group as a whole. To increase the SNR, which is intrinsically low in single trials, we averaged a cluster of frontocentral electrodes (Fz, Fcz, Cz, F5/6, F3/4, F1/2, FC3/4, C1/2) in which MMN is seen. The mean voltage at the N1/MMN span (100–200 ms) at each trial was then measured. If the averaged potentials for the deviants are due to two populations of responses with distinct amplitudes, then the distributions of single trial potentials should be bimodal, with one more positive peak for the ‘MMN–’ and one more negative for the ‘MMN+’. Further, the relative height of these peaks should gradually change as the deviance magnitude increases, as more trials will fall in the MMN+ category. In addition, in the ‘all or none’ scenario, the variance of the smaller magnitude responses would be predicted to be larger than that of the larger deviance magnitudes. However, the observed distributions were Gaussian for all conditions [a Gaussian fit, using Matlab Curve Fitting tool (The Mathworks, Inc.) explained over 95% of the variance in all cases; Supplementary material Fig. S1], and the variance tended to be larger, if at all, for the larger magnitude deviants. These results do not support the ‘all or none’ hypothesis.
Discussion
Auditory events can be localized quite precisely (resolution of 1–10°) in the frontal plane under optimal conditions (Blauret, 1996). However, animal studies have found that auditory cortex neurons are only coarsely tuned to spatial locations (see below). No parallel single unit data exists for human auditory cortex. Scalp MMN can tap into the nature of sound representation as it reflects the sensitivity of the sensory memory to differences along a given dimension. Previous MMN measurements in humans using interaural cues with headphones indeed found a correlation between MMN magnitude and degree of deviation, but space was sampled very coarsely, with the minimum separation between deviation magnitudes being 30°. This coarse level of discrimination is far from human spatial discrimination capabilities and can hardly be considered an accurate representation of sound location, one that can support for example rapid orientation towards a sound source that has shifted its position. Moreover, with free-field stimulation, no monotonous gradient of response magnitudes was found in previous studies. Paavilainen et al. (1989) presented pure tones of 600 or 3000 Hz with standards at the midline position and deviants at 10, 45 or 90°. For the lower tones, all deviants elicited an MMN of the same amplitude. For the higher tones, the 10° and the 90° deviants elicited similar MMNs, whereas unpredictably the 45° sounds elicited a lower MMN. Colin et al. (2002) used the syllable/pi/as stimulus, and found similar MMNs for deviants at 20° or 60° compared to midline standards. With no other source of information, the resolution of space representation in human auditory cortex remains unknown.
Here, using a free-field setup, we show that between 100 and 200 ms, auditory sensory memory linearly tracks the magnitude of task-irrelevant spatial deviance, with a resolution of at least 10°, more in accord with the range of human localization behaviour (Blauret, 1996). loreta distributed source analysis mapped the deviance related response to the posterior superior temporal gyrus (planum temporale, PT), compatible with previous EEG source localization studies using much coarser spatial mismatch responses (Tata & Ward, 2005; Sonnadara et al., 2006). This localization is also congruent with a number of nonmismatch functional neuroimaging studies pointing to the medial PT as a main site of auditory spatial processing in humans (e.g. Warren et al., 2002; Warren & Griffiths, 2003), as well as with a recent fMRI study using a spatial mismatch design (Deouell et al., 2004). This region may parallel the monkey caudal belt fields (CM and CL), which were found to harbour spatially sensitive neurons, correlated with localization behaviour (Recanzone et al., 2000; Tian et al., 2001). The present findings show that this part of secondary auditory cortex in humans responds to rare, unattended spatial change, in a linear fashion, and with a much finer resolution then previously seen.
In some auditory cortical zones in the cat and monkey, neurons are sensitive to the direction of sound, albeit with very large receptive fields, frequently encompassing the whole contralateral field (e.g. Ahissar et al., 1992; Brugge et al., 1996; Tian et al., 2001). Several models of population coding have been put forward to accommodate the apparently low resolution at the neuron level with the relatively high behavioural discrimination (Middlebrooks et al., 1998, 2002; Middlebrooks, 2002; Stecker et al., 2003, 2005a, 2005b). According to a very recent model, accurate localization may in fact be an emergent property; that is, it is computed from the sum of responses of auditory cortex neurons that are broadly tuned to either ipsilateral or contralateral fields, but this is achieved at a more central level, perhaps even multisensory or motor (Stecker et al., 2005b; see also Zimmer et al., 2006 for a similar argument). The present findings show accurate representation of spatial information already in the caudal part of the auditory cortex. Thus, the PT may be the site where specific locations are computed based on input from upstream ipsi/contralateral opponent populations (Boehnke & Phillips, 1999; Stecker et al., 2005b), congruent with the suggested role of the PT as a computational hub (Griffiths & Warren, 2002). The present study suggests that this is accomplished even when the task does not require any spatial judgement and in fact not even attention to sounds in general.
It could be argued that although the participants were instructed to ignore the sounds, visual attention, which focused on the centered video monitor, could affect the processing of auditory stimuli, via cross-modal links (Teder-Sälejärvi et al., 1999a). However, in this case, we would expect more negative responses in the 70–200 ms the closer the stimulus was to the focus of attention (i.e. closer to the standard location at 5°), as previously shown (Teder-Sälejärvi & Hillyard, 1998; Teder-Sälejärvi et al., 1999b). This was not the case for the standards in the control conditions (Fig. 1E), which elicited similar responses regardless of location, and in fact is the exact opposite of what was found for the standards and deviants in the main experiment (Fig. 1A–D). Thus, cross-modal attention effects cannot explain the results.
The results show that as in the domains of frequency and duration, the MMN amplitude is linearly related to the degree of spatial change and thus represents its saliency. To study this function in detail, we concentrated on a limited frontal sector of 35° to the side of the midline. Whether the function maintains its linearity in more peripheral space remains to be investigated. As humans are not as spatially accurate in lateral auditory space as they are in central space (larger ‘localization blur’, Blauret, 1996), the function may start to plateau at some point. Additionally, further study is needed to examine potential difference in gain function between the two sides of space (cf. Deouell et al., 2003), as well as between individual subjects. A monotonous increase in MMN from 10° to 20° to 30° of spatial deviation was observed in the majority of individual subjects' data. Coupled with the within subject design of the group analysis, this clearly supports the sensitivity of the mismatch generators to the degree of spatial deviation. Nevertheless, as might be expected from the high variability of human subjects in auditory localization accuracy, the slope and exact shape of the gain function varied among subjects (Supplementary material Table S1). The correlation between discrimination levels and MMN gain, across space, is currently under investigation.
The finding of a reliable MMN gain function for spatial changes offers the possibility of measuring the effects of different cognitive factors on background spatial representation. For example, strongly focusing attention in one side of space may or may not alter the resolution in the unattended side of space, which can be measured using an MMN gain function. Likewise, as it was shown that cross-modal auditory–visual interactions can elicit MMN (Colin et al., 2002), it would be intriguing to see how manipulations of visual spatial information affects the automatic neural representation of auditory space in auditory cortex. In addition, the present findings offer the possibility to follow the effects of particular types of brain damage on the spatial gain function (cf. Deouell et al., 2000 for the case of unilateral neglect), to follow it during recovery, and to correlate it with behavioural and cognitive dysfunctions.
Two mechanisms of change detection have been suggested. One is based on different levels of stimulus-specific adaptation of neurons producing the obligatory N1 response (the ‘afferent’ model; Ulanovsky et al., 2003; Jaaskelainen et al., 2004), and another which relies on a separate neuronal memory trace, encoding regularities in the environment (the ‘memory trace’ model; Näätänen & Winkler, 1999; Näätänen et al., 2005). According to the afferent model, the response to the stimulus depends mainly on how frequently that stimulus has been presented, regardless of the context. By the memory trace model, the deviation of the stimulus from the acoustic context is most critical. The earlier phase of the present deviant related response occurred as early as 100 ms, and because of this early latency, might reflect an afferent (N1) contribution to the deviance related negativity (cf. Butler, 1972; Näätänen et al., 1988; Palomaki et al., 2005). Congruently, loreta analysis indicated that at this phase of the response, the PT activity was significantly stronger on the left, contralateral to the stimulus side. However, a recent study examining MMN to spatial change has also found a comparably early MMN, which disappeared when ‘deviant’ stimuli were presented in the context of sounds of varying locations, and hence had no ‘standard’ location context to compare with (Sonnadara et al., 2006). This suggests a memory trace mechanism. Moreover, the three deviants in our experiment occurred with equal frequency, and so the different MMN amplitude elicited by the three deviants cannot be ascribed solely to stimulus probability effects. Thus, the early phase of the response might include contributions from both mechanisms (cf. Halgren et al., 1995). The latency of the second phase and its more symmetrical distribution suggest that it reflects the longer latency memory trace MMN mechanism (Schröger & Wolff, 1996). In any case, both phases of the location change mismatch response showed a clear linear relationship to deviance magnitude.
To conclude, these results provide clear evidence that the mismatch response, localized to posterior auditory cortex, linearly tracks the degree of spatial change, with a fine resolution. Thus, as early as ∼100 ms, auditory sensory memory, localized to the region of the planum temporale, accurately represents differences of 10° between locations in space.
Acknowledgements
This work was supported by grant 9-2004-5 from the National Institute of Psychobiology in Israel founded by the Charles E. Smith family to LYD, grant 477/05 from the Israel Science Foundation to LYD, and NINDS Grant NS21135 to RTK.
Abbreviations
-
- loreta
-
- low resolution tomographic analysis
-
- MMN
-
- mismatch negativity
-
- PT
-
- planum temporale