Exploring word-referent mapping in Mandarin-speaking late-talkers at 33 months and its language predictors at 27 months: An eye-tracking study
Hsin-Hui Lu and Wei-Chun Che equally contributed to this study as the first authors.
Abstract
Background and Aims
This longitudinal study investigated the language skills, phonological working memory and lexical-tone perception of Mandarin-speaking late-talkers (LTs) and those with typical language development (TLD) at 27 months, while also examining their connections with novel word-referent mapping (W-R mapping) through eye-tracking at 33 months.
Methods and Procedures
Participants included 22 Mandarin-speaking 27-month-old LTs and 22 toddlers with TLD. Data on expressive and receptive language abilities, as well as phonological working memory and lexical-tone perception, were collected when participants were 27 months old. An eye-tracking paradigm was further employed during the word-learning tasks, which included W-R mapping and word-identification test (W-I test) phases at 33 months. Multilevel models were used to analyse participants’ gaze pattern trajectories.
Outcomes and Results
At 27 months, LT toddlers exhibited poorer language skills (receptive: p = 0.015, expressive: p < 0.001), lexical-tone perception (p < 0.001) and phonological working memory (p < 0.001) compared to those with TLD, even after considering maternal educational level and participants’ fine motor ability. During the W-I test phase, we observed that LT toddlers showed a slower increase in fixations on the novel target image while listening to the corresponding novel word over time, compared to TLD toddlers (linear: p = 0.011, quadratic: p = 0.007) after adding confounders. Further, expressive language ability at 27 months old was a predictor of their newly established W-R mappings at 33 months old (p = 0.016). Additionally, the toddler's phonological working memory and lexical-tone perception were associated with their expressive language ability (p = 0.001 and < 0.001).
Conclusions and Implications
These findings indicate that the novel W-R mapping is not as robust in LTs as in TLDs, and the skills necessary for word learning share similarities with a wide range of expressive language abilities. Moreover, poor expressive language abilities were associated with deficits in lexical processing abilities; that is, phonological working memory and lexical-tone perception. These findings suggest the need for interventions aimed at improving LTs’ lexical processing abilities to strengthen their lagging word-learning skills at toddlerhood.
WHAT THIS PAPER ADDS
What is already known on this subject
- Late-talkers (LTs) exhibit delays in expressive vocabulary development. Furthermore, they also perform poorly in word learning.
What this paper adds to existing knowledge
- The eye-tracking paradigm was employed and found that novel word-referent mapping (W-R mapping) is not as robust in LTs as in those with typical language development. Toddlers’ early expressive language ability could predict their ability to establish novel W-R mappings. Furthermore, the better the phonological working memory and lexical-tone perception of LTs are, the better their early expressive language ability is.
What are the clinical implications of this work?
- Interventions might consider incorporating strategies to improve phonological working memory and lexical-tone perception to help Mandarin-speaking LTs enhance linguistic capacities and build robust novel W-R mapping.
Video Short
Exploring word-referent mapping in Mandarin-speaking late-talkers at 33 months and its language predictors at 27 months: An eye-tracking study
by Lu et al.INTRODUCTION
Late-talkers (LTs), or toddlers identified with expressive vocabulary delay in the absence of cognitive, hearing, social and physical impairments, comprise about 10% to 20% of the total population of children aged 2 years (Desmarais et al., 2008; Moyle et al., 2011). They exhibit delays in expressive vocabulary development during toddlerhood (MacRoy-Higgins et al., 2016). The disadvantages of being an LT in terms of vocabulary ability also persist into school age (Fisher, 2017) or adolescence (Psyridou et al., 2018). It is clear from the vast body of existing research that poor vocabulary ability in LTs is related to environmental (e.g., maternal education) (Reilly et al., 2010) and cognitive factors (e.g., phonological working memory) (Baddeley et al., 1998). However, the mechanism underlying the learning of new vocabulary by LTs and its relationship with other lexical processing abilities is still unknown. Research on these issues will help us develop early intervention strategies and improve LTs’ vocabulary learning abilities.
Factors impacting word learning in LTs
One explanation for toddlers’ rapidly accelerating vocabulary size is fast-mapping, the process by which one can quickly map or initially associate words with their referents with minimal exposure. This process initiates the encoding of new words and forms memory traces of their representations through brief and incidental mappings to referents (Kucker & Seidler, 2022). LT toddlers have poorer fast-mapping ability than those with typical language development (TLD) (Asadi et al., 2021; Weismer et al., 2013). The difficulties of LTs in learning words could be attributed to their poorer representation in encoding novel words, as compared to their peers (Kucker & Seidler, 2022). This representation is more easily compromised in novel word production tasks than in novel word comprehension tasks (Weismer et al., 2013).
McMurray et al. (2012) proposed a dynamic model to explain how children learn new words in the context of ambiguity; that is, multi-referents. In the past, a constraint approach was used to explain how children solve the problem of referential ambiguity; that is, what the novel word refers. Constraints, such as mutual exclusivity and novel name–nameless principles, help children disambiguate references. However, McMurray et al. (2012) proposed that associative learning is sufficient to explain how children learn new words. According to their model, if children establish a strong and reliable connection between a novel label and its referent during the first encounter, they should be able to locate the referent more quickly in subsequent similar situations where multiple referents are present and they hear the label again. Once again, to find the newly mapped referent can serve as an indicator of the robustness of the newly built representation.
Other factors that are closely associated with word learning are phonological working memory and speech perception abilities. Sufficient phonological working memory is a prerequisite for encoding representations of spoken words (Baddeley et al., 1998), and is often measured by non-word repetition (Moyle et al., 2011). Non-word repetition performance in LTs is not comparable to that of those with TLD (Stokes & Klee, 2009a, 2009b). Speech perception abilities are another foundation for learning words (Chen et al., 2016; Gervain & Mehler, 2010). Mandarin-speaking toddlers must learn how to effectively encode lexical-tones of novel words because lexical-tone carries the meaning of a syllable in Mandarin (Singh & Fu, 2016). For example, /ma/ with Tone 1 means ‘mother’ while /ma/ with Tone 3 means ‘horse’. Poor perception of lexical tones may result in children being less accurate in encoding the phonological representations of newly learned words. Pitch contours of vowels manifest the lexical-tones of syllables in Mandarin, that is, Tone 1 with the level contour and Tone 3 with the dipping contour (Liu et al., 2007). The abilities of those with TLD to perceive lexical tones and vowels could predict their vocabulary acquisition (Lu & Tsao, 2014; Tsao et al., 2004). Mandarin-speaking LTs cannot perceive lexical tones as accurately as their peers (Chen et al., 2016), and difficulties in lexical-tone perception are negatively related to their ability to learn new words (Lu & Tsao, 2014). Based on the aforementioned studies, phonological working memory and lexical-tone perception might serve as predictors for novel word-referent mappings (W-R mappings), particularly among LTs learning a tonal language.
Methods for capturing the word-reference mapping procedure
The looking-while-listening (LWL) eye-tracking paradigm is suitable for detecting subtle real-time processes during word-learning tasks (Ellis et al., 2015; Weighall et al., 2017). Automated eye-tracking technology enables the expansion of the LWL paradigm to include a greater number of objects, while also providing precise monitoring of various eye movement parameters and the assessment of changes in the timing of eye movements. Gaze-contingent stimulus presentation proves particularly valuable for younger participants with fluctuating attention. Fernald and Marchman (2012) used the LWL paradigm to test the processing efficiency of comprehension of familiar words in LTs and those with TLD. They found that LTs with faster reaction time, that is, processing efficiency at 18 months, were more likely to gain greater vocabulary at 30 months. Peter et al. (2019) also found the same trend that fast processing speed positively predicts subsequent vocabulary and syntactic growth. These studies signified that temporal related parameter in the LWL paradigm is a sensitive index for word processing abilities.
Ellis et al.’s (2015) study is a notable contribution that employed eye-tracking techniques to explore the online process of word learning in LTs. To assess LTs’ real-time processing of novel word learning, Ellis and her team employed three indices: target divergence time, the proportion of time spent looking at the target versus the distractor, and the latency of the first fixation. In our study, we built upon the strengths of Ellis et al.’s work, using the same eye-tracking measures. We further employed multilevel models (MLMs) for data analysis. This approach enabled us to gain a more detailed understanding of dynamic changes in these measures over time and to statistically evaluate performance differences between LT and TLD groups.
Additionally, in Ellis et al.’s (2015) study, infants learned two novel word-referent pairs by looking at the object moving back and forth on the screen while the novel word played seven times. After the training, infants had to choose the correct object between the target and the distractor (another newly learned novel object) pictures after hearing the newly learned novel word. Ellis et al.’s method revealed that LTs face challenges in new word learning, but it did not specify the particular stage at which these challenges occur, whether during fast-mapping or later memory and retention phases. Our study focused on evaluating the word-reference mapping ability of LTs, emphasising conditions characterised by reference ambiguity, and specifically concentrating on the early phase of word learning.
The current study
-
Is there a difference in the expressive and receptive language abilities, phonological memory, and lexical-tone perception between LT and TLD toddlers?
-
Do LT toddlers show difficulties with novel word-referent mapping in word learning?
-
Do expressive and receptive language abilities, phonological memory and lexical-tone perception predict novel word-referent mapping ability?
METHODS
Participants
At 27 months, a total of 72 participants were recruited, with 71 participants completing the data collection: 35 in the LT group and 36 in the TLD group. By 33 months, one participant in the LT group had dropped out, seven did not complete the eye-tracking task due to fussiness, two did not pass the vocabulary checklist for the eye-tracking experiment, and data from three participants were unusable due to the experimental procedural errors. In the TLD group, all participants returned to the lab at 33 months to complete the eye-tracking test, but only those matched with the remaining cohort of LT participants at 27 months were included in the current study sample. Consequently, the sample consisted of 22 Mandarin-speaking LTs [15 boys; Mean (M) = 27.60 months, SD = 2.85] and 22 Mandarin-speaking toddlers with TLD (15 boys; M = 27.70 months, SD = 2.74). These participants were monolingual Mandarin speakers. The mean age of participants did not differ significantly between groups (p = 0.912). All participants were recruited from parenting websites or local paediatric clinics in northern Taiwan. The LT group was defined as having a smaller expressive vocabulary size [percentile rank ≤15 in the Mandarin Chinese version of the Words and Sentences Forms of the MacArthur-Bates Communication Development Inventory-Toddler (MCDI-T; Liu & Tsao, 2010), while participants in the TLD group had a normal expressive vocabulary size with a percentile rank >25. Participants in both groups had normal cognitive and fine-motor abilities (evaluated using the Bayley Scales of Infant and Toddler Development-III; Bayley, 2006), and passed the Modified Checklist for Autism in Toddlers (M-CHAT; Robins et al., 1999), a screening for autism. Each toddler in the LT group was matched with a TLD toddler with the same sex and birth order. The age difference between participants in the LT group and their pairs was within 1 month. According to parental reports, all participants were born after at least 36 weeks of gestation, with birth weights >2500 g, and were free from prenatal and perinatal complications. Participants had no reports of other critical incidents, chronic diseases or sensorimotor deficits.
Table 1 presents the cognitive and fine-motor abilities of the TLD and LT participants. The LT group exhibited lower fine-motor skills than the TLD group [F(1, 42) = 5.16, p = 0.028]. No significant differences in cognitive abilities between the two groups were noted (p = 0.336). Additionally, the maternal educational level of the TLD group was higher than that of the LT group [F(1, 42) = 5.59, p = 0.023]. Vocabulary and syntactic skills, based on parental reports, also were lower in the LT group than in the TLD group [Fs(1, 42) = 141.03 and 94.33, respectively; p < 0.001; ηp2s = 0.77 and 0.69].
TLD | LT | ||||
---|---|---|---|---|---|
(n = 22) | (n = 22) | pvalue | |||
Demographic characteristics | |||||
Age (months) | 27.68 | 2.76 | 27.60 | 2.85 | 0.933 |
Maternal education level (years) | 16.73 | 1.45 | 15.45 | 2.06 | 0.023 |
Performances on tests and tasks | |||||
Cognition (SS)b | 11.68 | 1.99 | 11.00 | 2.62 | 0.336 |
Receptive language (SS)b | 11.64 | 1.59 | 9.55 | 1.99 | 0.015 |
Expressive language (SS)b | 9.64 | 1.14 | 6.00 | 1.07 | <0.001 |
Fine-motor (SS)b | 11.64 | 1.76 | 10.59 | 1.53 | 0.028 |
Vocabulary size (z-score)a | 0.58 | 0.71 | −1.55 | 0.46 | <0.001 |
Syntax (z-score)a | 0.59 | 0.78 | −1.53 | 0.66 | < 0.001 |
Lexical-tone perception (CR)c | 0.86 | 0.10 | 0.72 | 0.11 | < 0.001 |
Phonological working memory (CR)c | 0.88 | 0.14 | 0.45 | 0.23 | < 0.001. |
- Note: The p-value were obtained from ANOVAs/ANCOVAs with participants’ fine-motor ability and maternal educational level. Data are presented as M(SD).
- Abbreviations: ANCOVA, analysis of covariance; ANOVA, analysis of variance; LT, late-talking; MCDI-T, MacArthur-Bates Communication Development Inventory-Toddler; TLD, typical language development.
- a z-score were from MCDI-T norm.
- b SS: scale score, from Bayley-III norm.
- c CR: correct response rates
Measures at 27 months
M-CHAT
We screened for autism using the Modified Checklist for Autism in Toddlers (Robins et al., 1999), which has been used to assess toddlers aged 16−30 months.
MCDI-T
We assessed participants’ vocabulary production using the Words and Sentences forms of the MCDI-T (Liu & Tsao, 2010), which has been used to assess toddlers aged 16−36 months.
Bayley scales of infant development (Bayley-III)
The language, cognitive and fine-motor subscales in Bayley-III (Bayley, 2006), used to assess toddlers aged 16−42 months, were employed to assess participants’ receptive and expressive language, cognitive and fine-motor abilities. The assessments were administered in Mandarin Chinese by the licensed clinical psychologist. The Bayley-III receptive and expressive language, cognitive and fine-motor scales were translated into Mandarin from the English version. The translation of the receptive and expressive scales used in this study was also modified considering Mandarin and Chinese cultural characteristics. Although norms for the Bayley-III have not been established for children in Taiwan, a previous study demonstrated that the Bayley-III is suitable for Taiwanese children (Yu et al., 2013). Specifically, the intra- and inter-rater reliabilities of the Bayley-III were found to be good to excellent for Taiwanese children. Many studies in Taiwan have also used English normed scores for analysis (Lin et al., 2020, 2021). We derived the scaled scores from the norms of the Bayley-III English version and used these scores in later analyses.
Non-word repetition task
Participants were asked to repeat pseudo-words to assess their phonological working memory abilities. The pseudo-word list comprised 20 words with one to four syllables, with syllable structures conforming to Mandarin phonotactic rules. The correct rate was calculated for each participant. The detailed experimental materials have been published (Lu & Tsao, 2020).
Lexical-tone perception task
Minimal pairs of monosyllabic words with varied lexical tones were used to test participants’ abilities to discriminate lexical tones. There are four tones in Mandarin; thus, 12 pairs, each tone with three pairs (e.g., tone 1 paired with tone 2, tone 3, and tone 4 respectively), were used in this task. The correct response rate for each tone was calculated for each participant. The detailed experimental materials have been published (Lu & Tsao, 2014).
Experimental stimuli
The stimuli comprised static images of four animals presented on a white background. The animal images used in the experiment were bright and colourful [mean width: 410.8 pixels (SD = 25.3 pixels), mean height: 349.2 pixels (SD = 11.4 pixels)] with each image placed at the centre of four quadrants on the screen. The sizes of the four images were similar such that the chance of participants looking at the four images was the same under the context without semantic priming.
The word-learning task comprised six experimental trials, with two control trials using familiar words and four experimental trials using novel words. Each trial presented four animal pictures, with one target picture and three distractors. Considering the potential right visual field preference in LTs (Ellis et al., 2015), we designed a multiple-object condition (i.e., four images) to reduce the effect of participants' preference about the target image's location on the interpretation of word-learning process. In the familiar-word trials, the target words were familiar animals such as ‘pig’ (/xiao3-zhu1/) and ‘frog’ (/qing1-wa1/). In the novel word trials, the target words were unfamiliar animal names like ‘rong2-yuan2’ (salamander) and ‘bian3-fu2’ (bat), or nonwords like ‘zi3-xie1’ and ‘chuang2-jia3’. All target words comprised two syllables. To serve as distractors, 18 well-known animals familiar to children aged 2 to 4 years were selected. The positions of the novel animals on the screen were counterbalanced across the trials. Additionally, two carrier phrases (XX zai4-na3-li3? ‘where is the XX?’ and XX bu2-jian4-le ‘XX disappeared’) and six target words were audio recorded and then concatenated into the final audio utterances. The auditory stimuli were recorded by a native Mandarin female speaker, using a mono channel at a sampling rate of 44 kHz with a digital recorder. The mean duration of familiar/novel words was 1157 ms (SD = 19 ms). The mean duration of the audio string was 2 to 3 s.
Apparatus
Eye-tracking data were collected using Tobii T60 (Tobii Technology, Stockholm, Sweden), with Tobii Studio 3.0.0.128 software and a sampling rate of 60 Hz. The visual stimuli were presented on a 17’’ monitor with a screen resolution of 1280 × 768 pixels, and the auditory stimuli were played using loudspeakers (BOSE 240 V AP) at 70 dBA.
Procedures
The experimental procedure of this study is illustrated in Figure 1. Participants visited the laboratory at two time points. Basic information and baseline data were collected at Time 1. Considering the toddlers’ short attention span, data collection at Time 1 was divided into two laboratory visits. During the first visit, participants were administered Bayley-III, and participants’ parents completed the M-CHAT and MCDI-T and provided demographic information. During the second visit, tasks for testing early language processing abilities were performed. Data collection from participants and their parents was conducted in a quiet room. The duration of the laboratory visit was approximately 1−1.5 h.

At Time 2, 6 months after Time 1, participants visited the laboratory again for the eye-tracking experiment. During the experiment, the children sat on their parents’ lap in a quiet room, 60 cm away from the screen. Caregivers wore headphones and heard masking music and were instructed not to look at the screen to prevent interference with participants’ responses. At the beginning of the experiment, a five-point infant calibration within Tobii Studio was used, and participants watched the video clips for approximately 5 min. Additionally, before the eye-tracking experiment began, the parents had to complete a vocabulary checklist to ensure that the children were familiar with the known animals and could not name the novel animals.
Each experimental trial comprised three phases (Figure 2A): the showing phase for attention-getting, W-R mapping phase for target word and its referent mapping, and word-identification test (W-I test) phase for testing W-R mapping outcome. During the showing phase (0–18000 ms from the trial onset), four animals were simultaneously displayed on the screen for the first 5 s. To engage the toddler's attention, the four animals took turns disappearing for 1 s at 5000, 9000, 13000 and 17000 ms, respectively. The order of disappearance was pseudorandomized regardless of the target and distractors. As this phase lasted from 5000 to 18000 ms (i.e., 13 s) after the trial onset, each of the four items in one trial had an equal appearance duration of 12 s and a 1 s disappearance. After that, four animals reappeared on the screen to start the W-R mapping phase (18000–32000 ms from the trial onset), and the children listened to the utterance ‘XX. XX, zai4-na3-li3?’ (XX. XX, where is it?) while looking at four animals on the screen. In the last 6 s of the W-R mapping phase (28000–32000 ms from the trial onset), children listened to the utterance ‘XX, bu2 jian4 le’ (XX, disappeared) paired with the target animal, which then disappeared gradually within the last 4 s. Next, during the W-I test phase (32000–38000 ms from the trial onset), the four animals reappeared on the screen at different positions, and the children listened to the phase with the new word, ‘XX, zai4-na3-li3’ (XX, where is it?) and were asked to look at the corresponding image of the newly learned word. At the end of the trial, the children received cartoon pictures as reinforcers (38000–43000 ms from the trial onset).

To familiarise toddlers with the apparatus and experimental procedures, a familiarisation trial was conducted before the six experimental trials began. The trial used words from the fruit vocabulary typically known to 2-year-old Mandarin-speaking toddlers, along with corresponding images. Before starting this familiarisation trial, the fruit vocabulary list was cross-checked with the parents to ensure the child's familiarity with the words. Similar to the experimental trials, this familiarisation trial also comprised three phases: the showing phase, the W-R mapping phase, and the W-I test phase. The verbal prompts used throughout this trial mirrored those of the experimental trials, including expressions such as ‘XX, XX, zai4-na3-li3?’ (XX, XX, where is it?), ‘XX, bu2 jian4 le’ (XX disappeared), and ‘XX, zai4-na3-li3?’ (XX, where is it?). The eye-tracking trajectories of one participant in a novel-word trial are shown in Figure 2B.
The Research Ethics Committee of National Taiwan University approved this study, and the parents of all participants provided informed, written consent before participating.
Eye movement metrics and data analysis
Eye movement metrics
The first 5 s of the showing, W-R mapping and W-I test phases were segmented as epochs. Fixation data during the eye-tracking were captured for predefined areas of interest (AOIs). The AOIs for the word-learning task were the target animal image in each trial. Fixations were defined automatically, using Tobii Studio's default fixation filters setting (identification-velocity thresholdfilter). First, the raw eye-tracking data points for fixation in each phase were extracted using Tobii Studio. Three time windows for extracting eye movement data points were segmented, such that each time window was approximately 2850 ms, comprising 171 points (i.e., 16.67 ms/point). The first time window was from the showing phase (i.e., 500–3350 ms from the trial onset), the second from the W-R mapping (i.e., 18500–21350 ms from the trial onset), and the third from the W-I test phase (i.e., 32500–35350 ms from the trial onset). The visual stimuli in the first time window (i.e., showing phase) were presented without the auditory label, while those in the other two time windows (i.e., W-R mapping and W-I test phases) were presented with the exact same auditory label [i.e., XX, zai4-na3-li3 (XX, where is it?)]. Therefore, the time windows of comparison between critical W-R mapping and W-I test phases could be made directly without involving auditory confounding. The changes in trajectories for looking at the AOIs that matched the labels heard and the total fixation time on the AOIs in each time window were further analysed.
Data analysis
To answer the first research question “Is there a difference in the expressive and receptive language abilities, phonological memory, and lexical-tone perception between LT and TLD toddlers?”, we compared the performance of participants in receptive and expressive language abilities, lexical-tone perception, and phonological working memory between the LT and TLD groups. Analyses of covariance (ANCOVAs) were performed to examine receptive and expressive language abilities, lexical-tone perception and phonological working memory, with the covariates of participants’ fine-motor ability and maternal educational level.
For the second question, “Do LT toddlers show difficulties with novel W-R mapping in word learning?”, we used the eye-tracking metric that demonstrated preferences toward the target image as shown in the toddlers’ fixations to the target image over time between the TLD and LT groups. The MLMs for change were used to estimate models that include estimates of intra-individual change (i.e., repeated fixation on the target or not, referring to multiple points in times; Level 1) over time, as well as inter-individual variability (i.e., individual characteristics; Level 2). The trajectories of familiar on showing phase, novel on showing phase, familiar on W-R mapping phase, novel on W-R mapping phase, familiar on W-I test phase, and novel on W-I test phase, were examined separately. This is similar to growth curve analyses (Nayar et al., 2022) but is estimated using the MLM framework. Specifically, orthogonal polynomial terms, each representing a different pattern of looking, were added in Level 1: (1) the linear time reflected an increase in fixations to the target images over time in a linear fashion; and (2) the quadratic time reflected the dynamic nature of fixating on the AOIs or switching from the target images to non-target images. The Level 2 model included group (TLD, LT; reference = the TLD group) on intercept, linear time and quadratic time to examine the interaction between the late-talking, linear and quadratic change rate of participants’ fixations to the target, and participants’ fine-motor ability and maternal educational level as adjustments for confounders. Collectively, the set of analyses not only provided information on the dynamic patterns of looking via MLMs but also delineated the difference by which groups diverge from one another. In addition, the total fixation time in the time windows of the showing, W-R mapping and W-I test phases were compared between the LT and TLD groups. If the TLD group noticed and looked at the AOIs sooner, the total fixation time should be significantly longer than that of the LT group.
Finally, to answer the question, “Do expressive and receptive language abilities, phonological memory, and lexical-tone perception predict novel word-referent mapping ability?”, MLMs were also used to estimate the fixations on the AOI over time in the W-I test phase of the novel. The models included estimates of change in intra-individual fixations on the target image over the period (Level 1) and inter-individual variability (Level 2) for receptive and expressive language abilities, lexical-tone perception and phonological working memory and other covariates (fine-motor ability and maternal educational level) in the trajectories. Additionally, we performed a hierarchical multiple regression analysis to investigate the predictors of the total fixation time of the novel images during 500–3350 ms of the W-I test phase (i.e., 32500–35350 ms from the trial onset). The regression model involved seven predictors entered in steps. Step 1 included maternal education and the participants’ cognitive and fine-motor abilities, and Step 2 involved receptive and expressive language abilities, lexical-tone perception and phonological working memory. The results reported the regression statistics (β and p) for the final models (i.e., when all predictors were entered).
The HLM software package, version 7.03 (Scientific Software International Inc., Skokie, IL) (Raudenbush et al., 2016) was used to fit the MLMs. The models were fitted using a restricted maximum likelihood estimation. For the MLMs, a population-averaged model with robust standard errors is described subsequently. Other statistical analyses in this study were performed using IBM SPSS Statistics for Windows, Version 25.0.0 (IBM Corp.), with a significance level (α) of p < 0.05.
RESULTS
Is there a difference in the expressive and receptive language abilities, phonological memory and lexical-tone perception between the LT and TLD toddlers?
Table 1 presents participants’ expressive and receptive language abilities, phonological memory and lexical-tone perception at Time 1. The LT group exhibited lower receptive language abilities [F(1, 40) = 6.53, p = 0.015, ηp2 = 0.14] and expressive language abilities [F(1, 40) = 89.26, p < 0.001, ηp2 = 0.69] than the TLD group after the maternal educational level and fine-motor ability were considered as covariates. Moreover, results of the ANCOVA showed that the LT group exhibited less accurate perception of lexical tones [F(1, 40) = 15.77, p < 0.001, ηp2 = 0.28] and poorer phonological working memory [F(1, 40) = 43.48, p < 0.001, ηp2 = 0.52] than the TLD group.
Do LT toddlers show difficulties with novel W-R mapping in word learning?
To understand the difference between the TLD and LT groups in terms of the change in fixation toward the target image during the showing, W-R mapping and W-I test phases, the MLM for change was conducted for an estimation. During the familiar and novel showing phases (i.e., without auditory label), no significance was found in the linear time and the quadratic time (familiar: ps = 0.638 and 0.817; novel: ps = 0.190 and 0.927), and the groups did not differ in their linear and quadratic rate of change (familiar: ps = 0.147 and 0.186; novel: ps = 0.470 and 0.785). As shown in Figure 3A,B, participants’ fixations to the target image did not change over time.

During the familiar and novel W-R mapping phases (i.e., with auditory label), significance was found in the linear time (β = 0.001916 and 0.001622, 95% confidence intervals [CIs] = 0.000176–0.003656 and 0.000076–0.003168, p = 0.037 and 0.046); however, the groups did not differ in their estimated intercept and linear and quadratic rate of change (familiar: p = 0.089, 0.080, 0.574; novel: 0.127, 0.066, 0.313). As shown in Figure 3C,D, participants’ fixations to the target image increased over time, but the change in the fixations on the target over time did not differ between the TLD and LT groups. During the familiar W-I test phases (i.e., with auditory label), significance was found in the linear time (β = 0.004124, 95% CI = 0.002048–0.006200, p < 0.001) and the quadratic time (β = −0.000023, 95% CI = −0.000035 to −0.000011, p < 0.001); however, the groups did not differ in the estimated intercept and linear and quadratic rate of change (p = 0.522, 0.579 and 0.517). As shown in Figure 3E, participants’ fixations to the target image increased over time, but the change in the fixations to the target over time did not differ between the TLD and LT groups.
During the novel W-I test phase, significance was found in the linear time (β = 0.004791, 95% CI = 0.002986–0.006596, p < 0.001) and the quadratic time (β = −0.000024, 95% CI = −0.000034 to −0.000014, p < 0.001). Moreover, the fixations to the target at the onset of the trial did not differ between the TLD and LT groups, as indicated by their estimated intercept (p = 0.715); however, the groups differed in their linear (β = −0.003154, 95% CI = −0.005465 to −0.000843, p = 0.011) and quadratic (β = 0.000018, 95% CI = 0.000004−0.000032, p = 0.007) rate of change in the W-I test phase. The two groups still differed in their linear (adjusted β = −0.003154, 95% CI = −0.005464 to −0.000843, p = 0.011) and quadratic (adjusted β = 0.000018, 95% CI = 0.000004−0.000031, p = 0.007) rate of change in the novel W-I test phase after controlling for maternal educational level and children's fine-motor ability. As shown in Figure 3F, the fixations at the target in the TLD group increased more than that in the LT toddlers. We further compared the total fixation time in the W-I test phase (500–3350 ms) between the LT and TLD groups (LT: M = 383.03 ms, SD = 285.16 ms; TLD: M = 603.15 ms, SD = 435.52 ms), and found that total fixation time in the TLD group was longer than that in the LT group [F(1, 40) = 4.91, p = 0.033, ηp2 = 0.11] after the maternal educational level and fine-motor ability were considered as covariates. The TLD group indeed looked at the target image longer than LT group in novel W-I test phase.
Do expressive and receptive language abilities, phonological memory and lexical-tone perception predict novel W-R mapping ability?
The MLM was used to estimate the change in the fixation toward the target image on the W-I test phase of the novel word as associated with expressive and receptive language abilities, phonological memory and lexical-tone perception. Participants with better early expressive language ability could notice and look at the target more quickly 6 months later when they retrieved the novel words (β = 0.000294, 95% CI = 0.000016−0.000572, p = 0.045). The trend remained (adjusted β = 0.000306, 95% CI = 0.000020−0.000592, p = 0.043) after the maternal education, cognitive and fine-motor abilities were considered as covariates.
The bivariate correlations between expressive and receptive language abilities, phonological working memory, lexical-tone perception and total fixation time in the novel image during 500–3350 ms from the onset of the W-I test phase (i.e., 32500–35350 ms from the trial onset) are presented in Table 2. We found that the better the toddlers’ scores for expressive language (r = 0.36, p = 0.016), the longer their total fixation time on the novel image during the W-I test phase. The better the toddlers’ scores for lexical-tone perception (r = 0.49, p = 0.001) and phonological working memory were (r = 0.78, p < 0.001), the better their expressive language ability was.
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|
1. Total fixation time on W-I test_Novel (500–3350 ms) | 1.00 | ||||||
2. Receptive language | 0.04 | 1.00 | |||||
3. Expressive language | 0.36* | 0.50** | 1.00 | ||||
4. Phonological working memory | 0.23 | 0.25 | 0.78** | 1.00 | |||
5. Lexical-tone perception | 0.10 | 0.43** | 0.49** | 0.36* | 1.00 | ||
6. Mother educational level | 0.07 | 0.39** | 0.31* | 0.21 | 0.11 | 1.00 | |
7. Cognition | 0.02 | 0.75** | 0.22 | −0.07 | 0.09 | 0.32* | 1.00 |
8. Fine-motor | −0.09 | 0.53** | 0.38* | 0.25 | 0.36* | 0.24 | 0.47** |
- Abbreviation: W-I test, word-identification test phase.
- *p < 0.05, **p < 0.01.
Hierarchical multiple regression analysis assessed the predictors of total fixation time during 500–3350 ms from the onset of novel's W-I test phase. Maternal educational level and participant's cognitive and fine-motor abilities, added at Step 1, did not make a significant contribution to the prediction of total fixation time (p = 0.573, 0.988 and 0.537). In Step 2, which includes receptive and expressive language, phonological working memory, and lexical-tone perception at 27 months, we observed that neither receptive language, phonological working memory nor lexical-tone perception contributed significantly to the prediction (p = 0.616, 0.484 and 0.900); however, expressive language ability (adjusted β = 0.66, p = 0.025) was a significant predictor of total fixation time during the novel W-I test phase (500–3350 ms). Another hierarchical multiple regression analysis was used to assess the predictors of expressive language ability. Maternal educational level and participant's cognitive and fine-motor abilities, added at Step 1, could significantly contribute to the prediction of expressive language ability (adjusted R2 = 0.13, p = 0.036). The addition of phonological working memory and lexical-tone perception at Step 2 significantly contributed to prediction (adjusted R2 = 0.69, p < 0.001). In Step 2 of the hierarchical regression model, with all variables included, greater cognitive ability (adjusted β = 0.23, p = 0.030), lexical-tone perception (adjusted β = 0.22, p = 0.030) and phonological working memory (adjusted β = 0.70, p < 0.001) were associated with greater expressive language ability.
DISCUSSION
Three main findings emerged from this study: first, the expressive and receptive language abilities of 2-year-old LTs fall behind those of toddlers with TLD, as previous research has shown (Henrichs et al., 2011). Moreover, the phonological working memory and lexical-tone perception of LTs were significantly poorer than those of toddlers with TLD. Buschmann et al. (2015) found that LTs had poor performance in phonological working memory, while Chen et al. (2016) found that they struggled with lexical-tone perception compared to those with TLD. Second, LT toddlers can identify an appropriate referent when hearing a novel name for the first time in a situation of referential ambiguity just like toddlers with TLD can. However, it took LTs significantly longer than those with TLD to match the image later when they were asked to look for the newly learned word again in a multi-referent context (i.e., to test the word comprehension). Additionally, early expressive language abilities could serve as predictors for newly established W-R mappings in toddlers. Further, phonological working memory and lexical-tone perception predicted expressive language ability during toddlerhood.
Compared with the novel W-R mapping and W-I test phases (with labels), participants did not look at the referent over time in the showing phase (without labels), indicating that fixation to the target in the W-R mapping and W-I test phases was related to the auditory labelling of the novel word. Moreover, the fixations to the referent in LTs when hearing a novel word did not show a difference from that in TLD toddlers in the novel W-R mapping phase. This result suggests that LTs as well as toddlers with TLD may use the same lexical learning constraints, such as mutual exclusivity and the novel name-nameless principle, to assist them in correctly mapping novel labels to novel referents in a multi-referent context. Therefore, LTs do not encounter difficulties in applying these lexical learning constraints to select referents associated with novel words. However, this study did not explicitly investigate any specific lexical constraints in LTs. Further investigation is needed to explore this aspect specifically in future research.
When examining the robustness of the newly established W-R mappings during the novel W-I test phase, LTs performed worse than TLD toddlers. The fixations on the target increased more over time in the TLD group than in the LT group. Further, the total fixation time in the TLD group was longer than that in the LT group. However, there was no significant difference in the proportion of gazes to the target of eye movements between the TLD and LT groups during the novel W-I test phase (LT: M = 0.50, SD = 0.22; TLD: M = 0.56, SD = 0.21; p = 0.349). This suggests that the gaze process of word learning and the difference in gaze duration between the two groups were not related to the proportion of gazes to the target of eye movements. Compared to TLD toddlers, LTs had poorer word representation ability, suggesting that LTs might experience more challenges in either encoding phonological details (as shown in less accurate lexical-tone perception) of novel words or/and to retrieve word representations from memory. Alternately, the results could also stem from the possibility that LTs rapidly encoded the novel word during the W-R phase, leading to habituation and reduced attention toward the target. Further investigation with a retention trial is necessary to clarify this finding. Additionally, the duration of time that LTs looked at the novel referent during the W-R mapping phase (18000–32000 ms from the trial onset; i.e., the mapping phase) was not significantly different from that of TLDs (LT: M = 2914.79 ms, SD = 1772.94; TLD: M = 3328.13 ms, SD 1811.95; p = 0.449)]. This implies that LT toddlers’ poor word-reference mapping ability does not stem from the shorter exposure time to map the novel word to the novel referent during the learning phase.
Therefore, the connection weights, based on McMurray et al.’s (2012) model, between the novel word and referent in LTs might not be as robust as those in TLDs. Further, our findings also support the idea that the robust representation of the word referent is a gradual, cumulative process and is not an all-or-none phenomenon (Yurovsky et al., 2014). Recent research on children (Gordon et al., 2021) and adults (McGregor et al., 2017) with developmental language disorders has found that their performance in encoding new words is not as good as that of their peers. However, limited research has investigated the formulation of poor representations during word learning in LTs (Kucker & Seidler, 2022). Our findings shed new light on this topic.
Children's performance in expressive language ability at the age of 2 years could predict their performance in novel word mapping ability at 2.5 years. Socio-economic status (SES) has an impact in LTs’ language development and is a significant predictor for expressive language abilities (Fisher, 2017) and school readiness (Hammer et al., 2017). However, even after controlling for SES, the performance in W-I test phases for novel words between the TLD and LT groups remained significant. This suggests that the observed difference is not associated with SES but may be associated with other factors, such as the abilities underlying W-R mapping processes. Additionally, the better their early expressive language ability is, the better their phonological working memory and lexical-tone perception are. Phonological working memory, lexical-tone perception and novel word mapping ability enable better learning of new words, which facilitates the development of expressive language ability. These findings support the idea that skills required in word learning share the same properties as those involved in a wide range of linguistic capabilities (Marchman & Fernald, 2008). Based on the organisation of development perspective, early patterns of skills such as speech perception and phonological working memory are incorporated into successive reorganisations over time and transform into higher-order functions such as lexical capabilities. These linguistic capabilities reinforce later language learning.
The expressive language abilities were measured using the Bayley-III language subscale. It was chosen primarily because of its broader age range norms and its coverage of various language domains, including non-verbal communication, vocabulary comprehension and production, syntax comprehension and production, and narrative skills. Moreover, word learning is a complex process that relies on holistic language abilities, rather than solely on vocabulary or syntax expression as measured by the MCDI-T.
An implication of our study is that lexical-tone perception and phonological working memory are closely related to LTs’ abilities for language acquisition, and that qualitative differences in eye-tracking trajectories were observed between LTs and TLD toddlers. However, to identify clinically useful predictors for differentiating LTs whose language abilities would catch up from those whose language difficulties would persist, further longitudinal studies are required to follow up with participants into school age to disentangle the relationship among lexical-tone perception, phonological working memory, eye-tracking measures, and lexical growth in LTs.
Lastly, when searching for familiar objects upon hearing familiar words, the reaction time for finding the objects in LTs showed no differences from that of children with TLD in our study. This demonstrated that the efficiency of LTs in processing familiar words is the same as that of TLDs. This result is also consistent with previous findings (Fernald & Marchman, 2012). Further research can investigate the mechanisms through which LTs can develop word representations that are comparable to children with TLD.
Limitations and future research directions
Although the diagnosis of LTs can occur between 18 and 35 months (Hawa & Spanoudis, 2014), we recruited participants older than 2 years to reduce the risk of false diagnosis. The LT group was initially diagnosed at Time 1, and the eye-tracking data were collected at Time 2, which was 6 months after the diagnosis. As we did not reassess their expressive vocabulary size, it is possible that some participants in the LT group could have caught up with their language development by that time. Further research is required to assess whether the rate of growth in expressive vocabulary can predict word learning in toddlers. Second, we increased the number of images from the typical 2 to 4 to minimise participants’ preconceived expectations regarding the location of the target image. Nevertheless, we recognise that the inclusion of extra distractor images could potentially place greater demands on participants’ working memory and cognitive processing.
Lastly, the Bayley-III used in this study was translated from the English version into Mandarin, with modifications to consider Mandarin and Chinese cultural characteristics. Although norms for the Bayley-III have not been established for infants and toddlers in Taiwan, a previous study demonstrated that the Bayley-III is suitable for Taiwanese children (Yu et al., 2013). Many studies in Taiwan have also used English normed scores for analysis (Lin et al., 2020, 2021). Therefore, we derived the scale scores from the norms of the Bayley-III English version and used these scores in analyses. These scores were not used as criteria or cutoff scores to group participants in the study. Instead, they were employed to investigate whether there were any group differences between the LT and TD groups and to assess whether there were any predictors at 27 months of age. Therefore, it is important to note that the results of this study only present the relative association between early holistic language abilities in children and subsequent word learning. Additionally, the Bayley-4 has been translated into Traditional Chinese and normed for Taiwan (Guo, 2023). Future related research can use the Bayley-4 Chinese version to enhance the validity of assessing children's developmental abilities.
CONCLUSIONS
LTs aged 2.5 years could map a novel label to its referent by using lexical constraints as their peers do. However, their mapping representation was not as robust as that of their peers, which could hinder them from retaining these newly learned words. Additionally, early expressive language skills can predict the ability to map new words to their referents and are positively related to lexical-tone perception and phonological working memory. This suggests that skills required in word learning share the same properties as those involved in a wide range of linguistic capabilities.
Overall, our eye-tracking findings indicate that LTs exhibit poorer performance in early word-learning processes such as word-reference mapping compared to toddlers with TLD. Eye-tracking has been found to be a viable tool to detect certain disorders in children, such as autism spectrum disorder (Vargas-Cuentas et al., 2017). Using equipment to examine toddlers’ word-learning processes to identify LTs can be a potential direction for future research. Additionally, the development of standardised tools for measuring lexical-tone perception and phonological working memory could also aid in the early identification of language delays.
Furthermore, our results suggest the need for developing strategies to enhance the robustness of word-reference mapping in LTs. Intervention efforts targeting the improvement of lexical-tone perception and phonological working memory in LTs may be beneficial for enhancing their word learning abilities. However, relevant studies are very limited. Previous studies have shown that lexical-tone perception abilities are related to word learning (Tong et al., 2015), and can be trained and improved in naïve learners (Wang et al., 1999) and children with cochlear implantation (Zhang et al., 2021). A recent study showed that an intervention program incorporating speech perception improved vocabulary and lexical-tone discrimination abilities in children with developmental language disorder (Chen & Lin, 2022). Further research on the design and effectiveness of intervention programs incorporating lexical-tone perception and phonological working memory training is required.
ACKNOWLEDGEMENTS
We thank the participating families and children. We also thank all the research study staff and Chung Shan Medical University for assisting in the research. The authors disclosed receipt of the following financial support for the research, authorship or publication of this article: partial funding was provided by a grant to H.-H. Lu (NSTC 111-2410-H-182-036-MY4) and F.-M. Tsao (NSTC 112-2410-H-002-203-MY3) from the National Science and Technology Council, Taiwan. No additional external funding was received for this study. The National Science and Technology Council had no role in designing the study, collecting data, analysing and interpreting data or writing the manuscript.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
PATIENT CONSENT STATEMENT
The parents of all participants provided informed, written consent before participating
PERMISSION TO REPRODUCE MATERIAL FROM OTHER SOURCES
Not applicable
Open Research
DATA AVAILABILITY STATEMENT
The datasets used or analysed during this study are available from the corresponding author on reasonable request.