Syllable Frequency and Spoken Word Recognition: An Inhibitory Effect

Research has shown that syllables play a relevant role in lexical access in Spanish, a shallow language with a transparent syllabic structure. Syllable frequency has been shown to have an inhibitory effect on visual word recognition in Spanish. However, no study has examined the syllable frequency effect on spoken word recognition. The present study tested the effect of the frequency of the first syllable on recognition of spoken Spanish words. A sample of 45 young adults (33 women, 12 men; M ¼ 20.4, SD ¼ 2.8; college students) performed an auditory lexical decision on 128 Spanish disyllabic words and 128 disyllabic nonwords. Words were selected so that lexical and first syllable frequency were manipulated in a within-subject 2 (cid:2) 2 design, and six additional independent variables were controlled: token positional frequency of the second syllable, number of phonemes, position of lexical stress, number of phonological neighbors, number of phonological neighbors that have higher frequencies than the word, and acoustical durations measured in milliseconds. Decision latencies and error rates were submitted to linear mixed models analysis. Results showed a typical facilitatory effect of the lexical frequency and, importantly, an inhibitory effect of the first syllable frequency on reaction times and error rates.


Introduction
A relevant issue in psycholinguistics is the role played by sublexical components in lexical access.Several sublexical units have been proposed as functionally relevant in visual word recognition, including syllables.Research has shown that some syllable-based variables could influence word processing, particularly the frequency of each syllable within a language.Nevertheless, the evidence suggests that the syllable frequency effect is language-dependent.Thus, the role of phonological syllables in processing English visual words is controversial, and recent research has found facilitative effects of syllable frequency on the performance of naming and lexical decision tasks (Macizo & Van Petten, 2007); that is, words with high-frequency syllables yield shorter latencies than words with low-frequency syllables.We must consider that English is a language with ambiguous and ill-defined syllable boundaries; indeed, there is no consensus among linguists on how words are syllabified and syllable boundaries tend to be modified by other linguistic factors, such as stress or morphological structure (Eddington, Treiman, & Elzinga, 2013).
Unlike English, Spanish is a language with a shallow and transparent syllabic structure in which every syllable has clear and well-defined boundaries not affected by other factors.Research has found a clear inhibitory effect of syllable frequency on Spanish word recognition in the visual domain.In a seminal paper, Carreiras, A ´lvarez, and De Vega (1993) observed that Spanish words made up of high-frequency syllables were processed more slowly than words constituted by low-frequency syllables.This apparently counterintuitive result was obtained both in lexical decision times and naming latencies.In an earlier report using a moving window task, De Vega, Carreiras, Gutie´rrez, and Alonso (1990) had observed that reading words within texts yielded times that were inversely related to the frequency of their constituent syllables, particularly the token positional frequency of the first syllable.The finding of an inhibitory influence of syllable frequency on visual word recognition has been replicated for French (Mathey & Zagar, 2002) and German (Conrad & Jacobs, 2004).Furthermore, the evidence suggests that syllabic effects are separated from an orthographic redundancy due to the mere effect of letter clusters (Carreiras et al., 1993;Conrad, Carreiras, Tamm, & Jacobs, 2009) and they influence eye-movement behavior (Hutzler, Conrad, & Jacobs, 2005), electrophysiological correlates (Barber, Vergara, & Carreiras, 2004;Hutzler et al., 2004), and brain activity measured by means of functional magnetic resonance imaging (Carreiras, Mechelli, & Price, 2006).
The syllable frequency effect has been interpreted in terms of competition among representations of words: the basic assumption is that syllable neighbors, or words sharing a syllable with the target stimulus (especially the first syllable), reach some level of activation and compete with the target, resulting in a slower word processing.Since Carreiras et al. (1993) was published, it is striking that no study has specifically examined the syllable frequency effect on spoken word recognition.The syllable has been considered as a relevant functional unit in speech perception and experimental data suggest that syllables play a key role in the segmentation of fluent speech.For example, in French the detection of a speech fragment is facilitated in words that contain the fragment as a syllable, compared to words in which the fragment crosses a syllable boundary (Mehler, Dommergues, Frauenfelder, & Segui, 1981).This observation has been replicated in Spanish.Thus, Bradley, Sa´nchez-Casas, and Garcı´a-Albea (1993) observed a robust syllabification effect for Spanish speakers processing Spanish material: a fragment as "pal" is faster and easier detected in "palmera" (a word syllabified as "pal.me.ra") than in "paloma" (a word syllabified as "pa.lo.ma").However, that study did not find the same syllable sensitivity in English speakers processing English material.
Given the functional relevance of syllables in languages with a clear and unambiguous syllabic structure, the aim of present experiment was to examine whether the syllable frequency has any effect on Spanish spoken word recognition as it was clearly found for visual word recognition (Carreiras et al., 1993), and whether this hypothetical effect is different from that of lexical frequency.Specifically, this question was evaluated by means of an auditory lexical decision task, and stimulus words were selected by manipulating the value of token positional frequency of the first syllable (high vs. low) embedded in words which had high vs. low values of lexical frequency (Table 1).Auditory lexical decision implies a fast classification of spoken verbal stimuli as words or nonwords and this task has been widely used to study word recognition processes (for a review, see Goldinger, 1996).
Only the first syllable frequency was manipulated in this study because there is strong evidence that the first syllable of a disyllabic or multisyllabic word plays a dominant role and gives more information about the word than other syllables (Alvarez, Carreiras, & De Vega, 2000;Perea & Carreiras, 1998;Taft & Forster, 1976); indeed, after the seminal Carreiras et al. (1993) paper, several works that have studied the syllable frequency effect on visual word recognition only manipulated the first syllable frequency (i.e., Alvarez, Carreiras, & Taft, 2001;Barber et al., 2004;Carreiras et al., 2006;Conrad et al., 2009).This bias towards the first syllable presumably is even more pronounced in the auditory domain, where processing of spoken words is necessarily left-to-right.At the same time, six additional independent variables were controlled (Table 1): token positional frequency of the second syllable, total number of phonemes of the word, position of lexical stress (first vs. second syllable), number of phonological neighbors (PN), number of phonological neighbors that have higher frequencies than the word (HFPN), and acoustical durations of each word stimulus measured in milliseconds.Phonological neighborhoods are defined as sets of words that differ by a single sound (phoneme); for example, "casa" (house) and "cama" (bed) are phonological neighbors.In the auditory domain, previous research has suggested some neighborhood effects in recognition of Spanish spoken words.Thus, in contrast to the inhibitory effect of phonological neighborhood typically found in English, Vitevich and Rodrı´guez (2005) obtained in an auditory lexical decision a facilitative effect associated to both the phonological neighborhood density (number of neighbors) and the neighborhood frequency.Controlling phonological neighborhood variables (PN, HFPN) in our stimuli, we wanted to disentangle any hypothetical syllable effect from possible phonological neighborhood effects.
Hypothesis 1.Given Carreiras et al.'s (1993) findings, spoken words with high (first) syllable frequency will result in longer reaction times (RTs) in a lexical decision task and words with low syllable frequency will result in shorter RTs.
Hypothesis 2. Spoken words with high lexical frequency will give shorter RTs in a lexical decision task and words with low lexical frequency will produce longer RTs.

Stimuli
The stimuli consisted of 128 Spanish disyllabic words and 128 disyllabic nonwords, all containing 4 to 5 phonemes.The words were selected by combining two factors in a 2 Â 2 repeated-measures design: Word Frequency (high vs. low) and token positional Syllable Frequency of the first phonological syllable (high vs. low): 32 words for each experimental condition (see Appendix).Words were selected by means of Buscapalabras (abbreviated as B-Pal) (Davis & Perea, 2005), software that offers a broad repertoire of psycholinguistic statistics and includes the Spanish LEXESP database (Sebastia´n-Galle´s, Martı´, Carreiras, & Cuetos, 2000).Words with more than 60 occurrences per million were considered of high frequency in the present experiment, and words with less occurrences were considered of low frequency.Frequency means of high-versus low-frequency words were significantly different (121 vs. 17; p < .000001).Table 1 shows means, standard deviations, and ranges for every manipulated or controlled variable for each condition.
Within the high-frequency words, we considered two subsets: words with high first syllable frequency (HWHS) and words with low first syllable frequency (HWLS).The first phonological syllable of HWHS words had a token positional frequency above 550 units from a default vocabulary of 31,491 Spanish words in the B-Pal database; whereas the first phonological syllable of HWLS words had a token positional frequency below 550 units (means 2241 vs. 281, respectively; p < .00001;Table 1).
Within the low-frequency words, we considered two subsets: words with high first syllable frequency (LWHS) and words with low first syllable frequency (LWLS).The first phonological syllable of LWHS words had a token positional frequency above 350 units 1 from a default vocabulary of 31,491 Spanish words in the B-Pal database; whereas the first phonological syllable of LWLS words had a token positional frequency below 350 units (means 1252 vs. 131, respectively; p < .0001;Table 1).
Additionally, in order to control some factors that could influence processing times, stimuli were matched across the syllable frequency conditions for the following independent variables (Table 1): token positional frequency of the second phonological syllable; number of phonemes; position of lexical stress (first vs. second syllable); PN; HFPN; and acoustical durations measured in milliseconds.Concretely, PN measured the phonological neighborhood size counting the number of words that can be formed by substituting a phoneme at any position within the target word and also by deleting or adding any phoneme.
The nonwords were formed replacing a phoneme from the second syllable of Spanish disyllabic words (different from the experimental words).In a similar way to Vitevich and Rodriguez (2005), consonants were changed by consonants, and vowels were changed by vowels.The stress pattern of each original word was conserved.For example, the nonword "frute" was derived from the word "fruta" (fruit), and the nonword "jame´n" was derived from "jamo´n" (ham).The phoneme was changed from the second syllable to decrease the likelihood that participants would listen just to the first part of stimuli to make the lexical decision.
Similarly to previous studies of the authors (e.g., Gonza´lez, Cervera-Crespo, & McLennan, 2010;Gonzalez & McLennan, 2007), the stimuli were recorded in a sound-attenuated room by a male speaker (JG), low-pass filtered at 22,050 Hz, and digitized at a sampling rate of 44,100 Hz using a 16-bit analog-to-digital converter.When necessary, several utterances of the same word were recorded in order to match acoustical durations across the experimental conditions.All stimuli were edited into individual sound files (.wav) and stored on a computer disk; the onset of each file coincided with the onset of signal.Audio files were equated in RMS (root mean square) amplitude.

Design
The experiment was based on a within-subject 2 Â 2 design with two independent variables: Word Frequency (high vs. low) and token positional Syllable Frequency of the first phonological syllable (high vs. low).As mentioned in the previous section, six additional independent variables were controlled.The dependent variables were RTs (measured in milliseconds) and error rates for each experimental condition.

Procedure
The experiment was controlled by the program E-Prime 2.0 Professional on a PC.As in Gonzalez and McLennan (2007), the stimuli were administered binaurally over calibrated headphones AKG-K55 at 65 to 70 dB.Participants carried out a lexical decision task in which they had to decide as quickly and accurately as possible whether each stimulus they heard was a real Spanish word or a nonword.They indicated their decision by pressing one of two keys on the computer keyboard ("P" for word and "Q" for nonword).Each trial proceeded as follows: A red square was illuminated on the computer screen to indicate the beginning of the trial, and 500 ms later the participant was presented with a speech stimulus over the headphones to make a lexical decision.RTs were measured from the onset of the stimulus to the key press response.Following the response, the next trial was initiated 2 s later.If the maximum trial time (5 s) expired without any response, the computer automatically presented the next trial.Each participant received a different random ordering of the 256 stimuli and previously received ten practice trials.The session took approximately 15 to 20 minutes.

Analysis
Data were organized in a long format (one observation per row) and submitted to linear mixed models (LMM).Actually this type of model combines F 1 and F 2 analysis of variance treating both participants and items as random variables (e.g., see Baayen, Davidson, & Bates, 2008;Westfall, Kenny, & Judd, 2014).RTs were submitted to a mixed model following Brysbaert's (2007) suggestions for SPSS program.In the case of accuracy data, the appropriate analysis technique was a binary logistic regression because the dependent variable is dichotomous (success vs. error in each observation).

Results
As in Gonza´lez and McLennan (2007), any participant whose overall mean of RTs fell 2.5 standard deviations beyond the grand mean was excluded from the calculations, resulting in the elimination of one participant.Table 2 shows means and standard deviations of RTs for correct responses (92.7%) and percentages of error through subjects for each experimental condition.RTs and errors were separately analyzed through mixed models including fixed and random components.Both fixed effects (Word Frequency and Syllable Frequency) resulted significant in a linear mixed model (LMM) using RT as dependent variable and participants and items as random variables.A significant fixed effect of Word Frequency was obtained; as expected, RTs were shorter for high-frequency words than for low-frequency words, F(1, 5262.79)¼ 4.71, p ¼ .03.A significant fixed effect of (first) Syllable Frequency was also obtained.This time the effect was in the opposite direction: RTs were significantly slower for words with high-frequency first syllable than for words with low-frequency first syllable, F(1, 5274.72)¼ 12.03, p ¼ .001.Finally, the Word Frequency Â Syllable Frequency interaction was not significant, F(1, 5262.99)¼ 1.37, p ¼ .24.
Error rates were submitted to a binary logistic regression, which is the applicable statistical technique to analyze relationships between a dichotomous dependent variable and metric or dichotomous independent variables.An omnibus test of model coefficients was significant ( 2 ¼ 34.55, p < .0001)and both fixed effects (Word Frequency and Syllable Frequency) resulted significant when the interaction between effects was not included in the analysis.A significant fixed effect of Word Frequency was obtained; as expected, error rates were smaller for highfrequency words than for low-frequency words (Wald Z ¼ 12.91, p < .0001).A significant fixed effect of (first) Syllable Frequency was also obtained.This time the effect was in the opposite direction: error rates were significantly larger for words with high-frequency first syllable than for words with low-frequency first syllable (Wald Z ¼ 21.04, p < .0001).When the interaction between fixed effects was included within the binary regression, the Word Frequency effect was significant (Wald Z ¼ 14.18, p < .0001), the Word Frequency Â Syllable Frequency interaction was also significant (Wald Z ¼ 8.39, p ¼ .004),but the Syllable Frequency effect did not reach significance (Wald Z ¼ 1.59, p ¼ .21)because most of its variance was accounted for by the interaction.
In visual word processing, Perea and Carreiras (1998) performed a post hoc regression analysis on the RTs of a lexical decision task and found that the number of higher frequency syllabic neighbors was the main contribution to the inhibitory syllable frequency effect rather than the number of syllabic neighbors per se.Similarly, a post hoc analysis was conducted with item RT data of the present experiment regarding the number of token and type syllabic neighbors and the number of higher frequency syllabic neighbors of every item, but no Pearson correlation reached statistical significance. 2

Discussion
Hypothesis 1 was supported.Similar to the effect found by Carreiras et al. (1993) in visual word recognition, the present study found an inhibitory effect during spoken word processing due to the frequency of the first syllable and different from a facilitatory effect owing to the lexical frequency.As expected from previous research, spoken high-frequency words were recognized faster than lowfrequency words presumably because their lexical representations are more accessible in the mental lexicon (Hypothesis 2).In contrast, the effect of the (first) syllable frequency was opposite to that observed for lexical frequency.As Carreiras et al. (1993) stated, "Intuitively, frequency should help because the more times an event occurs the more accessible it should be for comprehension and production.So it should follow that as frequency increases, speed in processing should also increase" (p.770).However, Carreiras et al.'s results were in the direction opposite those of the present experiment: words including a high-frequency first syllable were identified more slowly, with more errors, than words including a low-frequency first syllable.The current results are important because inhibitory effects in visual word processing do not mean the same effect should be observed in spoken word processing.For example, in English the effect of phonological neighborhood is inhibitory in auditory lexical decision tasks (Ziegler, Muneaux, & Grainger, 2003) but facilitatory in visual lexical decision tasks (Yates, Locker, & Simpson, 2004).
According to the current data, it should be noted that syllable frequency apparently has a less pronounced effect on spoken word recognition than on visual word recognition (in absolute values and effect sizes). 3 The syllable effect is interpreted as a result of competing activation between syllabic neighbors; that it is, a word with a high-frequency first syllable obviously has a large number of syllabic neighbors, since many other words also begin with that syllable (e.g., "casa," "cama," "calor," "cafe´," "capa," etc., all Spanish words), whereas that a word with a low-frequency first syllable has fewer syllabic neighbors.Nonetheless, further research on visual word recognition has found that the key factor responsible for the inhibitory effect actually is the number of higher frequency syllabic neighbors of the target word (Alvarez et al., 2001;Perea & Carreiras,1998).In a post hoc analysis across items, the data did not yield a significant correlation between RTs and the number of syllabic neighbors, or the number of higher frequency syllabic neighbors as found in visual word processing.Likely, this lack of correlation is influenced by the fact that the stimuli were matched for neighborhood density, that is, the PN, and for the HFPN (Table 1).Previous research suggested some neighborhood-density effects in recognition of Spanish spoken words (Vitevich & Rodrı´guez, 2005).Controlling phonological neighborhood variables (PN, HFPN) in the stimuli, we sought to separate any hypothetical syllable effect from possible phonological neighborhood effects, but at the same time we reduced the variability of syllabic neighborhoods -note anyway that a syllabic neighbor is not the same as a phonological neighbor; for example, "gato" (cat) and "pato" (duck) are phonological but not syllabic neighbors.
The results suggest that syllable frequency influences spoken word processing in Spanish-and likely in other languages with a transparent syllabic structure-beyond the effect of phonological neighborhood variables (controlled in the experiment).Nevertheless, further research will be necessary in spoken word processing to disentangle genuine syllabic effects from the mere frequency of co-occurrence of phonemes within a syllable-that is, phonemes within syllables tend to co-occur in speech more often than phonemes between syllables.Also, further research should be conducted in the future to test the syllable frequency effect on spoken word processing using tasks other than auditory lexical decisions.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes
1. Within the initial pool of low-frequency disyllabic words (4-5 phonemes), a cut point of 350 units of token positional frequency of the first phonological syllable divided more centrally the pool than a cut point of 550 units.2. For each word stimulus, values of the number of token and type syllabic neighbors and the number of higher frequency syllabic neighbors were extracted from SYLLABARIUM (Dun˜abeitia, Cholin, Corral, Perea & Carreiras, 2010), an online database of Spanish and Basque syllables created for psycholinguistic experiments.3.In the visual domain, lexical-decision RTs for high versus low syllable frequencies were 790 versus 734 ms for high-frequency words, and 825 versus 783 ms for low-frequency words (first experiment of Carreiras et al., 1993; the authors did not provide SD values).In the auditory domain, our RTs for high versus low syllable frequencies have been 911 versus 895 ms (Cohen's d ¼ 0.24) for high-frequency words, and 930 versus 907 ms (Cohen's d ¼ 0.36) for low-frequency words.

Table 1 .
Characteristics of words used in the experiment.Forty-five undergraduate students (33 females, 12 males) from the University Jaume I participated in the experiment, ranging in age from 18 to 29 years (M ¼ 20.4,SD ¼ 2.8).All were native Spanish-speakers and received credit course for their participation.None of them reported a history of speech or hearing disorders.The research conformed to the American Psychological Association's Ethical Principles of Psychologist and Code of Conduct.
Means (M), standard deviations (SD), and ranges for the following variables: Word frequency (WF), token positional frequency of the first syllable (SF, SF 1 st syl), token positional frequency of the second syllable (SF 2nd syl); number of phonemes; position of lexical stress (first vs. second syllable); number of phonological neighbors (PN); number of phonological neighbors that have higher frequencies than the word (HFPN); and acoustical durations measured in milliseconds.HWHS: high first syllable frequency.Method Participants

Table 2 .
Means (M) and standard deviations (SD) of reaction times (ms) and percentages of error as a function of word frequency and syllable frequency (SF).