Perception of height and categorization of Brazilian Portuguese front vowels*

Cross-linguistic typological observations and theoretical models in phonology suggest that certain speech sound distinctions are more complex then others. One such example is the opposition between mid-high and mid-low vowels, usually thought to be more complex than the opposition between high and mid vowels. The present study provides experimental evidence on speech sound perception which supports this notion. Native Brazilian Portuguese speakers performed vowel classifi cation tasks involving either the distinction between the front high mid /e/ and the front high /i/, or the distinction between the front high mid /e/ and the front low mid /ε/ vowel. Measures of response time and discriminability (d’) at the vowel category boundaries were obtained. Participants showed signifi cantly slower responses and lower d’ values in the “e-ε” as compared to the “i-e” classifi cation task. Results indicate that perceptually distinguishing /e/ from /ı/ requires more processing time and resources, and involves more complex information than distinguishing /e/ from /i/. Key-words: speech perception; categorization; vowel height; sound oppositions. *. Acknowledgments The fi rst author is supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) Brazil. The second author is supported with a research grant by CNPq Brazil. Daniel Márcio Rodrigues Silva, Rui Rothe-Neves


Introduction
A foundational observation in phonology is that the sound systems of languages are systems of oppositions. Each vowel and consonant in any given language is defi ned by its opposition to all others in terms of certain features that are taken as relevant in that particular sound system. In German, for example, there is an opposition between /h/ and /ʁ/, as shown by minimal pairs such as hose 'pants' and rose 'rose'. No such opposition occurs in Portuguese, as the sounds [h] and [ʁ] are functionally equivalent -using one or the other cannot result in different words/morphemes being perceived by a native listener. Portuguese speakers identify both ['kahU] and ['kaʁU] as exemplars of the word carro 'car'. This example illustrates the fact that speech sounds form functional equivalence classes: sound categories whose elements are treated as functionally equivalent, i.e., having the same value in the language system. It seems not to be the case that sound systems of languages are formed by collections of oppositions among sound categories drawn from a set of equally likely or equally complex possibilities.

2016
Some sounds (and sound oppositions) are very common; others are rare (Maddieson, 1984). Furthermore, sound inventories are seen as comprising basic elements that are found very frequently across and within languages, and less basic elements through which they can be extended (Maddieson, 1999).
As for vowels, the most common across human languages are /a/, /i/, and /u/ (according to the UPSID database; Maddieson, 1984). Oppositions between high-mid and low-mid vowels are less common than oppositions between mid and high vowels. The most widespread pattern across languages is represented by fi ve-vowel systems comprising a high front, a high back, a mid front, a mid back, and a low vowel (Diehl, 2008;Schwartz, Boë, Vallée, & Abry, 1997a, 1997b. This is the case, for example, of the Spanish vowel system (/i, u, e, o, a/), in which no distinction is made between high-mid and low-mid vowels. The inclusion of this distinction (both for front and mid vowels) results in a seven-vowel system, also relatively common but considerably less widespread than the former. This is the case of the Portuguese vowel system, in which the opposition between high-mid /e, o/ and low-mid vowels /ε, ç/ -as attested by minimal pairs such as s[e]de 'thirst' versus s [ε] de 'headquarters'; c[o]rte 'court' versus c[ç]rte 'cut' -is functional only in stressed syllables (Bisol, 2003;Mateus & d' Andrade, 2000;Wetzels, 2011).
In Brazilian Portuguese, the neutralization of the high-mid/low-mid opposition reduces the seven vowels to fi ve in pre-stressed syllables, and the neutralization of the mid/high opposition in word-fi nal poststressed syllables results in a further reduction to three vowels which are realized as [Ι], [6 U], and [6 ] in most Brazilian Portuguese dialects. Here one notes a progression in which the oppositions /e/-/ε/ and /o/-/ç/ (that occur less frequently across languages) are the fi rst ones to be neutralized, followed by the oppositions /e/-/i/ and /o/-/u/ (Bisol, 2003;Mateus & d' Andrade, 2000;Wetzels, 2011). Interestingly, some free variation between high-mid and low-mid vowels is observed even in stressed syllables, as in the cases of p [o]ça ~ p[ç]ça 'puddle' and [e]xtra ~ [ε]xtra 'extra'. Therefore, cross-and intra-linguistic fi ndings suggest that certain oppositions are less stable and more complex than others, such as, for example, /e/-/ε/ as compared to /i/-/e/ (for related examples in Italian and Catalan, see Krämer, 2009;Wheeler, 2005).

2016
The above observations raise issues about whether and how such differences are refl ected in the processing and storage of information related to sound categories and the distinctions that defi ne them. To the extent that phonology is about mental entities and operations, it can provide some (rather indirect) clues. Some phonological analyses of Portuguese are consistent with the idea that certain oppositions involve more complex representations than others. Based on Contrastive Hierarchy Theory (Dresher, 2009), Lee (2010) suggests that Brazilian Portuguese vowels are specifi ed according to a hierarchy of distinctive features (see Figure 1) in which the feature [low] separates /a/ from the remaining six vowels and occupies the highest position, followed by [back], [high], and [ATR]. The latter three features distinguish, respectively, back from front, high from mid, and high-mid from low-mid vowels. Thus, the oppositions /e/-/ε/ and /o/-/ç/ are the most complex since they require that the whole hierarchy be traversed. Another example is found in the analysis presented by Nevins (2012), based on Element Theory, according to which vowels can be described as combinations of three elements: |A|, |I| and |U| (Harris, 32.2 1994;Harris & Lindsey, 2000). These elements correspond to the "corner vowels" /a/, /i/ and /u/, considered in the theory as atomic primes. The remaining vowels are derived from combinations among elements. Nevins (2012) describes the mid vowels of Brazilian Portuguese as resulting from the combinations |A|+|I| (for /e/ and /ε/) and |A|+|U| (for /o/ and /ç/). The distinction between high-mid /e, o/ and low-mid /ε, ç/ is made possible by allowing vowels to differ in the relative prominence of each element in the representation. Accordingly, the vowels /e, ε, ç, o/ can be represented as |IA|, |IA|, |UA| and |UA| -where the underline indicates the dominant (head) element 1 . Again, the most complex oppositions are those between high-mid and lowmid vowels.
Data on vowel acquisition in Brazilian Portuguese as a native language (Bonilha, 2004) provide further evidence that, from the point of view of speech production, the distinctions between high-mid and low-mid vowels are the last to be acquired -something also predicted in the model of Fikkert (2005) for European Portuguese. In fact, the data provided by Bonilha (2004) are consistent with the feature hierarchy proposed by Lee (2010).
Concerning speech perception, Bosch and Sebastián-Gallés (2003) provide interesting results from Spanish, with its fi ve-vowel system, and Catalan. The number of vowels varies across Catalan dialects, but they generally distinguish /e/ from /ε/ and /o/ from /ç/ -a seven-vowel inventory is the most widespread (Recasens & Espinosa, 2009;Wheeler, 2005). Four-and eight-month-old infants from Spanish monolingual, Catalan monolingual, and bilingual environments were tested for their ability to discriminate between /e/ and /ε/ (a distinction that is absent in Spanish). While four-month-olds from the three linguistic environments performed equally well, Spanish monolingual and bilingual eight-month-olds did not succeed. These results indicate that general perceptual abilities that are present early in development include the ability to discriminate /e/ from /ε/, which declines between four and eight months of age unless the linguistic environment establishes a clear separation between the two categories. Results of an additional 1. In the analysis proposed by Nevins (2012), Brazilian Portuguese dialects differ in whether low-mid or high-mid vowels are represented as headed. Further details are beyond the scope of this paper. experiment indicate that infants raised in the bilingual environment regain the sensitivity to the /e/-/ε/ distinction between eight and twelve months of age (Bosch &Sebastián-Gallés, 2003).
The development of language-specifi c perception of speech sounds involves improvement in sensitivity to distinctions that are used in the language of exposure as well as decline in sensitivity to distinctions that are not (Kuhl, 2004;Kuhl et al., 2008;Werker & Tees, 2005). It has been proposed that, thanks to general auditory processing mechanisms, young infants are sensitive to all distinctions in the world's languages (Kuhl, 2004;Kuhl et al., 2008; but see Mazuka, Hasegawa, & Tsuji, 2014). One example is the ability to distinguish /ε/ from /e/, the maintenance of which depends on linguistic experience -as shown by Bosch and Sebastián-Gallés (2003). However, while it may be true that infants are initially able to distinguish all sounds of human speech, evidence indicates that certain sounds are special, such as the corner vowels /a/, /u/ and /i/, which are proposed to function as salient and stable reference points in the vowel space during the acquisition of vowel systems (Polka & Bohn, 2003. The present study aims to provide experimental evidence on whether theoretically expected differences in the degree of complexity among vowel oppositions in Brazilian Portuguese are refl ected in the way perceptual information related to vowel categories is processed and stored in the speaker's memory. Speakers performed two classifi cation tasks involving either the distinction between /e/ and /i/ or the distinction between /e/ and /ε/, which is thought to be more complex, and hence less stable. If the latter involves more information and/or requires more complex operations than the former, we should expect slower and less precise responses in the task that requires distinguishing the categories /e/ and /ε/. Therefore, the two tasks were compared in order to test the following two hypotheses: 1) a sharper distinction between categories will be observed in the "i-e" classifi cation task; 2) response times to vowel sounds near the category boundary will be longer in the "e-ε" task when compared to the "i-e" task.

Stimuli
Vowel sounds were generated by means of a version of the KLSYN88 synthesizer (Klatt & Klatt, 1990) implemented by the software Praat (Boersma & Weenink, 2012;and presented in Weenink, 2009). Twenty-eight vowel sounds were synthesized in such a way as to vary in equal steps along a continuum from /i/ to /e/ to /ε/ -based on data from adult male speakers of Brazilian Portuguese (Escudero, Boersma, Rauber, & Bion, 2009;Rauber, 2008). This vowel continuum corresponds to a straight-line segment in the space defi ned by three orthogonal axes representing frequency values of the three lower formants (F 1 , F 2 , and F 3 ; see Figure 2) expressed in the psychoacoustic Bark scale (according to Traunmüller, 1990).
In order to simulate voicing, a glottal fl ow waveform was generated with duration of 150 ms, with fundamental frequency (F 0 ) linearly varying from 120 to 90 Hz. These values determining the duration and pitch contour of the stimuli were selected such that the synthesized isolated vowels sounded similar to natural vowels produced by a male speaker (for duration and F 0 measurements from Brazilian Portuguese speakers, see, e.g., Rauber, 2008). A cascade of eight formant fi lters then modeled vocal tract fi ltering. The three lower formant frequencies varied along the vowel continuum in fi xed steps of 0.13 (F 1 ), 0.06 (F 2 ) and 0.03 (F 3 ) Bark, from 2.01 to 5.39 (F 1 ), 14.20 to 12.52 (F 2 ), and 15.69 to 14.76 Bark (F 3 ) -or, in Hertz, from 205 to 555, 2390 to 1860, and 3000 to 2600 Hz. The remaining fi ve formant frequencies were the same for all sounds (3900, 4900, 5900, 6900 and 7900 Hz). Bandwidths of F 1 , F 2 , and remaining formants were set at 60, 90 and 150 Hz, respectively. Fade-in and fade-out were applied to the fi rst and last 20 ms of each sound. Root mean square (rms) intensity was then scaled to 70 dB (SPL).

Experimental design and procedure
Participants performed two classifi cation tasks, each composed of 432 two-alternative forced choice trials in which participants were asked to listen to a sound and indicate, by pressing a button, which of two vowel categories the sound best fi ts in. Response alternatives correspond to the vowel categories /i/ and /e/ in one of the tasks, and to the categories /e/ and /ε/ in the other. A different subset of sounds was used for each task. The 18 sounds nearest to the /i/ end of the continuum were used as stimuli in the "i-e" classifi cation task, while for the "e-ε" task, the 18 sounds nearest to the /ε/ end were used. Each task comprised 24 repetitions of each sound presented in pseudorandom order (no consecutive trials with the same sound). A 1 s interval was inserted between the response and the presentation of the next sound. Participants were allowed to pause after every 144 trials.
The order of the two tasks was counterbalanced across participants and each task was preceded by 18 practice trials (one for each sound in the task) presented in random order. Participants were instructed to keep their index fi ngers on the two response buttons and respond immediately after hearing a sound. Classifi cation responses and response times were recorded. Response times below 180 ms and above 1500 ms were excluded from analysis.

Analysis
Data from two participants (a male and a female) were excluded from the analysis because they did not perform the tasks according to the instructions. In a fi rst step of the analysis, a logistic regression curve was fi t to the classifi cation data from each participant and task ("i-e" and "e-ε"). From this curve it is possible to estimate the boundary between the two response categories -defi ned as the point in the vowel continuum 2 that corresponds to aresponse probability p = 0.5 (see Figure  3). This boundary point can be found by substituting p =0.5 into the logistic regression equation where p is the probability of one of the response alternatives, x is the independent variable corresponding to the vowel continuum, and β 0 and β 1 are the intercept and slope coeffi cients, respectively. Therefore, the value ofx corresponding to the category boundary is given by -β 0 /β 1 . Estimations of the boundaries between /i/ and /e/ and between /e/ and /ε/ are thus assigned to each participant.
The focus of the present work is on the sharpness ofdistinctions between categories, rather than on absolute values such as formant frequencies at the category boundaries. Particularly, we intend to 2. Note that the vowel continuum can be understood as a continuous independent variable that results from a linear combination of F 1 , F 2 , and F 3 .

32.2
compare the /i/-/e/ and /e/-/ε/ distinctions in terms of how sharply (and how rapidly) the vowel categories are separated. Thus, in order for results from the "i-e" and "e-ε" tasks to be comparable,it is important that the measures to be compared be made with reference to the category boundaries.
For testing the hypothesis that the /i/-/e/ distinction is sharper than the /e/-/ε/ distinction, d' (Green & Swets, 1974;Macmillan & Creelman, 2005)was used as a measure of the discriminability between the sound at the second level above and the sound at the second level below the category boundary -s a and s b , respectively (see Figure 3 for an example). Thus, for the two classifi cation tasks, where p(r j |s k ) is the proportion of responses r j to the sound s k , and z(p) is the inverse of the cumulative normal distribution (proportions of 0.00 and 1.00 were converted to 0.01 and 0.99, respectively). Note that the index j represents the response "e" in the "i-e" task and the response "ε" in the "e-ε" task. Thus, as used here, d' expresses a "classifi cation distance" between two sounds (s a and s b ), and results from the subtraction between the z-transformed probabilities of assigning them to the same response category. It is used in categorical perception studies as an estimate of discriminability assuming that two stimuli are discriminable to the extent that they are assigned to different categories (Gerrits & Schouten, 2004;Macmillan, Kaplan, & Creelman, 1977;Schouten, Gerrits, & van Hessen, 2003;Silva & Rothe-Neves, 2009).The d' values from the "i-e" and "e-ε" tasks were compared by means of the Wilcoxon signed-rank test.
In order to test the hypothesis that response times are longer for the /e/-/ε/distinction as compared to the /i/-/e/ distinction, the mean of the response times to the four sounds around the category boundary (from s a to s b ) was used as the dependent measure. The Wilcoxon signed-rank test was used for the comparison between the two tasks. Figure 3 -Fitted logistic regression curves for tasks "i-e" (top) and "e-ε" (bottom). The curve for one of the participants is depicted in black as an example; the dashed vertical line indicates the category boundary for that participant, and the horizontal thick line represents the range of the continuum from which response time and d' measures were obtained (two steps below and two steps above the boundary). Fitted curves for the remaining participants are also shown (thin gray lines).

Results
The fi tted logistic regression curves for all participants are shown in Figure 3. For the sake of illustration, the category boundaries and 32.2 2016 the s a -s b ranges (both defi nedon a participant-by-participant basis) for a representative participant are also shown 3 . The mean value of the category boundary estimate fell between positions 8 and 9 for the /i/-/e/ distinction, and between positions 19 and 20 for the /e/-/ε/ distinction. The boundary estimates and the corresponding formant values -obtained by linear interpolation in the F 1 × F 2 × F 3 Bark space -are shown in Table 1.The boundary values varied across participants between 5.68 and 11.00 for the /i/-/e/ distinction (mean 8.28; standard deviation 1.13) and between 17.24 and 21.63 for the /e/-/ε/ (mean 19.23; standard deviation 0.99). Participants also varied appreciably regarding the slopes of the regression curves. Odds ratios 4 from 1.48 to 10.31 were observed in the "i-e" task (80% of the cases, between 1.57 and 4.19; median 2.69) and from 1.26 to 7.83 in the "e-ε" task (80% between 1.61 and 3.48; median 2.62). The comparisons between /i/-/e/ and /e/-/ε/ distinctions regarding both d' and response time measures are summarized in Table 2 and Figure4. Signifi cantly larger d' values (Wilcoxon signed rank test, V = 536.5; p < .05) were observed in the "i-e" task (Mdn = 2.88) than in the "e-ε" task (Mdn = 2.33). Regarding response-time measures, a 3. Since logistic regression curves were fi tted individually for each participant, presenting all such curves and selecting a representative one in order to illustrate the analysis procedure is more appropriate for the current purposes than plotting a regression curve fi tted to data averaged across participants. 4. Here, the odds ratio e β 1 represents the ratio by which the odds of selecting a particular response alternative change for a unit change in position along the vowel continuum.

Discussion
Signifi cantly larger d' values and faster responses were observed for the distinction between /i/ and /e/ compared to that between /e/ and /ε/. These results indicate that, at least among front vowels in Brazilian Portuguese, the distinction between mid-high and mid-low vowels is more complex and involves higher perceptual processing costs than the more basic distinction between mid and high vowels. Some considerable variability across participants was observed regarding both the boundary positions along the continuum and how sharply the two response categories contrasted (as assessed by slopes, odds ratios, and d'). In the latter case, this may refl ect performance differences likely associated with limitations of the perceptual system, attention, and motivation. The variability in boundary values refl ects individual differences in vowel recognition and/orin decision criteria used by participants to select a response alternative. Importantly, although the participants were Brazilian Portuguese native speakers from a region of the state of Minas Gerais, infl uences of dialectal factors cannot be ruled out. In any case, regardless of such across-participant variability, nonparametric statistical tests for within participant comparisons revealed signifi cant effects supporting the tested hypotheses on both d' and response time.
As a sensitivity index, d' is more widely used in the analysis of detection and discrimination judgments, but it can also be estimated from the results of two-alternative forced choice classifi cation tasks such as the one employed in the present study (Gerrits & Schouten, 2004;Macmillan et al., 1977;Schouten et al., 2003;Silva & Rothe-Neves, 2009). In such cases, d' provides a measure of "classifi cation distance" that can be understood as refl ecting the discriminability between two stimuli in the absence of within-category information, i.e., when the only available information is about the distinction between two categories -analogous to the well known "predicted discrimination" scores used in categorical perception studies to predict discrimination performance from classifi cation responses (Liberman, Harris, Hoffman, & Griffi th, 1957). Here, d' was calculated in such a way as to index classifi cation distance in the region of the boundary between categories (the segment of the vowel continuum between sounds s a and s b ). As it was larger for the "i-e" task, d' data indicate that the categorization processes involved result in categories that are comparatively less distinct from each other in the case of the opposition between /e/ and /ε/. The latter is also associated with longer response times, suggesting that the /e/-/ε/ distinction involves processing of more complex information.
These results converge with cross-linguistic typological observations (Diehl, 2008;Schwartz et al., 1997aSchwartz et al., , 1997b, with some proposals regarding underlying representations of vowels in Brazilian Portuguese (Lee, 2010;Nevins, 2012), and descriptions of sevenvowel systems such as those of Portuguese (Bisol, 2003;Mateus & d' Andrade, 2000;Wetzels, 2011), Catalan (Wheeler, 2005), and Italian (Krämer, 2009), which show that mid-high:mid-low distinctions are particularly prone to neutralization and variation. Previous results by Silva and Rothe-Neves (2009) on perceptual categorization of back vowels /u o ç/ in Brazilian Portuguese are consistent with the present fi ndings in suggesting sharper perceptual distinctions between mid and high than between mid-low and mid-high vowels. Furthermore, they showed that responses in a two-interval forced choice discrimination task were more closely related to classifi cation responses when the stimulus set was composed of sounds from a /u/-/o/ continuum than from a /o/-/ç/ continuum. There is therefore some evidence supporting the generalization of the present fi ndings to other vowels. However, response times were not measured in that study (Silva and Rothe-Neves, 2009). On the other hand, the present study did not include discrimination tasks. Hence, response times associated with back vowel distinctions and the relation between classifi cation and discrimination along the /i/-/e/-/ε/ continuum should be examined in future studies. Importantly, comparisons between the cases of front and back vowels may shed light on typological observations suggesting that languages tend to prefer front vowel distinctions to back vowel distinctions (Maddieson, 1984;Joanisse & Seidenberg, 1997). Also of interest would be to explore the perception of low:mid and mid-high:midlowvowel oppositions in different phonological contexts (such as stressed versus pre-stressed syllables), in different languages, and using other measures than the ones provided here -particularly those that can be obtained without any task requirements and in the absence of attention to the stimuli, such as the mismatch negativity brain response (Näätänen, 2001;Silva & Rothe-Neves, 2014).

Conclusion
The present results provide experimental evidence that the distinction between the vowel categories /e/ and /ε/ is less sharp and requires more processing effort than that between /i/ and /e/.This is in agreement with the idea that phonological oppositions among vowels differ in degree of complexity, which would be refl ected in typological patterns, phonological representations, speech perception and acquisition. While previous experimental results suggest that the present fi ndings can be extended to the corresponding back vowels /u o ç/, further investigations are required to verify whether they would be replicated with back vowels and other speech sound classes.