The contrast between alveolar and velar stops with typical speech data : acoustic and articulatory analyses

Accepted: January 15, 2017 Study carried out at Programa de Pós-graduação em Distúrbios da Comunicação Humana, Universidade Federal de Santa Maria – UFSM Santa Maria (RS), Brazil. 1 Universidade Federal de Santa Maria – UFSM Santa Maria (RS), Brazil. 2 Universidade Estadual Paulista – UNESP Marília (SP), Brazil. Financial support: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) (Edital CAPES 025/2011). The first author received a doctoral scholarship from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and the Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS). Conflict of interests: nothing to declare. ABSTRACT


INTRODUCTION
The Brazilian Portuguese (BP) phonic system is comprised of stop segments and these consonants can be categorized by the contrast in voicing (voiced or voiceless, according to the closing or opening of the glottal gesture), degree of constriction (degree of closing, characteristic of the articulatory gestures of stop sounds in BP) and also the location of constriction of articulatory gestures (labial/p/ and /b/, alveolar /t/ and /d/ and velar /k/ and /g/ (1) .
This class of sounds is acquired early in childhood development.By the age of three all stop consonants have already been acquired in the phonic system (2) .
A study of stops and fricatives in (3) BP speaking children with typical speech development (TSD) showed the existence of a period of "articulatory refinement", i.e. even after the "end" of phonological acquisition, there seems to be a period of improvement of speech production motor skills, during which time the articulatory gestures become gradually more stable both in temporal organization and magnitude.
The notion of gradient states during the process of acquisition is related to the Gesture Phonology theoretical perspective (4,5) .This theory considers speech events as dynamic tasks and advocates the adoption of instrumental analyses to research the articulatory gestures involved in the production of phonic contrasts.
Because ultrasound tongue images during production of /t/, /k/, /g/ and /d/ has been a relatively little used technique in research, unlike acoustic analysis, there is no consensus in the literature about the methodology of data collection or the type of articulatory measures to be adopted.To date, stop consonants have been described in relation to constrictions of tongue curves during production of different stops (16,19,21,22) and complex articulatory parameters (13,18,23) .
Tongue ultrasound studies in BP are even more scarce.To date, this analytical instrument has been used to research liquid (24,25) , stop (19,22) and fricative (26) consonants.
Thus, the present study focuses on the acoustic and articulatory characterization of the typical contrast between alveolar and velar stops in BP.Two research hypotheses have been formulated: (i) A comparison of acoustic and articulatory speech data of stops in adult and children that presents TSD will reveal particularities in the production of alveolar and velar constrictions.
(ii) The comparison of productions of adults and children with TSD will show differences when using both instruments of speech analysis (acoustic and articulatory analyses).
This article aims to perform acoustic and articulatory characterizations of the contrast between alveolar and velar stops in typical speech data, as well as to compare the parameters (acoustic and articulatory) in adults and children with TSD.

METHODS
This transversal and quantitative/descriptive study is part of a research project approved by the Ethics Committee of the Universidade Federal de Santa Maria (protocol number 14973013.8.0000.5346).All individuals included in this research, or their legal guardians in the case of children, signed an Informed Consent Form.The children's agreement to participate, or not, was also respected.
The sample was composed of: (i) Twenty adults, 10 female and 10 male, between the age of 19 and 38 (M=24; SD=5.9 years in the acoustic analysis; and M=24; SD 6.1 years in the articulatory analysis).Three female individuals in the acoustic analysis were excluded because of the poor quality of the images at the highest tongue elevation during the production of velar stops.Three new female individuals were included to replace them.
(ii) Fifteen children with TSD, six females and nine males, between the age of four years and seven months and seven years and five months (M=5.7 years; SD= 10.9 months).
Sample selection was based on an initial interview and speech-therapy triage (orofacial myofunctional, speech, voice and auditory evaluations).
Inclusion criteria included: (i) not presenting omissions and/or substitutions of identified segments using auditory perceptual analysis; (ii) ages between 19 and 44 years for adults and between four and eight years for children with TSD.According to the Descritores em Ciências da Saúde (DECs), individuals in the adult group are within the age range considered to be adult.For the children with TSD, the minimum age of four was selected because stops are often acquired at this age.As this study is part of a larger study which also analyzes data of children with speech deviation, the maximum age of children with TSD was defined based on the criteria used for the group with speech alterations.This is because, based on the literature, after nine years of age, speech deviation has normally been overcome and substitutions of sounds are considered residual speech errors.In addition, we also aimed to avoid influences related to the maturation of neuromotor structures which occur as children get older; (iii) not having been in or not being in speech-therapy; and (iv) being a monolingual speaker of BP, based on the initial interview (residential locations; period and duration of contact with a second language).Exclusion criteria included: (i) presence of vocal, auditory and/or language alterations; (ii) apparent damage of neurological, cognitive, psychological and/or emotional aspects; and (iii) myofunctional orofacial alterations that could interfere in correct production of speech sounds.
For the data recording procedure, the following equipment was used: unidirectional microphone (Shure -SM48); pedestal; endocavitary transductor (65C10EA -5 MHz) coupled to a portable ultrasound (Mindray -DP6600); computer; speaker; acoustic booth; probe-stabilization (or transductor-stabilization) headset; SyncBrightUp unit for audio and video synchronization Articulate Assistant Advanced Software -AAA (the three latest programs of Articulate Instruments Ltd).
The corpus of both analyses consisted of four BP words, all two syllable, with stress on the first syllable, with stop consonants at initial onset and in the vowel context of /a/ /'kapə/, /'tapə/, /'galo/ and /'daɾə/, based on selection criteria used in the Instrumento de Avaliação de Fala para Análise Acústica (IAFAC) (27) .
The target words were represented by figures and presented on a computer screen for the individuals to name them.Individuals were instructed to include the target word in a carrier phrase "Fala ____ de novo" ("Say ____ again"), repeated six times, using a normal vocal pattern (intensity, frequency and speed).
The individuals remained seating during recording, with an erect posture inside an acoustic booth.The transductor of the ultrasound was positioned below the jaw and fixed to the head stabilizer.Gel for the contact between skin and the transductor was used to aid in image capture.For children, recordings were supervised by the first author of this article, who also remained in the booth.Collection time varied between 15 and 20 minutes and was performed in a single session.
Three repetitions of the target word were used for the acoustic analysis (4 words x 3 repetitions x 35 individuals = 420 stop segments analyzed) and five repetitions were used for the articulatory analysis (4 words x 5 repetitions x 35 individuals = 700 stop segments analyzed).Some productions were excluded from the acoustic analysis due to incorrect naming of the target word or carrier phrase, a long pause between words in the sentence, outside noise and/or acoustic register not differentiated from the burst.
Due to the exclusion of these segments and the statistical design, it was necessary to select the same number of repetitions of stops for the acoustic parameters and both groups.Thus, three repetitions of each consonant were used for the acoustic experiment.Five repetitions of each consonant were used in the ultrasound image analysis.Images with poor quality at the greatest point of tongue constriction were excluded as were those in which the target word or carrier phrase were named incorrectly.The first repetitions of each individual were prioritized for inclusion in the analyses.In the case of excluded segments, the next repetition was included until the total number of repetitions was obtained.
Audio and image capture were carried out using AAA software.Images were analyzed using AAA software and audio signals were analyzed using Praat Software.
In the acoustic analysis, target sounds were analyzed using the following parameters: voice onset time (VOT); spectral peak at burst spectral moments (centroid, variance, asymmetry and kurtosis); consonant-vowel transition (CV) and measurements of relative duration of the burst in relation to the total duration of the segment.These parameters were measured manually, following procedures described in other studies (3,(9)(10)(11) .
In the articulatory analysis, instances corresponding to the production of each segment analyzed ([t], [k], [d] and [g]) were selected: based on the spectrogram obtained from the program, data from the last regular cycle of the second vowel of the word "Fala" (Say) through the beginning of the vowel following the target stop.A spline was then drawn over the surface of the tongue (sagittal cut) at the instant corresponding to the highest tongue elevation (28) , during stop production.It is important to note that during audio and video synchronization using SyncBrightUp and frame selection at the highest tongue elevation, a visual inspection of the selected video frame was performed for each consonant.
After drawing all the splines for each of the five repetitions of each stop consonant, using a software command, an independent mean was calculated for each of the 42 axes and, thus, a mean tongue contour was drawn based on these 42 points, as well as two standard deviations.Then, two mean splines were compared for each of the contrasts investigated ( and [d] x [g]), using the T test for each axis calculated by the software, at p<5%.
Using this statistical test, the total number of axes crossed by the two mean splines was divided by two, thus dividing the tongue into two regions, anterior and posterior.When the total number of axes corresponded to an odd number, the exceeding axis was counted as pertaining to the anterior region.Thus, using the total number of axes for each region, the number of significant axes was obtained for the anterior and posterior regions.The significant axes given by the T test corresponded to the axes in which the two mean tongue curves (alveolar x velar stops) presented significant differences.
Finally, the proportion of significant axes was calculated by dividing the number of significant axes of the anterior region by the total number of axes of the anterior region.The result was then multiplied by 100.This procedure was performed for the posterior region as well.The proportion of significant axes of each region was calculated for each individual.These values were then submitted to statistical evaluations as described below.
Figure 1 shows the software window with the statistical comparison between the two mean tongue splines and the division of anterior and posterior regions.
The statistical method for the acoustic data consisted of a series of ANOVA repeated measures for each of the acoustic parameters.The intragroup factor was the four consonants and three repetitions and the intergroup factor was the period of speech development (adults and children with TSD).The post hoc Bonferroni test was performed with the aid of Statistica 7.0 software was used, at p<0.05.
Statistical analysis of ultrasound image means consisted first of the Kolmogorov-Smirnov Normality Test with normal distribution considered to be p<0.05.Paired T test was used to detect differences between means of tongue regions, for samples with normal distribution and the Wilcoxon Nonparametric Test was used for anormal distribution.The Mann-Whitney Test was used to compare the groups, since these variables did not present normality, using the Statistical Package for Social Science 15.0 -SPSS at p<0.05.
Finally, tongue curves during production of [t], [d], [k] and [g] stops were also described at the highest tongue elevation and based on gestural phonology (4) , using ultrasound images (location and degree of constriction of the tip of the tongue and location and degree of constriction of the dorsum.

Acoustic analysis of the contrast between alveolar and velar stops.
Table 1 presents descriptive values of each acoustic parameter for adults and children with TSD.
Table 2 presents results obtained for ANOVA repeated measures of the nine acoustic parameters.
The centroid was the only acoustic parameter to show a difference between adults and children with TSD with no significance for the consonant/group interaction.
The following parameters were significantly different between stop consonants [k], [t], [g] and [d]: VOT, centroid, asymmetry, kurtosis and relative duration of the stop and the burst.The differences between stops were similar between the groups.
A consonant/group interaction effect was observed for the following acoustic parameters: spectral peak, variance and CV transition.In other words, the differences between alveolar and velar contrast for these parameters were dependent on the type of group.
A series of post hoc analyses was performed to verify the difference between consonants and between consonants as a function of the group (in the case of a significant consonant/group interaction).
Table 3 presents the post hoc analysis performed to evaluate differences between alveolar and velar pairs of stop consonants, regardless of the group.
There was a contrast in voicing, for VOT, in addition to a distinction in the four pairs of alveolar and velar stops: . This was also observed for the five other parameters highlighted in Table 3, i.e. all of them showed both distinctions in alveolar vs. velar and voicing, however, the difference was not statistically significant for all the contrasted pairs.
For parameters presenting a difference between pairs of alveolar and velar stops with differences between the groups, the   suggest the existence of variability between repetitions of stops for variance values, regardless of the group.

Articulatory analysis of contrast between alveolar versus velar stops
In relation to the proposed articulatory parameters, the proportion of significant axes of the anterior tongue region and the proportion of significant axes of the posterior tongue region were compared in each group investigated (adults and children with TSD).
Table 4 shows the results of the adult group.There was a significant difference between the anterior and posterior regions of the tongue for all contrasts: In the four contrasts, the highest percentual mean of significant axes was found in the anterior tongue.
For children with TSD (Table 5), a significant difference was observed between the anterior and posterior regions of the tongue only for [t] x [k] and [t] x [g].The highest percentual mean of significant axes was also found in the anterior for the same consonant pairs.Figures 2 and 3 show the tongue splines in the comparison of each contrast - an individual from each of the groups investigated.
The splines are in line with the results in Table 4.In general, when considering the Sign Phonology variables for alveolar stops, there was a trend toward elevation and anteriorization of the tip of the tongue in the direction of the alveolar region.In contrast, for the velar stops, there was an elevation and posteriorization of the tongue dorsum toward the soft palate.
Figure 3 shows the splines of a child with TSD, which also demonstrates the differences between alveolar and velar constriction.The articulatory gestures (tract variables -the constriction site for the tip and dorsum of the tongue) of children with TSD were similar to those described for the adult group.However, the vast majority of the data present a smaller magnitude of articulatory gestures for these individuals, in addition to greater variability among some repetitions of segments.
When comparing the proportions of significant axes of adults and children with TSD, in relation to the anterior tongue region during the production of [t] x [k] there was a significant difference between the groups.Thus, the same result was confirmed for the other three pairs of stops -

DISCUSSION
This study aimed to investigate acoustic and articulatory variables for the contrast between alveolar and velar stops in BP.Acoustic parameters and ultrasound images of tongue gestures were analyzed in data of adults and children with typical acquisition.

Acoustic analysis of the contrast between alveolar and velar stops.
In the acoustic analysis, six of the nine parameters (VOT, centroid, asymmetry, kurtosis, relative stop and burst duration) presented differences between the target consonants regardless of the group type, i.e., they were effective for differentiation of the contrast between alveolar and velar stops both among adults and children with TSD.
The spectral peak values differed between consonants, however, this distinction was related to the type of group.This is due to the fact that the adults and children with TSD evaluated use this parameter differently in terms of marking the contrast of the articulatory point.Variance and CV transition were the two other parameters that presented a significant difference for the consonant/group interaction.
Statistically significant differences between adults and children with TSD were also observed for the centroid and by post hoc analysis of the acoustic parameters that presented a significant consonant/group interaction, which were spectral peak, variance and CV transition.
Acoustic analysis is a well-established method among speech researchers for characterization of contrasting sounds.VOT is considered one of the most important acoustic parameters to investigate stop segments and is extensively used for marking the voicing contrast, however, it has also been related to the contrast of the stop articulatory point (3,(5)(6)(7)(8)10,13) , which was also confirmed by the findings of this study. When cmparing the consonants based only on the alveolar versus velar contrast ([t] x [k] and [d] x [g]), higher values of VOT were observed for [k] and [d], respectively.
As for the other parameters, another study (9) observed the employment of all the acoustic parameters investigated here in the differentiation between the consonants [t] and [k], in the speech of an adult BP speaking individual (spectral peak, burst spectral moments, CV transition and relative durations).This corroborates in part with some of the findings of this article, since, in the present study, all parameters showed differences between the pairs of alveolar and velar stops.One study (9) showed that some of these acoustic measures were primary or secondary in the distinction of the alveolar versus velar contrast.
With regard to the distinction between the groups studied, the differences between the acoustic parameters during adult and child oral production, both in this study and in other studies from the literature (3,6,8) , are in line with established knowledge from Speech Therapy and Linguistics.If children with TSD present a stable production compatible with the adult target according to perceptual analysis, why would they present acoustic distinctions in relation to the language standard?This will be discussed in greater depth following the presentation of the articulatory data.

Articulatory analysis of contrast between alveolar versus velar stops
The second instrumental analysis employed in this study consisted of an analysis of tongue images acquired with the aid of ultrasound equipment.One of the challenges in the implementation of this study was obtaining a quantitative analysis of tongue images with the aid of the AAA software.Therefore, an alternative to this type of analysis was employed using the program's resources to record and analyze speech production data.Because the target contrast is apparently related to articulatory gestures at the tip and dorsum of the tongue, the advantage of using the anterior or posterior region of tongue to mark the alveolar versus velar contrast could be questioned.Therefore, the proportion of significant axes of the anterior and posterior region of tongue were analyzed.
The results pointed to some differences between the two tongue regions, both for adults and children with TSD.For all the significant differences observed, the highest percentual mean of significant axes was found in the anterior tongue region, making it possible to infer that there is a greater influence of the middle anterior tongue region in stabilization of the contrast of the stop articulatory point.
The dorsum excursion index (DEI) (13,23) is one of the quantitative ultrasound measures described in the literature for the segments analyzed.This parameter has been used to characterize the contrast between alveolar and velar stops of English speakers (13) and higher DEI values were observed for velar consonants in a child with a noticeable distinction between velar and alveolar segments.
In the description of tongue curves at the maximum point of constriction, a tongue tip and dorsum gesture was observed during the production of alveolar and velar stops, respectively.This was also reported in a study using typical BP speech data (19) .
This study also compared the two groups investigated by means of ultrasound tongue data, which likewise showed some particularities in adult and child productions.Even though both groups presented categorical productions of [t], [d], [k] and [g] (identified by auditory perceptual analysis), a distinction between phases of development was observed with the comparison of the proportion of significant axes of the anterior and posterior tongue regions, where there was a higher mean of significant axes for the adult group.
The visual inspection of tongue curves also showed some specific characteristics of the child group, even though the same tract variables were observed during the formation of alveolar and velar constrictions in both groups.For example, in children with TSD, there was less differentiation between the magnitudes of gestures of the tip and dorsum, in addition to greater variability of tongue curves during the repetition of a consonant.
This also highlights the uncertainties already pointed out in this discussion.Both the results generated by the acoustic analysis and by the articulatory analysis suggest a period of stabilization in the production of children with TSD.This interpretation can be explained by Sign Phonology (4,5) , since it is possible to observe gradient states even in data with no speech alterations, which corroborates the identification of a period of "articulatory refinement" even after the acquisition of the segment (3) .
Differences between the data of children with TSD and adults indicate a period of neuromotor maturation related to the use of articulators in the vocal tract.In these cases, even when some degree of distance is observed between the adult and child stages, where not all parameters are used in the marking of a particular contrast, the use of at least one parameter in a suitable magnitude provides the contrast through audition (8,9) .
Studies with different objectives have also found that motor development and speech motor control seem to be driven by the same maturational constraints (29) .Consequently, the maturation of fine motor control in speech occurs with increasing age (30) .
In corroboration with this, it is accepted that mature speech production is a skill that requires many years of development and improvement of human cognition and language and motor systems (14) .
Finally, in addition to providing information about the contrast between alveolar and velar stops, this study aimed to use a new method of instrumental speech research, employing ultrasound tongue images.Although we cannot answer all of the questions arising in the investigation of contrasts using this tecnhnology, it is hoped that the study will lead to further investigation into acquisition and development of speech sounds, as well as into speech-therapy practice for speech alterations.

CONCLUSION
The acoustic analysis showed several distinctions between the production of alveolar and velar constrictions in the class of stops.Values of VOT, centroid, asymmetry, kurtosis and relative stop and burst duration presented differences between stop consonants regardless of the type of group.In contrast, the results for spectral peak, variance and CV transition suggest that adults and children with TSD use these three parameters differently to distinguish stop consonants.
The articulatory parameters also showed differences between all contrasted pairs of alveolar and velar stops in the speech of adult individuals.However, children with TSD only showed differences in the comparison between the proportions of significant axes of the anterior and posterior tongue regions for the pairs [t] x [k] and [t] x [g].Likewise, tongue curves aided in the identification of articulatory gestures used during the production of alveolar and velar constrictions in both groups.
The differences shown between the two groups in both speech analyses suggest a period of refinement of articulatory gestures in children with TSD, even after stop acquisition, that is, beyond the age of five years and seven months, the mean age of children included in this study.

Figure 2 .
Figure 2. Tongue splines for each contrast investigated, produced by one adult individual

Figure 3 .
Figure 3. Tongue splines for each contrast investigated, produced by one child with TSD [t] x [g], [d] x [k] and [d] x [g] -also in relation to the posterior tongue.In all comparisons, the highest mean of significant axes was observed in adults, the only exception being the comparison between the proportions of significant axes of the posterior tongue, during the production of the contrast between [d] x [k].

Table 5 .
Comparison of proportions of significant axes in the anterior and posterior tongue regions in children with typical speech development *Statistically significant Statistical test -Wilcoxon test at p<0.