Comparison of fundamental frequency and formants frequency measurements in two speech tasks

Purpose: to compare the measurements of fundamental frequency (F0) and frequency of the first two formants (F1 and F2) of the seven oral vowels of the Brazilian Portuguese in two speech tasks, in adults without voice and speech disorders. Methods: eighty participants in the age range 18 and 40 years, paired by gender, were selected after orofacial, orthodontic and auditory-perceptual assessments of voice and speech. The speech signals were obtained from carrier phrases and sustained vowels and the values of the F0 and frequencies of F1 and F2 were estimated. The differences were verified through the t Test, and the effect size was calculated. Results: differences were found in the F0 measurements between the two speech tasks, in two vowels in males, and in five vowels, in females. In the F1 frequencies, differences were noted in six vowels, in men, and in two, in women. In the F2 frequen -cies, there was a difference in four vowels, in men, and three, in women. Conclusion: based on the differences found, it is concluded that the speech task for evaluation of fundamental frequency and formants’ frequencies, in the Brazilian Portuguese, can show distinct results in both glottal and supraglottal measures in the production of different oral vowels of this language. Thus, it is suggested that clinicians and researchers consider both forms of emission for a more accurate interpretation of the implications of these data in the evaluation of oral communication and therapeutic conducts.


INTRODUCTION
Technological advances contribute to enlarge the studies on speech sciences. Among the many forms of assessment, the acoustic analysis of speech and voice stands out for being noninvasive and relatively low-cost 1 , which contributes for it to be frequently used in researches conducted by different professionals, including the speech-language-hearing therapist [2][3][4] .
It is possible to observe, in the literature, different methodologies for the analysis of the same phenomena. In speech-language-hearing sciences, the acoustic parameters frequently investigated are the fundamental frequency and the vowel formant frequencies [3][4][5][6] .
The fundamental frequency (F0) produced by the vibration of the vocal folds and its harmonics are modified in the supraglottal cavities, which work as a filter attenuating some frequencies and amplifying other ones. The amplified frequency ranges are known as the formant frequencies, of which the most studied are the first two (F1 and F2), as they furnish phonetical identity to the vowels. The frequency of the first formant (F1) presents relation to the vertical position of the tongue and with the degree of mandible opening; its value is inversely proportional to the position of the linguomandibular complex. The frequency of the second formant (F2) is influenced by the anteroposterior displacement of the tongue, the more anterior the constriction of the tongue, the greater will be the value of F2; and, the more posterior, lower will be that measure [7][8][9][10][11][12] .
Both the values of fundamental frequency and those of the formant frequencies present correlation with the language. In Portuguese, according to the position of the tongue on the vertical axis, the vowels may be divided in: low [ [u]. Since these regions are related to F2 measures, the anterior vowels present higher F2 values, and posterior ones, lower measures of this parameter 7,13 (Figure 1). The height of the tongue also reflects on F0 values, as high vowels have higher pitch values than the lower ones. Hence, in Portuguese, the vowels with higher F0 values are [u] and [i]. The position of the vowel on the anteroposterior axis also influences this parameter, once posterior vowels usually present higher F0 values than their anterior correspondents 13 . In these measures, distinctions between the genders are also observed, mainly due to anatomical differences. In general, since among females the vocal folds and vocal tract are shorter, higher values for F0 and the formant frequencies are expected, in relation to men, who have longer vocal tract and vocal folds, and so, lower frequencies 14 .
The fundamental frequency is the most robust parameter for studying voice, and the formant frequencies are essential for the identification of the vowels, and they enable for articulatory interpretations of acoustic data 7 .
The proposition of studying differences between acoustic parameters of two forms of emission (carrier phrases and sustained vowels) was based on the fact that these are the most used speech tasks in researches and speech-language-hearing clinic, as referred in a recent systematic review regarding formants and production of voice and speech 6 .
Studies that investigated the differences between continuous speech and sustained vowels concentrated on the perception of dysphonia degree 1,[15][16][17] ; it is, thus, a theme little approached in individuals with unaltered voices. Only one of the researches found had analyzed glottal parameters of healthy people on two speech tasks in Brazilian Portuguese 18 . However, no data on the formant frequencies with this same outline were found.
Therefore, the comparison of measurements of F0 and of the two first formants on these speech tasks in people without articulatory and vocal disorders is relevant, especially in Brazilian Portuguese, due to the lack in the literature thereof. It is important that all vowels be analyzed due to the circumstances in the position of the articulators for them to be produced. The characterization of such aspects will collaborate to a more refined knowledge of the variants in producing speech, and it can aid the work of the speech-languagehearing therapist both in the clinic and in improving oral communication, since different results can be found depending on the form of emission chosen for assessment of the clients.
In this sense, this study aimed at comparing the fundamental frequency and the two first formant frequencies (F1 and F2) of all the seven oral vowels of the Brazilian Portuguese (BP) between the emissions in sustained vowels and in continuous speech, with the use of carrier phrases in people without dysphonia and speech disorders.

METHODS
This is an observational, descriptive, cross-sectional study, whose participants were divided into two groups according to their gender.

Selection of the Participants
To compose the sample of this paper, 80 people were included, paired by gender, aged between 18 and 40 years (men: = 23.3 years, SD=2.71; women: = 22.2 years, SD= 2.66).
The participants were interviewed by the first author of the research, and answered a questionnaire with personal data and questions related to their health conditions. Afterwards, they were evaluated by orthodontists, coauthors of this study, and underwent speech-language-hearing assessment of the orofacial structures and auditory-perceptual voice and speech assessment.
The inclusion criteria were: not having a history of respiratory, auditory, vocal or speech disorders, not being a smoker, being a native speaker of Brazilian Portuguese from the city of Rio de Janeiro, having normal occlusion or Angle Class I with balanced maxillomandibular relation in the three dimensions of space, harmonic profile, and small alterations of dental positioning. The inclusion of participants with Angle Class I was adopted once the patients with normal occlusion, i.e., without dentoskeletal alterations, are rare. The participants should present scores corresponding to grade 4 (normal) in the evaluation of the orofacial structures by means of the OMES-E Protocol 19 , score zero on the general dysphonia degree (G) according to the GRBAS auditory-perceptual evaluation scale 20 , balanced resonance, and not presenting speech disorders. The exclusion factors were: presence of open-bite, anterior or posterior crossbite, absence of teeth or presence of supernumerary teeth. The participants that reported the presence of cold or allergic processes on the day the speech samples were being collected, or that for some reason could not adequately perform the emissions, were excluded from the sample.

Recording of the speech signals and signal processing
The recording of the speech signals followed a methodology tested in previous researches 3,5,21 .
For the estimation of the fundamental frequency and the formant frequencies, the speech signals were obtained from: a) recording of carrier phrase: "Fale____ para mim" ("Say____ to me") , filled in with the words "pápa", "pépe", "pêpe", "pípi", "pópo", "pôpo" e "púpu"; and, b) in prolonged emission of the seven oral vowels of Brazilian Portuguese (BP) for three seconds. The participants read the instructions and performed tasks with comfortable pitch and loudness. Each speech task was repeated four times, and the two emissions with best definition of formants tracing were selected.
The recordings took place in a silent room, with the use of Praat software, version 6.0.16 (P. Boresmaand D. Weenink, University of Amsterdam, Netherlands, free, available at http://www.fon.hum.uva.nl/praat/), in monochannel, with a sampling rate of 22,050Hz, and in .wav format. A notebook, HP brand (Hewlett-Packard, USA), was used, with Windows 10 operational system, as well as a microphone, Shure brand, model SM 58 (Shure,  After the clipping, each segment was saved in a .wav extension file. For the digital processing of the signals, a script created with the Praat software was used, which had been tested in previous studies 3,5,22 . The measurements were obtained from two samples of carrier phrases (CP) and sustained vowels (SV) of each vowel for all the participants. Thus, 3,360 parametric values were collected, composed of three parameters (F0, F1 and F2) X seven vowels X two samples X 80 individuals. All the clippings of the vowels were carried out by the same researcher.
The values obtained with the script were revised in three different moments to ensure that the measurements were correct. Hence, the first researcher manually conferred the measurements, while another author verified the frequencies through the script and manually conferred the values. In the cases in which there was divergence between the automatic and the manual estimations, the measurements obtained manually were considered. These procedures were adopted to avoid estimation errors, mainly in the posterior vowels, which, as they present proximities in the first two formant frequencies 12

Statistical analysis
The statistical analysis was conducted with the use of the Statistical Package for Social Sciences for Windows (SPSS®, Inc. Chicago, Illinois), and the average, median and standard deviation measures of central tendency were considered.
To verify the normality of data distribution, the nonparametric Kolmogorov-Smirnov test was used, and there were noted evidences that the variables presented normal distribution.
For the comparison of the F0, F1 and F2 measurements between the forms of emission researched, the Paired t Test was used. The level of significance adopted for rejecting the null hypothesis (frequencies in the two forms of emissions were equal) was equal to or lower than 0.05 (5%). The alternative hypothesis was that there would be differences between the two analyzed forms of emission.
The effect size was also calculated, as it is an important complement of statistical significance test.
The objective was to verify the degree in which the phenomenon was present in the population studied; the greater its value, the greater was the presence of the phenomenon. The ES values are considered small (0.20≤d<0.50), medium (0.50≤d<0.80) or large (d≥0.80) 22 .

RESULTS
A difference has been observed between the averages of the two forms of emission (CP and SV), both of the F0 and the frequencies of the first two formants in several oral vowels of the Brazilian Portuguese.
In the males, higher F0 values were found in two vowels, and higher F1 values in six vowels in the CP emission. The F2 values were lower in the CP emission in four vowels. In this group, the largest effect size value was found in F1 of the vowel [i] ( Table 1).
In the females, lower F0 values were observed in five vowels, higher F1 values in tow vowels, besides lower F2 values in three vowels in CP emission ( Table 2).

DISCUSSION
In this study, the averages of the fundamental frequency and the frequencies of the first two formants were compared in emissions in carrier phrases and sustained vowel in people without dysphonia and speech disorders. After a bibliographical survey was conducted, it was noted a shortage of studies analyzing the differences between these two speech tasks in different vowels of the Brazilian Portuguese in vocally healthy individuals, which limits the comparison with the present results.

Fundamental frequency
When analyzing the values of fundamental frequency, it was noted that higher pitch measurements in the CP were found only in the anterior [i] and posterior [u] high vowels, in the male group, with effect size values considered small. Therefore, a hypothesis for such findings was that these results may have been favored by elevation of the hyoid-laryngeal complex during the coarticulation process present in this kind of emission. The presence of differences only in the high vowels may present correlation with the symmetry between the height of the tongue and F0 measurements present in Portuguese 13 . In the literature, two studies were found that compared measurements of the vowel [a] between sustained vowels and continuous speech. In one of them, averages of F0 close to the male voices were observed; this tendency was also observed in this vowel in this investigation. However, another study observed reduction of F0 in the sustained emission of the vowel [a] in relation to that emitted through text reading in this gender 23 .
In the female group, higher pitch measurements were found in the sustained emissions in five vowels, with medium effect size in four of them. Therefore, this was the parameter that most differed in this gender. As in men, symmetry between height of the tongue and F0 averages was also observed, with statistical differences.
Hence, the anterior medium-low [Ԑ] and medium-high [e] vowels presented tendency similar to their corresponding posterior ones, medium-low [ᴐ] and mediumhigh [o] vowels.Likewise, higher F0 measurement in the sustained vowel was observed in the vowel [a], which, for being central, has no correspondence with another vowel on the anteroposterior axis 13 .
Once these differences were noted in most of the vowels in the female group in the speech task closest to usual communication situations, it was hypothesized that, by basing on physiological aspects that indicate that higher F0 values can demonstrate higher elevation of the hyoid-laryngeal complex and higher vibration speed of the vocal folds, these findings may aid in understanding some clinical implications based on the observation of which speech task was used in the assessment. Thus, assuming that the sustained vowels are historically the most investigated and most used form of emission in speech-language-hearing clinic 1 , and that the present findings demonstrated higher F0 values in SV in most of the vowels in women without dysphonia, probably this muscular adjustment may reflect on the increase of the degree of dysphonia reported in researches [15][16][17] , given that, physiologically, higher-pitched sounds require more muscular refinement to be produced.
A hypothesis for higher F0 values in the SV in most of the vowels produced by women would be that, as this type of emission is not part of a usual communicative context 18 , there is greater probability of interference of the speaker 23 . Higher F0 values in the sustained vowel [a] in relation to continuous speech were also observed in three age groups of women in a study that investigated normal voices; however, the differences were subtle 18 . The same tendency of increase in F0 in the sustained vowel in relation to continuous speech of female voices was observed in another study 23 .
Another possible justification for the symmetry found between anterior and posterior vowels with statistical differences may be based on the existing correlation between tongue constriction position on the vertical axis and fundamental frequency 13 . Hence, even with the assessment of different emission tasks, the fundamental frequency values followed similar tendencies according with the height of the vowel in both forms of emission.
The differences found between the forms of emission are also supported in other papers 24,25 which associated the change in tone with articulatory aspects, and highlighted the interaction between glottal source and filter. By analyzing the relation between the control of frequency and articulation of vowels, a research 24 reported that the changes found in fundamental frequency, in addition to being originated at the intrinsic laryngeal tensor muscles, could also partly result from the geniohyoid muscle, genioglossus muscle, and hyoid bone movements. The extrinsic tongue and laryngeal muscles influence directly and indirectly the position of the hyoid-laryngeal complex and the intralaryngeal configuration.

Formants frequencies
In the male group, the measurements that most differed between the tasks analyzed were the frequencies of the first formant. The results demonstrated higher values in the CP in six vowels, with medium effect size in four of them ( Figure 3). In the female group, the same tendency was observed, though only in the vowels [a] and [Ԑ] with medium effect size ( Figure 4). Therefore, by establishing an acoustic-articulatory correspondence, it is possible to infer that the tongue was in a lower position and the mandible in more open position, besides having occurred greater narrowing of the pharynx in the carrier phrases in relation to the sustained vowels 7 . A hypothesis for the differentiation found between the two speech tasks would be the interference of the coarticulation phenomenon present in continuous speech, once a given segment influences adjacent segments; i.e., in the analyzed vowel, there are acoustic hints of the consonant that precedes it 7,12,26 . In women, the differences found only in the vowels  (Figures 3 and 4). Based on the observation of these data, it is possible to infer, by means of an acoustic-articulatory correspondence, that the tongue constriction position was more posterior and the conformation of the pharynx was narrower 7 than in the sustained vowels. A hypothesis for the reduction in F2 values would be a greater interference of the coarticulation present in the continuous speech in these vowels, since the movement of the articulators to produce a sound will change because of the nearby sounds 7,12,26 . The lower F2 values in the CP in all posterior vowels in both genders may have been favored by the tongue posterior constriction inherent to their production. And, in males, when analyzing the vowel [a], it was possible to observe the lowering of the oromandibular complex by the increase of F1 in the carrier phrases, which may have probably contributed to a more posterior constriction of the tongue, collaborating for the reduction of F2 frequency in this vowel.
The fact that the production of some vowels did not present statistical differences between the speech tasks is supported in the literature, which highlights that, even though different forms of emission may use different muscle adjustments for the production of the same vowels, adaptations in the articulators may occur and, thus, not produce so many differences in particular in the formant frequencies 14 .
The acoustic measurement values presented in this paper represent the averages of the population studied, according to a methodology tested in other studies 3,5,21 , there not being the intention of proposing parameters of normality.
The tendencies observed of elevation of F0 in some vowels in men and lowering of pitch in several vowels in women, as well as elevation of F1 values and lowering of F2 measurements in the posterior vowels in the carrier phrases may be complemented with other studies. Therefore, it is suggested that more researches with this scope be developed in order to increase the information on these differences in Brazilian Portuguese.

Limitations
Although this study has contributed on the differences that may be found in acoustic parameters according to the speech task assessed in people without voice and speech disorders, some limitations must be recognized. Firstly, only the two most used speech tasks in research and speech-languagehearing clinic were examined; however, other forms of emission were not considered. Secondly, even though other researchers, according to their objectives, may concentrate on evaluating measurements obtained from spontaneous or semi-spontaneous speech with more complex excerpts, it was opted for the use of the most referred carrier phrases in the literature, which can, in the future, allow for the comparison of data with other researches using the same corpus. Contributions The results of this research can aid the speechlanguage-hearing therapists who work both in the clinic and on researches and on the improvement of speech and voice, as it reinforces the presuppositions that different emission tasks may produce distinct acoustic measurements. Hence, it is highlighted the importance of the speech-language-hearing therapist consider more than one emission form when assessing their clients, and that this may aid in guiding their work in accordance with the therapeutic objectives for each case. It should be also emphasized that no isolated measure in the speech-language-hearing clinic is enough to define the conduct; nonetheless, the set of information aids the clinician in making better decisions.

CONCLUSION
There was a difference between the measurements of fundamental frequency and F1 and F2 frequencies between the two speech tasks. , in the male group. Therefore, it is concluded that the speech task for assessing fundamental frequency and formants frequency in the Brazilian Portuguese may demonstrate distinct results both in glottal and supraglottal measurements, when producing the different oral vowels of this language. Hence, it is suggested that clinicians and researchers consider both forms of emission for a more accurate interpretation of the implications of these data in assessing oral communication, thus, guiding their therapeutic conducts.