Margins of tolerance and reference values for the formant vowels for use in voice therapy for the deaf in commercial computer

Accepted: June 23, 2015 Study carried out at the Programa de Pós-graduação em Engenharia Mecânica, Universidade de Taubaté – UNITAU Taubaté (SP), Brazil. 1 Instituto Federal de São Paulo – IFSP Bragança Paulista (SP), Brazil. 2 Universidade Estadual Paulista – UNESP Guaratinguetá (SP), Brazil. 3 Universidade de Taubaté – UNITAL Taubaté (SP), Brazil. Financial support: nothing to declare. Conflict of interests: nothing to declare. ABSTRACT


INTRODUCTION
The deaf present difficulties in speaking simply because they cannot hear, except for those who have problems in the vocal tract that prevent them from emitting sounds.Hearing individuals repeat the sounds they hear in order to build their oral repertoire.The ability to recall sounds found in hearing individuals is limited or absent in the deaf.This article addresses this issue considering that the establishment of reference frequencies for the first three formant vowel sounds can assist in the vocal training of deaf adults who wish to improve or acquire oralization in Brazilian Portuguese.
Historically, the deaf are surrounded by "ouvinista" (of those who can hear) -a term coined by Skliar (1) to note that the deaf have to follow all the concepts of hearing individuals (2) -ideology, power, and interests.
The deaf can be fully literate in Portuguese and LIBRAS (Brazilian Sign Language) and have lip-reading ability, and still not be able to properly express their ideas and thoughts to hearing individuals who are not LIBRAS users because of the language barrier, in the same way that Brazilians who speak only their native language would have difficulty being understood by Russian speakers who also know only their language.The aspect of orality for the deaf pervades the history of education since ancient times.In Greco-Roman antiquity, speech was directly associated with thought, to the point of considering that thought could not be developed without speech (3) .The first initiatives that sought a means of education for the deaf had financial motivation, and were limited to the children of wealthier families concerned with the passage of assets to their deaf offspring.Although not of inclusive nature, these initiatives have flourished, because they served as a basis for other educators who reproduced the teaching methods from the contact with deaf children of noble families who were educated by these methods (4) .
This historical mention to the education for the deaf refers to the fact that deaf individuals have been the focus of attention because of their difficulty to communicate with a predominantly hearing society.In Brazil, there are approximately 2 million deaf citizens (5) , classified as having "severe hearing loss" according to the Convention on the Rights of Persons with Disabilities ratified by the Brazilian government.This is a controversial aspect.The deaf are not disabled; they are simply different and need other means to communicate.In this sense, the history of education for the deaf in Brazil is intertwined with the history of Speech-language pathology (6) .In general, this professional is the one who has the first contact with deafness, along with the physician.The role of Speech-language pathology extends beyond the application of bimodal, oralist, or bilingual techniques.Speech-language therapists play a more comprehensive role in the contact with deafness, because their professional practice does not involve only the treatment, it makes an enormous difference in a person's life.Considering either approach, the work should, indeed, promote the global development of individuals regardless of the form of communication they use.Global development refers to linguistic, intellectual, social, and academic development, and especially to identity preservation (3) .This paper aims to offer a small contribution to support Speech-language pathology professionals in their work with the adult deaf community.

PURPOSE
The objective of this study is to identify the margins of tolerance and reference values for the frequencies of the formants F1, F2 and F3, according to gender and age range, for the seven vowels of Brazilian Portuguese (/a/, /e/, /Ɛ/, /i/, /o/, /ᴐ/, /u/), which constitute the core of the syllables of the words of that language (7) .
It is expected that these reference values may allow adult deaf individuals, supervised and guided by a Speech-language therapist, to train oralization through a computational resource that provides them with a graphic display of the reference values and the values of their sound production, helping them to calibrate their voice as close to the displayed reference as possible.The use of a regular computer, whether a desktop or a laptop, connected to a simple microphone, will provide the Speech-language therapist with an affordable tool to support voice therapy.In this context, this study seeks to find values to compose the parameters to be used in the configuration of a system for the vocal training of deaf adults.
The reference values found will be used in the construction of a computer system according to the specifications of the "Interactive System to Aid the Deaf" patent application (8) which seeks to provide the Speech-language therapist with a tool to support the development of oral language therapy in individuals with severe hearing impairment.The system will include graphic features in three dimensions of the frequencies produced by vocalization performed by the deaf, collected and calculated in real time.The frequencies formed in the vocal tract will be displayed overlapped, in the same scenario, on the reference values obtained for the F1, F2, and F3 frequencies.The margin of tolerance will be represented by semi-transparent, three-dimensional geometry.The whole system operation will be performed by a Speech-language therapist.Therefore, the deaf adult will count on visual support on the computer screen to allow equalization between the production of their speech and the references obtained from individuals of the same age and gender.

METHODS
The process of voice production and all the dynamics associated with it, depicting the physical, mechanical, and acoustic aspects of the whole vocal tract involved in this process, is widely described in the literature on speech acoustics.The studies by Chiba and Kajiyama (9) , two Japanese researchers, have been considered first milestone in the development of the knowledge area on the study of acoustics involved in the speech production process.The book "The Vowel", from 1941, consolidated their research and has been widely cited to date.
Another researcher who greatly contributed to speech production studies was Gunnar Fant (10)(11) , a Swedish scholar who devoted his life to research related to vocalic production since his graduation at the Department of Telegraphy and Telephony of the Royal Institute of Technology (12) .In 1960, Gunnar Fant published a study (13) that had significant impact on the evolution of vowel production studies.His experiments and calculations allowed us to clearly observe the relationship between the acoustic parameters of vowel production and the artificial signals produced by a model.
Authors such as Stevens (14) , O'Shaughnessy and Deng (15) , and Flanagan (16) address not only the physiological aspects, but also the physics involved in the speech production process.These authors demonstrate, mathematically, how the frequencies observed during speech can be calculated.They address variables such as viscosity and thickness of the vocal tract and the section considered to obtain the frequency.
One of the aspects of this process is the identification of the frequencies of resonance waves that characterize a particular sound produced.These specific characteristics are obtained from the mathematical analysis of the transfer function of the passage of the acoustic wave along the vocal tract, from its production in the vocal folds to its exit through the lips.Elements such as air density and velocity, vocal tract length, cutting area where the wave is analyzed, etc., compose the modeling of this system, resulting in the calculation of the sound pressure produced (11,(14)(15) .
Several techniques are used to treat the voice signal.In general, these techniques are based on small anatomic variations of the vocal tract during phonation.The sound wave produced by the vibration of the vocal folds resonates throughout the vocal tract, finding greater or lesser constriction to its passage depending on the configuration of the system components (tongue, soft palate, hard palate, etc.).One of the techniques used for modeling the speech production process is called Linear Predictive Coding (LPC), which was proposed by the United States Department of Defense in 1984, initially as a model for human speech coding (17) .
The most important aspect of LPC is the linear prediction filter, which allows the value of the next sample to be determined by a linear combination of the previous samples (17) .
This and other technical approaches have been used in the computational process for extracting formants based on collected voice signals, as described ahead.The recording of voice signals was performed in a laptop computer connected to a microphone.Sampling rate was 11025 Hz.The study sample comprised 53 individuals, 40 men (≅75%) aged 17 to 59 years, 12 women (≅23%) aged 17 to 55 years, and one 6-year-old female child (≅2%).None of the participants presented speech problems.The participants were volunteers who signed an Informed Consent Form (ICF) approved by the Research Ethics Committee (REC) of "Universidade de Taubaté" under protocol no.985459.
The volunteers were given an explanation on the project content and the importance of their contribution to research aimed at human development and assistive technology therapies (18) .A demonstration of the sound recording procedure was conducted so that the volunteers became familiar with the process.The procedure consists in breathing lightly, obtaining sufficient air so that the sound production does not occur forcibly, which would alter the proposal of natural phonation of a vowel.After that, the individual approaches the microphone to a distance of 2 cm of the mouth and emits the sound of the requested vowel for approximately 1 sec.The graphical interface of the software displays a green quadrangle at the top right of the screen, a graph of the first three formants, and the message "Recording Performed Successfully", as shown in Figure 1.
The data were stored in one-dimensional vector format ready for use by MATLAB scripts.Thus, the formant extraction algorithm could be applied to each individual sample (individual and vowel) following the three main sub-routines: Signal filtering -hamming windows, pre-emphasis, and high-pass filter; Formant extraction: 8-coefficient LPC; and Track selection -selects tracks > 90 Hz (formant) and passband < 400 Hz.
Of the 53 participants, the most significant sample was composed of male adults aged between 19 and 59 years; therefore, values from the substrate of 38 male individuals were considered for this study.Among the 38 samples, all from individuals without speech problems, 22 are from Sao Paulo state, 12 from Minas Gerais state, and the others from other Brazilian states.Age distribution in years was as follows: two with 19, four with 20, four with 21, three with 22, four with 23, two with 25, two with 26, one with 27, three with 29, one with 34, two with 38, two with 39, one with 50, one with 51, two with 52, one with 57, one with 58, and two with 59.
Regarding data analysis, some utility programs were created to optimize and accelerate the extraction of formants of the whole sample to guarantee the integrity of the data, that is, ensure that the extraction of formants of each sample followed the same process.Individual data were stored in a repository in text format to be imported into an Excel spreadsheet to aid in the assembly of statistical graphs and calculations.
Data analysis was performed using statistical concepts (standard deviation, median, and coefficient of variation) (19) .Each formant and phoneme received individualized treatment for data collection.To illustrate the method used to obtain the reference values, we took as an example the data of the first formant (F1) of the phoneme /a/ as from the indicated sample, as shown in Figure 2.
The number of classes (block groups) was defined according to the Sturges' rule (20) and the interval between classes was based on the number of samples used (20) .The value for the margin of tolerance was calculated using the coefficient of variation found.The value of the median (reference center) was used for graphical representation of the frequencies of each formant.The margin of tolerance was designed by subtracting and adding the amount resulting from the application of the coefficient of variation on the median value.These values enabled the drawing of a three-dimensional figure around the central point, as shown ahead.
Finally, in this study, the algorithm was structured in a single program to ensure the integrity of the extracted data.

RESULTS
The data obtained with the method described, for the selected samples, are consolidated in Table 1, according to each phoneme.The option for using the sample median rather than its mean is due to the fact that the sample outlying frequencies (in Hz) could affect the result.The differences are small, but the purpose of this study is to find the most accurate possible approximation between the reference values of the formants of vowels.
To corroborate the results of this research, we adopted frequencies of formants derived from the production of vowels of Brazilian Portuguese from previous studies.
In her study, Oliveira presented a table (Table 4 on page 112; see references) (21) with the mean values of the formants F1, F2, and F3 for the phonation of phoneme /a/ by two male individuals without vocal problems aged 18 to 21 years, respectively.The frequency values extracted from the referred table were obtained when the pronounced word, according to the research method cited, contained the phoneme /a/ in its stressed syllable, as follows: F1=769 Hz and F2=1325 Hz, for the first individual; F1=845 Hz and F2=1371 Hz, for the second individual.
A classic study by Behlau (22) presents a table of values for F1 and F2 of the stressed vowels for men, women, and children.The frequency values for men for the phoneme /a/ are F1=807 Hz and F2=1440 Hz.
A more recent study, from 2013, shows a comparison of the production of vowels among children, men, and women.In this article, the frequency values for men for the phoneme /a/ are F1=620 Hz and F2=1478 Hz (23) .
Table 2 shows a comparison between the frequency values obtained in this research and those of the other three studies.
As it can be observed, the differences between values are within the margin of tolerance found for formants F1 (171.983Hz) and F2 (410.408Hz).

Graphic presentation of results
In a comparative graphic approach between F1, F2, and F3 of the seven vowels, establishing the median of the sample frequencies as the central point, surrounded by a parallelepiped that represents the margin of tolerance found, visual distinction between the vowels can be observed -one of the secondary objectives of this study -providing visual feedback of the vowel pronunciation, as shown in several forms in Figure 3.
The graphical display in three different views reinforces research in which the pattern of formants characterizes the vowels according to the vocal tract configuration (24) .The predominance of F1 and F2 in vowel characterization is well known, but the literature reports the first three forms as the reference for vowel identification (25) .

DISCUSSION
By approaching and plotting all formants and their margins of tolerance in mutual, two-dimensional, and three-dimensional view, it was possible to observe that the frequency values allow the use of these frequency references as a basis for vocal training, considering that they show the distinction between the vowels of Brazilian Portuguese through the first three formants.
f) 620  The lighter points (in green) represent the pronunciation of the other phonemes as well as that of the /a/ phoneme performed by a hypothetical individual under therapy, as shown in Figure 4.The result of this experiment shows that it is possible to use the obtained references as a basis for characterizing each of the seven vowels of Brazilian Portuguese, providing Speech-language pathology professionals and researchers with an extra support tool.
It is worth noting that two pairs of phonemes -/Ɛ/ and /ᴐ/; /o/ and /u/ -showed an overlap of values, as presented in Figure 5.
As the proposal is to offer supporting tools for professionals working in this area, this overlapping of frequencies should be considered in an individual basis at the time of their use and within the context of the approach adopted for its therapeutic utilization.

CONCLUSION
This study presented a valuable range of information for use as reference frequencies to characterize the seven vowels of Brazilian Portuguese.These reference values can be used to support device calibration and also for the production of new assistive technologies directed to voice therapy.
It can be clearly observed that the three-dimensional view proposal presents visual distinction between the vowels.When they are displayed in the same scenario, it is possible to observe the gap between them.The reference values place them in different spatial regions.The margins of tolerance found provide the individual under voice therapy with a reference for phonation within frequencies that make the sound intelligible and distinguishable by a hearing individual.The repetitive vocal training will assist the deaf in positioning the vocal tract properly when they have to pronounce one of the vowels.

Figure 1 .Figure 2 .
Figure 1.Graphical interface software program for sample collection

Figure 3 .Figure 4 .
Figure 3. Comparative overview between the frequencies of reference values and margin of tolerance of the first three formants of the seven vowels extracted from the voice recording of 38 male adults without vocal problems.(A) shows the comparison between F1 (horizontal) and F2 (vertical), (B) shows the frequencies of F3, and (C) shows the three-dimensional view of the three formants -F1 (horizontal axis to the right), F2 (horizontal axis to the left), and F3 (vertical axis)

Figure 5 .
Figure 5. Different views of the reference values of F1, F2, and F3 frequencies extracted from the voice recording of 38 male adults without vocal problems.(A) shows the overlapping of vowels /Ɛ/ and /ᴐ/.(B) shows the overlapping of vowels /o/ and /u/

Table 1 .
Consolidation of margin of tolerance and reference values of the frequency of formants extracted from the voice recording of 38 male adults without vocal problems

Table 2 .
Comparison of the frequencies of formants F1 and F2 between the values found in this study and the values obtained in three other studies