Updating sentences lists for assessment speech perception

ABSTRACT Purpose Adapt a list of sentences for a speech intelligibility test. Methods A speech material data base consisting of 200 phonetically balanced sentences was analyzed and partially updated. In the first stage, 60 reviewers, specialists in linguistics and speech and hearing science, analyzed the sentences in relation to the parameters of familiarity, meaning and predictability using an on-line questionnaire. Cronbach's Alpha coefficient was used to analyze the internal consistency of the questionnaire. In the second stage, the reviewers analyzed whether they were in accordance with the criteria indicated by the literature for the construction of sentences. Results In the first stage, the responses of 15 reviewers who completed the entire questionnaire were analyzed. Agreement between reviewers was high for all criteria. 71 sentences were recommended for modification in the first stage, with predictability being the most indicated parameter as requiring change. In the second stage, 28 more sentences were selected for adjustment, with the presence of a proper name in the sentence being the most frequently cited criterion. Conclusion It was possible to adapt a list of sentences in order to provide speech language therapists with a free of charge speech perception protocol. It is hoped that this new test can assist in standardizing assessment for normal hearing adults and individuals with hearing loss in Brazilian Portuguese.


INTRODUCTION
The ability to understand is fundamental for social integration since it is the first step that enables communication between individuals. It is considered the most important aspect to be evaluated in auditory functioning, generating data that show how individuals listen and understand during their daily routine (1) .
Collaboration between Speech-Language Therapy and Linguistics has developed techniques for early diagnosis of alterations in individuals' language and for the development of protocols to assess speech comprehension. The field of speech processing in engineering is undergoing constant evolution, contributing technological resources that improve, standardize and automate the testing used in the audiological area (2) . There have been significant advances in the assessment of speech perception owing to this relationship (3) .
In the literature, speech materials that have already been developed to evaluate the detection and discrimination of sounds and words were observed (4,5) , in addition to the ability to recognize monosyllables, words, and sentences (6)(7)(8) .
To assess real communication, the literature recommends the application of sentences with competitive noise (8)(9)(10)(11) . These sentences can be used both with normal listeners (12) and individuals with hearing loss (6) . However, its most appropriate application is for evaluating and monitoring candidates for and users of hearing aids (HA) and cochlear implants (CI) (13,14) .
The analysis of the national literature showed that currently, there are few options for speech perception tests available for use in the assessment of adults in clinical practice. In this context, a well-known test is the List of Sentences in Portuguese (6) , which is available on Compact Disc. This test requires the speech-language therapist to adjust speech and noise levels. Several studies use this material and recognize its wide applicability (13,15,16) .
The Sentence Lists of the Audiological Research Center (Centro de Pesquisas Audiológicas -CPA) is another test to assess speech recognition that uses phonetically balanced sentences and is widely used in most CI services. However, it is used in most hands-free services, and there is no uniformity in its application (7) .
The Hearing in Noise Test (HINT) is widely used, especially in the research field (8) . Its advantage is that of being internationally recognized, while the high cost of its license for use in the clinical context is a disadvantage (17) .
To create a new speech recognition test for speech-language therapy, we analyzed speech banks of Brazilian Portuguese (BP). Speech banks are files composed of a large number of sentences that make it possible to capture variations and changes in a speech community (18,19) .
Alcaim (12) developed a well-known speech bank in Brazil and it was adapted by Seara (20) . This bank consists of phonetically balanced sentences, which contain 35 segments (phonemes and their variations) of BP with a total occurrence of 20,178 sounds throughout the sentences.
The research carried out by Seara had academic purposes and stood out for its rigor in terms of phonetic balance, which was more complete than the author who proposed them (12) . However, given the clinical purposes of this study and the concern to adjust the vocabulary, meaning, and predictability of the speech material, we needed to update the sentences.
To make available a test that assesses speech recognition and facilitates the performance of the speech-language therapist for clinical practice, this study had the general objective of adapting a list of sentences to assess speech recognition in adults.

METHODS
The present study was observational, cross-sectional and analytical, and took place from June to August 2019. It was approved by the Research Ethics Committee of the Federal University of Santa Catarina (UFSC) and has the approval number of the Ordinance 1,997,931 and CAAE56838816.7.00000.0121.
In the present paper, we sought to adapt lists of sentences to assess speech recognition, based on a speech bank with phonetically balanced sentences, adapted by Seara (20) and authored by Alcaim (12) . Seara's speech bank (20) is composed of 200 sentences distributed equally into 20 lists with 10 sentences each. These sentences are widely used as a speech bank in speech intelligibility tests with normal listeners in the Engineering area.
This study was carried out in three stages: analysis by reviewers through an online questionnaire, updating of sentences according to construction criteria based on the literature, and a pilot study with normally hearing individuals.

st stage -Analysis by the reviewers through an online questionnaire
At this stage, we built an online questionnaire for the reviewers to analyze Seara's sentences (20) .
The speech bank sentences were transferred to an online platform (SurveyMonkey), aiming to facilitate the access and participation of reviewers from various regions of the country.
We invited two groups of reviewers made up of linguists and speech-language therapists with experience in auditory rehabilitation to carry out the sentence analysis. Recruited linguists needed to have at least a master's degree and speech-language therapists needed to have at least three years-experience. These professionals were appointed by professors and professionals with experience in the areas of phonetics, phonology, and auditory rehabilitation. Before submitting the questionnaire online, the reviewers were contacted by email or social media.
The group of linguists was called GL and was divided into two groups (GL1 and GL2). The group of speech-language therapists was called GR, and was divided into GR1 and GR2.
Due to a large number of sentences, we decided to divide 187 sentences into four subgroups to facilitate analysis, with 47 sentences being allocated for subgroups GL1, GL2, and GR2 and 46 for subgroup GR1. In addition to these different sentences, each subgroup also analyzed 13 randomly chosen, common sentences to evaluate the agreement of all subgroups.
Prior to sentence analysis, reviewers needed to answer questions about their undergraduate and graduate education, their institutional affiliation, whether they had experience in acoustic analysis, and what their field of study was. Therapists were asked about the year they graduated in Speech-Language Pathology, the time spent working in speech-language therapy, and whether speech recognition tests were applied.
For the sentence analysis, each reviewer was given a classification of marks for the following criteria: familiarity, sentence meaning, and predictability. We collected the answers using solid-line scales, collated in SurveyMonkey through a slider bar ranging from zero (0) to one hundred (100).
The classification of the familiarity of each sentence was realized according to how common or well-known the sentences were to the reviewer. In this item, 0 represented a sentence uncommon or unknown for the reviewer, while grade 100, was very common or well-known.
Sentence meaning was evaluated according to the meaning mobilized by the sentence. In this item, zero represents an absence of meaning, while 100, full meaning.
Predictability was classified according to the expectation of a continuation of the sentence from some initial words. 0 represents no expectation regarding what could complement the sentence while 100indicates an ability to predict the end by reading only the beginning of the sentence, that is, significant predictability.
At the end of the first stage, the data were descriptively analyzed and, to verify the internal consistency of the answers to the questionnaire applied to the reviewers, we used Cronbach's Alpha test, calculated using SPSS software. Cronbach's Alpha coefficient measures the relationship between the answers of a questionnaire, through an analysis of the research participants' answers. This coefficient reflects the degree of covariance between the items on a scale, with a lower sum of the items' variance, indicating a more consistent instrument. Values greater than or equal to 0.70 are indicative of adequate internal consistency (21) .

nd stage -Updating the sentences according to the construction criteria based on the literature
In the second stage of the research, three reviewers who did not participate in the first analysis updated the sentences indicated by the reviewers from the first stage, carrying out the recommended modifications for the criteria of familiarity, sentence meaning, and predictability. The reviewers of the second stage were two doctors of linguistics and one doctor of auditory rehabilitation.
The second stage reviewers also verified whether they were following the construction parameters of sentences based on the literature (6,22,23) : exclusion of proper names; affirmative sentences with simple and compound period; sentences consisting of three to seven phonological words, and with a low level of abstraction. In this second stage, the modifications in each sentence also prioritized the semantic aspect over the phonetic one, to adapt the sentences to the study objectives.
We considered a phonological word a phonological or prosodic word that only has one primary accent (tonic). This criterion was used to facilitate counting the number of correct answers when applying the speech recognition test.
After these adjustments, we organized the sentences into 10 lists with 20 sentences each to facilitate the clinical assessment of speech recognition. In each of the 20-sentence lists, the number of sentences modified by the reviewers and those considered original (about Seara's work) was also analyzed to achieve a more uniform distribution in each list, to maintain the phonetic balance of each list as much as possible.
We needed to adjust the number of phonological words per list to obtain 100 phonological words in each of the 10 lists. For this, we reviewed the lists and adjusted the sentences that had already been indicated for modification by the first stage reviewers. In this way, the original sentences were not altered.
At the end of the review, each list presented 100 phonological words, with each one in the list corresponding to 1%, that is, the number of phonological word errors obtained by the patient in a given list will be reduced from the total score (100%) with the result being expressed as a percentage.

rd stage -Pilot study with normal hearing subjects
To assess the clinical applicability of the sentence lists, we carried out a pilot study with three young (mean age 23.33 years), normal-hearing individuals, without attention and/or memory impairments, in the Department's Laboratory of Vibration and Acoustics of The Mechanical Engineering Department at the UFSC. During the application of the lists, the individual remained with a circum-aural SennheiserHDA200 headset in an acoustic booth, and we used the Inter-acoustics model AC 40 audiometer. The 200 sentences were presented live, at a fixed intensity of 50 dBNA bilaterally, controlled by the VU meter. The same evaluator applied all the lists to the individuals on the same day, taking breaks to avoid fatigue.
Data analysis from the second and third stages was descriptive.

RESULTS
The sentences of this study were sent in the first stage through an online questionnaire to 60 reviewers, including 37 linguists, and 23 therapists. In the group of linguists (GL), 26 reviewers accessed the questionnaire, but only seven (18.91%) analyzed all the items therein. In the therapist group (GR), 21 reviewers accessed the questionnaire, but only eight (34.78%) analyzed all the items therein. The final sample generated the complete evaluations of 15 reviewers, including seven linguists and eight therapists.
For the initial training of the seven individuals in the GL, we observed that the majority were speech-language therapists (42.85%), followed by Portuguese-Spanish Graduates (28.59%), English Graduates (14.28%), and Electrical Engineering graduates (14.28%). Of these participants, most had a postgraduate degree in Linguistics (42.85%).
In the GR, the eight participants (100%) were speechlanguage therapists with an average of 19 and a half years of study and 17 and a half years of experience in the area of auditory rehabilitation. Regarding their experience in applying speech perception tests, six (75%) responded that they applied speech recognition tests.
GL participants informed, in the questionnaire, about their experience in previous acoustic analysis, with six (85.70%) reviewers having experience in acoustic analysis and one (14.30%) not responding to the question. Table 1 shows the descriptive statistics of the classifications of the 13 sentences in common analyzed by the GL and GR groups.
The predictability criterion is considered the inverse of familiarity and sentence meaning. For this criterion, a score of zero represents low predictability, while in the familiarity and sentence meaning items, the maximum score of 100 represented significant familiarity or a fully meaningful sentence.
In Table 1, we observed that the two groups produced similar responses for the familiarity and sentence meaning criteria and that the therapist group evaluated the sentences as more predictable than the linguists. Cronbach's Alpha coefficient showed that for the analysis of the 13 common sentences, the agreement of the reviewers for the parameters of predictability (0.986), familiarity, and sentence meaning (0.960) was high.
Regarding the analysis of the 187 sentences, evaluated by the subgroups, Cronbach's Alpha coefficient showed that the agreement found in the predictability criteria was 0.99 for the linguist group and 0.97 for the therapist group. The familiarity and sentence meaning parameters were 0.93 for the linguist group and 0.92 for the therapist group.
We carried out the analysis of the subgroups separately for the reviewers' answers to the online questionnaire. Table 2 shows the descriptive analysis of the evaluations for the 47 sentences considered by each subgroup against each criterion.
In Table 2, we observed that the GL1 subgroup attributed higher scores to the sentences for familiarity. In the other subgroups, the evaluation average was similar. Also, the GL1 considered the sentences to be less predictable, while GR2 considered the sentences more predictable. There was greater divergence in the subgroup answers for the evaluation of predictability, in comparison with the familiarity and sentence meaning criteria.
From these analyses, of the 187 different sentences, we decided to modify those that had an evaluation below 69.97% for the familiarity criteria (Average 83.33 -SD 13.36), below 74.60% for sentence meaning (Mean 86.73 -SD 12.13) and 73.97% or above (Mean 52.58% + SD 21.39) for predictability. The percentage threshold that verified the need to modify the 13 sentences analyzed by all subgroups was calculated separately. We chose to modify those sentences with an evaluation below 74.77% for familiarity (Average 85.24 -SD 10.47), below 83.37% for sentence meaning (Average 90.20 -SD 06.83), and76.12% or above for (Average 56.76% + SD 19.35) for predictability. Figures 1 and 2 illustrate the number of sentences judged by the pre-established criteria of predictability, familiarity, sentence meaning, and concomitant criteria, which will need to be modified based on the criteria described above, with 1 referring to different sentences and 2 to common sentences. Figure 1 shows the analysis of the number of sentences evaluated by each subgroup of reviewers. Figure 1 shows that GR2 was the subgroup that most indicated sentences for modification based on the predictability criterion while GR1 did so based on the familiarity and sentence meaning criteria.
Of the 187 different sentences analyzed, 60 (32.00%) were modified. Of these, eight (13.33%) were modified according to the familiarity criterion, three (5.0%) according to the sentence meaning criterion, 30 (50.00%) according to the predictability criterion, 16 (26.66%) according to the familiarity and sentence meaning criteria concomitantly, one (1.66%) according to the sentence meaning and predictability criteria, and two (3.33%) according to the three criteria concomitantly.
Eleven (84.6%) of the 13 sentences analyzed by all subgroups were evaluated in relation to the criteria as requiring modification, one (9.09%) according to the sentence meaning criterion, three (27.27%) according to the predictability criterion, four (36.36%) according to the familiarity and sentence meaning criteria concomitantly, one (9.09%) according to the familiarity and predictability criteria concomitantly, and two (18.18%) according to the sentence meaning and predictability criteria concomitantly.
In the second stage, the reviewers identified 28 more sentences to modify according to parameters indicated in the literature, of which three were non-affirmative sentences, five presented    excessive abstraction or inadequate semantics, seven contained more than eight phonological words and 13 contained proper names. The number of phonological words per list was also analyzed at this stage. Only list 7 had 100 phonological words. Lists 1 and 5 exceeded the recommended number of phonological words per list. Lists 2, 3, 4, 6, and 8 to 10 ranged from 89 to 98 words per list. The standardization of the number of phonological words per list to 100 was carried out in the sentences that the first stage reviewers recommended for modification.
Chart 1 shows the complete list with the final 200 sentences.
In the pilot study, carried out with three normal, male listeners with an average age of 23.33 years, the average sentence recognition index was 99.16%.
The first participant in the pilot study correctly answered 99.9% of the test, with only one error in List 1. The second participant correctly answered98.8%, with nine errors in List 3, one error in List 4, one error in List 5, and one error in List 9, totaling 12 errors. The third individual correctly answered99.3% of the test, with six errors in List 5 and one error in List 7, totaling seven errors. Errors made by the participants involved a lack of Chart attention in presenting the sentence, leading to non-repetition of the sentence, or word changes, or even changes of words from plural to singular and, in one case, the reduction of "para a' to 'pra' (used in colloquial speech).

DISCUSSION
Of the 200 sentences analyzed in this study, the first stage reviewers suggested modifying71 sentences (35.5%). The internal reliability coefficient was high for both different and common sentences, showing that there is an agreement between the reviewers for the evaluated criteria. This data is very important for the more reliable selection of the speech material to be produced.
Predictability was the criterion most frequently cited by the reviewers as requiring sentence modification (Figure 1), with the average score for this criterion being 52.58% for different sentences and 56.76% for sentences in common. Comparing the mean of the answers in the criteria evaluated by the reviewer subgroups, we found that there was greater variation in the answers for the predictability criterion (Table 1 and 2).
For the predictability criterion, the closer to zero the score, the less predictable the sentence. In the literature, we observed that this criterion was also used to construct other materials for speech perception assessment in normally hearing adults (24) and elderly people with and without hearing loss (16) .
Predictability implies that the keyword, normally located at the end of the sentence, is predictable due to the presence in the sentence of other words semantically linked to it. Low predictability, means that the prediction of the keyword from the context is not possible, due to an absence of other words in the sentence that are semantically linked to it (16) . Thus, the less predictable the sentence, the more reliable the speech recognition assessment will be.
A national study (16) with elderly people, with and without hearing loss, used more or less predictable sentences to assess speech recognition with silence and noise. This study revealed that elderly individuals under silent testing conditions performed better than those under noisy conditions. This study also reported that elderly individuals with greater hearing loss, indicated more benefit stemming from context support. Data from the reported study show the importance of considering the predictability item in the construction of speech perception tests.
Another study (24) carried out with normal hearing individuals, which aimed to assess speech recognition, analyzed the influence of predictability of words using sentences with low and high predictability. The study found that the punctuation differences between the two types of sentences indicate the degree to which the listener can benefit from semantic, syntactic, and prosodic information provided by the sentence context, that is, the use of these sentences indicates the extent to which a person can use the context.
The therapist group was the one that recommended the largest number of modifications based on the predictability criterion. The linguist group, on the other hand, based its modification recommendations on familiarity and sentence meaning criteria. We believe that speech-language therapists, possibly because they deal with the experience of hearing-impaired patients in their clinical practice, are more inclined to take the predictability factor into account. With the decline of auditory and cognitive functions, their patients struggle to understand the information to be memorized, thereby becoming more dependent on word intelligibility and the linguistic context of the sentence for support (25) .
On the other hand, linguists analyze from a lexical point of view, considering words as isolated elements. Given this, the degree of familiarity with each word and its meaning are the particularities most considered for the selection and control of materials by this group (26) . Therefore, we see the importance of approaching these two groups from distinct perspectives, to be able to more effectively update the sentence lists.
Familiarity is the frequency with which a certain linguistic input is heard and which input is used, that is, how much the expression is known (27) . We found that this criterion is also applied when constructing speech materials used with adults (6) and normal-hearing children (4,5) , adults using CI 11 , elderly people with and without hearing loss (16,) and in children, adolescents, and adults who are native speakers of BP (28) .
The familiarity criterion is essential in the construction of speech perception tests, especially those intended for children, since the use of words unknown by the child can generate suboptimal outcomes in the auditory recognition of speech sounds.
A study (5) that aimed to apply speech perception material in a closed setting to analyze the percentage rate of speech recognition (PRSR) in children with hearing impairments found that the familiarity and sentence meaning criteria are as important as the chosen speech stimulus.
The number of sentences that needed to be modified following the evaluation of the first stage reviewers (35.5%) to make the speech bank more reliable and applicable in clinical practice was reduced. This result shows that this speech bank, already revised in 1998, has content that, in addition to being phonetically balanced, is familiar, with clear meaning and low predictability.
Reviewer experience was an aspect that helped to update the sentences. We found that both linguists and therapists had, in general, graduate degrees in the area of interest for this research (84.21%). Therapists had an average of 17 years-experience in the area of auditory rehabilitation, including experience in applying speech perception tests. Additionally, linguists typically had more than one area of expertise, most in the areas of phonetics and phonology -which are of great importance for studies regarding the production and perception of speech sounds (29) .
Most therapist reviewers in this study use speech recognition tests in their clinical practice (75%). A descriptive study (13) carried out on these speech tests used in cochlear implant centers in Brazil found that 63% of the services evaluated apply the tests in the therapeutic context and that there is no uniformity in the assessment procedures with the use of speech perception tests.
In addition to the 71 sentences recommended for modification by the reviewers at the first stage of the study, reviewers at the second stage also recommended a further 28 sentences for modification. Considering the modifications from the first (71 sentences) and second (28 sentences) stages, a total of 99 sentences (49.50%) from the original list were modified to prepare a list of sentences aimed at evaluating the speech perception of individuals and suitable for the clinical context.
In the second stage, the proper name was the element that was most indicated for modification in the sentences. We believe that the literature recommends the exclusion of proper names in sentences to avoid regionalism, since some names may not be familiar or common, making translation into other languages difficult.
Another criterion adopted in this study was the verification of whether sentences had no more than seven phonological words. Controlling the number of words has been recommended since 1955 (22) to prevent memory access interfering with speech recognition assessment. In this study, the author recommends that sentences have no more than 12 words. A1979 (23) study recommended that sentences should not exceed seven syllables. A national study (7) that used the criteria of both studies cited considered that sentences should have four to seven phonological words.
Another study reported that in the assessment of working memory in people with hearing loss, the phonological loop is accessed. In individuals with impaired auditory sensory functions, there may be problems accessing the phonological loop and processing auditory information. To compensate for this decreased perception, patients with CI depend more on top-down processing that uses phonological/lexical access and long-term memory storage (30) .
Thus, word control is very important in sentences to avoid cognitive strain in these individuals, which could lead to mistakes in speech recognition tests.
In addition to limiting the number of phonological words per sentence, the control of the number of phonological words per list was also carried out. We observed that in another widely used speech recognition test (7) , especially for cochlear implant users, the lists are made up of 50 phonological words. In the present study, 100 phonological words per list were chosen so that they can be used both to calculate the percentage of the speech recognition index, and the speech recognition threshold. For this same reason, in this study, we decided to leave a greater number of sentences per list than the original study, which had only 10 sentences per list (20) . In the literature, we found that other more recent speech recognition tests, such as the HINT (8) , present the same number of sentences per list as this study.
In the pilot study carried out to assess the clinical viability of the lists, we found that the scores of the participants were very close to 100% correct. None of the participants reported difficulties in performing the test. We believe that individuals showed excellent speech recognition because, in addition to the lists presenting content with good meaning and familiarity, care was taken to control the number of phonological words per sentence to avoid attentional and/or memory issues, as recommended in the literature 7.21, 22,30 .
The next stage of this research will be to record sentences in the studio and use the recorded sentences in trials with normal hearing subjects and individuals with different degrees of hearing loss, seeking to standardize the speech recognition threshold under silent and noisy conditions. For that, the recorded sentences will be inserted into the perSONA software, developed by the Laboratory of Vibrations and Acoustics at UFSC, allowing the assessment of speech perception with competitive noise, and thereby reproducing complex static and dynamic acoustic contexts.
This study was of great importance as it shows the relevance of updating speech recognition tests for everyday life and contributing to the creation of reliable material for clinical practice that can help standardize the assessment of speech perception in Brazil with normal listeners and with individuals with different degrees of hearing loss.

CONCLUSION
After analyzing the data and results obtained, we consider that this study achieved its objective, that is, to adapt a list of sentences to assess speech recognition for adults who speak BP.
It was possible to update the sentences in the speech bank, with the predictability criterion being the most indicated by the reviewers in the first stage of the study. In the second stage, the exclusion of proper names and sentences that contained excessive abstraction or inadequate semantics, was the most recommended. 19

ABSTRACT
Purpose: Adapt a list of sentences for a speech intelligibility test. Methods: A speech material data base consisting of 200 phonetically balanced sentences was analyzed and partially updated. In the first stage, 60 reviewers, specialists in linguistics and speech and hearing science, analyzed the sentences in relation to the parameters of familiarity, meaning and predictability using an on-line questionnaire. Cronbach's Alpha coefficient was used to analyze the internal consistency of the questionnaire. In the second stage, the reviewers analyzed whether they were in accordance with the criteria indicated by the literature for the construction of sentences. Results: In the first stage, the responses of 15 reviewers who completed the entire questionnaire were analyzed. Agreement between reviewers was high for all criteria. 71 sentences were recommended for modification in the first stage, with predictability being the most indicated parameter as requiring change. In the second stage, 28 more sentences were selected for adjustment, with the presence of a proper name in the sentence being the most frequently cited criterion. Conclusion: It was possible to adapt a list of sentences in order to provide speech language therapists with a free of charge speech perception protocol. It is hoped that this new test can assist in standardizing assessment for normal hearing adults and individuals with hearing loss in Brazilian Portuguese.
Este estudo foi de grande importância, pois mostra a relevância de atualizar os testes de reconhecimento de fala para o cotidiano e contribuir para a criação de um material fidedigno para a prática clínica que possa auxiliar na padronização da avaliação de percepção de fala no Brasil com normo-ouvintes e com indivíduos com diferentes graus de perda auditiva.