Semantic analysis of words for the Virtual Tool for Speech Assessment

ABSTRACT Purpose: to carry out the semantic analysis of a list of words that will compose a virtual tool for speech assessment for children and adolescents. Methods: twenty-three participants, aged between 2 years old and 17 years and 11 months old, from the central region of Rio Grande do Sul, Brazil, assigned the concept of 91 words. Data analysis was performed quantitatively, considering the concept of each word as correct or incorrect. Content Validity Ratio (RVC) and Gwet’s first-order agreement coefficient (AC1) statistical calculation were calculated. Results: from the word list analyzed, 42 stimuli presented CVR = 1; 30 words obtained CVR = 0.9; 11 with CVR = 0.8; six with CVR = 0.7; two had CVR = 0.4. Gwet's AC1 statistical calculation resulted in AC1 = 0.92 [CI = 0.90 - 0.94] for semantic analysis. Conclusion: the list consisted of 91 semantically validated words that can be used to assess the speech production of children and adolescents.


INTRODUCTION
Oral language is the most widely used means of communication in modern society 1 .Speech production comes from the interaction of several brain regions; therefore, it is considered a very complex process 2 .Speech acquisition occurs gradually, with individual variations according to the target-adult and also to the linguistic community in which the child is inserted 3 .Specifically, speech sound development can be described as the acquisition of sounds (phonemes) and their organization into patterns, encompassing both phonetic (i.e., articulatory) and phonological (i.e., phonemic) development 4 .
When this progress does not occur properly, children may have Speech Sound Disorders (SSD), that is a broad term that refers to any type of difficulty encountered in this process, which involves speech perception, production and mental representations of speech sounds 5 .
Children presented with SSD may have deficits in lexical retrieval, phonological encoding, articulomotor planning, programming, and/or execution 4 .These children constitute a very heterogeneous group, with a mean prevalence of 15.26%, although values vary from 8.26 to 20.63%, depending on the age group 6,7 .Therefore, the assessment of speech production is crucial.To this end, a standardized assessment and different sampling procedures (naming and word imitation) are recommended to verify the accuracy of productions, speech errors and error patterns 4,8 .
Nowadays, there is an increasing number of studies, mainly in Neuropsychology, with the aim of building or adapting assessment instruments that correspond to the cultural and linguistic characteristics of our country 9 .However, there are few standardized and properly validated instruments in Brazilian Portuguese (BP) for assessing children's speech.And no assessments have been found that also assess speech in adolescents 10 .Thus, a virtual instrument to assess the speech production of children and adolescents who speak BP is under development, and it will be composed of different tasks such as naming, imitation (listening to a word and imitating), repetition (repeating the same word several times in a row) and diadochokinesis.
However, as part of the process of developing verbal instruments, semantic analysis of the terms selected/ used is necessary, especially when involving children, to ensure that they are part of the lexicon.Constructs should be based on concepts from the theory and transformed into items that can be measured operationally.
Once the items have been created, they should be subjected to analysis to check their comprehension and then presented to part of the target population 11 .
In the process of elaborating and validating a speech assessment instrument, specifically before the pilot study, it is necessary to select the stimuli, have these templates analyzed by expert and non-expert judges, and perform a semantic analysis.The purpose of the latter is to verify the comprehension of the tasks by the test takers 12 .In this case, the semantic analysis aims to assess whether the selected words are part of the lexicon of the instrument's target audience (children and adolescents between two and 18 years old), that is, how familiar these subjects are with the proposed words.It is noteworthy that, to compose a speech assessment instrument with verbal stimuli, it is necessary to pay attention to the order of acquisition of phonemes, the number of syllables of the words, the structure of the syllables, the tonicity, in addition to its representativeness and familiarity for the assessed.Furthermore, this age group was chosen to make a longitudinal analysis and qualitatively compare the change in responses over the years, and because the speech assessment instrument under development allows the assessment of children and adolescents in this age group.
Considering the above aspects, the aim of this study was to perform a semantic analysis of the list of words that will compose a Virtual Tool for Speech Assessment.

Participants
The sample for the analysis by expert judges was composed of 12 speech-language pathologists and doctors, whose selection was based on clinical and/or scientific experience with the research content (speech assessment), after analyzing the lattes curriculum of different professionals, from different regions of the country, who accepted the invitation to participate.For the analysis of expert judges, the anonymity of all participants is necessary to reduce the bias of authority, the risk of adopting suggestions, by assumption, by the argument of the notoriety of those who issue them 13 .Furthermore, for the semantic analysis, participants were children and adolescents, chosen for convenience, living in the countryside of a southern state of Brazil.For selection based on the inclusion and exclusion criteria, an interview was conducted with parents/guardians to verify possible complaints, difficulties, aspects of neuropsychomotor development and communication.
As inclusion criteria, subjects should: be between 2 years old and 17 years and 11 months old; have BP as their mother tongue; have oral and/or written language comprehension and expression skills consistent with what is expected for their age group, which the evaluator verified at the time of collection; and consent to participate in the research by signing the Informed Consent Form (parents/guardians) and the Assent Form (children and adolescents).
Participants who had any of these altered aspects were excluded from the research: diagnosis of a genetic disorder and/or mutation, or any complex neurobehavioral disorder; relevant socio-emotional alterations that could be affecting interaction, detected by the speech therapist at the time of the interview with the guardians and during the assessment/collection; orofacial myofunctional alterations relevant to the production of intelligible speech (for example: dentofacial alterations and/or disproportion); and complaints of hearing and/ or visual difficulties.
Participants of the research were 23 children and adolescents from 2 years old to 17 years and 11 months old.The established age range allowed for a longitudinal analysis, as it covers from preschool age to adolescence.Thus, two children from each age group and of both genders participated up to 10 years old, and, after that age, one from each age group.
The research participants' data was organized considering the variables gender, age, type of school and mother's level of education (Table 1)."clothing"; five as "means of transportation"; nine as "nature/beings"; two as "places"; two as "actions"; two as "fantasy"; and two as "symbols".The purpose of this division was to obtain a better qualitative analysis of the productions.

Semantic Analysis of Words from the Virtual Tool for Speech Assessment
Afterwards, we moved on to the semantic analysis of the words, that is, to check whether the words were part of the vocabulary of children over 2 years old (the minimum age that the test is designed to assess).For this analysis, the order of phonological acquisition was not taken into account, as the aim was to verify the content of each response, not its phonetic/phonological adequacy.
We asked the children and teenagers to explain, in their own way, what each word on the list meant.We presented the words orally and individually to each participant.We started with an example, so that the children would understand what they had to do: "What is school?", and a possible answer: "A place where we go to study and learn".Next, all the questions were as follows: "What is _______?".
Due to the COVID-19 pandemic, the recordings were made either in person, when the family indicated that they felt safe in the presence of the evaluator, or online by sending a video, or by video conference using Google Meet.The duration of each collection was approximately 20-30 minutes and, at times, it was necessary to split it into two assessment sessions due to the children's tiredness, especially the younger ones.
The entire data collection process lasted approximately 6 months (from January to June 2021).
The flowchart (Figure 1) describes the stages in the process of selecting and semantically analyzing the words to make up the Virtual Tool for Speech Assessment.

Procedures
The development of the Virtual Tool for Speech Assessment was based on the analysis of several speech assessment tools (Computer Articulation Instrument 14 ; INFONO 15 and an adaptation of the Dynamic Evaluation of Motor Speech Skills (DEMSS) for BP 16 ).To select the test items, the following linguistic criteria was considered: phonemes assessed in each position of the syllable and the word, covering all the phonemes of BP and all the basic syllable structures of BP.
12 speech and hearing therapists and specialists in the areas of language/speech and linguistics analyzed and judged the list of words that was previously chosen to compose the instrument.Initially, the list consisted of 759 words.
The judges had to select the appropriate words to make up the list, regarding familiarity.The guidelines for selecting the words were as follows: "Select at least three (3) of the best words for each phoneme, in each position in the syllable, taking into account its existence in the children's vocabulary (familiarity).Feel free to suggest words in the "Other" option." The responses of the expert judges were analyzed by the frequency of responses in the Google© Forms platform, considering the best words to be those selected by the largest number of judges.Thus, we selected words with a CVR ≥ 0.33, that is, words considered appropriate by four judges or more.
After this selection, a new list of 352 words was sent to the judges for further analysis, in order to reduce the variability of the answers, since there were many options to judge.Only nine of the 12 judges who had participated in the previous stage also participated in this one, as three of them of them did not provide their analyses.Words with a CVR ≥ 0.66 were selected for the final list.
The final list of 91 words included all the phonemes of BP, in all possible positions in the word.After selecting the words, we divided them into the following semantic fields: 11 words classified as "body parts"; 18 as "animals"; 25 as "objects"; 10 as "food"; five as in order to conceptualize the requested word.The category in which the children made gestures the most was "animals" (onomatopoeia), followed by "body parts" (pointing).Over 12 years old, none of the adolescents used gestures and/or onomatopoeia, they only described the words orally.
In the semantic fields "body parts", "means of transportation" and "nature/beings", all the children and adolescents were able to conceptualize all the words.In the semantic field "animals", a 3-year-old child (S2) was unable to say the meaning of the words "alligator", "snake" and "tiger".In the category "objects", children aged 3 (S2) and 2 (S1) did not conceptualize the words "arrow" and "sign", respectively.As for "food", a 3-year-old child (S2) was unable to answer the meaning of the words "soft drink" and "chewing gum".The two 4-year-olds (S4 and S5) did not define the word "diaper" in the semantic field "clothing".A 4-year-old child (S4) was unable to conceptualize "house" and "explosion".Also, a 5-year-old child (S6) could not define the words "dragon" and "witch".Regarding the category "symbols", children aged 2 (S1), 3 (S2), 4 (S4) and 9 (S15) did not conceptualize the words "cross" and "zero" either.
Chart 1 describes some examples of meanings given by the children and adolescents who took part in the study, considering semantic fields and age group, respectively (Table 1).

Data analysis
Data analysis was quantitative and qualitative, considering: (0) when the child/adolescent didn't know the meaning of the word or answered incorrectly; (1) when the child/adolescent said the correct concept.At this stage, two speech therapists (PhD students with experience in speech assessment and psychometrics) gave the scores separately, and then, compared the results.Another speech therapist with the same qualifications analyzed the scores and classifications that differed, to break the tie.In addition, we carried out a qualitative analysis of the participants' speech production, comparing the production of each concept in relation to the age of each child.
We analyzed the data using Gwet's first-order agreement coefficient (AC1) and calculated the Content Validity Ratio (CVR) per word analyzed by the judges.

RESULTS
We collected and analyzed the concepts from the word list.Considering the number of stimuli (91 words) and the number of participants (23 children and adolescents), we analyzed 2,093 productions in terms of the conceptualization of the stimulus words.
Qualitatively, we can see differences in the conceptualization by children and adolescents as they get older.Many children between 2 and 4 years old used gestures, onomatopoeia or pointed to something

Figure 1. Flowchart of the word selection process
It is possible to observe a change in the complexity of defining meaning as age increases, in all semantic fields.However, there was a decrease in the number of details mentioned in the descriptions of words as the age group increased.

Chart 1. Examples of provided meanings by age group
Animals "it's an animal that makes woof woof woof" (dog) "it's a thing that has fur, has ears and makes 'woof woof' and sticks its tongue out" (dog) "it's an animal that is a human being's best friend" (dog) "it's an animal that can be of different sizes, it has hair, it can be of different colors and breeds" (dog) Objects "to write" (pencil) "it's for writing and drawing and you can erase it, but not a pen" (pencil) "an object from school supplies that we use to write on papers" (pencil) "an object you use to write or paint" (pencil) Food "it's white inside and the peel is yellow" (pineapple) "it's a yellow thing, you use it to eat and it has a pointy skin, it has thorns and the leaves is very big" (pineapple) "it's a fruit that looks like it has a crown" (pineapple) "it's a fruit that has thorns on the outside, usually green, yellow inside with holes" (pineapple) Clothing "when it's sunny we wear it" (hat) "it's something we put on top of our heads to protect us from the sun" (hat) "we use it to put on our heads and not get sunburned" (hat) "object worn on the head, it can be as an accessory, to protect from the sun" (hat)

Means of Transportation
"we ride in it to go anywhere" (car) "it's a kind of object that we use to get around" (car) "it's where we ride to travel, an automobile, a vehicle" (car) "it's a means of transportation, it can be different colors, made of iron, it's used to transport people and things" (car) Nature/beings "it's yellow and makes everything yellow and you can't look at it" (sun) "it lights up our day, it can burn and feel hot" (sun) "it's a round thing in the sky, it shines and people can go blind, it comes during the day" (sun) "star located at the center of the solar system" (sun) Places "for us to come in and play, to make food, there's a roof, a ceiling and a door and a wall, and light" (house) "we can live inside it so that the rain doesn't come and have food" "it's a place where we live" ( "structure that can be made of concrete or wood, usually people live inside" (house) Actions "swooooosh, wind comes out" (blow) "it's the air that comes and pushes the paper and the paper flies" (blow) "it's what comes out of the mouth, pull in the air and blow" (blow) "to draw air out of the mouth" (blow) Fantasy "uses a broom to fly, wears a hat, very big nose" (witch) "it's an evil thing, a scary person" (witch) "it's a horrifying thing, it's an old lady with a big nose and a wart, old and ugly, hunchbacked, purple or black and it's scary" ( "an urban legend, usually illustrated by a woman wearing a hat and carrying a flying broom" Symbols "it's a letter, to draw" (zero) "it's a number that is the first" "it's a number that refers to nothing" (zero) "numeric sign represented by the digit zero" (zero) The words with CVR = 0.7 and 0.4 remained on the list of words for the instrument, since there were no other words on the list to assess the phonemes /z/ in the initial onset; /pl/ in initial and medial onset; /fl/ in the initial onset; /m/ in the medial coda; and /kr/ in the initial onset.

DISCUSSION
The aim of this study was to carry out a semantic analysis of the list of words to compose a Virtual Tool for Speech Assessment.For this assessment, the target words that the children produce need to be within their vocabulary, especially when it comes to spontaneous naming 9 .
In general, the analysis showed that most of the items meet the criteria of representativeness and familiarity 17 .It is possible to hypothesize that the items that received the most zero (0) scores are complex items that the child may know but cannot define.Therefore, it is always important to check the familiarity and frequency of these stimuli in children's daily lives.A study indicates that there is a significant interaction between the familiarity of words and the ease of describing their meaning, that is, children remembered familiar words better than those from unfamiliar categories 18 It is possible to observe that the children who used gestures, especially those between 2 and 4 years old, pointed to parts of their own bodies, such as the nose, eye, foot, hair, tooth, navel, and made gestures to show how to use everyday objects such as knives, scissors, brushes, pencils, etc.In the age group of 2 to 7, in 38 productions, the children used gestures combined with onomatopoeia, mainly for conceptualizing animals.In addition to gestures and onomatopoeias, children between 2 and 4 years old use concurrent sentences.
The use of gestures is related to the acquisition of various semantic categories, such as social terms [19][20][21] .This category is defined by names of people, onomatopoeic expressions and words related to routine situations, and consists of a large proportion of children's vocabulary 17 .Other studies have also shown that children's performance on item naming decreases as they get younger 9,22 .Children's ability to efficiently process linguistic input, such as recognizing words quickly and understanding their meaning, has been strongly associated with their simultaneous knowledge of vocabulary 23 .Some words were more difficult for young children to conceptualize.According to the literature, difficulty in understanding the items should not be a complicating factor in individuals' responses 12 .Thus, the literature provides ten criteria to follow in order to create the items appropriately: 1) behavioral criterion -the item must express a behavior; 2) objectivity -ease in identifying the answer; 3) simplicity -express a single idea; 4) clarity -be understandable by all strata of the target population; 5) relevance -assess the construct in question; 6) precision -each item has its own defined position in the construct, and is different from the others; 7) variety -vary the language used and the way the items are formulated, such as half in the affirmative and half in the negative; 8) modality -not using expressions such as "very" and "excellent"; 9) typicality -phrases with expressions typical of the attribute; 10) credibility (face validity) -the item should not seem purposeless or inappropriate to the age group for which it is intended 24 .
Considering the last item, regarding the purpose for each age, the words will be reorganized for the different age groups, without losing the purpose, which is to assess speech through naming and imitation.In addition, it is important to note that only one word on the list is from the word class "verbs", since the recognition and correct production of nouns is more favorable than words belonging to other classes, as they are better to visually represent 25 .
Semantic analysis and word familiarity test are essential to compose a speech assessment tool for children of different ages; otherwise, target words may not be elicited because they do not belong to children's vocabulary (unfamiliar) 9 .Thus, it is possible to say that this study provided a list of words that were chosen and analyzed by expert judges and validated semantically, thus making it possible to use them to check the speech production of children and adolescents, both by speech therapists and other professionals.
The limitations of this study were the small number of participants and the fact that they were only from the countryside of one southern state.In addition, because the sample was chosen for convenience, the socioeconomic level was limited.Therefore, a large portion of the sample had mothers with a high school or college degree.We suggest carrying out studies with a more significant sample, including children and adolescents from different regions of Brazil and from different socioeconomic levels, in order to better compare the data from this study.

CONCLUSION
The word list consisted of 91 items, which will be divided for children under 4:11 and over 5:0, taking into account the difficulties observed in this study and phonological acquisition of BP.Children and adolescents from the central region of Rio Grande do Sul analyzed the words semantically and they proved to be suitable for assessing the speech production of this population.Thus, this study resulted in a list of semantically validated stimuli for analyzing the speech production of children and adolescents.
finger" (finger) "we have it in our hands and we use it to write and to hold" (finger) "it's what we have in our hands, what we use to grab things" (finger)

Table 1 .
Sociodemographic data of the research participants