Morphosyntactic Evaluation Protocol (MEP): validation of content Protocolo Avaliação (PAM): validação

Hage 3  ABSTRACT Introduction: the use of language assessment instruments in the area of speech-language-therapy is essential for the diagnosis and, consequently, for therapeutic planning. In Brazil, there is a shortage of instruments constructed and validated in the morphosyntax area. Morphosyntactic Evaluation Protocol (MEP) was constructed based on the main syntactic characteristics of the period of acquisition of children’s language, on the Portuguese grammatical structure and the application in a pilot study. Objective: To verify the validity of MEP content. Methods: for the validation process, the instrument was applied and analyzed through a questionnaire by three judges, a linguist and two speech-language specialists with experience in assisting children with Language Development Disorder. The Index of Judges’ Reliability was used to compare the results of the protocol application and the Cronbach’s Alpha tests, Spearman-Brown and Content Validity Index (CVI) in the questionnaire responses. Results: the statistical tests applied in the validation of content legitimized the reliability of the instrument with indexes considered substantial for both alpha coefficients, higher than 0.80, Spearman and the CVI test had a maximum concept of 1.0. Conclusion: there were compliance and compatibility in the answers of the experts, which indicates the reliability of the instrument. The results of the statistical tests legitimize the reliability of the instrument with indexes considered substantial for alpha and Spearman coefficient. In the future, the protocol may help characterize the syntactic profile of children with


INTRODUCTION
Morphosyntactic knowledge is decisive for the child to evolve from the level of isolated words to that of utterances using grammatical rules of the language. When the child starts to establish relations between the words in a sentence and use crunches around the age of 18 months, morphosyntax is established (1) . In the clinical context, this knowledge is commonly affected in Language Disorders, whether they are primary or not (2) . Despite the importance that morphosyntax has in the development of child communication, it has been neglected in Brazilian studies because it is not possible to measure it reliably in Brazilian Portuguese. Regarding the specific verification of the morphological and syntactic structure of the language, only the MLU -Mean Length Utterance has been used as a measurement procedure in recent years (3,4) . In this context, the construction or cross-cultural adaptation of syntactic assessment instruments represents a challenge for advances in the evaluation of children's language. To overcome this challenge, the instruments must be validated for Brazilian speakers.
To contribute to the morphosyntactic evaluation of children whose language is Brazilian Portuguese, the morphosyntactic evaluation protocol (MEP) was developed by Brazilian researchers and involved two stages: first, categories were listed in the evolutionary models of morphosyntactic acquisition based on literature review. The review sought information from different bibliographic sources on the morphosyntactic evaluation of children and the grammar of the Portuguese language. The implicit rules of language use and the language acquisition phases of children between 2.6 and 5.0, age group covered by the protocol, were considered. In a second step, the protocol was applied in a pilot group consisting of both children with typical language development, as well as with primary Language Disorder, that is, not associated with a biomedical condition. This application was essential to verify whether the categories identified in the literature were compatible with the speech corpus of children with the disorder. Based on the grammatical structure of the Portuguese language and the most frequent characteristics of the morphosyntactic construction of children with language disorders, MEP initially chose eight analysis criteria to be verified on the orthographic conversation transcript between child and adult interlocutor for 20 minutes. The analysis criteria were: 1. However, for an assessment instrument created to be used reliably, it is essential to check its psychometric qualities. The importance of verifying these qualities in language assessment procedures is increasingly emphasized (5) , and one of these qualities is the validity in which it is verified whether an instrument measures precisely what it proposes to measure (6) . There are several ways to validate an instrument, one of which is content. Content validity refers to the degree to which the content of an instrument reflects the construct that is being measured and implies an evaluation with quantification of judgments, whose main objective is to evaluate and improve the criteria used in an instrument (7) .
This study aimed to verify the content validity of an instrument created by Brazilians to verify the morphosyntactic abilities of children in language development. Studies on content validation are divergent concerning the number of reviewers. The recommendation can vary from two to twenty (7) . Lynn (8) points out that the minimum number of judges can be three. Thus, the choice was made for three expert evaluators with experience in evaluating all aspects covered in the protocol. That said, we invited three experts in the language field, two speech-language therapists and a linguist who were all willing to apply to five speech samples and evaluate the instrument.

METHODS
Content validation was carried out in two stages. First, the judges received a transcript of the speech sample of five children with Language Disorder, as their speech was the ultimate goal of creating the protocol, and applied the protocol based on the guidelines provided through a "step by step" manual. After the application, the judges evaluated the instrument through a questionnaire with questions regarding the pertinence, relevance, applicability and representativeness of each criterion of the protocol. The questions about each item served to assist in the judgment and the judges' evaluation. The use of a questionnaire to obtain expert judgment is a method that minimizes bias and standardizes the requested information on the content of each item (7) .
For the statistical analysis of the questionnaire, the Likert scale (9) with four levels of support was used: A -Disagree; B -Partially disagree; C -Partially agree; D -Agree.
With data from the application of the protocol by the judges, a table was generated to which the calculation of agreement between them or the reliability index was applied. The calculation was performed using the technique in which the reliability index must be equal to or greater than 70% (10) , which recommends a fact not to be produced at random, therefore indicating reliability.
Cronbach's test (11) was applied to each judge's questionnaire response list, a statistical measure that ensures the internal consistency of a test or scale. Cronbach (11) quantified this reliability by proposing a coefficient, α, which varies from 0 to 1. If α is close to 0, then the qualified answers are not reliable, and if it is close to 1, the answers are very reliable. If α ≥ 0.8, then the answers are considered reliable.
The reliability of the instrument was also verified using the halves method, that is, the items that make up the MEP were grouped into two halves, comparing the scores obtained to these halves. Precision coefficients were estimated using the Spearman-Brown formula (12) . Table 1 represents the data obtained through the application of the protocol by the judges in a speech sample already transcribed from five children with Language Disorder. The analysis of the responses was based on eight protocol criteria.   (11) , Spearman-Brown (12) tests and Content Validity Index/CVI (6) .

DISCUSSION
Among the attributes most commonly used in the process of evaluating the psychometric properties of an instrument are validity and reliability. Validity can accurately measure the phenomenon to be studied and reliability allows to reproduce a result consistently even with different observers, representing how stable and consistent the instrument is (6) . Table 1 shows the results of the application of the protocol by the judges using the statistical test of the Reliability Index. Authors argue that trustworthiness from 90% is appropriate (13) , however, some adopt an index from 70%. The formula used for calculations in this study says that every index equal to or greater than 70% is reliable (10) . Therefore, according to table 1, all reliability indexes were satisfactory.
In the criterion that deals with the number of statements (1), an index below 90% was obtained in the analysis of one of the five children, this occurrence directly influences the Reliability test, which, in these cases, tends to have a low percentage. Thus, if a judge finds two quantities of an item and the other only one, the Index is 50%, even if the quantities are very close (11) . However, this does not imply considerably low values, since a minimum of twenty statements and a maximum of twenty-five were found, with a variation of only five. This question is much discussed and controversial since the Trust Index must not be observed only as a number, one must know about what is being collected and analyzed to then be able to interpret the presence or not of agreement between the judges (10,14) .
As for the number of ungrammatical phrases (2), it is observed that this criterion reached a percentage below 90% in the analysis of three of the five children because the second judge found a lower number of phrases of this type.
Just like 1, the criterion that checks the number of noun phrases (4) reached an index below 90% in the analysis of one of the five children, because it had a minimum of 10 simple periods and a maximum of 12 with a variation of only two periods, and this is also a reason for the decrease in the index.
Regarding the number of compound periods (6), a percentage below 90% was obtained in the analysis of only one of the five children, this also happened due to the low amount of deviation from the nominal agreement. Just like 6, the criterion that assesses the number of nominal agreement errors (7), a percentage below 90% was obtained in the analysis of two of the five children, due to the low amount of verbal agreement deviation found in children with Language Disorder. Criteria dealing with low values directly influence the test of the Reliability Index (10,11) . In summary, with the sum between the values, it was possible to apprehend that most of the criteria reached percentage values above 70% and this indicates that they are within the allowed variation of agreement between judges for reaching the minimum for accepting an item as pertinent.
The filing of the protocol criteria also allows for its improvement, in this sense, after the analyzes, but one criterion was added to the others, constituting the number 9: verification of the number of words in the statement. This criterion was added because it is pointed out as an important measure in the context of altered language development (3,5) , as children with language disorders produce sentences with fewer words (2) .
The statistical results described in table 2 show the acceptability, adequacy and relevance of each protocol criterion. There were conformity and compatibility in the responses of the specialists, which reveals that the MEP has reliability. The results of the statistical tests legitimize the reliability with indexes considered substantial both for the Alpha coefficient and for the Spearman. The CVI had a maximum concept of 1.0, this is because all the judges' responses were categorized according to the two largest Likert scales (partially agree and agree), which proves the effectiveness of the instrument's criteria.
Determining how rigorously the aspects of reliability and validity are addressed in a study is essential to guarantee the quality of an instrument, which helps the researcher to decide whether or not to apply the results in his clinical practice (15) .
Finally, it is worth remembering that reliability and validity are not fixed properties and, therefore, change according to the circumstances, population and purpose of the study. Measuring instruments unify clinical practice and research in different areas of knowledge, thus, assessing their quality is essential for the choice of instruments that provide valid and reliable measures.
In the clinical field, a systematic review examining the psychometric quality of various language assessment tools available to school-aged children emphasized the importance of the psychometric quality of procedures for speech-language therapists to make evidence-based decisions about the assessments they select when assessing children's language skills. In this sense, this study contributes to this premise, which is to use instruments with psychometric quality, such as validity and reliability.
It is worth mentioning that MEP will not be available until all of its validation steps are completed. It is being applied to a sample of typical children, whose results, in the future, may serve as a parameter for the evaluation of children with a language disorder.

CONCLUSION
Regarding the content of the protocol, there was conformity and compatibility in the responses of the specialists, which reveals reliability. The results of the statistical tests legitimize the reliability with indexes considered substantial for both Alpha and Spearman coefficients. The CVI had a maximum concept of 1.0, which proves the effectiveness of the instrument's criteria.