Development of an instrument for collective assessment of fluency and comprehension of reading in school students II

Adolescent Education Educational Measurement Speech-Language Therapy ABSTRACT Purpose: The study presents the process of developing of an instrument for collective evaluation of reading fluency and comprehension of secondary elementary school students in grades 6-9 and verifies the effect of schooling on performance in the instrument. Methods: 100 students regularly enrolled in grades 6-9 in secondary elementary public schools participated in the study. The construction of the instrument involved seven steps, with the participation of two judges. The instrument was composed of narrative text appropriate for secondary elementary school students and for 10 multiple choice questions, which five were literal questions and five were inferential questions. Results: The results showed a better performance for the participants with higher schooling in fluency and in reading comprehension. The reading fluency presented positive and moderate correlations with the reading comprehension. Conclusion: The instrument is easy to apply and analyze, and can be used in clinical, educational and research context to measure the performance of students in grades


INTRODUCTION
Reading is an important skill involved in processes of knowledge acquisition, self-criticism and understanding of reality and the world. However, the act of reading is not simple and involves a complex of cognitive skills.
Reading comprehension is generally defined as the aptitude for extracting meaning from the text (1) , a process that requires integration of a variety of skills and abilities. These processes include cognitive skills such as decoding (transformation of written code into information), motivation, comprehension of spoken language, vocabulary, language skills, and also full information processing capabilities, such as working memory, reading accuracy (fluency) and rapid automatized naming (information processing speed) (2) . In addition to these variables, it is possible to include higher mental level component processes, that is, ability to extract information that is implicit in sentences and integrate new information with prior knowledge (3) , as well as monitoring of the text general meaning. Therefore, developing a reading and comprehension assessment instrument is not an easy task, since comprehension is not measured directly due to its complexity (4) .
Reading fluency is a multidimensional skill. It presents three main dimensions that favor connection with the comprehension skill: accuracy (or precision in decoding), automatic processing (automatic word recognition), and prosody (variations of fundamental frequency, duration and intensity that guarantee expressiveness and impression of attitudes towards reading). It develops gradually over the school years, allowing the reader to develop control over the processing of text surface structures so that he/she can focus on understanding deeper structures of access to meaning (5) . Reading fluency can be assessed by oral reading or silent reading. Oral reading (aloud) enables observation of the reader's reading routes, accuracy and prosody. Silent reading (visual, without using the voice) favors comprehension, which is enabled by closer dialogue between text and reader. Silent reading is faster than oral reading, because there is a process of mental recovery of the word sound and there is no need to use the vocal tract to access meaning. Nevertheless, both have the same purpose of extracting meaning from the text, involving multiple cognitive processes and at different levels of interaction (linguistic, textual and world knowledge) (6) .
National and international studies indicate that cases of school failure have increased over the years (7,8) . The results of these studies show that students who complete the early years of elementary school have poorer reading and writing performance compared with results expected for the school range and reading difficulty may have a strong relation with these findings.
Being able to measure and identify which aspects are involved in the reading skill where students present weakness is the main way to improve the intervention principles for both clinical and educational use. Such evaluations should include texts that require the general knowledge of the reader, as well as corrective criteria and clear answers, with no room for error induction (9) .
Reading comprehension can be assessed by means of online measures, obtained while the subject is reading, or offline measures, obtained after reading is completed (10) . Among online measures, reading time, lexical decision tasks, naming and recognition during reading stand out. Regarding offline measures, the literature highlights retelling, answer to open-ended and closed-ended questions, and problem solving. Online and offline measures have advantages and disadvantages (10) . The literature shows that offline measures do not hinder the reading process and are more indicative of the longer lasting representational outcome of these processes, whereas many online measures are disruptive and may lead the reader to use specific strategies for the test which are not normally used (10) . However, offline measures tend to be less informative about how certain reading processes operate and are also subject to forgetfulness or reconstructive processes at the time of testing. The literature suggests that textual comprehension can be better understood with convergent evidence obtained with different multiple measures (10) .
The retelling task after reading has been used for a long time to assess reading comprehension (4,11) . Nevertheless, as it is of individual application, its use in educational context is difficult. One of the main objectives of this type of evaluation may be to measure how much the reader remembers what he/she read, and what was understood from the text (4) . In the Text Processing Model, when reading a text, the reader should combine the meanings of words to form propositions, which, interrelated, form the microstructure of the text (12) . The microstructure is organized into a global structure called macrostructure. For the formation of the macrostructure, there is recognition of topics and their interrelationships. Together, the macrostructure and the microstructure are called textual basis, which represents the text explicit meaning (12) . In order for deeper text understanding to occur, comprehension should not be restricted to what is explicit in the text (10) . In this case, it is necessary to build a situational model (mental model of the situation described in the text). The reader's mental model can be considered an extended set of propositions, including inferences and propositions extracted from the actual text (13) . The instrument proposed here allows extraction of online and offline measures at the same time.
Multiple choice questions can be considered one of the most practical and objective evaluation techniques, since there is no interference from the reader's subjectivity. Besides, they are efficient regarding evaluation speed, since enable evaluation of a large number of individuals in a single situation, contributing to time optimization (9) . In addition, they favor control of difficulties of written production.
After defining which method will be used for comprehension evaluation (use of multiple choice questions), it is necessary to define the textual type. According to the literature, text type (narrative, descriptive, etc.), textual genre, its linguistic styles and the various text configurations can bring new issues that are only solved with reading experience (13) . On this subject, simple narratives have been more used for evaluation. The narrative structure presents a logical relation between events and actions of the characters and a macro-textual organization of each of these events (4) . Moreover, they present structural characteristics that demand the proficiency of distinct skills, since they use a variety of time markers, vocabulary and succession of events involving cause and consequence (14) .
For evaluation of fluency-related skills, in turn, reading speed is usually measured, where the amount of words read in one minute is calculated for automatic processing, and the amount of words read correctly per minute is used for accuracy. There are other ways of assessing these skills (e.g., measurement of articulation rate, utterance rate), as well as prosodic mastery, but the abovementioned ways are the most practical to obtain information objectively and accurately regarding this aspect of reading (5) .
In clinical and educational practice, there is significant number of students from 11 to 17 years old with complaints of oral language alterations, written language difficulties and disorders, but few instruments are available in the national literature to assess this population's reading comprehension (15) . In general, the instruments are designed for students up to 12 years old or for adults and the elderly. Moreover, no national collective assessment instrument has been found so far enabling assessment of comprehension and textual reading fluency at the same time.
This scarcity hampers identification of adolescents with reading issues in the school environment, as well as the clinical diagnosis and assessment of effectiveness of the interventions proposed. Thus, this study is aimed at presenting the process of developing an instrument for the collective assessment of fluency and comprehension of textual reading for secondary elementary school students, besides verifying the effect of schooling on their performance.

METHODS
This is a cross-sectional analytical observational study, with convenience sampling, approved by the Institution's Research Ethics Committee under protocol No. 1,722,230.

Participants
The research included 100 Brazilian students of both sexes, aged 11 to 15 years, regularly enrolled at two public secondary elementary schools in the municipality of Belo Horizonte, state of Minas Gerais, divided into four classes: 6 th grade (n = 32), 7 th grade (n = 24), 8 th grade (n = 26), and 9 th grade (n = 18).
The study included students who were considered by the teachers as not presenting learning difficulties in relation to academic performance. Such students signed the Informed Assent Form, and their parents/legal guardians signed the Informed Consent Form. Exclusion criteria were students with current or previous history of neurological or psychiatric disorders and oral and written language changes, reported by their guardians, as well as individuals with uncorrected visual or hearing disorders.
The process of developing the instrument also involved the participation of two judges. The judges were invited for presenting extensive experience in written language and in the development of neuropsychological tests.

Instrument and procedures
Several steps were performed for elaboration of the collective reading comprehension assessment instrument for students in 6th to 9th grade of elementary school. More specifically, it involved seven steps that will be presented below: Step 1: Initial text selection Two researchers, Language Speech Therapists, selected seven narrative texts from Brazilian Portuguese books, used in the municipal secondary elementary school of the municipality where the research was conducted.
Step 2: Text selection by a judge After the initial selection, the texts were submitted to a judge with extensive experience in the written language area, who chose two texts.

Step 3: Computational analysis of texts selected
Both texts were analyzed regarding complexity using the Coh-Metrix-Port 2.0 computational tool (16) . The tool analyzed text ambiguity, phrasal structure, semantics, syntax and length, as well as the Flesch Reading Ease Score (FRES), which measures the text complexity, its accessibility in terms of linguistic material and also the reader's facility in interacting with information made available by that text (16) . The Coh-Metrix-Port formula for the Flesch Reading Ease Score is as described in the formula below, where ASL is the average sentence length (the number of words divided by the number of sentences), and ASW is the average number of syllables per word (the number of syllables divided by the number of words): FRES = 206.835 -(1.015 x ASL) -(84.6 x ASW). The results allow identifying four ranges of reading difficulties for the Brazilian Portuguese: texts classified as very easy (index between 75-100), which would be suitable for readers at the beginning of primary elementary school students; easy texts (index between 50-75), which would be suitable for secondary elementary school students; difficult texts (index between 25-50), which would be suitable for students attending high school or university; very difficult texts (index between 0-25), which would generally be suitable for specific academic areas.
Step 4: Narrative text final selection Posteriorly, the text was analyzed by another judge with experience in elaborating neuropsychological tests. The judge was instructed as to the difficulty classification of the texts analyzed and the language used, selecting what she deemed most appropriate for the school age.
From all analyzes, the text "Por que o morcego só voa à noite" was chosen (17) , which is a fable (narrative form) and belongs to the African folktale, from the juvenile literature.

Step 5: Elaboration of questions
For the elaboration of questions, significant propositions that are explicit or implicit in the text were analyzed. The literal or inferential information was verified, as well as the causal relations between these ideas.
The elaboration of the questions was performed by two researchers, using as reference the reading comprehension model (18) , and the questions were classified as literal and inferential. Five literal and five inferential questions were elaborated. To elaborate the multiple choice items, a script was used with criteria for elaboration of objective multiple choice questions (11) . Care has been taken to avoid questions that led to obvious answers or answers easily deducible by the reader's world knowledge, that is, without having to access knowledge acquired from text reading.
Step 6: Analysis of questions by expert judge After elaboration of questions and the multiple choice items, the instrument was sent to a judge specialized in the area of reading and writing language, who judged the adequacy of the questions regarding quality, content proposed, and classification of the questions as literal and inferential. Adaptations were made according to the judge's considerations regarding rewriting of questions with two possible answers, and clarity related to utterance writing and multiple choice items, which could give rise to doubts.
Step 7: Procedure of instrument application on the sample selected.
The application was collective, per school year, and participants were initially informed that the assessment would be performed in two steps, where the first would be text silent reading, and then they should answer the questionnaire related to the text. They were instructed by an examiner with the following orientation: "Read the text very carefully. Start text silent reading and, when I ask you, stop reading and mark the word you are reading with an X. When I signal again, go back to reading from the marked word." The examiners timed 60 seconds on a digital timer and instructed the adolescents to mark the word they were reading as soon as the evaluator authorized it. After the examiners confirmed the word marking with the students, they restarted reading the text within 15 seconds after the pause. At the end of the reading task, the text was collected and the sheet with questions of text interpretation was handed. The students were instructed as follows: "Read the questions related to the text and mark the answer that you think is correct. Only one alternative will answer the question." The classes consisted of approximately 20 students and were monitored by two examiners. The average time to apply the test was 30 minutes, but participants were asked to raise their hands for an examiner to collect the evaluation sheet, and they should leave the instrument application site as soon as they completed the test.

Data analysis
The analysis was performed considering the total of correct answers and the total of correct answers in literal and inferential questions. Reading rate was also calculated, considering the number of words read per minute. Statistical analyzes of textual reading fluency (measured by the number of words read per minute) and reading comprehension (measured by questionnaire performance) performances were conducted using IBM SPSS Statistics 20.0, with a significance level of 5% (0.05). For sampling characterization, mean and standard deviation descriptive analyzes were carried out for the four groups per schooling years.
One-Way Analysis of Variance (One-Way ANOVA) was performed using Bonferroni post hoc test for multiple comparisons to verify differences in reading performance between the groups per grade level (6t h , 7 th , 8 th and 9 th grades). In a second moment, two groups were formed: Group 1 (6t h and 7 th grades) and Group 2 (8 th and 9 th grades), since there was no difference between 6t h and 7 th grades and between 8 th and 9 th grades. To compare the means of the two groups in each reading component, t-test analysis for independent samples was used.
Test power analysis and effect size were analyzed by Cohen's d calculation. To evaluate the association between performance in the tasks of reading fluency, reading comprehension and age, Pearson's correlation coefficient test was applied.

RESULTS
In step 1, analysis of Brazilian Portuguese didactic books used in a secondary elementary level of a public school in the municipality where the study was conducted resulted in the selection of seven narrative texts.
The seven texts selected were sent to a judge, who elected two of them after careful analysis.
Then, in step 3, the two texts selected were analyzed using the Coh-Metrix-Port 2.0 computational tool (16) . The tool investigated text ambiguity, phrasal structure, semantics, syntax and length, and the Flesch Reading Ease Score (FRES). FRES of the texts was rated 50-75 and considered suitable for secondary elementary school students.
In step 4, the texts were analyzed by another judge. From all analyzes, the text "Por que o morcego só voa à noite" was selected (17) , a 448-word narrative text that has FRES of 64.4, equivalent to the level "easy" to read and within the educational level of the instrument.
In step 5, analysis of propositions made it possible to identify literal and inferential information and causal relations between the ideas in the text. From these analyzes, five literal and five inferential questions were elaborated.
In step 6, the analysis of a judge was performed, as well as necessary adaptations to the multiple choice questions.
Chart 2 presents the multiple choice questions elaborated and applied.
In step 7, the descriptive analyzes of the performances in textual reading fluency (words read per minute) and reading comprehension (total of correct answers in the questionnaire, total of correct answers in literal and inferential questions) are presented in Table 1.
The data in Table 1 show increase in the number of words read per minute according as schooling progresses. The comparative analysis between school years showed that there was a statistically significant difference only between 7 th and 8 th grades in the total of correct answers in the questionnaire (p = 0.005, d = 0.60), in
No differences were detected between 6 th and 7 th grades or between 8 th and 9 th grades in any component of the reading assessment. From these results, it was decided to form two groups to analyze the data: Group 1 (6t h and 7 th grades), n = 56, and Group 2 (8 th and 9 th grades), n = 44. Table 2 presents groups' performance, the comparison of means and the effect size for reading fluency and comprehension.
The comparison between Groups 1 and 2 ( Table 2) showed that there was a significant difference in the total number of correct answers in the questionnaire, in the total number of correct answers in literal questions and in the total number of correct answers in inferential questions, with better performance for the higher educated students (Group 2). The effect size was considered strong (d = 2.52). The number of words read per minute did not differ in both groups. Table 3 shows that there was association between textual reading fluency (words read per minute) and reading comprehension (grand totals, literal and inferential questions). A positive association of moderate intensity (r = 0.302; p≤0.01) was observed between the number of words read per minute and the total of correct answers in the questionnaire. Caption: WPM: number of words read per minute; TCA: total of correct answers in the questionnaire; TLQ: total of correct answers in literal questions; TIQ: total of correct answers in inferential questions

DISCUSSION
In this study, the development of an instrument for collective assessment of reading fluency and reading comprehension of secondary elementary school students was described, as well as the results of the application of such instrument. The effect of schooling on performance in this instrument was also verified.
To elaborate the instrument proposed by this study, the same text and multiple choice questions were used for all school levels evaluated. The elaboration of the instrument involved seven steps. The text was selected in steps 1 to 4. A narrative text was selected due to the students' greater familiarity with this textual typology. The literature shows that students may find it easier to interpret a narrative text (19) , since narratives allow the reader to recreate his/her knowledge. In addition, they present chronological development, events occur in a certain order, which facilitates reading organization and comprehension (19) .
In steps 5 and 6, the questions were elaborated and the significant propositions that were explicit or implicit in the text were analyzed. Literal and inferential information and the causal relations between the ideas were verified. The analysis was performed due to the need for the instrument to contain literal and inferential questions, since the literature proposes that text comprehension occurs at three levels (19) . The first would be superficial understanding, called base text, which allows the reader to remember the text, summarize key ideas, and answer questions about the content. The second level is the integration of the information contained in the text with the reader's prior knowledge. The third level would be self-regulation, which allows the reader to identify problems occurred in the text, seeking solutions to solve them (19) .
Multiple choice questions were chosen in order to eliminate interference from the linguistic demands required in open-ended and retelling questions, for example. The literature shows that multiple choice questions allow evaluating the skills involved in comprehension, providing research on the contextual meaning of words, author's intention and access to literal and inferential information contained in the text (20) .
According to the results of the performance analysis (step 7), there was only difference between 7 th and 8 th grades regarding answers to the multiple choice questions (reading comprehension), with better performance for the 8 th grade students. No effect of schooling on textual reading fluency was observed. In an attempt to increase the sample size in each group and better analyze performance differences between schooling levels, the students were grouped into 6 th and 7 th grades (Group 1) and 8 th and 9 th grades (Group 2). The effect of schooling was maintained for answer to multiple choice questions, with medium effect (21) . The results indicate that the power of the assessment proposed is moderate for the groups studied (21) . The number of words read per minute did not differ in both groups. However, the effect size was strong, which may indicate that difference may appear between the groups if the number of participants is greater.
Multiple choice tests have many advantages, such as their objectivity, easy administration and punctuation, and the possibility of group application, which is very useful for educational practices in the school environment. However, it is important to point out that such instruments, although providing clearer and more direct analysis of the answers, are subject to the practice of frequently choosing alternatives of random answers. Another problem pointed out by the literature is the possibility of answering some questions without actually reading the passage, since there is probability of using the reader's world knowledge (22) . Bearing these issues in mind, the instrument was built to minimize such problems.
With regard to comparison between the groups, the analysis revealed increased reading fluency (assessed by the number of words per minute) with advancing education, although no statistically significant difference was found. The fact that there were no differences between the groups may point to the stabilization of textual reading fluency in adolescence. The literature shows increased reading fluency during schooling and advancing age (23) . Nevertheless, the average reading fluency of adolescents in this study is below the average observed in other studies with readers of American English, European Portuguese, and Brazilian Portuguese (5,24,25) . All the studies cited also used texts to evaluate reading fluency; however, they were different texts regarding type, ambiguity, phrasal structure, semantics, syntax, and length. Another different methodological factor was the way the assessment was carried out. In this study, adolescents were evaluated collectively through silent reading, in which the main objective was comprehension. In the referenced studies, reading was performed orally in order to evaluate reading fluency and decoding. Cultural, educational and socioeconomic factors may also have interfered with these results, given that the Human Development Index (HDI) that evaluates human development in three dimensions -income, education and health -of these three countries are quite distinct, that is, the United States ranks 10 th , with the HDI of 0.920, Portugal ranks 41 st , with the HDI of 0.843, and Brazil ranks 79 th , with the HDI of 0.744 (26) .
The study proposed by the Program for International Student Assessment (PISA, 2015) assesses the knowledge and skills of students aged 15 to 16 years in reading, math and science, contrasting with results of students' performance from other countries. PISA test assesses students' mastery of three aspects of reading: locate and retrieve information; integrate and interpret, and reflect and analyze. Among the 70 nations participating in the assessment, Brazilian students ranked 59 th in reading. The performance average showed the second fall since 2009 (27) . Due to this reality, development of validated and standardized instruments that allow teachers to monitor students' evolution throughout the year becomes one of the ways of continuous control of reading development, besides being and early indication of difficulties for referrals and guidance of individuals and groups at the risk of learning issues or educational strategies. When comparing Groups 1 and 2, it is possible to observe better performance of older students when answering literal questions and questions that required the construction of inferences, which indicates that, with increased education, there is improvement in deductive reasoning ability, since students tend to develop and improve reading processes according as they develop the techniques and practice this skill. Other studies have also shown improved reading comprehension with increased education (5,25,28) .
The results show positive correlations between reading fluency and reading comprehension with a moderate correlation magnitude. This may indicate that there is a relation between textual reading fluency and reading comprehension, as evidenced in the literature (5,29) .
Several studies show that as the reader reaches a functional level of decoding, the importance of reading fluency for reading comprehension decreases (23,30) . Another study, conducted with students from 1 st to 6 th grade (current 2 nd and 7 th grades according to the Ministry of Education, 2005), revealed that reading fluency is not the most significant factor for reading comprehension in 6 th grade, and it is more important in early grades (30) . Thus, it is relevant to consider age and educational level of the population studied when coming to a conclusion on the processes involving reading and their associations.
The assessment of reading comprehension is paramount, since it detects possible difficulties presented by adolescents, which may generate several negative consequences during school years. This instrument can assist in the assessment of this target audience by various professionals involved in reading learning (teachers, speech therapists, psychopedagogy professionals, psychologists, neuropsychologists, etc.). It can help educators regarding teaching proposals, monitoring the evolution of reading and comprehension skills and identifying possible reading difficulties, associated with assessments of other skills, and then verify the need for making specific referrals; and it can help clinicians by evaluating and detecting difficulties in order to monitor the evolution of the case, allowing the professional to elaborate and direct the interventions to be performed.
The great advantage is the possibility of collective administration of the instrument, which can be conducted in the classroom or in assistance groups, facilitating application and interpretation of results, useful in clinical and educational actions. Moreover, the instrument proposed here allows online and offline measure extraction at the same time, since reading time is monitored and multiple choice questions are presented after reading.
Due to the limitations of the study, such as a small sample and coming from only one type of school, the need for the instrument to be applied to greater number of adolescents is emphasized. Thus, it can present recognized reliable and valid measures, in order to provide the researcher, the teacher and the clinician with the possibility of detecting adolescents with comprehension issues, aiming to adequately direct the interventions that will be used and thus gather evidence that will support their scientific, clinical and scholarly reasoning. It is also suggested that the studies should be conducted with adolescents from public and private schools, with and without learning disabilities in order to validate, standardize and increase the reliability of the instrument experimentally applied in this study.

CONCLUSION
This article aimed to present the development process and the pilot study of an instrument to evaluate textual reading fluency and reading comprehension in adolescents. The analyzes carried out allowed investigating the results of the development of an instrument for collective assessment of reading comprehension of students in 6t h to 9 th grade, verifying the evolution of the performance of adolescents over the school years.
The results evidenced increased reading comprehension performance as schooling progressed. The instrument presented statistically significant results with medium effect and reading fluency showed positive and moderate correlations with reading comprehension.
The emphasis is on the need for further research that should be performed using this instrument so that to increase the number of students evaluated, make comparisons between public and private institutions, and between typical adolescents and adolescents with oral and written language disorders.