Reliability, Validity and Standardization of the Reading Test: Sentence Comprehension

Porto, Portugal ABSTRACT – The test called ‘ Reading Test: Sentence Comprehension (TELCS)’ has been validated and standardized. Participants ( N = 1289, 2 nd to 5 th grade, 7 to 11-years-old) were stratified in 15 state-schools in Brazil. The TELCS demonstrated reliability and validity to classify reading performance by both school grade and chronological age. Correlations between the TELCS and a General Reading Composite score were high, as were those with reading accuracy rates of word and pseudoword. Cluster analysis suggested a five-class solution: reading disability, below, average, above, and high reading performance. For individual or collective use, TELCS can quickly screen the sentence reading ability, useful to identify those who might need

The Reading Test: Sentence Comprehension [Teste de Leitura: Compreensão de Sentenças (TELCS)] evaluates the reading comprehension ability. The TELCS is an adaptation of Lobrot's Lecture 3 reading test (L3) (Lobrot, 1967(Lobrot, , 1980 to the Brazilian Portuguese language and socio-cultural context by . The test demands efficient phonological and lexical word recognition, knowledge of individual word meaning, and executive functions, mainly working memory (Medina et al., 2018). The TELCS has been listed as a scientifically based tool to assess the reading comprehension capacity of children (Salles & Paula, 2016).

Background of the TELCS
The L3 test is used to evaluate the silent reading ability of French-speaking children and is part of the writing and reading ORLEC [orthographe (OR) and lecture (LEC)] battery proposed by Lobrot (1967Lobrot ( , 1980. It consists of 36 incomplete sentences, followed by a choice of five words to complete each sentence. Only one of the five words is the correct answer (target word). The remainders are incorrect alternatives (distractors) and relate to the target word through visual, phonological, or semantic proximity or distance. The sentences are presented in an order of increasing difficulty (number of letters and syntactic complexity), with a time limit of five minutes to answer the test.
Since its creation, the ORLEC has been validated and updated norms are available (Génard et al., 1998;Mousty & Leybaert, 1999;Piérart & Grégoire, 2004). Mousty and Leybaert evaluated 217 monolingual French-speaking children in the 2 nd and 4 th school year in Belgium. The L3 test demonstrated good sensitivity for these grades, as no floor effect was observed in the second year (only 10% completed less than 5 items correctly) nor a ceiling effect in the fourth year (only 10% of children completed more than 30 items correctly). Piérart and Grégoire tested 2989 French-speaking Belgian elementary school children from the 3 rd to 6 th grade, provided updated norms for the L3, and demonstrated its high consistency and good reliability. As gender differences in scores in the 3 rd and in the 5 th grades were found, specific standardized and percentile norms for boys and girls were generated.
The TECLE (Carrillo & Marín, 2009;Marín & Carrillo, 1999) has been used since 1997 to screen for delayed reading in Castilian-speaking children. Although it has many similarities to the L3, it possesses a larger number of items (64 against 36), fewer alternative words for completing the sentence (four against five), and at least one pseudoword as a distractor for each item (in L3 all distractors are words). The TIL (Sucena & Castro, 2010) is more similar to the L3 than the TECLE, with the same number of items as the original French version and a similar structure. Normative data (score to Percentile) were provided to a sample of 614 children (8 to 11 years-old) and of 185 college students (18 to 48 years-old) (Fernandes et al., 2017;Sucena & Castro, 2010).

Adaptation of the L3 Test to Brazilian Portuguese
Reasons for departing mainly from the original French L3 test for use in Brazil, rather than from its European Portuguese adaptation, included the following issues: 1) the difference between the European and Brazilian Portuguese language as far as syntax and vocabulary are concerned; 2) lack of details in the available literature about the TIL in regard to the control of variables in its adaptation procedure; and 3) the fact that utilization of the original French version as the main reference would facilitate comparison between the three versions (L3, TIL, and TELCS).
The TELCS was adapted to Brazilian Portuguese from the original L3 test by , who provided evidence of content validity by using the following steps. First, the sentences and the target words were translated from French to Brazilian Portuguese by two independent psychologists, proficient in both languages and knowledgeable about the test content. A conceptual rather than a strictly literal translation was done, taking into account the Brazilian cultural-linguistic context. Second, the distractors (incorrect alternatives) of the L3 Test were classified by their visual, phonological, or semantic proximity or distance to the target word, sentence, and other distractors; this classification was necessary because no detailed information was available in the published materials of the L3. Additionally, to prevent a given alternative guiding the response due to its greater familiarity, the selection of the Brazilian Portuguese distractors took into account the 'frequency of occurrence of words' using the Word Frequency Count in Written Brazilian Portuguese (Pinheiro, 2015).
For comparison purposes, these first two steps also took into consideration the TIL Test (the European Portuguese adaptation of the L3). The proximity between this version and the Brazilian was maintained as much as possible. In the third step, a blind reverse translation procedure, in which the translators -a Brazilian French teacher (also a psychologist) and a native French speaker highly proficient in Portuguese -had no access to the original version of the L3, confirmed the content equivalence.
In the fourth step, the level of proximity between the TELCS and L3 was ensured by maintaining in the Brazilian adaptation 26 sentences with the same meaning expressed in the L3's sentences, as well as the L3's length in number of letters (3598 letters in the L3 and 3284 in the TELCS). As for the proximity between TIL and L3, the Portuguese adaptation has 22 sentences with the same meaning as the L3 and it is shorter in length (3118 letters). The alteration of meaning in the remaining 14 items of TELCS in relation to the French original version varied from minor (n = 8) to major (n = 6) and in both cases the changes were due to ethical reasons (e.g. items with violent content) and to a search for precision or contextual adjustment (e.g. 'horse running' was changed to 'car race', as Brazilian children do not encounter horseracing).
The absence of Brazilian tests of reading comprehension poses problems to new test development in this area, because the lack of gold standards limits concurrent validation. Moreover, clinical practices face difficulty since the lack of validated tests induces intuitive and inadequate procedures that do not consider evidence-based practice that, among other purposes, recommend the integration of professional experience with scientifically proven knowledge to make the clinical exercise as objective as possible, granting efficacy and safety to evaluations and therapeutic interventions (El Dib, 2007).
Aside from the theoretical and practical relevance, establishing reliability, validity and standards for tools that measure reading comprehension of elementary school children is especially important in Brazil due to limitations of the currently available tests. To confirm that TELCS is an accurate measure of sentence reading comprehension, the present study was conducted for the following reasons: (a) to show evidence of reliability; (b) evidence of internal structure validity; (c) evidence of external validity; and (d) to provide standardized norms for 2 nd to 5 th grades and 7 to 11-years-old.

METHOD Participants
All participants provided informed consent, and the Research Ethical Committee of the Universidade Federal de Minas Gerais approved all procedures in the study (identification number CAAE: 17754514.6.0000.5149), which was conducted in full accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki, 2008) for research involving human subjects.
Fifteen State Schools were stratified and randomly selected from a list provided by the Regional Superintendent of Education of all institutions registered in Belo Horizonte city, Brazil. Participants were children (N = 1289; 48% boys; age range = 7-11 years-old; 2 nd to 5 th grade), all native speakers of Brazilian Portuguese (see Table 1). The sample was collected at the end of the school year (November) of 2015 (n = 484, Table 2) and of 2018 (n = 805). A subgroup (n = 484; 49% boys; age range = 7-11; 2 nd to 5 th grade) (see Table 2), composed by six children (three boys and three girls) randomly selected from the attendance list from each of the 82 classrooms stratified across the city, completed a cognitive test battery of six instruments that evaluate reading ability, general cognitive ability, and social behaviour. The teachers (N = 82) completed a behaviour scale for each of the participants (n = 466). As there were no inclusion or exclusion criteria, an age range was found in each school grade due to natural birthday-determined variation and to grade retention (Table 1 and Table 2).
The required sample size was estimated taking into account the following parameters: a tolerance error of ±5%; Confidence Interval of 95.0%; population proportion of 0.5; and a target population size of 157.875 children enrolled in primary education in the city of Belo Horizonte. The suggested random sample size was 384 children. In addition, the sample was increased to provide more power to the analyses, reaching 1289 participants, more than three times the required sample size.
The TELCS , printed on both sides of an A4 page, is composed of 36 isolated sentences (varying from 8 to 20 words), with the last word being always omitted. Each item is formed by five words that are displayed in a multiple-choice manner, with only one of them fitting the meaning to the sentence. The alternative words relate to each other in terms of visual similarities (e.g. number of letters, equal letters), phonological similarities (e.g. equal alliteration or rhyme), or semantic similarities (e.g. belonging to the same semantic category, such as type of profession). Other studies have provided to the TELCS additional evidence of content validity, internal structure validity (schooling effect), external validity (concurrent, criterion) (Machado & Maluf, 2019;Medina et al., 2018;Pinheiro, Vilhena & Santos, 2017;. The Word Recognition Test (WRT) and the Pseudoword Recognition Test (PWRT) (Pinheiro, 2013) evaluate orthographic and phonological processing, respectively. Both WRT and PWRT consist of 4 training and 88 isolated test items each, which must be read aloud as quickly as possible. The words vary in frequency of occurrence, with equal numbers of high and low-frequency items. In each of these levels, the words further vary in grapheme-phoneme correspondence into regular and irregular words and in length (4 to 8 letters). The pseudowords were constructed with the same orthographic structure and length of stimuli used in the word test. Cogo-Moreira et al. (2012) demonstrated that WRT and PWRT correlated highly to each other (r = .92, p < .001) and with a text-reading accuracy measure (r = .87-.92; p < .001). Schooling effects, via Tobit regressions adjusted for the clusters of 10 schools, provided evidence of internal validation for WRT (third grade β = 6.62, p < .01; fourth grade β = 10.56, p < .01) and PWRT (third grade β = 4.45, p < .001; fourth grade β = 6.77, p < .001).
The Text Reading Comprehension subtest of the PROLEC Battery (PROLEC-Text) (Capellini et al., 2012) was used to evaluate semantic processes. It consists of four short texts, and of four literal questions about each one of them. Due to the type of questions the PROLEC-Text employs to assess comprehension, it will be considered here as a measure of literal comprehension. This construct is the first and most shallow level of text comprehension, with low engaging interactions with the text, as it requires extraction of information explicitly stated in a passage (Saadatnia et al., 2016).
Raven's Coloured Progressive Matrices Test (CPM) (Angelini et al., 1999) was used to measure general cognitive ability through the evaluation of analogical reasoning, which is the ability to infer relations between objects or elements (Pasquali et al., 2002). The test consists of 36 items, divided into three groups of 12 items (A, AB, B) organized with increasing difficulty. The task is to complete a figure at the top of a sheet with one of the six options printed below, which involves understanding that the images are characterized by their differences, similarities, identity, change, symmetry, and orientation.
Discriminant validity was assessed by the singlesided Brazilian version of the Strengths and Difficulties Questionnaire (SDQ) (Fleitlich, et al., 2000), a behavioural screening covering the age range 4 to 16, to be answered by the teachers. The instrument has 25 items divided into five scales: prosocial behaviour (empathy/positive relations), emotional symptoms (anxiety/mood), conduct problems (aggression/delinquency), hyperactivity/inattention, and peer relationship problems (withdrawn/social problems). The instrument has adequate indices of reliability and validity in 21 countries, including Brazil (Saur & Loureiro, 2012).

Procedures
Apart from the SDQ answered by the teachers during a period of one week, all instruments were administrated on the same day, in two sessions, , each lasting an average of 15 minutes. Whereas in the first session, groups of up to 10 children were assigned both the TELCS and CPM, in the second, each child was individually presented with both the WRT and PWRT (administrated in sequence, but in random order), followed by the PROLEC-Text. All instruments were administered by a professional psychologist and six undergraduate students of Psychology.
The TELCS was administered with a training phase composed of four items, with the first two answered collectively after being read aloud by the researcher and the other two items individually, via silent reading. The remaining 36 items were also read in silence by each child, however, as quickly as possible within a maximum of five minutes and with no assistance granted. The scoring of the test consisted of one point for each correct answer and zero for the incorrect or omitted ones (maximum: 36 points).
In both the WRT and PWRT, participants were asked to read aloud each item of each test card (printed on an A4 page, Arial font, size 14), starting from the first row from left to right. The reading time and errors were registered. Time to read WRT ranged from 48 seconds to six minutes (average of 127 seconds) and PWRT from 53 seconds to nine minutes (average of 175 seconds). On both instruments, two measures were used: accuracy, which is the total number of correctly read words or pseudowords, and accuracy rate, which is the total number of correct words or pseudowords read per minute.
Regarding the PROLEC-Text, the stories were administrated in a fixed order, after the following statement: 'I will display a small text for you to read. Read it carefully because after you finish I will ask you some questions about it.' The participant was asked to read each story quietly, without time limit, and to respond orally to open questions (also made orally), immediately after reading each text. No rereading was allowed.
The CPM was individually administrated to 2 nd year participants and the collective form was used for children from grades 3 to 5. It was presented as a puzzle game: the first two items were introduced collectively and explicitly, with subsequent items answered without assistance. There was no time limit. No child spent more than 12 minutes to complete the test.

Data Analysis
All analyses were performed using SPSS version 21.0 (IBM, Chicago, Illinois). No outliers were detected using the outlier labeling rule with a g value of 2.2. Test reliability was analysed by Cronbach's alpha, Spearman-Brown split-half coefficient and Test-Retest (internal consistency indexes). Internal validity (schooling and age effect) was assessed by hierarchical two-step cluster analysis and by univariate analysis (ANOVA), corrected for family-wise error with Bonferroni. For the investigation of the TELCS' internal validity and population distribution scores (Figures 1 and  Figure 2), skewness and kurtosis values were divided by the respective standard error, using a criterion of significance higher than 1.96 (Cramer & Howitt, 2004).
External validity (relations between the TELCS and all reading measures, general cognitive ability, behaviour, and demographic variables) was assessed by Pearson Correlations. Since the TELCS evaluates reading competence as a whole, a dimension reduction by principal components analysis (Carreira-Perpiñán, 1997) was performed to incorporate three reading measures (PROLEC-Text and accuracy rates of the WRT and PWRT) to create a robust reading variable, the General Reading Composite.
In order to classify a given participant according to his or her reading ability, several statistically distinct latent groups were identified in the sample via hierarchical two-step cluster analysis. This method assumes that the distance between two clusters is equivalent to the decrease in log-likelihood function as a result of merging. The Bayesian information criterion (BIC) was established to compare the number of latent classes, in which small values correspond to a better fit. Statistical significance was set at p < .05.

Evidence of External Validity
For concurrent external validity of the TELCS, a bivariate correlation showed a significant p-value of .001 for all reading and general cognitive ability tests (Table 3). There were strong correlations with accuracy rates of Word and Pseudoword Recognition Test (r = .84 and .79, respectively) and with the General Reading Composite (r = .84); moderate correlations with the untimed accuracy measures of the Word and Pseudoword Recognition Test (r = .55 and .57), with the PROLEC-Text (r = .58), and with general cognitive ability (r = .50). In contrast, the external discriminant validity of  (Figure 1.a) and for each age group (Figure 1.b-f), fitted with an expected normal distribution curve.  the TELCS was provided by a mild correlation found with psychiatric behaviours (r = -.34). As seen in Table 3, the TELCS was the variable with the highest correlation with school grade and age, even in comparison with CMP, which has a strong age correspondence due to neural maturation. The analysis of variance showed no difference between genders for the TELCS (F (1, 1288) = 1.32 , p = .25). Table 4 presents the norms for the TELCS' standardization study (N = 1289 participants), with raw scores, corresponding percentiles, reading performance classification, and descriptive statistics. Distinct groups delineated by reading performance in the standardization were supported by a two-step cluster analysis, which suggested a five-class solution: reading disability; low; average; above average; and high performance. An univariate analysis of variance (ANOVA), with a Bonferroni correction, confirmed significant differences in scores for all five classes (p < .001).

DISCUSSION
The purpose of this study was to provide evidence of reliability, internal and external validity, as well as standardized norms for the Reading Test: Sentence Comprehension [Teste de Leitura: Compreensão de Sentenças]. Evidence of content validity was provided by , with detailed description of the operational and constitutive definitions of the items.

Evidence of Reliability
The TELCS presented strong reliability indices (α = .95; ρ = .97), which were very close to those found for the original L3 (α = .94; ρ = .98) (Piérart & Grégoire, 2004). Another evidence of reliability was provided by Test-Retest measure taken in 2015 and 2018, that demonstrated the stability of TELCS' mean scores between conditions, partially due to the representative stratified random sampling.

Evidence of Internal Validity
Regarding validity evidence-based on the internal structure, the instrument significantly distinguishes readers both by school grade (2 nd < 3 rd < 4 th < 5 th grade) and by chronological age (7 < 8 < 9 < 10 < 11 years-old). The differences were considered significant, as the data presented non-overlapping confidence intervals. Machado and Maluf (2019)  The results demonstrated the standard normal distribution of the data. The significant levels of skewness shown in the 2 nd and 5 th grades were due to floor and ceiling effects, respectively. An alarming result was the identification of children who performed poorly within the sample. It was found that 31% of the Brazilian 2 nd graders completed less than 5 items correctly, a much worse performance than the 10% reported by Mousty and Leybaert (1999) for the same level in Belgium. On the other hand, 11.5% of the current sample of 4 th graders completed more than 30 items correctly, similar to the 10% found by Mousty and Leybaert in the fourth school year. However, 24.0% in the 5 th grade completed more than 30 items correctly, evidence of ceiling effect. In the future, a reduction of the TELCS's examination time from five to four minutes (or less) should be tested to control its present performance overestimation for 5 th graders.

Evidence of External Validity
Evidence based on relationships with external variables includes concurrent, discriminant and criterion validity. For concurrent external validity, as expected, the TELCS presented moderate to strong correlations with all the comparison reading instruments. The strongest correlation (r = .84) was with the accuracy rate for word recognition and the General Reading Composite.  found a moderate correlation (r = .65) between TELCS and the Scale of Evaluation of Reading Competence by the Teacher (EACOL), an indirect assessment of reading aloud (speed and accuracy in word recognition, prosody, and comprehension) and silent reading (text comprehension and synthesis) of schoolchildren. D'Hondt and Leybaert (2003) also reported a significant correlation with the L3 using a timed lexical decision task (r = .65, p < .001). These results agree with the verbal efficiency theory, which states that poor word representations and slow decoding processes (mapping orthographic to phonological representations) consume resources in working memory that would otherwise be dedicated to high-level comprehension processes (Hamilton et al., 2016;Perfetti, 1985). Quick word recognition is especially important in the TELCS, as working memory is already being engaged to complete sentences by selecting a target word among five options.
As Brazil has no gold standard sentence reading comprehension test that could be used to establish the concurrent external validity of the TELCS, it was necessary to create, via a dimension reduction technique, a General Reading Composite score to integrate the three reading measures employed in the present study (WRT accuracy rate; PWRT accuracy rate; and PROLEC-Text) into a single and comprehensive variable. The high correlation between TELCS and General Reading Composite (r = .84) can be considered the most important result of the current study, providing evidence of concurrent external validity.
Nonverbal intelligence has consistently been demonstrated to be a mild-moderate predictor of reading comprehension in children (Kershaw & Schatschneider, 2012;Stanovich et al., 1984). High correlations between the CPM and reading are not expected because this would tend to eliminate the causal influence of reading upon reading ability (Carver, 1990). In light of the referred verbal efficiency theory (Perfetti, 1985), intelligence mainly influences tests that evaluate reading efficiency, a combination of accuracy and rate measures. Carver found moderate correlations between the CPM and a reading efficiency test in 2-12 graders (range = .36 to .68, mean = .49). These values agree with the correlations found in the current study, as general cognitive ability played a moderate role in TELCS scores (r = .50).
Further evidence of concurrent external validity was provided by Medina et al. (2018) that demonstrated, in a sample of 20 children with and without diagnosis of dyslexia, that the TELCS had a strong correlation with the reading measure (Test for School Achievement, r = .93), strong correlation with phonological awareness (Phonemic Awareness Tasks, r = .79), and moderate to strong correlation with different components of executive functions, such as Cognitive Flexibility (Test of Tracks A: r = .51; Tracks B: r = .74), Working Memory (Working Memory Tasks: r = .82; Visuospatial Task: r = .58; Digits: r = .75; Phonological Span of Nonwords: r = .74), Inhibitory Control (Attention by Cancelling Test: r = .49; Task Go/No Go: r = .55 to .58), and with Verbal Fluency (r = .59).
Concerning the psychiatric aspects, signs of behavioural problems, indexed by the SDQ, showed a negative effect on TELCS scores. This agrees with Kristoffersen et al. (2014) who reasoned that a negative association between indicators of externalizing behaviour and school outcomes can be expected. In contrast, prosocial behaviour, also measured by the SDQ, was beneficial to the reading performance of the participants. The weak correlations between reading and these negative/positive social behaviours provide evidence of discriminant validity for the TELCS, as they measure different framework constructs.
Criterion validity would be provided if TELCS could identify children with dyslexia. One possibility is to compare a group of children with such diagnosis to a group of children without complaints of reading and writing difficulties. Medina et al. (2018) verified that the group of dyslexic children (Mean = 1.4, SD = 0.83) had a significant poorer performance in TELCS, when compared to the control group without reading difficulty (Mean = 21.8; SD = 2.15) (Mann-Whitney = 0.0, p < 0.001). Likewise, Medina and Guimarães (2019) demonstrated that the two groups of dyslexic children had the same score in TELCS (p = .44), and that both dyslexic groups had poorer scores than two control groups composed by good readers matched for age and younger (p > .05). Medina and Guimarães also verified that the TELCS was able to detect the development of reading (higher means) throughout the school year (p < .05) for the dyslexic children, probably due to the advancement in decoding transferred to the sentence comprehension. These results offer evidence that TELCS has criterion validity, being able to discriminate the diagnostic group, as dyslexic groups showed poorer performance in TELCS, with lower scores.

Standardization study
For the standardization study, the TELCS' norms were split into school grades and chronological age, as years of study reflect formal schooling stimulation, and age is an indicator of neural maturation. The large random stratified sample size (N = 1289) can be regarded as one of the strengths of the study. However, it is important to highlight that it is a regional sample and may not represent the performance of Portuguese-speaking readers across the country. No difference between genders was found in the current study (p = .25). Using the TELCS, Machado and Maluf (2019) also did not find gender difference in the 2 nd (p = .48), 3 rd (p = .92), and 4 th grades (p = .75). Thus, the norms of the present study were not split into male and female as was done in both Piérart and Grégoire (2004) and Sucena and Castro's (2010) standardization studies. A reanalysis of Sucena and Castro's data suggested that the gender difference reported was, in fact, concentrated only in the 4 th grade (F (1, 123) = 2.78, p = .006, Cohen's d = .50) and not in all grades.
The large number of items in the test (N = 36) permitted a clear differentiation and classification of participants' reading performance (reading disability, low, average, above average, and high performance), with a standard normal distribution curve. Few participants in the entire sample (1.6%) could finish the test in five minutes with 100% accuracy, which may be an indication of its adequate length. In summary, from the results obtained, it can be asserted that the TELCS is reliable for the assessment of different levels of reading ability in children from the 2 nd to the 4 th grade. The ceiling effect found in the 5 th grade reveals that the TELCS has limitations in discriminating the reading performance of children at advanced levels of schooling.
TELCS grade and age scores were divided according to different parameters of performance, allowing researchers and clinicians to select a lenient or conservative cut-off score according to their purposes (cut-offs at 25 th , 15 th , 10 th , and 7 th Percentiles). This range of parameters was based on literature. Génard et al. (1998), demonstrated that 69 out of 75 dyslexic children scored in the lowest quartile on the L3, and thus considered the 25 th percentile to be a good predictor of reading disability, especially for research purposes (higher sensitivity). Rousselle and Noël (2007) asserted that the choice of the 15 th percentile not only guarantees the diagnosis of reading disability, but also avoids false positives when used for clinical purposes (higher specificity). Taking a more rigorous step, the Diagnostic and Statistical Manual-5 (DSM-5) (American Psychiatric Association, 2013) recommends a cut-off score in the 7 th percentile for a Specific Learning Disorder (in this case, with the specifier for impairment in reading). However, when considering an academic skill much below average age, this manual also endorses a more lenient threshold of up to the 25 th percentile.
TELCS meets the standards for reliability and validity. The current standardization study involved a large, representative, stratified, and random sample (N = 1289). The results offered here are limited to Portuguesespeaking children attending local public schools, taking into consideration the city of Belo Horizonte. One of the implications of the gloomy picture portrayed by the figures reported in the present study is the need to develop norms not only for those children attending both public and private schools in other regions of Brazil, but also for college students, followed by the strengthening of the reliability and validity of the test to evaluate the sentence reading comprehension ability.

CONCLUSION
The TELCS has demonstrated adequate psychometric properties, with evidence of reliability, internal structure validity, content validity, external validity (concurrent, discriminant, criterion) and standardized scores for measuring sentence reading comprehension for Brazilian Elementary School children (2 nd through 5 th grades; 7 to 11 years-old). Due to the psycholinguistic controls introduced by , TELCS can be used to screen children with low to high reading performance, either for collective screening purposes or for individual clinical administrations. These features make the TELCS an important psychometrically standardized measure to assess the Criterion B for Specific Learning Disorder (specifier for impairment in reading, also referred to as Dyslexia) in the DSM-5, which requires an academic skill substantially and quantifiably below those expected for the individual's chronological age.