The agreement between two screening tests for language evaluation in premature and low weight children

Objectives: to evaluate the agreement of the results in two screening tests on children ́s development Denver II and Early Language Milestone Scale (ELM) aged two to three years old, born prematurely and with low weight. Methods: two screening instruments: Denver II and ELM were applied for the development in an observational cross-sectional descriptive study. The agreement between Denver II Test and its language sector and ELM were assessed by Kappa coefficient. Results: 77 children evaluated, 36.3% had an overall loss of the development performed by Denver II and 32.5% loss of the language by ELM. The agreement between the results of Denver II test considering all sectors versus ELM showed Kappa coefficient of 0.856 (p<0.001) and considering only the language sector of Denver II versus ELM, the Kappa coefficient was 0.886 (p<0.001). Conclusions: the developmental impairment observed in the children studied by assessing Denver II and through its language sector showed agreement with changes in the language abilities observed in ELM.


Introduction
The increase in the survival of preterm babies in recent decades has opened a challenge regarding to the follow-up of these children.Premature and low weight children are considered at risk for delay in the development especially in speech and language alterations.The language deviation can be observed in the functions of receptive and expressive hearing, visual, as well as the domain in grammar and vocabulary development are significantly low.The losses in the language sector can cause problems in learning how to read and write, risking the process of learning and leading to social and affective problems. 1,2creening tests on children´s development, may be an overall development or language specifics, have as a purpose to identify children who may need a more specific diagnostic evaluation in their development. 3The use of screening tools in the standard development compared to a clinical observation of isolated development it presents greater accuracy and is in favor of early intervention. 1The choice of the tools will depend on the limitations and the characteristics implemented, as well as the study population and the goals to be achieved by the professional who will use them. 4arly recognition in the changes of the development, a greater standardized and validation in our midst of evaluation tests for the development can provide better identification and inclusion of children in specific intervention and stimulation programs. 5,6he Denver II is used and recognized internationally, especially in our country and it is recommended by the Brazilian Society of Pediatrics on children´s development.It is routinely used in the children´s follow-up, it is easy to apply, the cost is low and can be quickly trained, and can be used by professionals from different fields of work who attends the children. 7,8Although it has not been validated for Brazilian children, it has been used in academic areas adjusted to our population. 5,9here is divergence when it comes to the ability to identify problems in expressive language and in the articulation mentioned in the original form of Denver II -Denver Selective Development Test. 10 Denver II differs from its predecessor by adding 20 items, most of which include expressive language and skills. 11Researches also show low specificity (a significant number of normal children classified as having delay in the development) and low sensitivity (a significant number of children with delays has not been classified) of this test. 8,11Some studies indicate failures in the predictive validity and the sensitivity of Denver II to detect cases of light delay and among the factors that restrict the validity of the test in different countries highlights culture influence in some individual items. 9n Brazil, Denver II test has been used in studies to assess the development of children including premature ones, 6,12 and those who specifically approach the language. 13In spite of the authors call the attention to what conclusions the use may bring by applying as a whole and not parts, 14 in our midst, different studies have evaluated this test to separate, either by the items, 8 by the sectors, or even from presenting results of isolated sectors assessment. 13hus it is necessary to investigate the adequacy of the items and the response pattern in Brazilian children. 9s well as Denver II, the Early Language Milestone Scale (ELM) was developed to be a screening test, more specifically to detect delays in speech and language. 15It is a quick execution instrument which presents an easy application form similar to Denver II test, which can be used in scoring points or by simply approving scores on assessed items.It allows a detailed screening in which function of the language is at risk with grouped marks in three areas: receptive hearing function, expressive hearing function and visual function.Elm is based largely on parent observing; the results are good if the parents are informed and good observers; its use ranges from zero to 36 months. 16he frequency of changes in the development by Denver II in premature children between two and three years old, showed in a previous study our service, as well as similar frequencies of language changes by ELM. 17 Considering the predominance of the language elements existing in Denver II at this age, the results obtained suggest that the frequencies could be due to the similar abilities of both tests on screening language alterations in children.The objective of this study is to evaluate the correlation between the language sector of Denver II Test and ELM in premature children at the age range of two to three years.

Methods
This is an analytical, epidemiological, prospective, cross-sectional study on premature and low weigh children with chronological age between two and three years old.
The children were screened through ICU hospitalization medical records from four hospitals in the city of Cuiabá (MT), located in the Midwest Region of Brazil.Of these hospitals, three were attended by the Unified Health System (SUS), and the fourth is a complementary health system.These four hospitals were responsible for 95% of the deliveries in the city.
In the period of April 20 th to October 20 th 2011, 84 premature (gestational age less than 37 weeks) and low weight (birth weight less than 2500g) children were located by active search at home or in a follow-up at two SUS ambulatories for preterms.
As inclusion criteria were considered: prematurity and low birth weight, births in hospitals included in the study, presenting incomplete chronological age between 2 to 3 years old and healthy at the moment of the assessment.As exclusion criteria were considered: the presence of congenital malformation and/or neurological problems that affect the expression of speech, sensory hearing changes and/or visual and sequela impairment of the central nervous system.
The objective of verifying the applicability and feasibility of the tests, a pilot study was conducted with 10 children.Data collection was performed in a room by the only examiner and the environment to apply the tests was satisfactory in all.An initial interview with the person responsible in applying the protocol of the study was consisted in reviewing the child´s booklet and subsequently the medical records on hospitalization in the neonatal period.Descriptive information from the sample was collected: birth weight, gestational age in weeks and birth condition by Apgar score.Other characteristics of the study population are found in Caldas et al. 17 For Denver II, the criteria were normality by age corresponding to those proposed in the originals of the same elaborated on a population of American children, according to cultural adaptation of the test for the local population in Cuiabá-MT. 18To assess Denver II sectors, it was used the following items intercepted by an age line constructed for each participant, according to his chronological age: -The personal-social sector: remove garment, feed doll, put on clothing, brush teeth with help, wash and dry hands, name friends, put on t-shirt, dress no help, play card games, brush teeth no help, prepare food; -The fine motor-adaptive sector: tower of cubes -four, six and eight, imitate vertical line, thumb wiggle, copy circles, draw person -three parts, copy "cross", pick longer line; -Language sector: point to pictures -two and four, combine words, name pictures -one and four, body parts -six, speech -half understandable, all understandable, know actions -two and four, know adjectives -two and three, use of objects -two and three, name colors -one and four, count one block, understands four prepositions; -The gross motor sector: kick ball forward, jump up, throw ball overhand, broad jump, balance each foot one, two and three seconds.
Each item was evaluated by a histogram whose boundaries correspond to the age of initiation and extreme for the child to develop such ability.The left limit corresponds to the 25 percentile (P25) of the age to perform the ability and the right limit of histogram corresponds to the 90 percentile (P90).
It is understood as "failed" when the child did not perform the test and the age line remained between the 25 (P25) and 75 (P75) percentiles and with caution or warning when the child failed in one of the tests that was intercepted by the age line between 75 percentile (P75) and 90 percentile (P90).The P90 is the cutoff point used in the Denver II test to define delay, in other words, when a child fails an item or test that was completely to the left of the age line.
The classification of the overall performance in the Denver II test was performed according to the number of failures, the delays and cautions, considering as: -Abnormal: when the evaluated child presents two or more delays regardless of the area or sector; -Suspect: when the evaluated child presents a delay and/or two or more cautions; -Normal: when the evaluated child did not present any delay and at most one caution.
For the analysis of this study were considered the results of the comprehensive analysis of Denver II (suspect and altered).For analysis of the language sector were considered changes on the existence of at least one delay or caution of the sector. 18s a reference for language screening, the Early Language Milestone Scale was used according to the authors´ recommendation, 19 considering the standards of normality by age of the test and the language mark grouped together in the functions of receptive hearing (AR), expressive hearing (AE) and visual (V).
The behaviors in ELM are also prepared in the histogram forms, on a single sheet, and distributed in 36 months, allowing to find within the functions the ages of initiation and the limit of the performance on each item.The graph indicates the values of 25%, 50%, 75% and 90% as representatives of the percentage of children at a certain age that reached the tested ability during the process of validation of the scale.
As in Denver II, it was constructed an age line in all the scale, corresponding to the chronological age of the child on the day of the evaluation.Then all the items that intercepted by this age line were evaluated in each of the area, AR, AE and V to determine the top and base levels.
The behaviors in ELM regarding age of 2 to 3 year old are distributed according to the following functions: -Auditory expressive: Speaks only four to six words, more than 50 words, speaks I/you, uses prepositions, conversation, name and the use of objects (glass, ball, spoon, pencil); -Auditory receptive: follows orders of two commands without gesture, points to named objects, points to objects described by the use, orders/ spatial concept; -Visual: follows orders of one command with a gesture, starts gestural games, points to desired objects (the last three items up to 18 months of age).The ELM result was considered adequate for the evaluated area when the child obtained sequentially, up to three items of success corresponding to chronological age.It was regarded as a failure or as altered, when the child did not perform a test completely at the left of the age line, in other words, above P90 and the sequence identified three more consecutive failures in the evaluated area. 20It was also considered to analyze the percentage of failed or altered items according to the functions in isolation: expressive hearing, receptive hearing and visual.
When the performance in one of the tools was altered, retests were performed in a two weeks deadline.The parents' report and direct observation or behavior assessed incidentaly were considered.
For the calculation of the sample were considered the number and proportion of children born premature and low weight in 2009 (the year of birth of the children) in Cuiabá-MT, according to the data from the Information System on Live Births (Sinasc), 411 children under these conditions were detected.Based on the average of the results obtained in studies, 21,22 the percentage of children born prematurely and low weight who showed changes in the development of speech and language was approximately 21%, a percentage that is considered for the purposes of calculating the sample.Considering a sample error of 7%, the calculated sample resulted in 75 children.
The analyzes were performed with the statistical package of SPSS, version 20.For the comparison of frequencies in the results of the tests studied, Pearson´s chi-square test was used.The agreement intensity between the results of Denver II to ELM, as well as the specific analysis of the language sector of Denver II Test and ELM was assessed by Kappa coefficient. 23Considered to be statistically significant correlated to p<0.05.
This study was approved by the Research Ethics Committee at the Hospital Universitário Júlio Müller da Universidade Federal de Mato Grosso, protocol number 967/2010 and documented on 1,141,638 on 07/30/2015.The children with changes in their performances were forwarded to specialized care.

Results
84 children born prematurely and low weight were evaluated in this study period.Of these four children were excluded: one with a diagnosis of hydrocephalus and three with cerebral palsy.The other three children did not appear to perform the retesting, they were considered as lost to follow-up, so, 77 children remained in the final sample.
The studied children were 58% males, 50% were premature infants at gestational age greater than or equal to 34 weeks, 21% had birth weight less than 1500 g and 7% the Apgar score was less than 7 in the fifth minute.
Tables 1, 2 and 3 show the altered frequencies Denver II and ELM as well as the changes of the sectors and the functions of the two tests respectively, according to the presence or absence of the changes.There was no difference between the percentages of children with altered results of Denver II compared to the results of ELM (Table 1).In Denver II the highest frequency of delays were in the language sector, compared to social personnel, fine-adaptive and gross motor skills sectors (Table 2).In relation to the functions of ELM, all children who showed changes in language by ELM presented alterations in the area of expressive hearing (32.5%), and this change was more frequent than those observed in the area of receptive hearing (18.2%).The visual function which displays items to evaluate until 18 months was normal in all children (Table 3).
Table 4 shows the agreement obtained in the studied children from the comparison in the results of Denver II with ELM and the language sector of Denver II with ELM.Both Kappa coefficients were significant and are in the range considered optimal for this score (0.81 to 0.99).

Table 1
Frequency of the alterations on the performance in Denver II in the language sector of Denver II and ELM in 77 evaluated children.

Table 3
Frequency of the alterations on the performance in ELM sectors in 77 evaluated children.

Discussion
The prevalence of alterations in both tests used was similar in the studied children.In the same way, the language changes in Denver II test predominated, similar to the percentage of children screened positively in language by ELM.The changes of expressive language were more frequent and the overall agreement between the applied tests and the evaluation of the language sector of the Denver II test, valued the presence of caution and delays comparing with ELM, showed great consistency by Kappa coefficient. 23he screening of the language delays is the most effective method to identify language disorders.Initial studies 16,22 had already discussed the similarities and characteristics between two screening methods discussed here and observed differently that the use of ELM was a tool of great sensitivity and specificity to assess children´s language development considered at risk.
A few studies compared Denver II and ELM.O'Hara et al. 24 assessed children in foster care in the age range of zero to 18 months by two methods.These authors 24 observed that 35% of the children failed the Denver II, thus, they had the highest percentage (35%) in the component of language.Using the score points by ELM, the percentage of failure, by this method, was only 8%.The differences observed in relation to the use, could be attributed to the difference of age range and the variation of ELM applied by score.
Reviews about language changes in children born prematurely up to two years of age, emphasize the altering frequencies with significant prevalence of changes in expressive language for these children.Although, it is not detectable at 12 months, these changes may arise at two years of age, ages similar to those evaluated in this study. 6,25he encounter of changes in the development by Denver II in children who were born premature, it is coincident with findings in other publications, as well as the involvement of language in the areas auditory expressive and to a lesser extent in the auditory receptive through the screening by ELM. 1,4,9ommunication disorders in receptive and expressive languages in extreme premature children has no attribution to neurological deficits or disorders, but a general decline of mental function in these patients. 4In a similar way, findings on receptive language are justified by immaturity in the skills of attention and in tasks involving duration and direction to attentional focus.In addition, changes in expressive language could be associated to biological factors of these children or to inadequate environmental stimuli. 2,26enver II differs from its predecessor by adding 20 items, most of which include expressive language and skills. 11Among the 45 language items of Denver II, 18 were in the age range of the participant and sixteen of these items were expressive language.This emphasis by the expressive language in the use of Denver II in this period probably favored the correlation observed, considering also the largest percentage of changes observed by applying ELM was given in this same language sector.
The studied patients at the exam were chronologically between 2 and 3 years old.This has not allowed them to be evaluated for visual element of the language that is embedded in the ELM scale up to 18 months of age.This probably did not interfere in the application of the test, because all the studied children met the requirement of being in the range of the application and have met the last three elements of visual evaluation.The way ELM was used in this study, was according to the approval of the items.This form has a quicker application and minimizes false negatives, but is less specific and may lead to a slight increase in false positives. 15creening tests indicate a potential risk that can be confirmed by a systematic follow up and through diagnostic tests.In addition, the assessment of the development in a single moment does not allow to determine definitively a delay in the development of the child, but indicates the need for a more careful and deeper investigation. 8,27Considering the emphasis to detect delays in the development including language through screening up to two years of age, 1,3 the encounter of changes in both screening methods to alter the development in children who were born premature emphasizes the importance of maintaining the follow-up of these children at risk.
Both tests used here are not validated in the Portuguese language.Unfortunately there is no way to know about the measurements of the reproducibility or reference standards of these children.The wide use of these tests, including its separation of sectors, 9 justify the analysis even if partially carried out in this study and to emphasize the need for a greater standardization in support of further investigation.
These findings in a high risk population may not at first be generalized as a general population. 28lthough, the emphasis falls upon the language sector which is the most affected in this group of people, 17 the limitations in this study may have influence on the agreement as well as the homogeneity of the sample, sleepiness and fatigue of the child and the fact that both tests have been applied in sequence by the same person.
The knowledge of the tests (Denver II and ELM) results applied by the same person may have influenced the interpretation of both tests, probably increasing the concordance between the results and the caused of distortion in the accuracy measurement.This may have resulted in the occurrence of a bias review -which is not to evaluate in a masked way the tests that are performed for other tests and outcomes. 28owever the application of both test in a uniformed way and the existence of the predominance of the elements in the expressive language in different forms allow us to infer that the concordance observed in the sample probably reflects much more in the agreement between tests rather than the effect of bias due to application by an only examiner.
The Kappa coefficient is used for categorical data which constitutes a step forward in relation to the general correlation rate because it is an adjusted indicator which takes into account the correlation due to the chances.Its significance allows the achievement of the statistical inference but does not bring information about the quality of the measurement performed by observers estimating only the concordance between the observations effected. 23he evaluation of accuracy of a test is based on its relationship in some way to know if the disease is or is not actually present -the most faithful indicator of the truth generally referred to as the "gold standard". 29This evaluation was not performed in this present study, because both tests compared, despite its wide use in our midst, have not been validated for the Brazilian population.
It is reinforced in this study the emphasis on the language element among other screening elements of the methods used.Although using screening tools, which have advantages and disadvantages, and knowing that factors related to the cut-off point of the screening methods contribute to make the difference harder between sick and non-sick patients, 30 this study seeks to contribute in verifying the validity of the use of these tools in our country.
It seems to reinforce, in particular, the use of Denver II test as a screening method in our midst and the value of its language sector showing a broad concordance with other method for specific language screening in the 2-3 years of age range.
According to the data in this study, it can be stated that there is a strong agreement between two screening development methods -Denver II and ELM especially in the language item and it suggests that Denver II is capable of screening development diversion in a population with a predominance of language changes in the age range of two to three years old.Only other studies with greater depth of methodological approach could conclude about the accuracy of the suggestions proposed here.

Table 2
Frequency of the alterations on the performance in Denver II sectors in 77 evaluated children.

Table 4
Agreement between the performance in Denver II and ELM of the 77 studied children.Agreement between the performance in the language sector of Denver II and ELM of the 77 studied children.