Interrater reliability of the Saint-Anne Dargassies Scale in assessing the neurological patterns of healthy preterm newborns

Objectives: to assess the interrater reliability of the Saint-Anne Dargassies Scale in assessing neurological patterns of healthy preterm newborns. Methods: twenty preterm newborns met the inclusion criteria for participation in this prospective study. The neurologic examination was performed using the Saint-Anne Dargassies Scale, showing normal serial cranial ultrasound examination. In order to test the reliability, the study was structured as follows: group I (rater 1/physiotherapist; rater 2/neonatologist); group II (rater 3/physiotherapist; rater 4/child neurologist) and the gold standard (expert and professor in pediatric neurology). Results: high interrater agreement was observed between groups I – II compared with the gold standard in assessing postural pattern (p<0.01). Regarding the assessment of primitive reflexes, greater agreement was observed in the evaluation of palmar grasp reflex and Moro reflex (p<0.01) for group I compared with the gold standard. An analysis of tone demonstrated heterogeneous agreement, without compromising the reliability of the scale. The probability of equality between measurements of head circumference in the two groups, compared with the gold standard, was observed. Conclusions: the Saint-Anne Dargassies Scale demonstrated high reliability and homogeneity with significant power of reproducibility and may be capable to identify preterm newborns suspected of having neurological deficits.


Introduction
][3] In the last two centuries, the advent of more advanced complementary exams has not reduced the value of the clinical assessment of newborns with and without either clinical or neurological complications.][6] Volpe 7 argued that the identification of isolated neurological signs during the neonatal period should not be considered predictive.However, he underscored that the predictive value increases when a set of altered signs are identified during a neurological examination, as the presence of several abnormalities on such an assessment is suggestive of the presence of a severe neurological disorder, which increases the predictive ability of the clinical exam. 7][10] With respect to scale reliability in assessing neurological pattern in preterm, Deschênes et al. 11 underscored the interrater reliability of the Amiel-Tison Neurological Assessment at term, showing its importance and exhibiting excellent reliability, while Simard et al. 12 demonstrated good validity and reliability in preterms and terms up to 6 years of age.However, Gagnon, in his doctoral thesis, showed that Premie-Neuro raw scores had acceptable reliability and validity for use by clinicians to identify atrisk preterm infants, however, its classifications should be interpreted cautiously. 13On the other hand, Fernãndez et al. 14 reported acceptable validity and reliability using the Spanish version of the Premie-Neuro scale for preterm children.Leroux et al. 15 showed that interrater reliability of the Amiel-Tison assessment tool is very good, and when performed by a highly trained examiner, the results correlate with developmental performance at 2 years of corrected age.
Thus, it is important to evaluate the reliability of assessment instruments, determining the skill of examiners in measuring or identifying subject or object differences, thereby decreasing errors inherent to diagnosis, scores or measurements. 16e Saint-Anne Dargassies Scale (SDS) is reference for the follow-up of preterm infants, particularly because it defines the maturational evolution of these newborns every two weeks until term. 17][20] Although the SDS is considered the gold standard in assessing preterm newborns (PTNBs), it can be verified that the scale has been underused in PTNBs researches.
The aims of this study were to evaluate the reliability of the SDS in assessing the neurological patterns of healthy PTNBs, and to analyze the level of agreement between interraters and the interrater considered the gold standard.

Methods
This was a prospective study, performed using the SDS 17 in PTNBs who were born and treated in the Intensive Care Unit of Januário Cicco Maternity School (MEJC) in Natal, Brazil.The sample was calculated based on the number of complicated and non-complicated preterm newborns born at the MEJC over a one-year period.The PTNBs were assessed by a neonatologist for clinical, physical and biochemical aspects following birth and defined as optimal to be part of the sample, for not exhibiting any clinical or neurological complications of prematurity.
This project was approved by the Research Ethics Committee of the Federal University of Rio Grande do Norte, under protocol number 423/2010.Written informed consent was provided by the parents of all PTNBs.
The inclusion criteria for participation in this study were as follows: a) being preterm newborns treated at the Intensive Therapy Unit of MEJC: b) having a gestational age (GA) between 32 and 37 weeks; c) exhibiting no abnormalities and d) having undergone serial cranial ultrasonography evaluations without abnormalities.PTNBs were not included if they had any of the following: malformations of the central nervous system (myelomeningocele, hydrocephaly, anencephaly, and others), neurological or clinical complications requiring either intubation or sedation, maternal sedation during the first 48 hours of life and abnormal cranial ultrasound, as well as failure to comply with the assessments as established via research.
To test the reliability of the SDS and interrater Alves CIS et al.
agreement, the collection was structured as follows: group I consisted of rater/1 (physiotherapist) and rater/2 (neonatologist); group II consisted of rater/3 (physiotherapist) and rater/4 (child neurologist); these groups were compared with the gold standard (GS).The GS is the childhood neurology professor, with a medical residency in pediatric neurology and exercising this specialty for at least 35 years.Each group independently assessed 10 PTNBs for a final total of 20 PTNBs.The GS assessed the total n of the sample under the same conditions and on the same days as groups I and II.The PTNBs were well fed, and a 1-hour interval was allowed between each observer's exam, with an application time of approximately 10 minutes.On the SDS, the following items were assessed: the cardinal points reflexes and the palmar grasp, Moro, crossed extension and gait reflexes; passive muscle tone was examined by measuring the popliteal, foot-leg and heel-to-ear articular angles, and active tone was assessed via the observation of spontaneous movements, lower limb straightening, head straightening and an examination of the neck flexors (traction maneuver).The state of awareness was proposed by SDS, who classified them as sleepiness, provoked wakefulness, spontaneous wakefulness, wakefulness, altered wakefulness or asleep, and sedated. 17The first neurological examination was conducted within 72 hours following birth and repeated every two weeks until the infant reached term.
To assess the infants' articular angles (A), a goniometer was specially adapted, per Alves and Melo. 21At the end of each item assessed by the SDS, the score values between 0 and 2 were estimated, where 0 was an absent response, 1 an altered response and 2 an expected response.
The cranial ultrasound exams were performed and interpreted by an examiner blinded to the neurological patterns of the PTNBs, using a GE-LOGIQ P6 ultrasound machine with a convex transducer of 6 to 10 MHz and diameter of 3 cm, which was applied using the anterior and posterior transfontanellar technique.The exams were conducted in the coronal (anterior, medial and posterior), sagittal (median and paramedian) and axial planes.The cranial ultrasound was performed before the neurological exam and every two weeks thereafter until each infant reached term.
Before the application of the SDS, all four observers attended a preparatory workshop taught by an experienced examiner (GS).The workshop was divided into two stages.Stage 1 addressed the items assessed by the scale, and stage 2 involved practical training to standardize the assessment of the scale's items.The first 4 meetings consisted of the GS applying the scale in PTNBs hospitalized in the MEJC to evaluate the 4 raters.During the remaining 8 meetings, the four raters attempted the SDS under the supervision of the GS.The preparatory workshop concluded after 12 meetings, when the raters were considered qualified to utilize the SDS.
The data were tabulated and stored using Microsoft Excel 2010.The databank was exported to SPSS 20.0 software, the primary tool with which the statistical analyses were performed, using R software, version 3.3.1.An analysis of the interrater agreement between the two groups and the GS was completed using the binomial test.The significance level was set at 2%, and a critical value of 3 disagreements was established to reject the disagreement hypothesis; that is, in the observation of seven interrater agreements, the hypothesis that the observer disagreed with the assessment was rejected.The Wilcoxon test was used to compare continuous measurements, with a significance level of 5%.If the p-value of the interrater test was less than 0.05, there was considerable evidence to reject the equality between the distributions of the values used between the observers and the GS.

Results
A total of 26 PTNBs without either clinical or neurological complications were selected.However, the study sample was ultimately composed of 20 non-complicated PTNBs because six were excluded from study due to the following: the detection of intracranial hemorrhage during the first ultrasound exam (1 case), signs of alcohol and drug withdrawal (1 case), signs of lethargy (1 case) and failure to complete each of the assessment stages of the study (3 cases).Of the 20 PTNBs selected, 13 (65%) were female, and 7 (35%) were male.With respect to gestational age, 1 (5%) was 32 weeks, 3 (15%) were 33 weeks, 9 (45%) were 35 weeks, and 7 (35%) were 34 weeks.Regarding weight for gestational age, 15 (75%) were considered appropriate for gestational age and 5 (25%) were considered small for gestational age.Regarding resuscitation in the delivery room, 13 (65%) did not require resuscitation, whereas 7 did.Regarding the use of resuscitation resources in the delivery room were 6 (30%) required an oxygen hood or helmet and 1 (5%) newborn required continuous positive airway pressure (CPAP).
On the APGAR scores at 1 st and 5 th minutes, the PTNBs were stable, and those that required resuscitation in the delivery room progressed satisfactorily following resuscitation.During the first minute of APGAR screening, the following was observed: 7 (35%) had APGAR scores of 9, 9 (45%) had APGAR scores of 8, 2 (10%) had APGAR scores of 7, and 2 had no record of their first minute APGAR scores.Regarding the 5-minute APGAR score, the following was noted: 16 (80%) had a score of 9, 1 (5%) had a score of 8, 1 (5%) had a score of 6 and 2 (10%) had no record of their 5-minute APGAR scores.
Regarding the type of delivery, 11 (55%) underwent a cesarean delivery and 9 (45%) a vaginal delivery.Regarding their presentations at delivery, 15 (75%) were born with a cephalic presentation, 3 (15%) with a pelvic presentation, and 2 (10%) with no record of their presentation at the time of delivery.
Table 1 shows the interrater agreement in the assessment of postural patterns between groups I and II; 100% agreement was noted between rater/1 and the GS (p<0.01) for the four assessments of postural pattern; however, between rater/2 and the GS, the first assessment had a lower index of agreement (90%).However, the P-value remained significant (p<0.01) and 100% agreement was noted in the postural assessment between rater/3 and rater/4 compared with the GS (p<0.01).
Table 2 demonstrates the likelihood of significant interrater agreement in the evaluation of head circumference between groups I and II compared with the GS.The group I P-values for the head circumference measurements in the four SDS assessments were 0.944, 1.000, 0.905 and 0.915, and the P-values for group II were 0.634, 0.833, 0.259 and 0.191.
The assessments of the infants' primitive reflexes demonstrated that the highest rate of interrater agreement between group I and the GS was noted in the assessment of the palmar grasp and Moro reflexes, with a P-value of 0.01 for the four assessments (Table 3).Significant agreement was noted between raters 3 and 4 and the GS with respect to the assessment of the infants' primitive reflexes.The interrater agreement was not homogeneous as observed in the assessment of the palmar grasp and Moro reflexes in group I. Group II exhibited greater heterogeneity between the observers and the GS (Table 4).
Table 5 displays the level of agreement between the group I and group II raters and the GS regarding articular angles and demonstrates that all measurements of the popliteal angle (PA) were significant for group I; however, the strongest agreement was noted between rater/2 and the GS.The measurements of only the foot-leg angle (FLA) demonstrated significant agreement for group I in the second, third and fourth assessments.All measurements of the heel-ear angles (HEA) in group 1 were considered significant.
Significant interrater agreement in the assessment of the PA was noted only in the fourth assessment, with a P-value of 0.01 for both raters/3 and 4 compared with the GS, as demonstrated in Table 5.The measurements of the HEA exhibited significant agreement between rater/3 and the GS only in the third and fourth assessments (p=0.01).Rater 4 and the GS agreed significantly in all four assessments of the foot-leg angle.With respect to the HEA assessment in group II, significant agreement was noted between rater 3 and the GS only in the first and third assessments, and between rater 4 and the GS in the first three assessments.
The assessment of active tone in the PTNBs demonstrated significant homogeneity between the group I and II raters and the GS on the upper limb rebound test, with a P-value of 0.01 for the four group assessments.During the spontaneous movements exam, all group I assessments obtained significant agreement, with a P-value of 0.01.In group II, this agreement was significant only for the assessments of this item between rater/4 and the GS, as rater/3 and the GS agreed significantly on the evaluation of spontaneous movement only in assessments 2 th , 3 th and 4 th (p=0.01).
Regarding the traction maneuver, which assessed muscle flexor strength, significant agreement was noted among all of group I assessments, with a p<0.01 for all of the assessments in this group.Significant agreement in the four group II assessments was noted only between rater/3 and the GS.Interrater agreement was considered significant between rater/4 and the GS only for the first, second and fourth assessments, with p<0.01 for the first and second assessments and a P-value of 0.02 for the 4 th assessment.

Table 2
Significance between the group I and II raters and the gold standard in measurements of head circumferences, and biauricular and anteroposterior parameters of 20 preterm newborns.

Table 4
Agreement between group II raters and the gold standard regarding the assessment of primitive reflexes of 20 preterm newborns using the Saint-Anne Dargassies Scale.

Table 5
Agreement between the group I and II raters and the gold standard in the assessment of articular angles of 20 preterm newborns using the Saint-Anne Dargassies Scale.

Discussion
In this study, three parameters exhibited an important relationship in defining healthy preterm infants, parameters such as birth weight for gestational age, APGAR score and premature birth etiology.][24] According to the literature, when the APGAR score is below seven following the 5 th minute, special attention must be paid to these newborn infants, even in the absence of altered laboratory exams. 25The PTNBs of this study exhibited 1 and 5 minute APGAR scores greater than or equal to 7, confirming that these newborns evolved satisfactorily.
Bittar and Zugab 26 reported that spontaneous prematurity accounts for 75% of cases and results from premature labor.Moreover, the etiology is considered complex or multifactorial or unknown, which hinders the implementation of preventive measures.The data from our study are similar to those obtained by these authors, as 12 of the participants (60% of the sample) went into premature labor without a secondary etiology.
The neurological examination is an important part of the newborn assessment, whether the infant in question is term or preterm, and is a useful tool for identifying newborns that require follow-up due to the risk of neurodevelopmental abnormalities. 27It is important that the scale and the research method are used correctly to identify newborns suspected of having abnormalities and to differentiate these patients from those with normal development.
According to Noble and Boyd, there is increasing evidence regarding the impact of prematurity on brain development. 28The authors report that approximately 10 to 15% of extreme PTNBs are diagnosed with cerebral palsy and that there is growing evidence of premature birth effects persisting into school age, adolescence and adulthood, which categorize premature birth as a severe public health problem.
Sampath et al. 29 report that even preterm newborns with normal cranial ultrasound results are susceptible to neurodevelopmental alterations, reinforcing the importance of performing the neurological exam in conjunction with valid and reliable instruments such as the SDS to identify infants suspected of having developmental disorders.The use of standardized assessments has become increasingly necessary in clinical practice.Studies have demonstrated that this need exists in all areas of medicine. 30The choice of an adequate instrument to study PTNBs should be a concern for researchers, as these instruments must be both valid and reliable for this population.
The results of interrater agreement in the present The primitive reflexes of the 20 PTNBs also exhibited strong agreement between the group I and II raters and the GS.This significant agreement in assessing primitive reflexes further strengthened the reliability of this scale for the follow-up of these high-risk newborns.
Although the statistical analysis of the articular angles also demonstrated heterogeneity between the group I and II raters and the GS, this heterogeneity did not compromise the significant level of agreement, which also reinforced the reliability of this assessment instrument.
The interrater agreement observed in the utilization of the SDS may have resulted from differences in both the experience and the performance of the raters, in addition to previous training in the preparatory workshop that preceded the data collection.
The present study had some limitations.The first regards the sample of 20 preterm newborns; however, it is important to relate that we studied only preterm with no clinical or neurological complications, a factor that restricted the sample n of the study.Thus, the homogeneous nature of the groups analyzed to determine reliability minimized this problem.Other factor should be considered the number of examiners (n=5) and only one of whom was considered GS.However, in spite of these limitations, SDS can be considered for future validations in studies, using more examiners to confirm our previously unpublished findings.
The strength of this study was to demonstrate that the SDS exhibits both high reliability and homogeneity and may be considered a scale with accurate reproducibility.Our results allow us to recommend it as an assessment instrument for PTNBs without either clinical or neurological complications.We also believe that the use of the SDS is a good reference for the follow-up of preterm infants, in addition to demonstrating high reliability.
In conclusion, our data pointed strong agreement among examiners using the SDS to screen pre-term newborn with high neurologic risk.We suggest the use of this SDS for the follow-up of PTNBs during prematurity period.It should be utilized every 15 days by professionals who work with preterm newborns.Given its reliability and practical feasibility, it will be useful to health professionals: pediatric neurologists, neonatologists, pediatricians, physiotherapists and neonatal nurses to identify premature newborns suspected of having neurological deficits.

Table 1
Agreement between group I and group II raters and the gold standard in assessing postural patterns of 20 preterm newborns using the Saint-Anne Dargassies Scale.

Table 2
Significance between the group I and II raters and the gold standard in measurements of head circumferences, and biauricular and anteroposterior parameters of 20 preterm newborns.

Table 3
Agreement between the group I raters and the gold standard regarding the assessment of primitive reflexes of 20 preterm newborns using the Saint-Anne Dargassies Scale.

Table 5
Agreement between the group I and II raters and the gold standard in the assessment of articular angles of 20 preterm newborns using the Saint-Anne Dargassies Scale.