VALIDITY AND RELIABILITY OF A NEW TRIAGE SYSTEM FOR PEDIATRIC EMERGENCY CARE: CLARIPED

ABSTRACT Objective: To assess the validity and reliability of a triage system for pediatric emergency care (CLARIPED) developed in Brazil. Methods: Validity phase: prospective observational study with children aged 0 to 15 years who consecutively visited the pediatric emergency department (ED) of a tertiary hospital from July 2 to 18, 2013. We evaluated the association of urgency levels with clinical outcomes (resource utilization, ED admission rate, hospitalization rate, and ED length of stay); and compared the CLARIPED performance to a reference standard. Inter-rater reliability phase: a convenience sample of patients who visited the pediatric ED between April and July 2013 was consecutively and independently double triaged by two nurses, and the quadratic weighted kappa was estimated. Results: In the validity phase, the distribution of urgency levels in 1,416 visits was the following: 0.0% red (emergency); 5.9% orange (high urgency); 40.5% yellow (urgency); 50.6% green (low urgency); and 3.0% blue (no urgency). The percentage of patients who used two or more resources decreased from the orange level to the yellow, green, and blue levels (81%, 49%, 22%, and 2%, respectively, p<0.0001), as did the ED admission rate, ED length of stay, and hospitalization rate. The sensitivity to identify patients with high urgency level was 0.89 (confidence interval of 95% [95%CI] 0.78-0.95), and the undertriage rate was 7.4%. The inter-rater reliability in 191patients classified by two nurses was substantial (kw2=0.75; 95%CI 0.74-0.79). Conclusions: The CLARIPED system showed good validity and substantial reliability for triage in a pediatric emergency department.


INTRODUCTION
Triage in the pediatric emergency department (ED) is a challenge. Limited ability to communicate, subclinical presentations in young children, variations in normal vital signs (VS) according to age group, among other factors, make pediatric triage a complex and difficult task. 1 The triage systems most commonly used worldwide for pediatric emergency care are the Manchester Triage System (MTS), the Canadian Pediatric Triage and Acuity Scale (PedCTAS), the Emergency Severity Index (ESI), and the Australasian Triage Scale (ATS). 2,3 These instruments were originally designed for adults, and later adapted for children, who represent 20 to 40% of the population treated in emergency departments. 4 The validity and reliability of these triage systems for children have been assessed predominantly in the countries they were created or in developed countries with similar cultures. [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] These instruments are very extensive or complex, and their performance in countries with distinct sociodemographic and/or cultural characteristics has been lower than in their original countries. [23][24][25][26] Differences in human and technological resources, professional qualification, and health policies can interfere in their performance. Simpler algorithms, provided they are valid and reliable, could be more appropriate for countries like Brazil.
The South African Triage Scale (SATS) is a simple and objective tool; 27 however, it has only four urgency levels (ULs) and three age groups for VS evaluation. The tool recommended by the World Health Organization (WHO) for less developed countries, the Emergency Triage Assessment and Treatment (ETAT), prioritizes the identification of patients with a high urgency level. 28 This characteristic does not reflect 70 to 90% of the population who crowds the public and private Brazilian pediatric EDs and have intermediate urgency levels.
To meet these demands, a team of pediatric ED experts in Brazil has developed a five-level triage system for pediatric emergency care (CLARIPED), which is simple and objective, easy to use and train, and stratified into five age groups. 29 The purpose of this study was to assess the validity and reliability of this instrument.

METHOD
This is a prospective observational study conducted in the pediatric ED of a private tertiary hospital in the city of Rio de Janeiro (Rio de Janeiro, Brazil), with 3,000 visits per month, daily staff of 3 to 4 doctors, 2 nurses, and 2 nurse technicians, in addition to pediatric residents.
To assess the validity of CLARIPED, all consecutive patients who visited the ED and underwent triage from July 2 to 18, 2013 were included. We excluded patients who did not undergo triage. Data were collected every 24 hours from the medical records of the previous day. Demographic and clinical variables from triage and during ED stay, such as diagnostic/therapeutic resources, ED length of stay, and destination were collected to digital forms. Data were reviewed for consistency.
To evaluate inter-rater reliability, we prospectively selected a convenience sample during daytime shifts (8 to 17h) between April and July 2013. Immediately after the conventional triage performed by the regular triage nurse, consecutive patients and their guardians were invited to voluntarily participate in the study. If they agreed and the guardian signed the Informed Consent Form, they were taken to another room, where a research nurse blind to the first classification, performed a new complete triage procedure. The research nurses belonged to the triage team, had the same level of training, and voluntarily participated in the study, in their extra work hours. In this phase, for ethical reasons, we excluded patients who needed immediate treatment, due to the impossibility of subjecting them to two consecutive triage procedures.
CLARIPED was applied as previously described. 29 The first step was the assessment of four vital signs to calculate the pediatric vital signs score (VIPE score), from 0 to 12, classified in five ULs: 0=blue (no urgency); 1 or 2=green (low urgency); 3 to 5=yellow (urgency); 6 to 9=orange (high urgency); and 10 to 12=red (emergency). The second step was the evaluation for the presence of clinical discriminators consisting of signs, symptoms and/or complaints, also distributed into 5 ULs. If an identified discriminator corresponded to a higher UL than the one determined by the VIPE score, the final classification would be the one with greatest UL. 29 Due to the lack of a gold standard for triage in pediatric ED, we used two methods to evaluate validity: 1. the association between the UL designated by CLARIPED and four clinical outcomes (diagnostic/therapeutic resource utilization, admission rate at the ED observation room, ED length of stay, and hospitalization), which were considered proxies of urgency, similarly to other studies; 9-12,14,15 and 2. comparison between the CLARIPED classification and the one determined by a reference standard.
The first method was based on the following hypothesis: if CLARIPED adequately identifies the five ULs, a decreasing gradient in the frequency of outcomes should occur, from the highest to the lowest UL. The outcome "resource utilization" included diagnostic tests, therapeutic procedures, and specialty consultations, according to a previously standardized and adapted table from Gilboy et al. 30 This variable was dichotomized (<2 resources and ≥2 resources), similarly to other studies. 12,14,15,19 The admission rate to the ED observation room comprised only children who, after occupying an ED bed, were discharged home. ED length of stay was calculated from the beginning of physician assessment until the patient left the ED. Patients who progressed to hospitalization, even those transferred to other institutions, were included in the hospitalization rate.
In the second method, the reference standard was based on a matrix developed by experts to study the MTS in the pediatric population 19 and adapted for the present study. This matrix used data extracted from medical records (significant vital signs changes, life-threatening clinical conditions, laboratory and imaging tests, therapeutic approach, and patient destination), alone or in various possible combinations, to retrospectively identify the appropriate urgency level and compare it with the one previously assigned by the triage system.
For the validity, we estimated a sample of 1,385 ED visits, based on data from the literature regarding the ED length of stay, which was the outcome that demanded the largest sample. Assuming an alpha error=0.05 and beta error=0.80, we used the difference of 71 minutes between level 2 (high urgency) and level 3 (urgency), reported in a study on the ESI-4 (309 minutes; 95%CI 257-361, SD=225.5 versus 238 minutes; 95%CI 223-251, SD=112.8, respectively). 15 For the inter-rater reliability, the sample calculation was based on a pilot study including 61 visits, which generated a quadratic weighted Kappa (kw 2 ) of 0.57 (95%CI 0.51-0.68). To reduce the confidence interval range to 0.10, we made simulations with increasing samples sizes, and the same distribution of agreements and disagreements between ULs observed in the pilot study, resulting in an estimated sample of 183 visits.
Associations between ULs and outcomes were assessed using the chi-square test or Fisher's exact test for categorical variables and the Mann-Whitney test or Kruskal-Wallis test for continuous variables. We used logistic regressions to estimate odds ratios (OR) resulting from the association of ULs (independent variables) with hospitalization and use of resources (dependent variables), after adjustments for potential confounding factors (age, service time and day of the week). Overtriage and undertriage rates, sensitivity, and specificity in diagnosing high urgency cases were calculated by comparing CLARIPED with the reference standard. Stratification by age group and diagnostic categories were performed on an exploratory basis.
For inter-rater reliability, we chose kw 2 because this estimate takes into account the degree of disagreement between categories, in addition to being the most widely used in other studies. The analysis considered a significance level of 0.05 and 95%CI. We used the statistical softwares Stata 12.0 (StataCorp, College Station, Texas, United States) and R 2.15.3 (R Foundation, Vienna, Austria). The Committee for Ethics in Research (CER) of the institution approved this project.

RESULTS
The validity phase included 1,416 consecutive visits (80.2% of those eligible) and excluded 28 cases whose medical records were lost (1.6%), 12 cases who left the ED before triage (0.7%), and 310 cases who did not undergo triage (17.6%). In the reliability phase, 179 patients agreed to participate in the study (93.7% of invitees), 9 refused, and 3 were excluded due to the absence of a legal guardian ( Figure 1).
The validity sample had the following distribution: red 0.0%; orange 5.9%; yellow 40.5%; green 50.6%; and blue 3.0% (Table 1). Resource utilization was evaluated in 1,415 visits, while admission to the ED observation room and hospitalization were evaluated in 1,413 visits and ED length of stay in 1,090 visits ( The comparison between CLARIPED and the reference standard showed absolute agreement in 33.5% of cases, overtriage in 59.1%, and undertriage in 7.4%. Most of the disagreements represented assignments one category above the correct classification (49.4%), mainly in the green and blue levels, or below it (7.3%), mainly in the yellow level (Table 3). There were no differences between age groups in overtriage (p=0.20 to 0.98) and undertriage (p=0.13 to 0.52) when compared to general rates. Overtriage rates were lower for lower respiratory diseases (29.6%; p<0.01), and higher for upper respiratory diseases (67.1%; p=0.002) and ear diseases (76.1%; p=0.0002).
No diagnostic category showed an undertriage rate different from the general one of 7.4%.
CLARIPED sensitivity and specificity in identifying high urgency levels were 0.89 (95%CI 0.78-0.95) and 0.98 (95%CI 0.97-0.99), respectively. The stratification of these estimates by diagnostic categories and age group was impaired due to the small number of cases in subgroups (Table 4).
In the reliability phase, 15 nurses with the same training level on CLARIPED participated in pairs in the double triage: 13 nurses in the first and two nurses in the second triage. The median age of the nurses was 28 years old (interquartile range [IQR]: 26.0-29.5). Four nurses had over five years of pediatric ED experience, including the two research nurses, while 11 nurses had less than five years of experience. The UL distribution in the first triage was orange 7.3%, yellow 39.1%, green 41.9%, and blue 11.7%; and in the second triage was orange 7.3%, yellow 41.9%, green 34.6%, and blue 16.2% ( Table 1). The absolute agreement was 68.7%, and the kw 2 was 0.75 (95%CI 0.73-0.79) ( Table 5).

DISCUSSION
In addition to having good validity and reliability, an ideal triage system must be feasible and effective. Ensuring team adherence and procedure expedition is essential. CLARIPED showed good validity, demonstrated by a strong association between ULs and clinical outcomes, in addition to substantial inter-rater reliability. The measures of association with outcomes were comparable to those observed in similar studies with other triage systems. The chance of hospitalization in the orange level was almost 11 times higher than in the yellow and greater than the estimate of a multicenter study on PedCTAS (OR=4.93; 95%CI 2.95-8.25). 10 However, in the present study, the hospitalization rate (2.2%) was lower than those reported in other studies (5-10%), 9,10,12,15,26,27 which could be the result of differences in populations or institutional policies.
Given the low hospitalization rate, the resource utilization was a more appropriate outcome for the population under study. The frequency of patients who used ≥2 resources decreased from highest to lowest UL (81.0; 48.5; 21.8; and 2.4%; p<0.0001). Considering that there were no patients classified as red, these results were similar to those reported with the ESI-4 (100, 70, 45, 17, and 4%) 14 and are more discriminant than those found with the MTS (41.7, 25.4, 30.2, 16.6, and 3.7%). 19 The orange level showed an almost 5 times higher chance of using ≥ 2 resources when compared to the yellow level, while the green level showed a 5 times lower chance. This association was also very close to that reported with the PedCTAS, when comparing high urgency level (OR=4.67; 95%CI 2.61-8.34) and low urgency level (OR=0.21; 95%CI 0.17-0.28) to the urgency level cases as reference. 10 Figure 1 Patient selection algorithm for the validity and reliability studies. 1 Reliability study (convenience sample between April and July 2013); 2 validity study (consecutive sample from July 2 to 18, 2013).  The pediatric ED length of stay was calculated from the beginning of the assessment by a doctor, and not on arrival at the ED, as in most studies. The purpose was to avoid distortion in the association between this outcome and the UL since the triage process determines that the lower the urgency level assigned to the patient, the higher the waiting time to be seen by a doctor. The distribution of length of stay showed a decreasing gradient from the highest to the lowest UL, corroborating the good validity of the instrument (209, 106, 47, and 27 minutes; p<0.0001). Disregarding the level 1 (red), this gradient was also more discriminant than those identified in two studies on the PedCTAS (191,250,191,96, and 66 minutes; p<0.0001, and 309, 238, 186, and 160 minutes), 9,10 and two studies on the ESI-4 (334, 221, 207, 151, and 132 minutes; p<0.001, and 156, 236, 259, 117, and 99 minutes; p<0.0001). 12,14 However, the difference in the definition of this outcome in these studies might have contributed to the less consistent results.
Despite the difference in sample size between this study (n=1,416) and two studies on the MTS (n=13,554 and 11,260), 20,22 the use of a similar reference standard by the three studies allows some comparisons between the performance of the tools. CLARIPED showed absolute agreement similar to MTS (33.5 versus 34.0%); higher overtriage rate (59.1% versus 54.0%), higher sensitivity (89.0 versus 63.0%), and specificity (98.0 versus 79.0%); and lower undertriage rate (7.4% versus 12.0%). 20 After modifications in some MTS pediatric discriminators, specificity increased (87%), overtriage decreased (47%), sensitivity did not change (64%), and undertriage presented a slight increase (15%). 22 Nonetheless, it is important to question whether the reference standard provides the appropriate urgency level in all cases. For example, an infant who arrives at the pediatric ED weeping, with irritability, and intense pain would be properly classified as orange or yellow by MTS and CLARIPED. If the final diagnosis is acute otitis media, which is a very common entity in pediatrics, the patient will be medicated with an analgesic and discharged Overtriage (orange) = 0.0% Undertriage (orange) = 11.3%  with a prescription, being considered green by the reference standard. Actually, the CLARIPED overtriage rate was particularly high for ear diseases. In the same way, a patient with an extensive cut-contusion wound, requiring sutures, would be classified as yellow by CLARIPED and MTS, and green by the reference standard. These and other similar cases could justify the low absolute agreement, as well as the high overtriage rate of both tools compared to the reference standard. In fact, the reference standard was not validated. This study estimated the reliability by including only patients treated in real time, instead of hypothetical scenarios, which were commonly used in several studies. 5,12,16,17,21,27 Clinical scenarios do not replicate the difficulties of the actual triage process, being subjected to biases. The present study exhibited substantial interrater reliability (kw 2 =0.75; 95%CI 0.74-0.79). This result is better than those obtained in the first studies on other instruments with actual patients: MTS (kw 2 =0.65; 95%CI 0.56-0.72), 20 PedCTAS (kw 2 =0.61; 95%CI 0.42-0.80), 8 and ESI-4 (kw 2 =0.57; 95%CI 0.52-0.62). 14 More recent studies showed better reliability with PedCTAS, 10 (kw 2 =0.74; 95%CI 0.71-0.76) and ESI-4 15,26 (k linear=0.92; p<0.001 and k not specified=0.82; 95%CI 0.67-0.84). The improvement in reliability over time probably reflects the refinement of these tools and a progressive better qualification of the teams. In this regard, the reliability exhibited by the first version of CLARIPED is promising.
This study has some limitations. It was carried out in a single center, and the researchers could have over-motivated the health team, resulting in an overestimation of the validity and reliability of CLARIPED. However, the easy assimilation and implementation of the tool suggest that it could be appropriate for many similar environments, including non-hospital emergency departments.
Another limitation was that participants of the validity phase represented 80.2% of eligible patients and the hospitalization rate was higher among non-participants (8.6% versus 2.2%) ( Table 1).  The most plausible reason for this difference is that the pediatric ED of the present study receives patients referred to hospitalization by its assistant pediatricians. These children are sent directly to the ED observation room to start treatment, without undergoing triage; however, the characteristics of the study participants did not differ from the total pediatric ED population (Table 1). An additional limitation is the lack of patients classified as red in the period studied; however, this fact does not invalidate the results found in the other four urgency levels, which constitute about 99% of emergency pediatric care. Two validity studies on PedCTAS did not include patients requiring immediate care either. 6,10 These patients are very rare in most pediatric EDs around the world, 7,9,11,23,25,26 and, in daily practice, they do not undergo triage, being directly led to the reanimation room, and classified retrospectively. On the other hand, one of the main challenges of triage system is discriminating intermediate UL patients, such as levels 3 (urgency) and 4 (low urgency), which comprise the vast majority of patients who crowd the pediatric EDs. Level 3 patients are those with the potential to have their condition worsen if they wait a long time for medical care, but who might not be easily identified without an objective assessment. Surgical abdominal pain (appendicitis or intussusception), cases with the risk of severe dehydration (profuse diarrhea or incoercible vomiting), or acute bacterial infection (high fever in small children) are some examples of level 3 (urgent) patients.
Lastly, the present study used clinical outcomes as proxies of urgency to determine the convergent construct validity.
However, the goal of triage systems is not to predict clinical outcomes, which are good markers of complexity and severity of diseases, but do not always reflect the level of urgency in all situations, in addition to being influenced by the quality of treatment and institutional policies. For instance, a patient with seizures (red) or having an asthma crisis (orange) can be discharged from the ED observation room a few hours after treatment, without needing hospitalization. On the other hand, a patient with a serious chronic disease can come to the ED with a low urgency complication (green) and need hospitalization due to the underlying disease.
In conclusion, this is the first study on the validity and reliability of a pediatric triage system in Brazil. CLARIPED proved to be a valid and reliable instrument in the center where it was developed. A multicenter study is necessary to corroborate these preliminary findings, indicate the adjustments needed for different health contexts, and assess the external validity of the instrument.