Intra- and inter-rater reliability in the assessment and classification of the longitudinal plantar arch of children 6 to 10 years of age

aims: This study aimed to analyze the intraand inter-rater reliability in the assessment and classification of the longitudinal plantar arch of children from 6 to 10 years old in the eyes-open (EO) testing condition. Methods: A total of two-hundred and seventy-eight Brazilian children (556 feet), boys and girls, from 6 to 10 years of age participated in the study. The children’s feet were examined on a baropodometric platform, and the Staheli index was used for calculating the plantar arch index. Footprint analyses were performed at two different times, with an interval of 7 to 10 days, by three physical therapists in a single testing condition, resulting in 3,336 footprints. To determine the reliability of the continuous measurements, the Intraclass Correlation Coefficients (ICC) with 95% confidence intervals (CIs), Standard error of the mean (SEM), absolute value and percentage, and the Minimum Detectable Change (MDC) were calculated. To determine the reliability of the longitudinal arch classification, inter-rater reliability was evaluated by Weighted Fleiss Kappa Coefficient and the test-retest reliability was estimated by Weighted Cohen Kappa Coefficient. Results: Regarding inter-rater reliability, we observed values of ICC ranging from 0.79 to 0.96; thus, the results were classified as substantial to excellent reliability), being the lowest ICC values occur for line B, mainly in the first assessments. SEM ranges from 0.08 to 0.21 (percentage: 3.74 to 28.7), being the best, the lowest SEM values occur for Plantar Arch Index assessments and the MDC varies between 0.22 and 0.59. Regarding intra-rater reliability, the results indicated excellent reliability: values of ICC range from 0.92 to 0.99, being the lowest ICC values also occurs for line B analysis. SEM ranges from 0.03 to 0.20 (percentage: 2.32 to 26.6), being the lowest SEM values occur for Plantar Arch Index assessments and MDC varies between 0.09 and 0.54. Analyzing the interrater reliability for the longitudinal arch classifications, we observed values of Weighted Fleiss Kappa Coefficient ranging from 0.83 to 0.87, expressing almost perfect agreement among the raters before and after evaluations. The testretest reliability of the longitudinal arch classification resulted in values of Weighted Cohen Kappa Coefficient ranging from 0.80 to 0.996, expressing substantial to almost perfect agreement intra-rater. Conclusion: The study showed high reliability in the clinical assessment of the longitudinal plantar arch index of children from 6 to 10 years of age indicating that the Staheli method is applicable to pressure platform assessments with intraand inter-rate reliability.


Introduction
The foot is essential for supporting body weight, and changes in its structure can cause musculoskeletal disorders, unstable postural control, and symptoms in children and adolescents 1 . During childhood, the development of the longitudinal arch of the foot occurs within the first six years of life, with variability in the structure and function of the feet; thus, it is very important to monitor this parameter in the clinical practice 2 .
The feet are anatomical structures that make it easier to perform important tasks (such as maintaining the orthostatic posture) and help in the strategies to maintain balance. In children, balance is an essential component influenced by motor development and skills that are important to movement 3 . There are three types of foot arches: normal or neutral foot, low or flat foot (pes planus), and high foot (pes cavus) 4 -the longitudinal arch is an essential component responsible for absorbing and dissipating forces in the feet 5 .
The plantar arch is modified as children grow. There is no longitudinal arch up to two years of age, and its natural growth accelerates until around five or six years of age. After early childhood, growth slows down and stability occurs at the age of 12 in girls, and around 13 or 14 in boys, depending on the child's motor experiences during the childhood 6,7 . Range of iD motion (ROM) and morphology are associated with foot function in children 8 and can be measured through questionnaires 9 and clinical assessments 10,11 .
In clinical practice, analysis instrumental can be used to assess and record the longitudinal plantar arch based on the morphological patterns of the feet, obtained through photopodoscopy and photopodometry 11 , radiographic measurements 12 , plantar ultrasound 13 , photogrammetry 14 , plantigraphy 15 and plantar pressure measurements obtained with a baropodometric platform 16 . Different indices, such as Cavanagh and Rodgers -AI 17 , Staheli -SI 18 , Chippaux Smirak -CSI 19, and Clarke's alpha angle -AA 20 , can be applied to assess the longitudinal plantar arch index and foot pattern prevalence. In a previous Brazilian study, the longitudinal arch of school-age children from 3 to 10 years old was characterized using four footprint classifications and calculated through the AI, SI, CSI, and AA indices 21 .
The Staheli method is widely used in Brazil because of its ease of application with children 22 , adolescents 2,3 , and adults 24 , including baropodometric measurements 25 , and has a high level of agreement with other indices 26 . Experimentally, knowledge about footprint measurement and its classification is necessary since professionals must base their interventions on reliable, evidence-based measurements. This study aimed to analyze intra-and inter-rater reliability in the assessment and classification of the longitudinal plantar arch of children from 6 to 10 years old in the eyes-open (EO) testing condition. The rationale is to add more information to the clinical practice of the foot assessments from school-age children.

Study design/sample characteristics
This cross-sectional study was conducted in three full-time public schools in the city of Goiânia, Goiás, Brazil. A total of two-hundred seventy-eight children participated in the study (133 girls and 145 boys), between the ages of 6 and 10 years, with parental authorization, having signed the informed consent form. The exclusion criteria were children with musculoskeletal disorders, such as clubfoot, lower limb deficiencies, and leg length discrepancies that could affect measurements of the plantar arch index. No assessment method was employed for lower limb dysmetria. In this situation, the parents/guardians answered a questionnaire on the child's health condition; no complaint of discrepancies between members was presented regarding any of the children. The Ethics Committee for Research involving Human Subjects of the Federal University of Goiás approved the study under Protocol No. 71269717.0.0000.5083.

Anthropometric measurements
A G-Tech® digital scale, model Glass 10, in tempered glass, with 100-kg divisions and a maximum load of 150 kg, and a portable stadiometer were used to collect the anthropometric measurements (weight, height, and body mass index). A baropodometric platform was used to obtain the anthropometric measurements of the feet (dimensions 565 x 420 x 25 mm, active surface 490 x 490 mm, with capacitive sensors 4096 / 6x6, frequency of 200 Hz and maximum pressure per sensor of 120 N/cm², Footwork Pro® software) (Figure 1) 16 .

Procedures
After measuring height and body mass, the children removed their shoes and socks and assumed an orthostatic position. The children were instructed to place one of their feet on the platform according to their preference and then the other. The assessments were performed under a single testing condition: Eyes open, where the children kept their upper limbs along the side of their body and aligned their eyes with a fixed point on the wall, at a distance of 1.5 meters (EO condition). For the test condition, the child remained 30 seconds on the platform (see Figure 1). The platform data were collected by two previously trained physical therapists and footprint analyses were performed by three physical therapy raters. The measurements were independently taken by the examiners.
After printing the plantar pressure images, the longitudinal arch index was calculated using the Staheli index, which establishes the ratio between the central and posterior region of the footprint. A line was drawn in reference to the longitudinal arch of the forefoot (Line A) and on the topography of the heel (Line B), and the plantar arch index was obtained by the division between the two lines. The children's feet were classified according to the values of the arch index: neutral (0.3 cm and 1.0 cm), flat (> 1.0 cm), and high (< 0.3 cm) 18 . In order to analyze data, the study had the participation of three physical therapists, who were experts in the field with clinical experience in foot arch assessment and were previously trained. For the analysis of the longitudinal arch of the foot, 556 feet were evaluated in a single testing condition. Each rater analyzed the footprints and repeated the analysis with new footprints after an interval of 7 to 10 days (test-retest). The total number of footprints analyzed in this study was 3,336.

Data Analysis
For each foot measurement and the foot type classification, it was analyzed the reliability of the two assessment times, for each one of the three raters (intra-rater test-retest reliability), and the reliability of the measurement (or classification) inter the three raters (inter-rater reliability). To determine the reliability of the continuous measurements the Intraclass Correlation Coefficients (ICC) with 95% confidence intervals (CIs) were calculated. The ICC estimates were calculated using a two-way mixed-effects model with absolute agreement. The level of reliability was established by the Fleiss classification 27,28 . According to Fleiss, the reliability by ICC values were considered low for ICC values below 0.40; moderate between 0.40 and 0.75; substantial between 0.75 and 0.90; and excellent greater than 0.90. To assess measurement variability, the Standard Error of the Mean (SEM) also was calculated. This measurement is defined by with representing the total variance, and can be defined as an estimate of the expected random variation in scores when no real change has taken place and was calculated to provide an "absolute index of reliability" in the same units as the original measurement. The ICC is influenced by multiple sources of variation (eg, subjects, raters, trials) and by error, whereas the SEM is not influenced by variability among patients, is affected by error variation only 29 . The 95% confidence interval (CI) to SEM was calculated as described by Brennan 30 . The following criteria were used for absolute measures: good if SEM <10 and poor if SEM ≥10, according to Keogh et al. 31 and McGinley et al. 32 . The SEM value was transformed into a percentage (SEM %) for interpretation. The was defined as where is the mean for all observations. The values were interpreted as follows: ≤ 5 %, very good; > 5% and ≤10%, good; 10% and <20% = doubtful; and >20% = negative 33 . To determine any clinically important changes, the MDC (Minimum Detectable Change) was used and calculated using the equation . MDC can be defined as the minimal change that falls outside the measurement error in the result of an instrument used to measure a clinical characteristic 34,35 .
To determine the reliability of the foot type classification (ordinal variable) the Weighted Kappa Coefficient with linear weights and their respective 95% confidence intervals (CIs) were calculated. To assess intra-rater test-retest reliability (two classifications compared) was calculated the Weighted Cohen's Kappa Coefficient (WCKC). To assess inter-rater reliability (three classifications compared) was calculated the Weighted Fleiss' Kappa Coefficient (WFKC). The p-values significance test of ICC, WCKC, and WFKC were exhibited in the analysis. Landis and Koch 36 provide a way to characterize Kappa values. According to their scheme, a Kappa value < 0 is indicating no agreement, 0-0.20 as slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1 as almost perfect agreement.
Statistical analysis was performed with a level of significance of 5% using the Statistical Package for Social Sciences (SSPS version 23.0, IBM, Corp, Armonk, NY, USA), and Programming Software (R Core Team, 2020. R. Foundation for Statistical Computing, Vienna, Austria).

Results
The majority of the children were male (52%, n=133), and the mean age was 8.36±1.14 years. The Body Mass Index was 17.3± 3.34 kg/m², CV=0. 19). Tables 1 and 2 contain the inter-rater reliability results for before and after, respectively, the Line A and Line B measurements, and the plantar arch index test condition for both feet. Regarding inter-rate reliability, we observed values of ICC ranging from 0.79 to 0.96, the results were classified as substantial to excellent reliability), being the lowest ICC values occurs for line B, mainly in the first assessments. The variability (SEM) ranges from 0.08 to 0.21 (percentage: 3.74 to 28.7), being the best, the lowest SEM values occur for Plantar Arch Index assessments and MDC varies between 0.22 and 0.59. Table 3 shows the intra-rater reliability obtained from the test condition and their respective measurements in the two assessments (test-retest). Regarding intra-rater reliability, we observed values of ICC ranging from 0.92 to 0.99, being the lowest ICC values occur for line B analysis. The variability ranges from 0.03 to 0.20 (percentage: 2.32 to 26.6), being the lowest SEM values occur for Plantar Arch Index assessments and MDC varies between 0.09 and 0.54. The results indicate excellent intra-rater reliability was obtained, which may be due to the measurement of the analyses in the 7-to 10-day intra-rater period.
Tables 4 and 5 present the frequencies to the foot type classifications by the Staheli method, before and after. Regarding inter-rater reliability for the classifications, we observed values of Weighted Fleiss Kappa Coefficient ranging from 0.83 to 0.87, expressing almost perfect agreement among the raters in the before and after evaluations. Table 6 provides the test-retest reliability results of the longitudinal arch classification. Regarding intra-rater reliability for the classifications, we observed values of Weighted Cohen Kappa Coefficient ranging from 0.80 to 0.996, expressing substantial to almost perfect agreement intra -rater.

Discussion
This study assessed the intra-and inter-rater reliability of the longitudinal plantar arch measurement from children aged 6 to 10 years old. Reliability studies are very important in clinical practice and research because they provide more safety to the use of treatment and assessment techniques. Our findings indicate that the raters achieved high and very high levels of reliability in the children's footprint analyses, using baropodometry.
In clinical practice, professionals must be sure to use reliable and reproducible measurements. Therefore, studies need to demonstrate reliability in relation to measurements taken, and both reproducibility and repeatability must be tested, considering the quality of inter-and intra-reliability results, respectively 11,37 .
A previous study using the Staheli and Chippaux-Smirak indices revealed that measurements of the plantar arch index yielded excellent inter-rater and test-retest reliability in footprints of school-age children from 6 to 10 years old 26 . Another study investigated intra-day reliability (two different times of the day) and inter-rater reliability of the arch index results for eight-year-old children, assessed by a pressure platform under two different conditions: static and dynamic. The results revealed that, regardless of the test condition, the reliability of the measurements was high 37 . In the present study, the children were assessed in a single testing condition: eyes open (EO). It was verified that in such a testing condition, the repeatability and reproducibility of the arch index ranged from high to very high. The reason for that might be that the index used in the measurements was manually drawn up and calculated by the raters themselves. Even with an interval of 7 to 10 days between the assessments, the arch index remained reliable among the raters. Accordingly, a previous study comparing the data from three different plantar arch indices (Chippaux-Smirak, Staheli, and Clarke) assessed by a manual method and using Photoshop CS5 Software revealed good agreement among the techniques, indicating that the two have clinical applicability for assessing young people's footprints 38 .
In a study of 12-year-old children, there was high test-retest reliability among plantar arch index measurements using Clarke's angle and radiographic measurements. The study suggests that Clarke's angle is reliable and sensitive for quantifying medial arch height in children and is recommended for use in studies and clinical applications 39 . In another study, the Staheli and Chippaux-Smirak indices were used to compare the longitudinal plantar arch index of the feet of Brazilian and German children from 3 to 10 years of age, through a pedigraph (Foot Imprinter Harris Mat). There was no difference among the arch index values for all the age ranges assessed, except for four-year-old, children who had lower values on the Chippaux-Smirak index 40 . The application of the Cavanagh and Rodgers, Staheli, Chippaux-Smirak, and Clarke's angle indices resulted in good reliability based on the intraclass correlation coefficient 41 . In this study, the Staheli index proved to be valid and indicated good reliability through pressure platform analysis, as shown by the high ICC values, and can be recommended for studies and clinical applications. The Staheli index is widely used in Brazil 22 and other countries 42 for classifying foot types in different populations. In the present study, with footprints captured by a baropodometric platform, the Staheli index indicated a low prevalence of flat feet in the sample. This finding was consistent among the raters with high test-retest reliability. The prevalence of flat feet was similar in a study conducted with children in the same age range in Taiwan 26 .
Two systematic review studies found a prevalence of 4% to 15% of flat feet in school-age children 43,44 . In the activities of daily living in children, the feet positively influence the maintenance of postural stability. Normative data indicate that flat foot is a normal finding in children in their early years of life, which changes with their motor experiences as the children grow up and development 7,45 . In the present study, taking into consideration intra-and inter-rater reliability in the before and after comparison, the prevalence of flat feet ranged from 4% to 8.3%, which confirms the findings of previous studies.
Baropodometric platforms are usually used to assess balance, postural control, and plantar pressure 46 . In the present study, the platform was used to generate footprint images. The findings revealed that this method yielded reliable and consistent measurements among the different rater, since there was a high correlation between their measurements during the analyses.Regardless of the test condition assessed, the results were consistent and reproducible. The values for the index and classification of the longitudinal arch of the foot did not change over time. Some limitations are pointed out, such as the lack of parameters to classify the plantar arch in the baropodometric platform used. Considering this limitation, and that there are no parameters in the literature for the collection method used, it is believed that the present findings will be useful as a reference for future studies. The use of reliable, evidence-based techniques can assist in the assessment, intervention, and prognosis of possible foot alterations.

Conclusion
Our findings confirm a good intra-and inter-rater reliability in the assessment and classification of the longitudinal plantar arch of children from 6 to 10 years old. The study could contribute to the clinical practice of both professionals and researchers who work with children's and adolescents' postural assessment and intervention methods.