The relation between gross motor coordination and health-related physical fitness through raw and standardized measures from the KTK and Fitnessgram tests

Aim: The purpose of the current study was to investigate the effects of considering single/ combined and raw/ standardized measures from the Körperkoordinationstest für Kinder (KTK) and Fitnessgram tests on the relation between GMC and HRPF in four age-groups. Method: Participated five-hundred thirty-one children and adolescents (279 boys). The individuals were divided into four groups: 4 to 7, 7 to 9.5, 9.5 to 12, 12 to 15 years of age. We utilized the KTK and Fitnessgram tests to measure, respectively, GMC and HRPF. Bootstrap correlations and χ tests were performed for all individuals, and each group controlling for sex. Results: For the raw scores, correlations were around (absolute) r = 0.37, except body composition, with large decreases when controlling for age and sex. For standardized tests, considering either the GMC quotient or GMC categories, correlations were all significant (around r = 0.34). Finally, considering broad categories (apt/non-apt and coordinated/non-coordinated), the association was 0.16. Conclusions:We found clear influences of the measure utilized on the association between GMC and HRPF measures.


Introduction
Health-related physical fitness (HRPF) is a theoretical construct composed of a set of anthropometric and physiological components, such as cardiorespiratory or cardiovascular endurance, muscle strength/endurance, flexibility, and body composition 1,2 . There is a recognition that increased health-related physical fitness (HRPF) in the first two decades of life decreases the chance of chronic-degenerative diseases later in life [3][4][5] . Indeed, it has been one of the most studied constructs in the health sciences area [6][7][8] . HRPF is associated with several intervening variables 5,9 ; one of them is the gross motor coordination (GMC) 5,9,10 .
GMC is also a construct that reflects a general motor capability, being the basis for activities of daily life 11 and a variety of motor skills 12 . GMC development involves broad movement experiences as well as typical neuromotor function maturation 12 and has been recognized as one of the most important components of motor competence 13 À determinant for maintained physical activity in childhood 5,14 .
There is growing evidence supporting the positive association of GMC and physical fitness in children and adolescents 3,9,13,15 . Considering the circular model of engagement, children with high levels of motor competence will have higher perceived competence, perceive tasks as less difficult, and engage more frequently in many tasks; this would inevitably lead to higher levels of HRPF 5 .
The Fitnessgram 16 and KTK tests (Körperkoordinationstest für Kinder) 17 are, respectively, the most used tests to assess HRPF and GMC. The Fitnessgram provides a raw score of its constituent aspects (aerobic fitness, body composition, and muscle aptitude) and allows classifying children based on criterion-referenced cutoff points based on functionality and healthy fitness zones standardized by age and sex (see 16 for a review). The KTK's has been used as the general test for GMC, addressing motor aspects such as balance, rhythm, strength, laterality, and agility through four tests 11,18,19 . The KTK' raw scores can be standardized by age and sex (in reference to the German population norms upon which the KTK was established) and combined into a global indicator of motor coordination level: the motor quotient À which can be further categorized (e.g. good, high, poor GMC) 17 .
Although both batteries were intended to be used as a whole with the standardized measures, some studies investigating the relations between GMC and HRPF employed either the measure of a single test (in each battery) or preferred to use the raw scores (or a mix between these) 9,10,20,21 . This might be problematic as there is no evidence that the relation between GMC and HRPF is the same when employing either raw or standardized scores. Standardization (by age, for instance) eliminates changes in performance that are expected to occur "typically" as one grows older while the raw score still carries its influence. It means that associations observed using raw scores might be confounded with age and sex effects. Indeed, KTK raw scores are found to correlate with age 22 while its standardized measures present moderate stability as individuals grow older 19 . In the same vein, if the categories reflect truly the "meaning" of these constructs, associations found at the level of raw or standardized continuous scores should be disregarded as spurious. Of note, there are studies that employ even broader (and arbitrary) categorizations when analyzing the relation between motor behavior and physical fitness with, probably, the underlying assumption that this would not modify the relation 23,24 .
Considering that these tests have their utility in capturing important constructs of physical activity and motor development in applied settings, it is tantamount to investigate whether and how such measures demonstrate associations. Such investigation could provide a basis for physicians, physical education teachers, and movement researchers to properly implement and infer from these tests. For this reason, this correlational study aims to investigate GMC and HRPF positive association dependence on measure standardization from its battery tests. For this, we assessed the association considering raw, standardized, and categorical variables that can be derived from the tests. It is our expectation that the positive association is mainly observed in terms of the original categories proposed in the batteries as they capture the essence of these constructs.

Participants
The participants were selected from a larger study on growth, maturation, and motor development in Muzambinho, Brazil 25 . All children from the seven schools from the town were invited to participate. The exclusion criteria were to present any reported cognitive or physical disability (from a physician or family member). The inclusion criteria for this study was to have all required evaluated tests in a given data collection wave. In total, five hundred and thirty-one children, without any physical and/or intellectual disability, were selected and divided into 4 groups by age; from 4 to 6.99 years of age (GR6; 21 girls and 30 boys); from 7 to 9.49 years of age (GR8; 84 girls and 78 boys); from 9.5 to 11.99 years of age (GR11; 107 girls and 113 boys); and from 12 to 15 years of age (GR13; 40 girls and 58 boys). Table 1 shows the mean and standard deviation of each group's age, height, and weight.
Participants and their respective guardians signed the assent form and consent form, respectively, approved by the Ethics Committee from the School of Physical Education and Sport of the University of São Paulo (Protocol number:13832).
Procedure GMC was measured using the KTK developed by Kiphard and Schilling 17 , and validated encompassing the present age range. KTK consists of four tests. The first test task is to walk backward three times along three different balance beans (3m long, 5 cm tall, with a width of 6, 4.5, and 3 cm). The score is the number of steps without falling considering a maximum of 8 per attempt (maximum score is 72). The second test task is to jump on one leg over a foam barrier (composed of smaller foams of 5 cm height). The score was composed of points for successful attempts À if the first attempt was successful, the participant would receive three points; two points if in the second attempt, and so on. The maximum score was 39 per leg (ground level added of 12 pillows). The third test task is to jump sideways above a wooden slat (thus, the jumps were each time for one side) of 4 cm height as many times as possible in 15 s. Each participant had two trials and the number of jumps was summed up as the score. The fourth test task is to locomote as much as possible by stepping from the first wooden plate to the second with the need for grasping and putting the first wooden plate ahead to continue (and so on). The score was the sum of two trials of changes from one to the next plate (for more details, see 11  The first categorization is based on the KTK manual and the second is a broader one defined in this study. We did not use the raw score of each test as a prior evaluation of the data showed a high correlation between the four tests (78% of variance accounted for in the first component using a principal component analysis with high load for all four tests). Therefore, the SS suffices for the current purposes. HRPF was assessed using the Fitnessgram test 16,26 , developed and validated encompassing the present age range. We considered one measure of aerobic fitness (time to run/walk a mile À Mile), body composition (body mass index À BMI), and three measures of muscle aptitude: arms strength (maximum repetitions of push-ups À Pushups); abdominal endurance (maximum repetition of curlups À Curl-ups) and trunk flexibility (distance achieved by the arm in a trunk lift À Trunk Lift). The cadence was controlled by the evaluator using a three-second rate per repetition. To express HRPF, we used the raw score of each test, the standardized (normalized by sex and age) categorization of fit or unfit; the number of the tests categorized as fit (heretofore, FitCount); and, as it was done for the GMC, we derived a broad category of fit (or unfit) considering those who achieved a FitCount of at least 3 (Fit/Unfit). Note that the Fitnessgram also has a third category of individuals above the recommended range of fitness; we did not consider this category here.
The researcher team of the larger study was composed of 25 researchers (including three professors). The preparation for the data collections was made in terms of theoretical and practical classes to train the team in terms of the evaluation protocols. For all tests performed in the training, the reliability inter-and intra-reliability stayed between 80% and 89%. The reliability in the field (with 5% of children of the data collection), all values were above 77%.
Participant children of a given school would arrive at the local of data collection and would be distributed in many stations. Thus, the exact order of tests within a test battery (KTK or Fitnessgram) was not controlled. Each station was related to a given test of the larger study (see 25 ). Each child would perform a test battery a day and each GMC or HRPF test was not separated by more than 3 days in between.

Data analysis
We performed associations between (1) KTK SS and raw score in each of the Fitnessgram tests; (2) KTK MQ and FitCount; (3) KTK five categories and FitCount; (4) KTK two categories and Fitnessgram classification of fitness in each Fitnessgram test. Given the results of the Kolmogorov-Smirnov tests, non-parametric analyses were used. For analyses 1, 2, and 3, we performed a bootstrap procedure using Spearman's ρ with 10000 iterations. For analysis 4, we performed a bootstrap procedure using the ϕ measure of association from the χ 2 statistic. ϕ is a measure that varies between 0 and 1 (allowing comparison with the absolute ρ statistic) derived directly from the χ 2 statistic. Thus, its confidence interval was considered in terms of a critical ϕ value.
Analyses were performed considering the whole sample, at first, and then the sample separated by age groups and controlled by sex. The latter was necessary as the raw scores are expected to still carry the influence of age and sex. To address our questions, the confidence interval was derived from the bootstrap distribution using the 2.5 and 97.5 percentiles. All analyses were performed in Matlab R2020a. Tables 2 and 3 show the sample characteristics and summary of the variables of the present study for girls and boys, respectively. In general, the table exemplifies the issue of standardization in these tests when considering the age discussed in the introduction. For raw scores, older individuals showed better results than younger ones. For standardized measures (and categories), older individuals are either equal (GMC) or worse in their results than younger individuals (HRPF). Additionally, we see that boys show three fit categories as their mode for all ages while older girls drop it to 2 for GR11 and GR13. This is less apparent when considering the raw scores.   Figure 3 shows the ϕ mean and 95% confidence interval from the bootstrap χ 2 values for the association between Fit/Unfit and KTK Two Categories. Considering the critical values (black lines) and the ϕ CI 95% only the Not Controlled association was significantly different than zero. Still, even this correlation is small (ϕ = 0.16; CI 95% = [0.11, 0.21]). All other associations were not different than

Discussion
This study investigated the influence of employing raw and standardized measures to assess the GMC and HRPF relation also considering age and sex. Employing the KTK Fitnessgram battery tests, our results demonstrated that, in line with our expectations, the GMC and HRPF association is dependent on the measure.
Raw scores (and individual tests) led to no discernible trends of age (see Figure 1) but showed the requirement to control for age and sex. We expected such changes in correlation as standardization procedures consider what is expected for a given age and sex (based on criteria or norms). Age changes carry a diversity of factors from biological neuromaturation, strength increase, motor, and social experiences 12,27,28 . Sex also carries differential motor and social experiences 29 , emphasized by physiolo-gical and biomechanical changes in the body [30][31][32] . Thus, an increase in push-ups cannot be considered as an improvement in HRPF directly from raw scores as such repetitions might be below what is required (and expected) for functional interactions.
Theoretically, GMC and HRPF are latent variables requiring a range of procedures to be captured. The individual components that compose the overall construct will refer to some facets of the latent variable but will not be able to fully describe it. Nevertheless, it is common to find studies utilizing isolated tests to refer to the construct 3,19,21 . In accepting the requirement for all tests to express the latent variable, the most appropriate measure is the one that was validated to refer to the construct À the standardized measures.
Clearly, one could question whether the tests capture the latent variables as supposed and whether the standards are valid (e.g., is it valid to use the German population for KTK?). The former requires a long-needed discussion on what GMC and HRPF are and whether the tests' results capture the concepts. In terms of standards, an advantage of the Fitnessgram 26 is the usage of criterion-referenced (CRS) instead of norm-referenced standards (NRS). This categorization is based on minimal disease risk and adequate functionality (to perform activities of daily living) which eliminates population-specific biases 33 . The KTK, on the contrary, is NRS-based which might be problematic when used in different populations 34,35 . In any case, a stan-dardized measure seems warranted in comparison to raw scores of the tests. The standards are necessary to understand whether the current value of a given measurement and its change over time has any behavioral meaning. Furthermore, overall variables characterizing either HRPF or GMC achievements (Fit/Unfit and KTK Two   s ρ) between the sum of tests achieving fit status on the Fitnessgram (FitCount) and the standardized measures of KTK (motor quotient À MQ, and categories derived from MQ) for all ages (not controlling for age and sex) and for each age group (partial correlation controlled by sex). The circle represents the mean and the error bars of the 95% confidence interval of the bootstrap distribution after 10000 iterations.
Categories) either failed to demonstrate an association or resulted in small relations. This lack of relationship can come from the overly encompassing categories created. Studies often rely on such arbitrary categorization to facilitate data analyses 23,24,36 but it might be the case that information is lost in this procedure. If the relation between GMC and HRPF is complex À encompassing many confounding factors such as perceived competence 5,37 , parents support 38,39 , then hardly broad and arbitrary categorizations will result in meaningful results.
Assuming, therefore, the standardized measures as the measure truly capturing the constructs in the discussion, we found a consistent relation between GMC and HRPF for all ages. This occurred in terms of MQ (continuous or KTK five categories) and FitCount measures when controlling for age and sex. This supports our expectations as we anticipated that neither age and sex would influence the relation for these variables and that the association would be always positive between GMC and HRPF.
Such results are promising. Despite the need to consider how long participation in physical activities requires for results on GMC and further, HRPF, to be found 40 , the standardized measures were able to demonstrate it throughout the age range 14,41 . Also, one could expect that closer to puberty, the participation (induced by GMC) would further amplify fitness results 12 . However, the standardized measures seem to encompass the effect of puberty and, probably, its interaction with long-term participation 40 .
Nonetheless, observing Table 2, we see that the standardized measures are worse for older groups. It means that, instead of a positive effect of those who demonstrate high GMC spreading the distribution by increasing HRPF results and clarifying an underlying relation between GMC and HRPF, we find a negative effect of those who demonstrate low GMC with less HRPF. This result is still in accordance with Stodden et al. 5 findings but emphasizes the negative spiral of low GMC, leading to the perception of low competence, leading to disengagement in physical activity, finally lowering HRPF.
A simple comparison with the literature is difficult. Unfortunately, a large part of studies used different statistical analyses and measures to investigate GMC and HRPF. For instance, Lopes et al. 20 used motor coordination, physical activity, and physical fitness to evaluate subcutaneous adiposity which has, from our categorization here, both GMC and HRPF measures in the relation to one HRPF aspect. Few studies saw the association between the two constructs directly, but none considered the issue of standardized/non-standardized and grouped/individualtest measures. Pereira et al. 10 , for instance, used logistic regression to relate the sum of raw scores in the KTK to the fit/unfit categories of the mile-run, curl up, push-up, and trunk-lift. They found that (for all HRPF aspects), and the relationship was significant. They, nevertheless, did not include interactions between GMC and sex and age to see whether such changes were dependent on this interaction. It is important to note that such relation was between a non-standardized measure and a standardized measure of separated tests (a common feature) 15 .
The only study that performed a direct correlation between these measures using a single "type" of measure (both grouped/non-standardized) for both GMC and HRPF, controlling for age, sex, and physical activity, was Chaves et al. 9 . They found that being more physically fit resulted in higher GMC scores (through a hierarchical model analysis). This is in consonance to Figure 1 where we show a general relation between raw scores for the majority of measures (even when controlling for age and sex). Thus, it is possible that their results would show different relations when standardization is performed. We invite these and other authors to investigate whether such results remain when interactions between sex and age are considered and when the tests are standardized.
This brief consideration of the association between GMC and HRPF calls for a more theoretical appreciation of the standards. How such standards can capture all these intricate relations in growth, motor development, and HRPF? Despite being good predictive values, they are mainly empirical À few theoretical models directly relate each variable with its causes and effects in development. Researchers investigating the model of engagement based on motor competence must consider how the model explains such clear relation for children as young as fiveyears-old 5 . That is, as reviewed in Robinson et al. 13 , one should expect that the association between GMC and HRPF to get stronger with age. As discussed here, this might occurs given the negative spiral predicted in the -Association (ϕ) between the broad Fit/Unfit category from the Fitnessgram and the broad category of Sufficient/Insufficient Coordination from KTK for all ages (not controlling for age and sex) and for each age group (partial correlation controlled by sex). The circle represents the mean and the error bars of the 95% confidence interval of the bootstrap distribution after 10000 iterations. The black line represents the minimum value required for significance (the mean and confidence intervals need to be above that line to be significant at p<0.050). model. However, age did not modify the association in our study, begging the question of how such measures truly relate to each other. Note that the association between these measures is a necessary but not sufficient condition for GMC and HRPF to be causally related as implied in the literature 5,13 .
This study is limited in not providing a measure of maturation to control for confounding effects of age, especially considering that we analyzed an age range encompassing puberty. Clearly, such a procedure would allow a more accurate understanding of the increased relation between HRPF and GMC. Note, however, that the main goal of our study was to demonstrate the influence of the measure standardization in inferring about GMC and HRPF status rather than capturing all possible intervening variables of this relation.
In conclusion, we found that the relation between GMC and HRPF is dependent on the standardization of the measures employed. Whereas low levels of physical activity in children and young people is currently a worldwide concern 42 , this dependence on the measure talks directly to professionals who wants to employ these tests to track and intervene on children's health-related physical fitness and gross motor coordination. That is, in order to correctly understand the status of an individual and track it over time, one must employ the correct measures. Additionally, the consistent relation found highlights the need for interventions in physical education: the development of either GMC or HRPF in early childhood might have sustaining effects on each other, resulting in a healthier lifestyle later in life.