Intra and inter-rater agreement of the Dynamic Movement Assessment ™ Agreement of the DMA™

- Aims : To investigate the inter-and intra-rater agreement of the Dynamic Movement Assessment (DMA™) risk classification. Method : In this study, after the anthropometric measurements were made, 17 female soccer athletes were filmed performing the six DMA™ tests (full squat, step-up, single-leg squat, jump test, test plank, and side plank). Both, major and secondary deviations, were observed during the tests. Two experienced health professionals performed video analysis using Kinovea 8.15.0 (inter-rater agreement). To assess the intra-rater agreement, the same video analysis was performed two months later. Participants were rated from 0 to 21 points and at low, medium, moderate, and high risk of developing musculoskeletal injuries. To assess the reliability of the assessment of movement patterns of DMA, the intraclass correlation coefficient (ICC) was performed with a 2-way random-effects model with an absolute agreement (inter-rater) and a 2-way mixed-effects model and consistency (intra-rater). Weighted Kappa Agreement Analysis (k w ) was performed with linear weights to assess the level of agreement related to the risk classification of DMA (high, moderate, medium, or minimum). The Analysis was performed with StatsDirect v.3 and SPSS (23.0). Results : Comparing the number of points between the inter-and intra-rater, the ICC was 0.91 (95% CI = 0.74-0.97) and 0.84 (95% CI = 0.59-0.94), respectively, with k w = 0.46 (P = 0.02) intra -rater and k w = 0.46 (P = 0.006) inter-rater (Table 9). Conclusion : DMA has excellent inter-and intra-rater reliability to evaluate movement patterns and classify the risk of musculoskeletal injuries.


Introduction
Musculoskeletal injuries have a high incidence among participants in physical activities, and it is the predominant cause for their absence from sports activities and work [1][2][3][4] . In sports, injuries are more common on the lower limbs and trunk 5,6 .
To identify individuals at high risk of injury, certain assessment strategies have been developed 7 . Some authors suggest that the presence of certain movement patterns, such as the presence of dynamic valgus, is considered the main cause of musculoskeletal injuries in athletes and practitioners of recreational or occupational activities 8,9 . Moreover, the majority of training sessions by these individuals comes associated with an increased risk of musculoskeletal injuries 8,10 .
The Dynamic Movement Assessment™ (DMA™) is a tool that uses two-dimensional video analysis to measure the pattern of movement in six functional tests and classifies the risk of injury in the full squat test (FST) [11][12][13][14][15][16][17][18] , step-up test (SUT) 19 , single-leg squat test (SLST) 20 , single-leg hop test (SLHT) 21,22 , plank test (PT) 23 , side plank right test (SPRT) 23 and side-plank left test (SPLT) 23 . The first four tests are based on movements and gestures present in various sports. The plank tests are designed to assess the strength of the core muscles. This method emerged as an alternative assessment because of the high cost and difficult access by the population to the bio-mechanics laboratories 24 . The use of DMA™ may assist in the identification of injury risk and the subsequent implementation of programs may reduce the risk of injury.
The DMA™ can be carried out by Physical Education professionals and Physiotherapists trained in the method, but the ability of DMA to predict the risk of injury is unknown. For scientific research, it is necessary that the instrument used to collect data is valid and reliable 25,26 . Validity refers to the ability of an instrument to measure what it is intended to measure. Reliability, on the other hand, allows the identification of potential sources of measurement error, that can compromise the evaluation. Thus, an instrument can be trusted without being valid, but it cannot be valid without being reliable 26 . The initial step to assess the validity of a method is to know its reliability since the validity coefficients are limited by the reliability of the measure. When the criteria are not perfectly reliable, the maximum possible validity is smaller than or equal to the square iD iD iD Inter-and intra-rater Agreement of The Dynamic Movement assessment ™ root of the reliability coefficient 27 . DMA has a subjective nature and the decision of the evaluator can impact the ability to predict injuries25. However, the inter-and intra-rater to evaluate the movement patterns and classify the risk of injury are not known. Therefore, this study aimed to analyze the inter-and intra-rater agreement of DMA ™ to evaluate the movement patterns and classify the risk of musculoskeletal injuries.

Methods
This agreement study was approved by the Ethics in Research at the Naval Hospital Marcilio Dias Committee

Sample
Female football athletes of the Physical Education Center Almirante Adalberto Nunes from Brazilian Navy (CEFAN) were invited to participate in this study. A healthcare professional evaluated the musculoskeletal pain and complaints of the volunteers in the previous seven days using the Nordic Musculoskeletal Questionnaire 29 . Only the injury-free athletes during the seven days before the assessment were included in the study. Once the criteria above were met, the assessment of height and body mass was held, allowing the calculation of the body mass index (BMI). The fat percentage was estimated following the protocol and Pollock's equations 30 .
The sample size was calculated after conducting a pilot study with six participants. The aim was to calculate the observed agreement and the chance agreement of the DMA 31 . After calculating the scores achieved by the assessors, it was possible to prepare a 2x2 table with the number of concordant and discordant answers associated with the minimum, medium, high, or moderate risk of musculoskeletal injuries of participants. Afterward, the Kappa Index (k) was calculated according to equation (1) 31 . Considering k = 0 as null hypothesis 31 , data were used to obtain the sample size recommended in the literature to assess agreement in studies where the outcome is an ordinal categorical variable 31 . The result obtained in the pilot was a k = 0.70. Considering the accuracy of the test being 80%, an α error of 0.05, calculations were made based on a sample of 17 participants.

Procedure for the Dynamic Movement Assessment™
The individuals were invited to participate in the study during the team's preseason. Before starting the tests, a health care professional investigated the data: date of birth, sports category, history of musculoskeletal injuries or recent surgeries, and musculoskeletal pain complaints in the previous seven days. If criteria were met, the athletes would then be evaluated using the DMA™.
The DMA™ tests were filmed by two physical therapists trained in the DMA™ method, with six years of experience and having performed more than a hundred tests. The participants did not do physical activities on the assessment day. As recommended by the DMA™, the athletes were not familiarized with the method. Initially, they were given verbal directions for each test regarding the execution of the exercise and there were a few repetitions so that all participants received the same information regarding the movement in each functional test.
All tests were filmed with a digital camera (Nikon Coolpix P600 16.1-megapixel and 60 images per second, Japan) positioned on a tripod leveled both horizontally and vertically, with a distance of 3.04 meters from the participant. The DMA™ consisted of the following functional tests, in order, with no gaps between them, as previously described by Nessler & Haile 24 : FST, SUT, SLST, SLHT, PT, SPRT, and SPLT ( Figure 1). The tests were performed in this sequence without any gaps between them. The participants were asked to inform if they felt any pain prior to the test and any that came along with the movements. If any pain related to the test was reported, the movement would be finalized and given a zero score 24 . As some of the movement patterns analyzed are not viewed from the frontal plane (PT, SPRT, and SPLT) the two evaluators were positioned perpendicular to the camera during these tests. Participants performed 10 repetitions on each side for the unilateral tests (SUT, SLST, and SLHT), 20 repetitions of FST (10 with their front to the camera and 10 with their back), and maintained the plank test for 60 seconds (PT, SPRT, and SPLT).

Video analysis and score calculation
The analysis of the videos was done using Kinovea software (v. 8.15.0). The image was calibrated using a red piece of wood 30 cm in length. This material was positioned next to the participant during all tests ( Figure 1). Each of the six tests was rated from zero to three points, three points being the best score a participant could get in each test. Then, the possibilities of classification were: three points (normal movement); two points (few deviations); one point (many deviations or movement patterns associated with a high risk of injury); or zero (if any pain was associated with the test).
The tests used to assess the resistance of the trunk muscles were the plank and the side planks 23,32 . In these tests, the depression of the pelvis higher than 2.54 cm, winged scapula, trunk rotation, lower trunk rotation, hip elevation, increased lordosis, and oscillations were observed 24 . Table 1 details the DMA risk classification criteria. Tests 2, 3, 4, and 6 are unilateral. Thus, the final score is calculated by using the lowest score between the right and left sides. Risk rating increases by one risk category if the following factors are observed: prior anterior cruciate ligament (ACL) injury or high-risk sports practice (such as basketball, soccer, volleyball, football, cheerleading). Inter-rater reliability and intra-rater reliability movement patterns were observed on video. The only exception was the plank tests, in which in addition to the video analysis, the evaluator observed the participant to detect deviations that cannot be seen from the frontal plane, such as rotations of the pelvis or trunk and a winged scapula.

-High
A collective score of 0-3 points on tests FST, SUT, and SLST + collective score of 3 points on tests PT, SPRT, and SPLT; and/or LOB in the tests FST and SUT;

-Moderate
A collective score of 4-5 points on tests FST, SUT, and SLST + collective score of 4 points or better on tests PT, SPRT, and SPLT; and/or LOB in two of the 3 test FST, SUT, or SLST;

-Medium
A collective score of 6-7 points on tests FST, SUT, and SLST + collective score of 5 points or better on tests PT, SPRT and SPLT; and/or LOB in one of the 3 test FST, SUT, or SLST;

-Minimum
A collective score of 8-9 points on tests FST, SU, T, and SLST + collective score of 6 points or better on tests PT, S P, T and SPLT; and/or LOB in one of the 3 test FST, S,UT or SLST; Inter-and intra-rater Agreement of The Dynamic Movement assessment ™ Intra-rater agreement Evaluator 1 did not attend the screening and was unaware of anamnesis data from the participants. He was responsible for filming the functional tests and analyzing the videos on the same day (phase: DMA1). To assess the intra-rater agreement, the videos were analyzed again after two months (phase: DMA2). The motion analysis procedure was done using the biomechanical analysis software Kinovea version 8.15.0.

Inter-rater agreement
The videos obtained during the first stage of evaluation of the participants (DMA1) were analyzed independently by a second examiner, who also did not know the history of the participants.
The results obtained from the analysis of movement patterns, as well as the risk rating, were compared by an independent statistician. Thus, it was possible to verify the inter-rater agreement. The participants had access to the results, which were delivered to the team's physiotherapist.

Statistical analysis
Initially, 4x4 tables were constructed in order to investigate the intra-and inter-rater agreement. The movement patterns were analyzed using seven tests from the DMA and were given a final score ranging from 0 (zero) to 21 (twenty-one) points (interval). As explained in Table 1, the classification ranged from a minimum, mean, moderate, or high risk of injury (ordinal).
To assess the reliability of the DMA TM , we used the intraclass correlation coefficient (ICC) preceded by the confidence intervals for Bland-Altman Limits of Agreement for the exclusion of outliers, as well as a typical error of measurement in regard to the number of points obtained. For intra-rater reliability, the ICC (3,1) was performed with a 2-way mixed-effects model and consistency. To evaluate the inter-rater reliability, the ICC (2,1) was performed with a 2-way random-effects model with absolute agreement 33 . To assess agreement DMA™ to classify the risk of injury as low, medium, high or moderate, the Kappa Index was used with linear weights (kw) 34,35 . The intra-and inter-rater agreement was verified by the number of consistent responses, i.e. the number of cases in which the result is the same between evaluators, excluding those by chance 31,34 , according to equation (1): where k w = Kappa with linear weights; Po = Proportion of the observed agreement; Pa = Proportion of chance agreement.
To compare the agreement between pairs of DMA scores, we developed a 4x4 table. Kappa analysis was done by comparing the scores of four DMA (high, moderate, medium, and low risk of injury). Finally, the prevalence and the bias ratio of the study were calculated.
The equation (2) was used to calculate the prevalence: where a = number of positive concordant cases between the evaluators 1 and 2; d = number of negative concordance cases between the evaluators 1 and 2; n = sample size. Values close to 1.0 (one) indicate higher prevalence and values closer to 0 (zero) indicate low or absent prevalence.
The bias was calculated using the following equation (3): where b = number of conflicting positive cases between evaluators 1 and 2; c = number of conflicting negative cases between evaluators 1 and 2; n = sample size. The closer to 1.0 (one) the greater the bias and the closer to 0 (zero) the smaller the bias.
It was considered as significant values of P ≤ 0.05.

Results
A total of 38 athletes who play women's professional football were invited to participate in the study, but only 17 met the inclusion criteria and volunteered. Sample characteristics are shown in Table 2 and the evaluation results for each participant are detailed in Table 3. The number of concordant results for the study of the inter-and intra-rater agreement is, respectively, in Tables 4  and 5. Figures 2 and 3 show calculations of confidence intervals for Bland-Altman Limits of Agreement. The correlation values inter-and intra-rater by calculating the ICC, through comparison of scores ranging from 0 (zero) to 21 (twenty-one) of DMA are in Table 8. Comparing the scores of four DMA (high, moderate, medium, and minimum risk of injury), k values were low (Table  9). Therefore, a comparative table of the Kappa Index and DMA scores was constructed (Table 10), in order to verify which scores, show the lowest agreement that none of the participants had minimal risk. Furthermore, there was no significant difference between the "medium" and "moderate" scores (k = 0.34 and P = 0.14) but the "high risk" was significantly different (k = 1.0; P = 0.000). The prevalence and bias values were respectively 0.82 and 0.06. There was no significant difference between the "medium" and "moderate" scores (k = 0.34 and P = 0.14) but the "high" score was significantly different from the "medium" (k = 1.0; P = 0.000). The prevalence values and bias rate for the study were respectively 0.82 and 0.06.     Legend: C = categorical; I = interval.    16.2 ± 2.1 ICC = intraclass correlation coefficient; TME = typical measurement error; There was a deletion of a participant, the result of which was out of the Bland-Altman confidence limits. a The estimator is the same, whether the interaction effect is present or not. b This estimate is computed assuming the interaction effect is absent because it is not estimable otherwise.  Minimum* x ---

Discussion
The results of the present study showed that there were 16.6± 2.6 and 16.6 ± 2.3 points in phase 1 and 2 (intra-rater agreement), respectively, resulting in a significant ICC of 0.84 (95% CI = 0.59 to 0.94). The evaluator 2 quantified 16.6 ± 2.6 points (inter-rater agreement). Thus, the mean value of the ICC inter-rater was 0.91 (95% CI = 0.74-0.97) ( Table 6). Moreover, compared to the high-risk classification categories, medium, moderate and minimum risk presented intra-raters k w of 0.46 (P = 0.02) and inter-rater values of k = 0.46 (P = 0.006) ( Table 7). Therefore, there was a moderate agreement between values. If we consider the total sum of points (0-21), the DMA has relatively high reliability, but it shifts to moderate when the risk of injury is classified (Tables 6 and 7).
The DMA was originally developed to classify the risk of an individual developing musculoskeletal injuries 24 . The score is obtained after quantification of the points given in each of the seven tests and can vary from 0 (zero) to 21 (twenty-one) points. Since categorizing risk takes into account the total sum of the points in FST, SUT, and SLST (tests 1, 2, and 3) as the first criteria for individuals to be categorized into risk groups, the total sum of the score values yields classification into high risk (up to 3 points), moderate (4 to 5 points), medium (6 to 7 points) and minimum (8 to 9 points). Moreover, the total sum of points in the three plank tests (PT, SPRT, and SPLT) should be up to three points (for high risk), more than 4 points (for moderate), 5 points (for medium), and 6 points (for minimum). In addition to fulfilling the classification criteria, if the participant practiced high-risk sports or reported prior anterior cruciate ligament injury, the risk level was increased. Thus, an athlete could be given the same total score for the two different evaluators or even if evaluated at two different times but be classified into different risk categories.
This can be better exemplified using the results obtained in this study: of the 17 participants, six were classified in different categories between the first and second evaluations (intra-rater agreement), but the total score (between 0 and 21) of two participants was equal; two participants showed a difference of two points and two other participants showed a difference of two points or more. Between evaluators 1 and 2 (inter-rater agreement), seven participants were classified into different injury risk categories, but the total score (between 0 and 21) of two participants was equal; three had a difference of one point and 2 participants had a difference of two points. Thus, among 5 and 6 participants, whose points were the same or differed by only one point, respectively, were classified into different risk categories (Table 2). Considering, for example, participant number 1 received 16 points by the first evaluator and 17 points by the second evaluator. This difference has no impact on the ICC. However, the athlete was classified as moderate and highrisk respectively by evaluators 1 and 2. Thus, a difference of one point in tests 1, 2, and 3 can result in a classification into different categories, which is reflected in high ICC values and moderate kw values.
Analyzing the comparative table between the pairs of scores (Table 8), even though there is no significant difference between medium and moderate-risk categories (k = 0.34 and P = 0.14), the high-risk category is different from the moderate risk category (k = 1.0; P = 0.000). On the other hand, the classification interval showed good reliability intra-and inter-rater (Table 6).
In evaluation tests of movement patterns, low-reliability values are due to the subjectivity of the tests 25 . In such cases, evaluators use qualitative decisions to perform the classification test. Therefore, discrepancies may occur between evaluators or between multiple tests performed by the same evaluator. Although qualitative, functional tests are often helpful, especially when it comes to assessment protocols of individuals with previous injuries 36 .
In the DMA, the main differences that may be potential sources of disagreement are a lateral deviation of the pelvis (FST) 37 , presence and magnitude of hip adduction (FST, SLST, and SLHT) 25 and positioning the pelvis (PT, SPRT, and SPLT). Reliability values of hip adduction range from poor to substantial 38 . We did not find studies that evaluated the plank using the two-dimensional analysis. However, in these tests, a potential source of bias can arise from determining the starting position of the pelvis in a two-dimensional evaluation software. Furthermore, some deviations were analyzed live (presence of winged scapula, pelvis rotation or trunk oscillations, etc.) and, therefore, are potential sources of disagreement.
Using more than two categories of standard movement evaluation protocols were previously performed, as proposed by the DMA, and was mentioned as a threat to the reliability 39 , because it can make the assessment more complex. The subjectivity of the tests may have contributed to a moderate agreement. A clear guidance regarding the evaluation protocol is also recommended 40 . Kappa index was used, since the DMA results is an ordinal variable, with more than two categories 35 .
The interpretation of the Kappa index is not straightforward, as other factors can influence the magnitude of the coefficient or interpretation that can be placed in a given magnitude. Nonindependent classifiers, prevalence, and bias can influence the magnitude of kappa 31 . The analysis was performed independently, so as not to influence the results. On the other hand, a disease in the population impacts agreement assessed by K in cases where the disease is very common or rare, making the professionals responsible for diagnosing the increasing trend of a positive or negative diagnosis 31 . In the present study, there was a prevalence of 0.82 (Table 7), which could contribute to a final higher agreement. On the other hand, there are situations where the evaluators disagree with the proportion of positive (or negative) cases and that is reflected in the difference between the cells of the 2x2 Table showing the disagreements in regard to positive and negative cases 41 . In this study, a bias ratio of 0.06 (Table 7) does not interfere with kappa values.

Limitations and strengths of the study
It is important to assess if the reliability of the method varied depending on the experience of the evaluators. This study was conducted using experienced evaluators. The evaluators of this study underwent formal training with DMA and had five years of experience at the time of the study. Therefore, these results cannot be extrapolated to inexperienced individuals without formal training in the method. To increase the clinical applicability of the method, a next step could be the evaluation of live DMA without the 2D evaluation. However, new studies should be conducted to test the reliability. Moreover, it was not possible to assess the internal consistency reliability with a reassessment within two days, because the same evaluator remembers the results of the test. However, an evaluation of the stability of values was conducted.

Conclusion
In this study, DMA ™ displayed excellent inter-and intra-rater reliability to measure the movement patterns and classify the risk of musculoskeletal injuries when the number of points obtained in the seven tests are considered. This reliability decreases to moderate when considering the risk of injury yielded by the method. Further studies are suggested to compare expert and novice evaluators. Therefore, caution should be exercised when generalizing the results obtained in this study. Further research is suggested in order to verify the validity of DMA to classify the risk of injury.