Intra and inter-rater reliability study of pelvic floor muscle dynamometric measurements

OBJECTIVE: The aim of this study was to evaluate the intra and inter-rater reliability of pelvic floor muscle (PFM) dynamometric measurements for maximum and average strengths, as well as endurance. METHOD: A convenience sample of 18 nulliparous women, without any urogynecological complaints, aged between 19 and 31 (mean age of 25.4±3.9) participated in this study. They were evaluated using a pelvic floor dynamometer based on load cell technology. The dynamometric evaluations were repeated in three successive sessions: two on the same day with a rest period of 30 minutes between them, and the third on the following day. All participants were evaluated twice in each session; first by examiner 1 followed by examiner 2. The vaginal dynamometry data were analyzed using three parameters: maximum strength, average strength, and endurance. The Intraclass Correlation Coefficient (ICC) was applied to estimate the PFM dynamometric measurement reliability, considering a good level as being above 0.75. RESULTS: The intra and inter-raters' analyses showed good reliability for maximum strength (ICCintra-rater1=0.96, ICCintra-rater2=0.95, and ICCinter-rater=0.96), average strength (ICCintra-rater1=0.96, ICCintra-rater2=0.94, and ICCinter-rater=0.97), and endurance (ICCintra-rater1=0.88, ICCintra-rater2=0.86, and ICCinter-rater=0.92) dynamometric measurements. CONCLUSIONS: The PFM dynamometric measurements showed good intra- and inter-rater reliability for maximum strength, average strength and endurance, which demonstrates that this is a reliable device that can be used in clinical practice.


Introduction
Pelvic floor muscle (PFM) evaluation is recommended by the International Continence Society (ICS) and considered essential to evaluate a post-therapeutic intervention effect 1 . Several methods are used by different researchers, among them vaginal dynamometry has been particularly investigated throughout scientific fields [2][3][4][5][6][7][8][9][10][11] . According to Dumoulin et al. 12 , vaginal dynamometry can be an efficient tool for the direct investigation of female PFM strength.
However, the main limitation associated with PFM dynamometers is their lack of accessibility because these devices are mostly used by their designers and are not commercially available, a fact which excludes measurement reproducibility. Thus, this study proposed to investigate the intra and inter-rater reliability of PFM dynamometric measurements for maximum and average strengths, as well as endurance, using an equipment locally available.

Method Study design
This was a test-retest study, assessing intra-and interrater reliability of PFM dynamometric measurements.

Participants
A convenience sample of 18 nulliparous women, without any urogynecological complaints, aged between 19 and 31 (mean age of 25.4±3.9) participated in this study. All participants signed an informed consent form, and the study was approved by the research ethics committee of Universidade Federal de Alfenas (UNIFAL-MG), Alfenas, MG, Brazil (CAAE: 06620512.4.0000.5142). The inclusion criteria were: nulliparous women, between 18 and 35 years old, normal body mass index (<25 kg/m 2 ), without any urogynecological complaints and presenting PFM strength equal to or greater than grade 1, according to the Modified Oxford Grading Scale 19 . The exclusion criteria were: pregnant women, pelvic organ prolapse or reconstructive pelvic surgery, symptoms of vaginal infection, intolerance to condoms, allergy to the gel used in the procedure, degenerative neurological disorder or any other disease that may interfere with PFM strength measurements, being in either a premenstrual or current menstrual period 2,5,20 .

Assessment tools
A dynamometer designed to measure PFM strength was used in the present study (EMG System do Brasil, model DFV 020101/10  ). The vaginal dynamometer is cylindrical in shape (9.5cm in length and 3.3cm in diameter), made externally in plastic and internally in steel structures and equipped with a load cell 2cm from its base, which can measure anteroposterior unidirectional compressive strength in kilogram/force (Kgf) units. The vaginal dynamometer was connected to a computer and both remained unplugged from the mains during the collections to avoid any interference.

Interventions
PFM strength was evaluated for all women and repeated in three successive sessions: two on the same day with a rest period of 30 minutes between them, and the third on the following day. First, an interviewer asked the participants to provide their demographic and clinical data. Then, all participants were evaluated twice in each session, first by examiner 1 followed by examiner 2, in a randomly selected order, as presented in Figure 1. The interviewer remained in the assessment room to ensure that the same procedures were performed by both raters and the raters were blinded to each other's results.
As in Ferreira et al. 20 , both examiners in this study were previously trained to perform the PFM assessment protocol (digital palpation and dynamometric assessment) by a well-experienced physical therapist with 16 years of clinical practice experience. They also had comprehensive knowledge and experience in PFM assessment skills.
The ability to contract and relax the PFM was first evaluated by digital palpation, in the lithotomy position. The participant was asked to perform a maximum contraction of her PFM, lifting it inward and squeezing around the fingers then completely relaxing it 20 . When a correct contraction was verified, the examiner scored it according to the Modified Oxford Grading Scale (0-5 points) 19 , which determined the participant's eligibility.
Thus, PFM strength was assessed with the vaginal dynamometer, which was covered with a condom (Elite  ) and lubricated with hypo-allergenic gel (Johnson & Johnson  KY gel), then inserted into the vaginal cavity with the load cell positioned so that it could capture the anteroposterior compression strength. Next, the participant was asked to perform three maximal voluntary PFM contractions, recorded for 15 seconds with a rest period of three minutes after each one of them 21 directed by a verbal command as follows: "When I ask you, please, perform a pelvic floor contraction as hard as possible, maintaining as long as you can and then relax when you get tired".

Data analysis
The vaginal dynamometry data were analyzed by the main researcher, using three parameters ( Figure 2): -Maximum strength: the researcher calculated the difference between the highest and lowest strength values, which were provided by the equipment software 3 , in kgf.
-Average strength: a mean value of the strength curve, provided by the equipment software, in kgf.
-Endurance: equal to the length of time, in seconds (s), during which the participant could maintain a contraction above 60% of her maximum strength 22,23 .
An average value was calculated for each parameter, using the results of the three values.

Statistical analysis
Demographic and clinical data were presented as frequency and percentage variables. The intra-rater agreement was analyzed using a type 3,3 Intraclass Correlation Coefficient assessing the measure consistency by each rater in three evaluations. Inter-rater agreement was analyzed using a type 3,1 Intraclass Correlation Coefficient, considering the between-rater concordance during the three sessions using only an average value obtained from the three measures assessed in each session. The following values, suggested by Portney and Watkins 24 were considered: ˃0.75 = good; from 0.5 to 0.75 = moderate and ˂0.5 = poor.
Moreover, the Standard error of measurement (SEM) and Minimal detectable difference (MDD) were calculated for both intra-and inter-rater reliability analysis, and an inter-rater measurement dispersion
The digital palpation evaluation showed that all participants presented effective and conscious PFM contractions, which were classified as strength grade 3 (n=9), strength grade 4 (n=8), and strength grade 5 (n=1), using the Modified Oxford Grading Scale. Tables 1 and 2 show the intra-and inter-rater analyses for the dynamometric measurements, respectively. Figure 3 shows the Bland-Altman plots for both raters. Table 1. Intra-rater reliability of the dynamometric measurements.   According to the ICS 25 , PFM function can be qualitatively defined by the tone at rest and the strength of a voluntary or reflex contraction as strong, weak or absent or by a validated grading system. Digital palpation has been used in clinical practice although many researchers do not consider it reliable, objective or sensitive. Several authors who researched its correlation with other methods considered it objective, nevertheless its reproducibility still remains questionable 21 .

Discussion
Other methods have been used during clinical trials in order to quantify the subjective findings of digital palpation assessment. Among them are: electromyography, perineometry, dynamometry, ultrasound, and magnetic resonance imaging. However, due to the lack of a gold standard for the assessment of women's PFM function, any comparison among the results becomes more difficult and even inaccurate.
Thus, the use of PFM functional assessment is necessary not only to investigate the muscular response, but also to quantify muscle strength 2 , endurance 2,5,22,23 , speed of contraction 5 , as well as the ability to perform, then repeat, fast and slow contractions 4 .
The protocol used for data analysis in this study was based on previous studies which used different PFM evaluation methods 3,22,23 due to the fact that no other study using vaginal dynamometer equipped with a load cell was found in the literature. Thus, three different parameters were analyzed: maximum strength (kgf), average strength (kgf), and endurance (s).
Considering the histological composition of the PFM, composed of approximately 70% type I fibers (slow fibers -responsible for pelvic organ support) and 30% type II fibers (fast fibers -responsible for urethral closure during activities which trigger an increase in intra-abdominal pressure) 26 , both equally important for the maintenance of continence mechanisms 27 , it is believed that the proposed parameters in this study allow a better understanding of muscle function in its totality. So, while clinically evaluating a patient, it is important not only to assess a maximal voluntary contraction but also the ability to maintain a sustained one. Of course, in order to use any device in clinical research, it is essential to verify and analyze its reliability, without which, it would be impossible to rely on the collected data 2,28 .
The reliability of any PFM evaluation provides basic information about the degree of error within its measurements. The test-retest reliability verifies the stability of repeated measurements performed along different and separate periods of time. Repeated applications may be obtained by multiple evaluations within the same session (intra-session reliability), measurements taken over longer periods of time (test-retest reliability) or comparing the results of different raters (inter-rater reliability) 28,29 .
There is also a diversity of protocols used among researchers 2,5,20 while testing the reliability of PFM measurements. Morin et al. 5 tested the test-retest reliability of PFM dynamometric measurements using the Montreal dynamometer 12 , by means of two parameters: speed of contraction and endurance. To calculate the speed of contraction, the authors quantified the force rate in the first contraction and the number of fast contractions performed. To analyze the endurance parameter, the authors calculated the area between 10 and 60 seconds under the force curve of a maximal voluntary contraction.
In the present study, as well as in Quartly et al. 22 , the endurance parameter was analyzed considering the time factor (in seconds), measuring the time during which the participant could maintain a contraction above 60% of her maximum strength.
It is common as well as important to verify the time of a sustained contraction in clinical practice. While Quartly et al. 22 found an average of 5.5 (range 4 to 12) seconds for women under 40 years using a perineometer, the present study found an average of 4.08 (range 1.5 to 9.67) seconds using a vaginal dynamometer.
Two other parameters were also used to quantify PFM strength: maximum strength, also used by Morin et al. 3 in their study, and average strength, which was proposed as an additional parameter to equalize the findings of fast and sustained PFM contractions.
Another methodological feature to be considered refers to the time interval which comes between an assessment and another one due to the influence of the patient's menstrual cycle, as well as the ability to learn and train performing PFM contractions from one evaluation to the next, which could compromise the comparison 20 . Sigurdardottir et al. 30 reported that the time range of test-retest reliability performance should be, at most, up to seven days. Thus, in this study, an interval of one day between assessments was determined.
A limitation of the study was that the equipment used in this study has a cylindrical shape 3.3cm wide that can cause some vaginal discomfort and thus interfere with the performance measures, a fact that was also reported by other authors 2,7 . Another limitation of this equipment would be the difficulty to use it in different positions, as well as with women who suffer from vaginal stiffness.
The use of the vaginal dynamometer has the advantage of quantifying clinical data observed during PFM contraction evaluation and can be used in scientific research, despite its high cost which can be another limiting factor, and in clinical practice. In addition, this model can be protected with a condom followed by disinfection, which facilitates the clinical routine, since it does not need to be privately used or go through a sterilization process, like endovaginal probes which are used in electromyography.
It is known that the larger the sample size is, the greater its consistency and the greater the agreement among the findings will be, ensuring the study's reliability 28 . Accordingly, a higher number of participants would have enforced the present study's findings. Therefore, the PFM dynamometric measurements showed good intra-and inter-rater reliability for maximum strength, average strength, and endurance, demonstrating this to be a reliable device, which can be used in clinical practice.