Avaliação da reprodutibilidade de um instrumento para medição da força axial da língua Reproducibility assessment of an instrument for measuring the axial force of the tongue

Purpose: Evaluate the reproducibility of Forling, a portable instrument for measuring axial tongue force. Methods: Axial force of the tongue was measured in 49 individuals (30 women and 19 men) aged 18-25 years using the Forling portable instrument. Measurements were performed in three days at intervals of 7±2 days. On each day, three 7-second measurements were performed with one-minute intervals between them. The coefficient of variation, Wilcoxon paired test, and intraclass correlation coefficient were used in the statistical analysis of the data. Maximum and mean tongue force values were analyzed, and comparison between them was performed using three approaches: the mean of the three values; the mean of the two highest values; the highest value of each measurement. Results: In the analysis of mean tongue force, the coefficient of variation was considered desirable and the intraclass correlation coefficient was acceptable. Significant differences were observed regarding the maximum value between the second and third days, mean of the two highest values and mean of the three values between the first and second days and the second and third days. In the analysis of maximum tongue force, the coefficient of variation and the intraclass correlation coefficient were acceptable. Significant difference was found only in the comparison between the second and third days. Conclusion: Good reproducibility of the data obtained with the use of the Forling portable instrument was observed.


INTRODUCTION
It is essential that force and mobility of the tongue be adequate so that its functions of chewing, swallowing, suction, breathing, and phonoarticulation can be performed harmoniously (1) .
Axial force is defined as the force along the axis on which it is exerted.It is thus characterized as a longitudinal force that, in the case of the tongue, refers to protrusion force (2) .The protrusion force of the tongue against resistance requires activation of the genioglossus muscle and the intrinsic lingual muscles, and the genioglossus muscle is responsible for providing a stable platform for the intrinsic muscles to develop this force (3) .Therefore, assessment of the axial force of the tongue enables verification of a muscle group of extreme relevance in the clinical practice of speech-language pathology -the intrinsic musculature.
In clinical practice, assessment of tongue force is performed perceptually by speech-language therapists, considering that there are few instruments available for this purpose.The Iowa Oral Performance Instrument (IOPI), which provides numerical data on tongue pressure against an air bulb, is the most cited tool in the literature (4) .Another instrument, based on the same principle, uses an arrangement of bulbs positioned on the hard palate (5) .There are also tools composed of sensors positioned on palatal plates (6,7) placed on the palate (8) or the teeth (9) .The bulbs are disposable and soft, thus hygienic and comfortable, but when they are not fitted to the palate, it is difficult to maintain the reproducibility of their positioning in the oral cavity, because they slip on the surface of the tongue and the tube connecting the bulb to the instrument has no markings indicating its position after the lips are closed (10) .Tools that use palatal plates need to be constructed in an individualized way, because each individual has a different size and format of palate (10) .In addition, they only enable assessment of force in the cranial direction.However, they are effective in measuring tongue force during its functions, because they enable patients to close their mouths and perform functions almost normally (10) .Fitting the sensors directly into the oral cavity is the best way to measure force during functions, but presents difficulties in reproducibility of the sensor fixation point, so that comparisons can be made with the reassessments, and in disinfection of the sensors for use in different patients (10) .
Perceptual evaluation is subjective and depends on the examiner's good judgment and experience, whereas instrumental assessment is performed using instruments that provide numerical data on tongue force or pressure, but most tools are not commercially available in Brazil.For this reason, the Biomechanical Engineering Group of the Universidade Federal de Minas Gerais (UFMG) developed an instrument to measure the axial force of the tongue that has already been used in several studies (1,2,(11)(12)(13)(14)(15)(16)(17) .The portable version of the Forling instrument was designed later (18) , and is currently undergoing testing.
Analysis of the reproducibility of an instrument should always be considered, and precedes its use in research and clinical practice, because data obtained with methods without assured reproducibility may lead to results that are not representative of the phenomenon and, therefore, not generalizable, threatening the effectiveness and soundness of clinical decision-making (5) .Moreover, knowledge about the accuracy or variability of the data provided by each method enables their consideration during selection of the instrument to be used and interpretation of results (19) .
Reproducibility is conceptualized as the degree of agreement between the results of measurements of the same measurand performed under varied conditions.It can be expressed quantitatively depending on the characteristics of the dispersion of results.For an expression of reproducibility to be valid, it is necessary to specify the altered conditions (20) , i.e., different locations, days, operators, and measurement systems.
In view of what has been previously exposed, the objective of this study was to assess the reproducibility of the Forling portable instrument for measuring the axial force of the tongue.The hypothesis of this research is that the measurements obtained using the Forling instrument present good reproducibility.

METHODS
This study was approved by the Research Ethics Committee of the Universidade Federal de Minas Gerais (UFMG) under proc.nº 008/10.This is a prospective, observational, analytical study conducted with a non-probabilistic sample composed of 49 undergraduate students (30 women and 19 men), aged 18-25 years, enrolled in the Medicine and Speech-Language Pathology courses of the UFMG.All participants signed an Informed Consent Form prior to study commencement.
For recruitment of the sample, an active search was conducted by means of posters and direct invitations at the academic units of the UFMG.Inclusion criteria were as follows: minimum age of 18 years; absence of glossectomy and/or pelvectomy, tongue paralysis or paresis, and cognitive impairment, identified from an interview conducted prior to data collection.Individuals with clinical classification of tongue force divergent between two examiners and those classified with increased or severely reduced tongue force were excluded from the survey.
Initially, a clinical assessment of the tongue of each participant was performed independently by two previously trained examiners.To evaluate tongue force, participants were requested to protrude the tongue and push it against a wooden spatula positioned vertically a few centimeters away from the lips for seven seconds.Tongue force of the participants was classified as follows: normal, when the musculature was able to protrude the tongue against the firm resistance made by the spatula and maintain the force without trembling and/or deformation; slightly reduced, when the musculature was able to protrude the tongue against the firm resistance made by the spatula and maintain the force, but with slight trembling and upward or downward bending of the apex of the tongue; moderately reduced, when the musculature was able to protrude the tongue against the firm resistance made by the spatula and exert moderate force with trembling and upward or downward bending of the apex of the tongue; severely reduced, when the musculature was weak and only able to withstand slightly against the firm resistance made by the spatula with occurrence of shaking and deformation, being able or not to protrude the tongue out of the oral cavity; increased, when the musculature was able to protrude the tongue and exert excessive force against the firm resistance made by the spatula (18) .Only the participants who had their tongue force classified as normal or slightly reduced by the two examiners were submitted to the instrumental evaluation.
Participants who fit the study profile were assessed using a portable version of the Forling instrument (Figure 1).The instrument is composed of a mouthguard made of thermo-moldable material that adapts to the dental arches of each participant; three pieces made of epoxy, namely, a base part, a fastener, and a drive shaft; and a force-sensing resistor.The base part fits into the center of the mouthguard and provides support for the sensor.The fastener attaches the sensor to the base part.The drive shaft is in contact with the tongue and it is the place where it exerts the force.During measurements, the participant pushes the drive shaft with the tongue with maximum force, pressing the sensor (18) .Figure 2 shows the mouthguard with the parts connected and Figure 3 presents the scheme of these fitting parts.
Participants were seated with their backs well supported and their feet flat on the ground.They were instructed about the operation of the instrument and requested to insert the mouthguard into their oral cavity comfortably.The examiner then waited 15 seconds for participants to get used to mouthguard and instructed them to protrude the tongue with the greatest possible force upon request, triggering the device that measures the tongue force for seven seconds (1,21) ; participants had no access to the test results.All instrumental assessments were conducted by the same previously trained examiner.
Clinical evaluation and the first instrumental assessment were performed on the same day (Day 1), the second instrumental assessment was conducted 7±2 days later (Day 2), and the third instrumental assessment was performed 7±2 days after the second test (Day 3).On each day, three measurements were taken using the Forling instrument with one-minute intervals between them (1,13,(21)(22)(23)(24) .Software developed in the LabView platform showed the force values in newton (N) in real time and recorded the force graphs over time.The same examiner that instructed the participants conducted all the instrumental assessments.Values of maximum and mean force of the tongue were obtained in each measurement.Maximum tongue force refers to the maximum value obtained in the measurement, whereas mean tongue force refers to the arithmetic mean of all the values generated in the measurement.
Random and systematic errors were evaluated by means of differences in the mean value between the days, which may reflect sampling errors and the effect of training.Intra-individual variation was determined by typical error expressed by the coefficient of variation (CV).The CV was assessed for each step, from Day 1 to Day 2 and Day 2 to Day 3, and it was considered acceptable for values <10% and desirable for values <5%.The CV represents sources of technical and biological errors.Also, the Wilcoxon paired test was applied for each pair.The repeatability of the Forling instrument was investigated by intraclass correlation coefficients (ICC), which was considered acceptable for values >0.600 and desirable for values >0.800.
Two statistical analyses were performed: the first considered only the values of the mean tongue force of each measurement and the second considered the values of the maximum tongue force  of each measurement.For each of these analyses, comparison between the values (maximum or mean tongue force) was performed using three approaches, as in a previous study that analyzed the Iowa Oral Performance Instrument (IOPI) (19) : (1) the mean of the two highest values of instrumental assessment; (2) the highest value of each instrumental assessment; (3) the mean of the three values of each instrumental assessment.

Instrumental assessment: analysis of mean tongue force of participants in each measurement
Initially, the data obtained from the participants in each measurement were analyzed according to their mean value (Table 1).Three measures, one from each measurement, were obtained on each day.These measures were summarized in a single value in three ways: maximum value, mean of the two highest values, and mean of the three values.
In the descriptive analysis, considering the maximum value between the measurements on each day or the mean of the three values, progressive increase in tongue force was observed on the measurement days.When the measurement values were summarized according to the mean of the two highest values found between them, it was verified that the highest value of the measurement occurred on Day 3.
The Wicoxon test didn't show statistically significant difference between the maximum values measured for Days 1 and 2. (Table 2).The typical error was <5% and the intraclass correlation coefficients (ICC) were acceptable on both days.
When the mean of the two highest values of the measurement on each day was considered, it was verified that the largest mean difference between the measurements occurred from Day 2 to Day 3. The Wilcoxon test showed statistically significant difference between the values obtained.For all pairs of the measurement days, typical error values were <5% -considered desirable.The ICC results for all pairs of days were considered acceptable.
The value for each day was also summarized according to the mean value of the measurements performed.In the reproducibility analysis, statistically significant difference was observed by the Wilcoxon test between the values obtained.The typical error was considered desirable and the ICC results were acceptable on both days.

Instrumental assessment: analysis of maximum tongue force of participants in each measurement
In a second moment, the data obtained from the participants in each measurement were analyzed according to the maximum value found.Once again, three measures, one from each measurement, were obtained on each day.These measures were summarized in a single value in three ways, as previously exposed, and reproducibility analysis of the test instrument was conducted.
In the descriptive analysis (Table 3), with the data of maximum value of each day, as well as with the data summarized by the mean value of the measurements on each day, an increase in the values was observed throughout the days.When the values were summarized by the mean of the two highest measures obtained, it was verified that the highest mean value occurred on Day 3.
In the analysis of the maximum value on each day, the p-value of the Wilcoxon test was low, close to zero, only between Days 2 and 3, indicating difference between the values for these two days.The typical error for Days 1 and 2 was acceptable, and so were the ICC results (Table 4).Caption: CI = confidence interval; CV = coefficient of variation; ICC = interclass correlation coefficient In the analysis of the mean of the two highest values of the measurements on each day, the largest difference was verified between Day 2 and Day 3. The typical error was considered desirable between Days 2 and 3 and acceptable between Days 1 and 2. ICC results were acceptable for both comparisons.
Regarding analysis of the mean of the three measurements on each day, no statistically significant difference was observed from Day 1 to Day 2 by the Wilcoxon test.It was also verified that the typical error was desirable and both ICC results were acceptable, thus showing good reproducibility.

DISCUSSION
Reliability of an instrument or test can be determined by the agreement of the results provided by different examiners (interrater reliability) or at different moments (intra-examiner or test-retest) (25) .Intra-examiner reproducibility was analyzed in the present study.An option was made for time variation, and three measurements were taken at different times at intervals of 7±2 days, with no changes in the test conditions, considering that this would be the greatest applicability of an instrument of this nature in the area of Orofacial Motricity.A two-day variation was allowed in the instrumental assessments in order to avoid study withdrawal by the participants because of difficulty in schedule.However, assessment conditions were maintained on the three days, avoiding interference with the results.
All descriptive analyses show an increase in tongue force values throughout the measurement days.As a consequence, differences in the mean values observed were large, especially between Day 2 and Day 3, reflecting in the p-value (Wilcoxon test), which indicated statistically significant differences between the measures.This might have occurred because of the training effect, that is, the individuals evaluated were improving their results due to familiarity with the measurement process.
Analysis of the mean tongue force data shows that, regardless of the type of analysis, all typical error values were considered desirable (<5%), which indicates that distribution of the differences in the measurement values of the pairs of days was homogeneous.In addition, the ICC results remained within acceptable range in all analyses, showing good reliability.Nevertheless, the Wilcoxon test indicated statistically significant differences in the maximum value analysis (in the Day 2-3 pair), in the analysis of the mean of the two highest values (in the pairs Day 1-2 and Day 2-3), and in the analysis of the mean of the three values (in the pairs Day 1-2 and Day 2-3).Therefore, the mean tongue force values presented good reproducibility in two of the three assessment methods employed.
In the analysis of maximum tongue force, all typical error values were considered acceptable, and the Day 2-3 pair showed typical error values considered desirable by the three methods employed.With respect to ICC, the results obtained remained within acceptable range in the three types of analysis, also demonstrating good reliability.
Regarding the Wilcoxon test, unlike what was observed in the analysis of the mean tongue force, statistically significant difference was found only in the Day 2-3 pair, and the Day 1-2 pair indicated no significant difference by the three methods used.Thus, just like the mean tongue force values, the maximum tongue force values showed good reproducibility in two of the three assessment methods conducted.
These results indicate that maximum tongue force seems to be a pattern with smaller variation than mean tongue force, as reported in a previous study (12) , and the best way to summarize it was using the mean of the three measures.Results of this study are in agreement with those obtained in other surveys that addressed assessment methods for measuring tongue force and found coefficient of variation values between 0.056 and 0.093 (19) and between 0.014 and 0.070 (26) , although they have employed a different instrument -the Iowa Oral Performance Instrument (IOPI), as well as different methodologies for data analysis.Both instruments, Forling and IOPI, propose the assessment of tongue force, but the Forling uses a force sensor and returns data in unit of force (N), and the IOPI uses a pressure sensor and provides data in unit of pressure (kPa); consequently, the measures generated by the instruments cannot be compared.Furthermore, the Forling requires a protrusion movement, whereas the IOPI requires participants to compress the tongue against the hard palate.
All the coefficients of variation found were considered desirable or acceptable, demonstrating homogeneity of the values and confirming the good reproducibility of the data generated by the Forling portable instrument.
However, typical error is the most important analysis for the reliability of the measures, because it indicates the variability of the data from individuals between different assessments (19) .Therefore, if this parameter is considered, it can be stated that the instrument is reproducible.
In this research, a tongue force application time of seven seconds was used, considering that individuals with normal tongue force reach maximum force within the first seconds of contraction, but individuals with reduced tongue force need a longer time (27) .Some previous surveys have also used the time of 7 seconds (1,21) , whereas others have used a shorter time, between two and three seconds (19,22,26,28) , but these studies that used shorter contraction times evaluated only maximum tongue force, whereas the present study also assessed mean tongue force.In order not to obtain results with variations caused by muscle fatigue, a rest period was observed between measurements; as in previous studies, this time was one minute (1,13,(21)(22)(23)(24) .
In addition to the good reproducibility of the data, it is worth noting that the software that accompanies the Forling instrument is easy to use, enabling modification of the data recording time and data visualization during measurement, allowing the instrument to be used in different ways and without need of extensive training.The ranges of mean and maximum tongue force values found in this study are consistent with those reported by other authors during assessment of protrusion tasks in healthy adults.A study conducted with adults aged 20-44 years found maximum tongue force of 16.1±4.7N and mean tongue force of 11.6±3.4N (16) , whereas a survey conducted with men aged 22-39 years obtained maximum tongue force of 24.3±6.7 N (29) .
Due to the training factor observed in this study, it is advisable that examiners provide participants with longer familiarization time with the instrument.If participants are assessed in a single session prior to treatment, it should be considered that there will be a gain in tongue force relative to the training, previously or concomitantly with the treatment gain.Adams et al. (19) suggested that several measurements be performed in the first session so that patients can become familiarized with the instrument.
Differences in axial tongue force are expected throughout life, but significant variations are not observed in groups of young adults, as the participants of this study, only in individuals over 60 years of age (2) .Some studies have also shown differences in tongue force between men and women, and the values found for males were generally higher than those observed for females (22,30) .Despite this difference, reproducibility results are not affected, because comparisons for this type of analysis are conducted with the individuals themselves.For these reasons, analysis stratified by age groups or gender were not performed in this study.
This study was conducted with a small sample and did not include major orofacial myofunctional impairments.Clinical evaluation showed that most participants had normal tongue force, but 32.7% presented slightly reduced tongue force.A previous study (16) performed clinical evaluation of the tongue in young adults and found a larger number of individuals (62.5%) with some impairment in tongue force in the task of protrusion against resistance; however, most of them presented weakness in the apex or anterior third of the tongue, and only 4.2% had slightly reduced tongue force.As clinical evaluation is subjective, differences between surveys are expected.The authors justified that minimal changes in tongue force were considered during the qualitative classification, thus a large number of impaired individuals were observed (16) .The clinical evaluation of tongue force in the present study was conducted as an inclusion criterion, so that it was possible to initially understand the behavior of the instrument, with respect to reproducibility, in individuals without significant tongue force impairments.
It is believed that better results could be obtained if the study sample were larger and there were more days of measurement in order to mitigate the training effect.It is therefore suggested that further longitudinal studies be conducted with larger samples, longer measurement periods, and with interrater reproducibility assessment.
It was also possible to observe that the type of tongue force chosen (maximum or mean) and the different types of analysis led to different results; therefore, it is important to verify and carefully choose the methodology and the statistical analysis model in order to avoid any bias.
Therefore, it is expected that the Forling portable instrument be used in the research and clinical practice of speech-language pathologists and other professionals who work in the interface of the Orofacial Motricity area, considering that quantitative assessment enables observation of patient evolution throughout therapy in a simple and reliable way.

CONCLUSION
In the present study, very good reproducibility of the data obtained using the Forling portable instrument was observed, with most parameters assessed -typical error, mean coefficient of variation, confidence interval, and intraclass correlation coefficient -presenting acceptable and desirable results, enabling the use of this instrument both in clinical practice and in other studies.
Maximum tongue force presented smaller variation than mean tongue force, and the best way to summarize it was by the mean of three measurements.It is recommended that

Figure 3 .
Figure 3. Scheme of the fitting parts of the mouthguard

Figure 1 .Figure 2 .
Figure 1.Individual with a Forling portable instrument fitted in the oral cavity

Table 1 .
Descriptive analysis of the mean force of the tongue

Table 2 .
Reproducibility analysis of the mean force of the tongue

Table 3 .
Descriptive analysis of the maximum force of the tongue

Table 4 .
Reproducibility analysis of the maximum force of the tongue