Severity of temporomandibular disorders in women : validity and reliability of the Fonseca Anamnestic Index

The aim of this study was to assess the validity and reliability of the Fonseca Anamnestic Index (IAF), used to assess the severity of temporomandibular disorders, applied to Brazilian women. We used a probabilistic sampling design. The participants were 700 women over 18 years of age, living in the city of Araraquara (SP). The IAF questionnaire was applied by telephone interviews. We conducted Confirmatory Factor Analysis (CFA) using Chi-Square Over Degrees of Freedom (χ2/df), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), and Root Mean Square Error of Approximation (RMSEA) as goodness of fit indices. We calculated the convergent validity, the average variance extracted (AVE) and the composite reliability (CR). Internal consistency was assessed by Cronbach’s alpha coefficient (a).The factorial weights of questions 8 and 10 were below the adequate values. Thus, we refined the original model and these questions were excluded. The resulting factorial model showed appropriate goodness of fit to the sample (χ2/df  =  3.319, CFI  =  0.978, TLI = 0.967, RMSEA = 0.058). The convergent validity (AVE = 0.513, CR  =  0.878) and internal consistency (a  =  0.745) were adequate. The reduced IAF version showed adequate validity and reliability in a sample of Brazilian women. Descriptors: Reproducibility of Results; Validity of Tests; Scales; Temporomandibular Joint Disorders. Introduction Temporomandibular disorder (TMD) is a collective term embracing all the problems regarding the temporomandibular joint and related musculoskeletal structures.1 The literature shows wide-ranging variation in TMD prevalence and TMD symptoms in different populations. This may be attributed to the variety of study designs, sampling methods, measurement instruments, and different diagnostic criteria for TMD. In a systematic literature review, Manfredini et al.2 found a TMD prevalence from 2.6% to 11.4% in the normative population, in different countries. Moreover, Gonçalves et al.3 found a 39.2% prevalence of TMD symptoms in the Brazilian population, in which women were significantly more likely to have TMD than men (RR > 1.0; p < 0.001). Declaration of Interests: The authors certify that they have no commercial or associative interest that represents a conflict of interest in connection with the manuscript. Submitted: May 03, 2013 Accepted for publication: Jul 31, 2013 Last revision: Sep 30, 2013 http://dx.doi.org/10.1590/S1806-83242013005000026


Introduction
Temporomandibular disorder (TMD) is a collective term embracing all the problems regarding the temporomandibular joint and related musculoskeletal structures. 1 The literature shows wide-ranging variation in TMD prevalence and TMD symptoms in different populations.This may be attributed to the variety of study designs, sampling methods, measurement instruments, and different diagnostic criteria for TMD.In a systematic literature review, Manfredini et al. 2 found a TMD prevalence from 2.6% to 11.4% in the normative population, in different countries.Moreover, Gonçalves et al. 3 found a 39.2% prevalence of TMD symptoms in the Brazilian population, in which women were significantly more likely to have TMD than men (RR > 1.0; p < 0.001).

Declaration of Interests:
The authors certify that they have no commercial or associative interest that represents a conflict of interest in connection with the manuscript.
Braz Oral Res., (São Paulo) 2014;28(1):1-6 Although screening for TMD in the population is the concern of researchers, different facets of TMD and different measuring instruments have been used in different studies (e.g.TMD-type diagnosis, symptoms, severity and/or mandibular limitation).
The most common and important instrument in assessing TMD is the Research Diagnostic Criteria for Temporomandibular Disorders -RDC/TMD.RDC/TMD has been used in several clinical and epidemiological studies.It is composed of two evaluation axes.Axis I conducts a clinical evaluation requiring the presence of the patient.Thus, its use in epidemiological studies, where data is gathered by telephone, mail or internet, is difficult or may even be impossible.With this in mind, other instruments have been developed for evaluating TMD.Fonseca's Anamnestic Index (IAF) 4 has been proposed as a low cost and easy to apply alternative and has been used in screening for TMD in a non-patient population.
In Brazil, Fonseca's Anamnestic Index (IAF) has frequently been used [5][6][7][8] to classify individuals according to TMD severity (mild, moderate, severe and no TMD), and also to screen patients for further developments in diagnosing TMD. 9 The IAF is a scale proposed in the Portuguese language, consisting of 10 questions whose answers are arranged in a three-point scale format (no, sometimes, yes).Despite its frequent use in Brazil, the IAF has not been applied in other countries.However, the simplicity of its application, and its dispensing with the requirement of a physical examination of the patient, makes it suitable for fast epidemiological screening by telephone, mail or internet surveys.
Measurement theory states that the use of psychometric and/or clinimetric scales, addressing latent variables, requires the assessment of the metric qualities of the data gathered with these scales in the study sample, so that the researcher may be confident about the validity and reliability of the data gathered.Despite the widespread use of the IAF, we did not find studies in the literature that addressed its validity to measure the construct of "severity of temporomandibular disorders."Moreover, studies of IAF's metric properties have been limited to the evaluation of its reliability 10 and predictive validity. 4hus, our study was conducted to evaluate the validity (construct and concurrent), and reliability of Fonseca's Anamnestic Index (IAF), applied to Brazilian women.

Methodology Study and sample design
This is a cross-sectional validation study.The sampling design was probabilistic, and conducted in two stages.In the first stage, we stratified the participants, according to their census segment.In the second stage, we randomly and systematically selected each participant from the phonebook.
The participants lived in the city of Araraquara (SP, Brazil), were female, and over 18 years of age.
The sample size was established from an expected prevalence of TMD symptoms of approximately 40% in the female population, as presented by Gonçalves et al. 3 We estimated the population of adult women in Araraquara municipality at about 70,000.The significance level was 5%, the test power was 80%, and the sampling error was set at 10%.Assuming a non-response rate of 20%, we estimated a minimum sample size of 715.

Instrument
The instrument used was the questionnaire proposed in Brazilian Portuguese by Fonseca et al. 4 to estimate Fonseca's Anamnestic Index (IAF).The IAF is a one-dimension questionnaire, consisting of 10 questions with a three-point scale (0 = no, 5 = sometimes and 10 = yes).The IAF has been widely used in Brazilian studies to estimate the severity of temporomandibular disorders.

Procedures
Supported by evidence from the literature and considering the simplicity of the instrument's application, we chose to conduct the data collection through telephone interviews, not to exceed 5 minutes.
When the respondents answered the call, the researcher identified herself and read the free and informed consent statement for the research to be undertaken.Only those subjects who agreed to its terms participated in the study.The calls were always made by the same interviewer, calibrated in a pilot study.The intra-reproducibility of the re-Braz Oral Res., (São Paulo) 2014;28(1):1-6 equate if a ≥ 0.70. 12Since a underestimates the true reliability for congeneric items, as is the case of the IAF, Composite Reliability (CR) was also evaluated to substantiate the reliability of the IAF in the present sample.
Concurrent validity was assessed by correlation analysis with the mandibular function impairment questionnaire (MFIQ), duly validated for the population under study, according to Campos et al. 13

Ethical aspects
The present study was approved by the Ethics Committee of the School of Pharmaceutical Sciences -UNESP (protocol 52/2009).

Results
The response rate (RR) obtained was 97.9%.A total of 700 women with a mean age of 44.3 (SD = 16.3)years, participated in this study.Of the participants, only 39.0% used dental prosthetics and 55.4% reported taking some type of chronic medication.Regarding marital status, 22.9% were single, 58.1% were married, 11.2% were widowed, and 7.9% were divorced.As for average monthly household income, 1.1% of participants reported having an income of USD 5676.00,36.9% reported USD 1857.00,55.3% reported USD 682.50, and 6.7% reported USD 317.00.
The distribution of the participants, according to the answers to each question of Fonseca's Anamnestic Index (IAF) is shown in Table 1.
There was a high prevalence of "No" answers to the IAF questions.This is because the sample was composed of a normative population.
Figure 1 shows the IAF's full factorial model after its refinement, states the standardized factorial weights, and explains the variance for each question, respectively.
In the complete factorial design, questions 8 and 10 had factorial weights below the recommended values (λ < 0.50), and question 4 proved to be strongly correlated with other items, according to the modification indices (LM > 11).After these items were removed, there was a better fit of the factorial structure to the data.
The adjusted model showed good convergent searcher doing the calls, regarding the classification of TMD (mild, moderate, severe and no TMD), was estimated at two distinct time periods, one week apart.The reproducibility was considered good (n = 62; κ = 0.89).

Analysis of the psychometric characteristics
The IAF construct validity was estimated by the factorial and convergent validity.For the purpose of estimating the factorial validity, we used Confirmatory Factor Analysis (CFA), applying the polychoric correlation matrix performed on the MPLUS 6.0 (Muthén & Muthén, Los Angeles, USA).In general, CFA is an appropriate method of evaluation, if the items are good indicators of the proposed factors a priori to the scale, in a given sample.If the original model fits the variance-covariance matrix of the items suitably, it is indicative of factorial validity.Goodness of fit for individual items was evaluated through their factorial weights (λ).Standardized lambda values greater than 0.5 were considered indicative of good fit of the items to the scale.The global goodness of fit was evaluated using the standard goodness of fit indices for CFA: • Chi-Square over Degrees of Freedom (χ 2 /df), • Comparative Fit Index (CFI), • Tucker-Lewis Index (TLI), and • Root Mean Square Error of Approximation (RMSEA).
The fit of the model to the variance and covariance data of the items was considered adequate when χ 2 /df ≤ 4.0, CFI ≥ 0.90, TLI ≥ 0.90 and RM-SEA ≤ 0.10.We refined the model by also taking into consideration the modification indices.Accordingly, the items that presented Langrage Multipliers (LM) > 11 (p < 0.001), suggesting correlation between errors of measurement, were excluded.
The convergent validity was evaluated using the Average Variance Extracted (AVE) and the Composite Reliability (CR), as proposed by Fornell and Larcker. 11These were considered adequate if AVE ≥ 0.50 and CR ≥ 0.70.

Discussion
Knowledge of the metric characteristics of the data collected with a given instrument, when applied to different samples and different formats (telephone, mail, internet and personal interview), is an essential prerequisite for using this data, given that this procedure is the only way to check the quality of the data collected.However, this methodology has been widely neglected in the literature.This situation can be explained through the training given to health professionals, [14][15][16] who have technical difficulties in determining the validity and reliability  of statistical analyses, or who lack knowledge about the importance of performing this procedure when using scales.It is important to highlight that the metric properties of the data, collected with a given instrument, depend on the characteristics of the sample; 13 therefore, those properties should always be tested and presented along with the study results.RDC/TMD is one of the most widely used instruments in TMD diagnosis.However, one of its dimensions (Axis I) requires the presence of the patient, thus limiting the use of RDC/TMD for some studies.Conversely, the IAF is a questionnaire composed of 10 items that may be answered by the patient by telephone, mail, or internet surveys.When considering large initial epidemiological studies, this feature of the IAF represents a great advantage for quick-to-apply, cost-effective and large scale epidemiological studies.
Fonseca's Anamnestic Index (IAF) is a scale proposed to measure the "severity of temporomandibular disorders" construct.Therefore, it was built using a three-point ordinal scale, which calls for a polychoric correlation matrix to estimate the model's fit to the sample.However, we should stress that, for this analytical model to have adequate statistical power, a large sample size must be used; 17 on the other hand, this can be a limiting factor for testing the metric characteristics when using IAF in clinical samples.Nevertheless, this limitation does not invalidate the professional's responsibility of providing data on the reliability and validity of the IAF, when using the index in different contexts.
The original IAF structure, when applied to normative women, was not adequate; the fit of the model to the sample had to be improved (Figure 1); this implied the removal of three items.The inadequacy of these questions had already been reported by Bevilaqua-Grossi et al. 5 and by Campos et al., 10 who reported the low internal consistency and low predic-tive ability of these items.The contribution of these items to the "severity of temporomandibular disorders" construct had not yet been assessed.Thus, it was difficult to establish a direct comparison between the results gathered in this study and other studies that have used the IAF.However, it is important to note, along with the results shown in Figure 1, that the different metric characteristics observed in the studies mentioned above point to the inadequacy of questions 4, 8 and 10, and reinforce the need to remove them in order to obtain more reliable information and valid diagnosis of TMD severity.The lack of fit of these 3 questions to the IAF may account for the fact that these questions do not evaluate the structural-anatomic alterations related to the temporomandibular joint function, as the other questions do.
After the number of items was reduced, adequate factorial and convergent validity and internal consistency of the IAF was observed.The reduced scale was, therefore, able to capture the "severity of temporomandibular disorders" construct with higher psychometric sensitivity, validity and reliability.
It must be pointed out that the simpler IAF version, proposed in this study, requires further external validation.Since the items are ordinal with 3 points, requiring that an analysis of polychoric correlations be made, future studies will require large sample sizes.The lack of external validation of the proposed refined IAF is a limitation, insofar as only females participated in this study.However, data gathered here is a first contribution to the study of the psychometric properties of data gathered with the IAF, in terms of both construct-related validity and reliability for the probabilistic sample used.

Conclusion
A reduced version of Fonseca's Anamnestic Index (IAF) showed adequate validity and reliability for the sample of women studied.

Table 1 -
Distribution of participants according to the responses to each question of Fonseca's Anamnestic Index (IAF).