On-line version ISSN 2176-9451
Dental Press J. Orthod. vol.17 no.1 Maringá Jan./Feb. 2012
Maria Christina de Souza GalvãoI; João Ricardo SatoII; Edvaldo Capobiango CoelhoIII
IMSc in Dentistry, Methodist University of São Paulo (UMESP). Student, Specialization Course in Applied Statistics, UMESP
IIProfessor, Lato-Sensu Specialization Course in Applied Statistics, UMESP
IIIMSc in Statistics, University of São Paulo. Head of the Lato-Sensu Specialization Course in Applied Statistics, UMESP
INTRODUCTION: The accurate evaluation of error of measurement (EM) is extremely important as in growth studies as in clinical research, since there are usually quantitatively small changes. In any study it is important to evaluate the EM to validate the results and, consequently, the conclusions. Because of its extreme simplicity, the Dahlberg formula is largely used worldwide, mainly in cephalometric studies.
OBJECTIVES: (I) To elucidate the formula proposed by Dahlberg in 1940, evaluating it by comparison with linear regression analysis; (II) To propose a simple methodology to analyze the results, which provides statistical elements to assist researchers in obtaining a consistent evaluation of the EM.
METHODS: We applied linear regression analysis, hypothesis tests on its parameters and a formula involving the standard deviation of error of measurement and the measured values.
RESULTS AND CONCLUSION: we introduced an error coefficient, which is a proportion related to the scale of observed values. This provides new parameters to facilitate the evaluation of the impact of random errors in the research final results.
Keywords: Biostatistics. Dahlberg error. Method error. Linear regression analysis.
In biological research, it is not often possible to assess quantitative measurements directly from living beings. Therefore, indirect methods are used and it is necessary to evaluate their effectiveness when compared with other methods. It is not possible to state which one is more accurate, but it is feasible to compare the agreement levels. The standard method is usually called "Gold Standard", however, this does not mean that there is no error.1 The randomized sample is one of the most important approaches to reduce bias. In another way, measure replications can be a good method to quantify and control random errors. The results of a trial might not be reliable if no satisfactory control of the error of measurements was performed.6
In dentistry, in order to interpret the results of a study, the author has to consider how imprecise it is to trace landmarks. In both studies of growth and in clinical trials, the changes are subtle, which makes the error of the method quite important.7 In order to evaluate the variance of error between researches, several authors4,6,7,8,10 suggested the formula proposed by Dahlberg6 in 1940. This method assumes that the sample has a normal distribution and, mainly, there is no bias (systematic error).6
Furthermore, the connections between error of measurement (EM) and misinterpretation of the results were not mentioned. In any research project, it is important to reduce the EM as much as possible, mainly when changes in measures were small comparing with the original scale of data. This wariness allows that the results, and consequently, the conclusions, can be validated.7
There is almost no reference to decide if some amount of error can be considered acceptable or not. In several papers the interpretation of the result is empirical or it is based on the personal experience of the investigator. According to Midtgard, Bjork, Linder-Aronson9 and Battagel,3 the error of variance should be ideally less than 3% of the total variance. However, Midtgard, Bjork and Linder-Aronson9 stated that it is almost impossible to have a variance of error less than 10% of the total variance. Baumrind and Frantz2 reported that differences of measures derived from a patient should be at least the double of the standard deviation of the error of measurement. In this way, they can be considered as treatment results.
Although several studies reported few changes during a treatment, it is important to evaluate these changes properly. EM can be reduced but not totally eliminated. If therapeutic changes were small, EM can significantly influence the inference of the evaluated differences. Therefore, there is a need to elaborate methodologies to analyze and interpret the effects of the EM in the changes observed during the treatment. Actually, there is no agreement about this subject in the literature.7
Regression analysis allows the assessment of systematic and random errors. It also permits a very intuitive visual evaluation of the results through a scatter plot chart and an optimum fitted line to these points. Further information about regression analysis can be found in the study of Wackerly, Mendenhall III and Scheaffer.11
Due to it's simplicity, the Dahlberg formula is frequently applied in dental research, in spite of other methods and different approaches to analyze the error, such as the one proposed by Martelli Filho et al.8 The aims of this paper were: To interpret the meaning of the Dahlberg6 formula proposed in 1940, to compare it to linear regression analysis and to propose a simple method to analyze the results of this formula.
MATERIAL AND METHODS
For the EM study, data sets of masters thesis in dentistry were kindly provided by their authors.
1) From an initial sample of 20 orthodontic dental casts, the lingual shape of the arch was evaluated by using X and Y coordinates, resulting in 40 coordinates. Ten were re-measured to evaluate the EM.12
2) This study was based on 17 adult patients under orthodontic treatment, whom had magnetic resonance imaging taken in three different occasions. The tipping of anterior teeth (canine to canine) was measured using the author's own method.4 From these sample, 4 patients had these inclinations re-measured by the same researcher generating a set of 12 samples with duplicate measures for 5 different teeth.
Methods for error analysis
The Dahlberg6 formula is defined as:
di is the difference between the first and the second measure, and N is the sample size which was re-measured.
Assume the model Zij=ui+εij where i is the sample index (i=1,2,3,...,N), j is the measure index (1st measure, 2nd measure), Zij is the observed measure, ui is the actual measure and εij the EM.
Regarding the EM, it is assumed that the expectation is E(εij)=0 and the variance VAR(εij)=δ2ε. Thus, one probable quantification of the EM is its respective standard deviation εij, or, δε. In other words, the smaller the standard deviation the smaller will be the error of method.
Observing the difference between the second and the first measure, we have:
In this way, if we assume that there is no bias (systematic error), one intuitive estimator for 2δ2ε could be:
Thus, the quantity
is exactly the formula Dahlberg proposed in 1940. This estimator of standard deviation of the EM is largely used in orthodontic research and gives us the root of the mean squared error (SSE), being equivalent to standard deviation of this error in case of no bias (systematic error),6 i.e., if the mean of errors is equal to zero.
Note that Dahlberg error is extremely sensitive to bias, since any mean deviation between the two measures will be incorporated. Besides, the Dahlberg formula presumes equality, not only between the means of the first and second measures but also of their variances. This second kind of systematic error, named "bias of slope", will be more detailed in the next section. In summary, Dahlberg error does not distinguish between systematic and random errors, making it difficult to interpret the results.
Several biological phenomena can be explained by mathematical models. Regression analysis describes a straight line as a relation function between two variables in a set of data. It is necessary to find the best line to fit this relation. The equation of the line is:
β0 is the intercept (value of Y when the line crosses the X axis) and β1 is the slope coefficient of the line.
For a linear regression model, it includes a random error, and thus, we have:
where ni has mean zero and variance δ2n. Details about the estimation process and test of hypothesis for parameters β0 and β1 can be found in the study of Wackerly, Mendenhall III and Scheaffer.11
For dentistry, considering Yi and Xi as the values for the i-th sample at first and second measures (i.e., Yi=Zi1 and Xi=Z21), the case without any bias or EM occurs if β0= 0, β1= 1 and δ2n= 0. One illustrative example of this case is in Figure 1A, in which a straight line has fitted perfectly to the points.
In Figure 1B, the points do not fit perfectly to the straight line. In this case there are scattered points along the line, indicating the presence of EM. In this case, the best fit in the least-squares criterion minimizes the sum of squared residuals. Residual is the vertical distance between the observed point and the fitted line.11 Note also that the fitted line does not cross the origin, i.e., β0 is different of zero, showing a bias in the mean of the error. It can be also noticed in Figure 1B that the inclination of the line is not 45º, which means that there is a slope bias, i.e., β1 is different of 1. These two types of systematic errors can occur at the same time or not, and this Figure is only for illustrative purposes.
In practice, due to random fluctuations, the estimation of β0 and β1 coefficient will hardly be equal to one and zero, respectively. Thus, it is necessary to perform statistical tests to detect systematic errors. In these cases, t-statistics of estimators5 of β0 and β1 can be used, testing the null hypothesis for β0= 0 (mean bias) and β1=1 (slope bias), respectively. These measures come from regression analysis and can be found in the study of Wackerly, Mendenhall III and Scheaffer.11
If there is no systematic error, then Zi1 can be written as Zi1=Zi2+ni.
Moreover, as Zi2=µi+εi2, we have µi+εi2+ni=µ i+εi1 , then εi1=εi2 +ni and, so, ni=εi1-εi2, being εi1= first measure error and εi2= second measure error.
being δ2n the error of variance of the regression analysis, which the estimator is denoted by 2n.
By calculating the square root of this coefficient, we have a measure equivalent to Dahlberg error, i.e.,
Despite this, the formula will only be valid in case of no slope bias. One general formula in cases with systematic errors is given by:
Evaluation of found errors
If we assume that any distance can be described as the true distance plus an error, which has a Gaussian distribution, we have:
From the statistics theory, we know that 95% of the data from one randomized sample with normal distribution have the mean between µi-1,96 δ2ε and µi+1,96 δ2ε . Thus,
where N = is the number of data re-measured,
δε = standard deviation estimator of the EM.
Z1i = first measure,
Z2i= second measure.
This indicates that the ratio between EM and the observed measures is smaller than P in approximately 95% of the sample. Note that P is a proportion of the error related to the measured value, i.e., a percentage. This property makes it easier to interpret the results, since the reliability can be expressed in terms of a proportion (i.e., 10%, 25%, etc). Note the original Dahlberg formula does not use this feature as it does not consider the total value of the original measures, but just the difference between them. The same absolute value of error in a sample with small measures has greater influence than in one with big measures.
In approximately 95% of the cases the EM is in average less than 0.69% of the value from the observed measures. Note that under a significance level of 5% (Type I error) there is no mean (p-value=0.820) or slope (p-value=0.775) bias.
In this case, the error ratio (P) ranged from 9.38% to 125.88%. The biggest error ratio was found in tooth #11 and the smaller in tooth #22. Analyzing the systematic error test under a significance level of 5% (β0=0 and β1=1), it can be concluded that there were mean (p-value=0.029) and slope (p-value=0.043) bias for tooth #23.
The formula proposed by Dahlberg6 in 1940 assumes no systematic error in mean (β0) or in slope (β1). Nevertheless, the results obtained through regression analysis do not require these assumptions. Moreover, it also allows an intuitive analysis of the error using a scatter plot chart and a fitted line as shown in Figure 2. This scatter plot chart can be easily built with a Microsoft Excel® chart builder, which permits a visual analysis of the measures before sending the data to a more elaborated statistical analysis. In this chart, we can see that the straight line fitted almost perfectly to data set 1, which means that the errors are very small. The results confirm that the EM is extremely small in comparison to the observation sizes (P=0.69%). Moreover, using regression analysis we observed that β0 is statistically equal to zero (p-value=0.820) and β1 is equal to 1 (p-value=0.775). Therefore, we can conclude that there is no evidence of mean or slope biases and, when this happens, the EM estimated by the Dahlberg formula or regression analysis are very close, as shown in Table 2.
In sample 2, we analyzed the teeth with small measurement error (Fig 4) and those with the biggest error (Fig 5). We can visualize, by comparing the points in Figure 5, that the points did not fit the line as in Figure 4. The results of Trpkova et al10 demonstrated that there is a systematic error and a random error involving the tracing of the landmarks. There is a standard mean error and a confidence interval of 95% in repetition and reproduction of 15 landmarks regularly used in facial growth analysis. An average error of 0.59 mm on X axis and 0.56 mm on Y was considered acceptable accuracy, even though this criteria did not take into consideration the total scale of the measure. Other researchers proposed to evaluate the variance of error. Midtgard, Bjork, Linder-Aronson9 and Battagel3 stated that variance should be ideally less than 3% of the total variance.
But, in a later paper, Midtgard, Bjork, Linder-Aronson9 reported that it was almost impossible to have a variance of error of less than 10% of the total variance. The use of analysis of variance is quite complicated for researches who are not familiar to the intricacies of statistics. Maybe this is why the Dahlberg formula is so broadly used, mainly in studies using cephalometric measures. The Dahlberg formula lacks bias analysis, which makes it hard for researchers to be secure if the error is acceptable or not. When we evaluated the values on Table 5, we can clearly notice that the analysis of the absolute value given by Dahlberg formula did not give us the parameters to evaluate the amount of error made. In tooth #21, the value provided by the formula was 0.856 and it corresponded to an error percentage of 29% (Fig 3). Meanwhile, in tooth #23, we had a very similar value of 0.880, which corresponded to a percentage error of 81% (Fig 3). Thus, it is necessary to analyze the Dahlberg formula using parameters that permits one evaluation of the amount of error made. By calculating the percentage of the error in relation to the magnitude of the original measure, we were establishing a probabilistic limit to the error with 95% of confidence. In the second set of data, using regression analysis, we could evaluate the existence of mean and slope biases. From the analysis of Table 5, we can notice there were no bias neither in the measure with the lowest error nor in the one with the highest error. In this case, we can also verify that Dahlberg formula was very similar to the result of regression analysis in the lowest error case (tooth #22), with a difference of 0.008 between this values. It is interesting to mention that there was a tendency of slope bias in tooth #11 (p-value=0.065) and there were mean and slope biases in tooth #23 (Fig 6). These two teeth were the ones who had the biggest difference between the EM given by the Dahlberg formula and the regression method, showing that systematic error significantly influenced the Dahlberg formula. In these cases the Dahlberg formula differed from the regression analysis, mainly due to not making a difference between random and systematic error.
As the test for β0 and β1 referred to systematic error, these biases can be numerically rectified depending on the objective of the study. In several studies, the mean bias can be explained by equipment calibration problems or operator subjectivity. Nevertheless, the cause of slope bias is not very clear and can be related to errors in data collection process. It is important to highlight that the estimator and test of hypothesis referring to β1 had some limitations that were intrinsic to the estimation process. These procedures were suitable to a small set of observations and with EM reasonably low. In fact, this is what usually occurs, as the re-measures are made only in a few samples and elevated EM can be easily detected and these samples discarded.
Besides, paired t-test commonly used to detect mean bias showed to be ineffective. In tooth #23, the paired t-test did not reveal statistically significant differences between the mean of the first and second measures, which were detected by regression analysis as we can see in Table 5.
Baumrind and Frantz2 mentioned that the differences observed in a patient must be at least two times the standard deviation of the estimated error, so it can be considered as therapeutic results. This proposition of evaluating the ratio of the error in relation to the measure lead to a less subjective parameter, which helps the researchers to determine what can be considered an acceptable error. However, the boundary value will depend on the accuracy level demanded by the research.
The use of regression models to evaluate EM have some advantages: 1) It distinguishes systematic error (mean and slope biases) from random error; 2) The results can be interpreted in a more objective and intuitive terms and permits a visual analysis using scattered plot charts and fitted line; 3) It supplies an estimation of EM integrated to the mathematical model.
The Dahlberg error is an estimator of standard deviation of the EM and correspond to the estimated error by regression analysis, when there is no systematic error. This fact is not highlighted very often in the literature. However, when there were biases, the Dahlberg error was different to the one obtained by our analysis as demonstrated by our results.
Although the Dahlberg formula can be a simple and efficient way to evaluate the EM, the analysis of quality of measure using standard deviation is quite hard. The transformation of the given value of this formula in a percentage of the amount measured provides parameters that makes it easier to evaluate the impact of the random error in the final result of the research.
Considering the aspects previously discussed, we concluded that the use of the regression model associated to the Dahlberg formula can be very useful to identify measurement and calibrating errors. These techniques, that can be used after a pilot plan to evaluate systematic and random error, increase the confidence of the results and avoid unexpected biases, regarding to procedure errors in gathering data.
1. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999; 8(2):135-60. [ Links ]
2. Baumrind S, Frantz RC. The reliability of head film measurements. Conventional angular and linear measurements. Am J Orthod. 1971;60(5):505-17. [ Links ]
3. Battagel JM. A comparative assessment of cephalometric errors. Eur J Orthod. 1993;15(4):305-14. [ Links ]
4. Capelozza Filho L, Fattori L, Maltagliati LA. A new method to evaluate teeth tipping using computerized tomography. Rev Dental Press Ortod Ortop Facial. 2005;10(5):23-9. [ Links ]
5. Callegari-Jacques SM. Bioestatística. Princípios e aplicações. Porto Alegre: Artmed; 2004. [ Links ]
6. Houston WJB. The analysis of errors in orthodontic measurements. Am J Orthod. 1983 May;83(5):382-90. [ Links ]
7. Kamoen A, Dermaut L, Verbeeck R. The clinical significance of error measurement in the interpretation of treatment results. Eur J Orthod. 2001 Oct;23(5):569-78. [ Links ]
8. Martelli Filho JA, Maltagliati LA, Trevisan F, Gil CTLA. Novo método estatístico para análise da reprodutibilidade. Rev Dental Press Ortod Ortop Facial. 2005;10(5):122-9. [ Links ]
9. MidtgÅrd J, Björk G, Linder-Aronson S. Reproducibility of cephalometric landmarks and errors of measurements of cephalometric cranial distances. Angle Orthod. 1974 Jan;44(1):56-61. [ Links ]
10. Trpkova B, Major P, Prasad N, Nebbe B. Cephalometric landmarks identification and reproducibility: a meta analysis. Am J Orthod Dentofacial Orthop. 1997 Aug;112(2):165-70. [ Links ]
11. Wackerly DD, Mendenhall III W, Scheaffer RL. Mathematical statistics with applications. 7th ed. California: Thomson Brooks/Cole; 2008. [ Links ]
12. Yasushi IM. Estudo das formas e dimensões linguais das arcadas dentárias em indivíduos brasileiros com oclusão normal [dissertação]. São Paulo (SP): Universidade Metodista de São Paulo; 2007. [ Links ]
João Ricardo Sato
Rua Santa Adélia, 166 - Bangu
Zip code: 05.056-020 - Santo André/SP, Brazil
Submitted: July 7, 2008
Revised and accepted: February 5, 2009
The authors report no commercial, proprietary, or financial interest in the products or companies described in this article.