Biostatistics: essential concepts for the clinician

ABSTRACT Introduction: The efficiency of clinical procedures is based on practical and theoretical knowledge. Countless daily information is available to the orthodontist, but it is up to this professional to know how to select what really has an impact on clinical practice. Evidence-based orthodontics ends up requiring the clinician to know the basics of biostatistics to understand the results of scientific publications. Such concepts are also important for researchers, for correct data planning and analysis. Objective: This article aims to present, in a clear way, some essential concepts of biostatistics that assist the clinical orthodontist in understanding scientific research, for an evidence-based clinical practice. In addition, an updated version of the tutorial to assist in choosing the appropriate statistical test will be presented. This PowerPoint® tool can be used to assist the user in finding answers to common questions about biostatistics, such as the most appropriate statistical test for comparing groups, choosing graphs, performing correlations and regressions, analyzing casual, random or systematic errors. Conclusion: Researchers and clinicians must acquire or recall essential concepts to understand and apply an appropriate statistical analysis. It is important that journal readers and reviewers can identify when statistical analyzes are being inappropriately used.


INTRODUCTION
Every professional, regardless of the area of training, has a role in decision-making based on theoretical and practical knowledge. Regarding health professionals, where it is essential to maintain or promote the health of the patient, any inappropriate decision may cause irreversible biological damage to patients. Currently, Orthodontics has been submitted to an avalanche of new information, technologies and experiences, which are easily accessible. And it is up to the orthodontist to discern the reliable scientific knowledge from those who have errors or bias -acquiring for their clinical practice what will, for example, reduce error rates, waste, unsuccessful therapies and unnecessary exams. 1,2 Evidence-based Orthodontics can become a challenge for clinicians. This is because published papers often present information that makes understanding scientific knowledge a complex task. 3,4 A substantial level of experience in statistical understanding is necessary in the critical reading of the research, the methodology used, data analysis and interpretation of the results, for the acquisition of conclusions that will reduce the uncertainties in decision making, in view of the variability of available options. 2,[5][6][7] Statistics are known to have a direct connection to mathematics. And the culture of fear and anxiety that surrounds it makes the assimilation of statistical concepts and methods complex. 8 Some studies show that graduate students, despite understanding the importance of biostatistics, do not have the skills to apply it correctly in scientific research; and that attitudes, successes and failures in face of statistical challenges are linked to basic knowledge. 6,[9][10][11] This ends up having an impact on scientific publications. Studies showed that it is common to find errors such as incompatible study design, inadequate analysis and inconsistent interpretations. [12][13][14] The basic concepts, which are fundamental to avoid errors, are often easy to forget, impacting the choice of statistical tests used in the data analysis. In addition, most statistical software does not guide the user in choosing the most appropriate statistical test for the research, generating scientific publications that do not contribute to the solution of a clinical problem, due to the wrong data analysis. 15 Therefore, the objective of the present article is to clearly review some essential concepts of biostatistics that will assist clinical orthodontists in understanding scientific research for an evidence-based clinical practice, in addition to indicate the main errors observed in published articles. Then, it will be presented the updated version of a PowerPoint ® guide, originally published in 2010, to assist in choosing the appropriate statistical test. 16 This guide is useful for readers, authors and reviewers of scientific articles.

BASIC CONCEPTS
Biostatistics is a method used to describe or analyze data obtained from a sample that represents a population. It is used in studies in which variables are related to living beings. 17,18

WHAT IS A VARIABLE AND HOW IS IT MEASURED?
Variable is a characteristic or condition that can be measured or   Questions where the answers can be "yes" or "no".

DEPENDENT VARIABLE Called "Response Variable"
INDEPENDENT VARIABLE Also known as "Explanatory" or "Predictor"

Concept
It is the event or characteristic that you want to discover or explain. It represents a quantity whose appearance, disappearance, increase, decrease, etc. depends on how the independent variable is handled by the researcher.
It is the determining factor, condition or cause that makes it possible to predict a response, effect or consequence. It can vary during the study or be controlled, but is not affected by any other variable within the experiment.

Example
In one study, it is intended to ascertain the need for orthodontic treatment based on gender, age, education, socioeconomic level and perception of oral health. Thus, the response variable (dependent) of the study is the "need for orthodontic treatment", while the others are explanatory (independent) variables.

Figure 1:
Graphical exemplification of normal and abnormal distributions.

THE IMPORTANCE OF NORMAL DISTRIBUTION
A distribution in biostatistics refers to a mathematical model that relates values of a variable and the probability of occurrence of each value. It should be clarified that whenever there is a quantitative variable that will be analyzed, it is assumed to verify the normality of the data distribution, by statistical test and/or histogram, according to the need. Some statistical tests require a distribution with normal characteristics as a requisite. Where the data is concentrated around the average and from there they are dispersed in a symmetrical way, with a characteristic bell-shaped graph. When the distribution is different from the normal, preference should be given to the use of median and interquartile deviation. [17][18][19] The graphical elucidation of normal and abnormal data distributions can be seen in Figure 1.

HOW SHOULD THE DATA BE PRESENTED? (DESCRIPTIVE STATISTICS)
The organization and presentation of these data, made by appropriate methods, can be summarized, known as descriptive statistics. This concept is the initial step for an appropriate selection and use of statistical tests. Descriptive statistics can be divided into frequencies and/or summary measures of central tendency and dispersion (Table 3). 8 Also known as "dispersion measures", as they reveal how the data varies or is distributed around its midpoint.
Amplitude: Is the difference between the highest and lowest value in a data set.

Standard deviation (SD):
Is the value that represents the symmetric average dispersion of data around the mean of a data set. It is used in quantitative data with normal distribution.
Variance: Is the standard deviation value raised to the square.    Currently, journals and reviewers have requested in the results not only the p-value, but also the referring confidence interval (CI).
Some years ago, only a few studies with with multivariate analyzes reported the CI found. 21 A systematic review 22 showed that the interpretation of the CI is important, but it rarely occurs in those randomized clinical trials where the effects of treatments were not statistically significant. This can lead to the abandonment of future research or to a clinical practice based on invalid conclusions.

TYPES OF STUDIES
The execution of a study must always be planned, and this plan for conducting the research is called research design. It must follow specific standards and techniques, according to the nature of the study. 6,[17][18][19] The quality of research designs is related to the strength of recommendation and applicability to the patient. 18 This difference between the degree of strength of the types of studies can be seen in Figure 2, representing a pyramid of evidence.
This pyramid incorporates the suggestion of Murad et al. 23 that considers not only the study design, but also the assessment of the certainty of the evidence, examined by the GRADE tool  » Cross-sectional: It is considered a "portrait study". It determines the situation of interest and outcome in a single moment, assessing the prevalence and relationship between variables, comparing exposed and unexposed or with disease and without disease.
Example: to analyze the association between gingival inflammation (present/absent) and the use of orthodontic appliance (exposure) in a single moment of treatment, comparing with patients without orthodontic treatment (control).
» Cohort: It is a longitudinal study, considered a "film study". It starts from the exposure to the outcome (disease), and observes over time individuals exposed and not exposed (control group) to a Although it is generally retrospective, they can be carried out prospectively. Example: comparative analysis of orthodontic patients with and without gingival inflammation (disease, or outcome) among patients who did or did not use daily mouthwash. In this case, starts from the disease to the exposure.

Randomized Clinical
Trial: This is a simulation study of the reality in which an exposure or intervention in the experimental sample occurs, in comparison to a control group. The main feature is the allocation of research subjects being carried out by randomization between groups. It is a highly controlled study. However, the randomization method can fail, especially when small samples are analyzed.
3. Synthesis: This category includes the secondary study called "Systematic Review". It uses primary studies as a source of data to obtain the answer to a key question. It is a scientific investigation carried out under a rigorous methodology for both data searches and analyzes, and the consequent determination of the certainty of the available evidence. When possible, a "meta-analysis" is carried out, which is the statistical analysis to combine the results of the included primary studies. It is at the top of the evidence pyramid for clinical decision-making (Fig 2).

MAIN ERRORS IN THE STATISTICAL METHODOLOGY OBSERVED IN THE PUBLISHED ARTICLES
Articles that will be submitted to journals must be very well written and designed. This requires that the study be conducted in a reliable manner, allowing the correct description of all the steps performed and the consequent ease of reading and acceptance of the article by the reviewers 25 . Below are the most common errors found in published articles, regarding the statistical methodology employed.

USE OF COLUMN / BAR GRAPH FOR QUANTITATIVE VARIABLES
Column graphics should be used for frequency graphics, as each column represents a category. When we have numeric variables, we should use the box-plot graph (Fig 3) for independent samples and the line graph for data over time 8,18 . The box-plot, unlike the column graphic, allows us to observe the summary measure (mean or median) and the dispersion of the obtained values.

USE OF PARAMETRIC TESTS WHEN NORMALITY IS NOT ACHIEVED
Parametric tests are more powerful than non-parametric tests, but they presume a normal distribution of data. Numerical data with abnormal distribution should be analyzed as if they were qualitative data. The use of a parametric test in this situation implies greater ease in rejecting the null hypothesis, which may not represent the population's reality. 8,19

ABNORMAL DISTRIBUTION
Although some researchers correctly use non-parametric tests when the assumption of normality is broken, they sometimes incorrectly use the mean and standard deviation when presenting data. These should not be used exactly due to the asymmetric distribution. In this case, use the nonparametric reference that divides the data in half: the median and its deviation (interquartile deviation). 1 The use is simple and must be done in the "presentation mode".
The INITIAL MENU represents the objective intended by the researcher. There are six possible options (Fig 4), where it is possible to arrive at the desired answer through a sequence of clicks on drops.
By clicking on the option "Examine the type of distribution", you have the direct answer of which test you can perform to examine the distribution of data for a quantitative variable. The same occurs when clicking on the drop "Survival analysis", in which the possible options for survival tests are directly available. So that, after a sequence of clicks on drops, it is possible to obtain the desired response. Figure 5 exemplifies a submenu - in this case, the comparison.
This sequence of clicks requires basic knowledge about types of variables (Table 1) and data distribution (Fig 1). It is also necessary to understand the difference between dependent (paired) and independent (unpaired) samples. Paired samples are those in which the comparison is dependent on and, in some cases, in VassarStats, by clicking on the corresponding icon that will appear (Fig 6).  Overall responsibility:

DN.
The authors report no commercial, proprietary or financial interest in the products or companies described in this article.