Randomized clinical trials of dental bleaching – Compliance with the CONSORT Statement : a systematic review

We reviewed the literature to evaluate: a) The compliance of randomized clinical trials (RCTs) on bleaching with the CONSORT; and b) the risk of bias of these studies using the Cochrane Collaboration risk of bias tool (CCRT). We searched the Cochrane Library, PubMed and other electronic databases, to find RCTs focused on bleaching (or whitening). The articles were evaluated in compliance with CONSORT in a scale: 0 = no description, 1 = poor description and 2 = adequate description. Descriptive analyses of the number of studies by journal, follow-up period, country and quality assessments were performed with CCRT for assessing risk of bias in RCTs. 185 RCTs were included for assessment. More than 30% of the studies received score 0 or 1. Protocol, flow chart, allocation concealment and sample size were more critical items, as 80% of the studies scored 0. The overall CONSORT score for the included studies was 16.7 ± 5.4 points, which represents 52.2% of the maximum CONSORT score. A significant difference among journal, country and period of time was observed (p < 0.02). Only 7.6% of the studies were judged at “low” risk; 62.1% were classified as “unclear”; and 30.3% as “high” risk of bias. The adherence of RCTs evaluating bleaching materials and techniques to the CONSORT is still low with unclear/high risk of bias.


Introduction
Dental bleaching (or whitening) has become the most sought after treatment by patients in search for esthetics.According to study of Al-Zaera, 1 which investigated the research subjects' satisfaction with dental appearance, nearly 66% of the individuals were dissatisfied with the color of their teeth.Another survey conducted in Ankara, Turkey, 2 focused on the treatment of patients who were unhappy with their smile, questioning which treatment these patients would like to receive.About half of the patients suggested dental bleaching (49.9%), followed by esthetic restorations (25.4%), orthodontic treatment (24.5%), and prosthetic restorations (16.9%).
Linked to growing demand, the effectiveness of various protocols and materials used by dental professionals has been extensively studied in the last decades, including longevity of the bleaching outcome. 3,4,5,6esearchers have used clinical or in vitro studies to obtain data that can Declaration of Interests: The authors certify that they have no commercial or associative interest that represents a conflict of interest in connection with the manuscript.
predict clinical performance, as some subjective factors related to the bleaching protocol, such as postoperative sensitivity and other adverse reactions, cannot be evaluated directly. 7,8,9hile laboratory testing is a very useful method to study the diffusion of the components of bleaching gels, such H 2 O 2 , into dental pulp, 10,11 clinical trials can provide reliable and direct evidence to guide clinicians in their choice of materials for in-office and at-home bleaching. 12,13,14,15ence, randomized controlled trials (RCTs) are considered the standard research design for the evaluation of health interventions.In fact, RCTs and systematic reviews are at the top level of the evidence hierarchy. 16RCTs, however, may incur risk of spurious results if their design is flawed or if the respective methodology lacks accuracy. 17Several problems with the design and execution of RCTs raise questions regarding the validity and reliability of the respective findings.This situation may lead to an underestimation or overestimation of the true intervention effect. 18herefore readers should appraise any RCT before a clinical decision is made.This evaluation depends on a good report/writing of the methods and results of RCTs.A group of experts joined efforts in 1996 and proposed several items that should be described in a RCT (CONsolidate Standard Of Randomized Trials [CONSORT] Statement), with the objective of standardizing the reporting of RCTs.The CONSORT Statement was reviewed in 2001 19 and the most recent version was published in 2010. 20,21iven the importance of RCTs in dental bleaching to make decisions regarding protocols, application time, and commercial brand, the aim of this study was to systematically review the literature in peerreviewed journals to evaluate a) the compliance of RCTs with the CONSORT Statement and b) the risk of bias in these RCT studies through the Cochrane Collaboration risk of bias tool (CCRT).

Methodology
This study was not registered, as there are no currently known systematic review registries of methodologies.

Search methods
We following databases: MEDLINE via PubMed, Cochrane Library, Brazilian Library in Dentistry (BBO) and Latin American and Caribbean Literature in Health Sciences database (LILACS) and citation bases: Scopus and Web of Science were consulted (Table 1).The reference lists of all primary studies, as well as the related articles link from the PubMed database from each primary study, were manually searched.Articles in Korean, Japanese, Chinese, Arabic and related languages were not included due to difficult translation.
According to the MEDLINE database, a search strategy was defined according to a terminology for indexing biomedical information (MeSH, Medical Subject Headings, U.S. National Library of Medicine, Bethesda, MD, USA) along with free keywords.For each database, the search strategy was adapted for consultation.In order to standardize the articles evaluated, only studies published since the CONSORT Statement declaration in 1996 were included.

Eligibility criteria
We included parallel and split-mouth RCTs that evaluated the effectiveness of different types of bleaching systems and techniques on color change, toxicity, postoperative sensitivity and application technique.We did not restrict studies with patients of different age groups or populations (Table 2).
Laboratory studies were excluded, as well as those presented as conference abstracts, theses and reports published in any media other than peer-reviewed journals.Additionally, all studies that were published before 1996 were excluded (Table 2).
Three reviewers (A.P., B.M.M. and T.H.) catalogued articles that met the inclusion criteria.Article selection was carried out by first reading the titles and abstracts; then the full text of the paper was read in case of doubts.

Adherence to CONSORT statement
An evaluation tool based on the items related to the methods and results from the 2010 CONSORT Statement 20 was developed to evaluate the reporting completeness of RCTs (Table 3). 22The items related to the title and abstract, introduction and discussion were not evaluated since the evaluation would have  ((((((((((((((((
A total of 12 items of the CONSORT Statement were included in this CONSORT evaluation tool.As some of these items were subdivided, a total of 16 items were evaluated.The given score per item ranged from 0 to 2. In general words, 0 = no description, 1 = poor description and 2 = adequate description.More details regarding the scoring process for each score of each item are displayed in Table 3.Each item was given an equal weighting.
Prior to evaluation, the instrument was discussed between two experienced authors in clinical trials (A.D.L. and A.R.), pilot-tested in 15 articles and checked for accuracy and reproducibility by three evaluators.This process yielded modification of the instrument tool, as new possibilities for each score were observed and discussed during pilot testing.
Three reviewers (A.P., B.M.M. and T.H.) performed the round of scoring using the CONSORT evaluation tool as guide (Table 3).In case of disagreement a discussion followed and the consensus was used to determine the final score.Evaluators were not blinded to the study authors.This was not feasible, as reviewers were familiar with the studies and could easily guess the researchers' affiliation by reading the paper.

Scoring system and statistical analysis
The number of studies by journal, follow-up period and country were analyzed descriptively.Compliance with individual items of the CONSORT Statement was analyzed to identify areas in which authors could improve the description.A chart with the percentage of studies per score in each item was provided.
To achieve an overall compliance score, the scores for the 16 items were added in each article.A trial with adequate descriptions (score 2) for all CONSORT items would have received a maximum score of 32.A mean average score was calculated by period of time, journal and country.Comparison within each factor was performed with the Kruskall-Wallis and Mann-Whitney test at a level of confidence of 95%.Linear correlation analysis between 2015 ISI journal impact factor and the average CONSORT score was also performed.
These additional analyses aimed at offering information about whether improvements in the average CONSORT score occurred over the time and if these improvements were related to the journal and respective impact factor, as well as the living country of the first author.
Negative [0] This information is not reported.
Poor [1]   1. Information can be obtained during reading the manuscript, although this is not explicity reported by the authors.2. There is lack of consistence between sections of the article (examples -abstract does not match the material and methods section; the presentation of the results does not match the description of the trial design; flow diagram presents different information, etc.).

Participants Eligibility criteria
Positive [2]  The inclusion and exclusion criteria is clear, so that readers can know exactly which population the data can be extrapolated to.
Negative [0] The information is not reported.
Poor [1]   1. Incomplete information of eligibility criteria compared to most of the studies on the field.2. Presence of inconsistencies in the inclusion/exclusion criteria that prevents the readers from knowing the population at which the intervention/control groups were performed.

Settings and location
Positive [2]   Clear description of the setting (academic, practice-based research, university, private clinics, etc.) as well as the date at which the intervention was implemented.

Negative [0]
The setting and/or the location is not reported in the text.
Poor [1]  1. Authors describe either the setting or the date but never both.2. This information can be obtained indirectly in the text

Interventions
Positive [2]  The interventions for each group are described with sufficient details to allow replication, including how they were actually administered.

Negative [0]
There is no description.
Poor [1]  There are missing information that prevents the replication of the interventions/comparators.

Outcomes
Positive [2]   At least the primary outcomes were defined in details, including how and when they were assessed.Consider it as clear when the details are clear, but the authors did not use the term "primary outcome" or related synonyms.

Negative [0]
There is no definition of the primary outcome and/or secondary outcomes.
Poor [1]   1.The authors only report they have used a specific criteria without detailing the most important outcomes of such criteria.2. The description of the primary outcome and/or secondary outcomes is very superficial and does not allow replication of the method.

Sample size
Positive [2]   Method of sample size calculation is described in a way to allow replication.It should be identified the primary outcome for each the sample size was calculated for.Elements of the sample size calculation are (1) the estimated outcomes in each group (which implies the clinically important target difference between the intervention groups); (2) the α (type I) error level; (3) the statistical power (or the β (type II) error level); and (4), for continuous outcomes, the standard deviation of the measurements should be reported.For equivalence trials, the equivalence limit, instead of the effect size should be reported.

Negative [0]
There is no description in the article.
Poor [1]  The sample size is described but some parameters are missing so that it prevents replication.

Sequence generation
Positive [2]  1. Clear description of the random sequence generation.2. or clear description of a non-random sequence method.

Negative [0]
There is no information in the text.
Poor [1]   The authors only provide a very superficial description (such as the "groups were randomly allocated") or do not provide sufficient information to allow replication of the randomization process.

Allocation concealment
Positive [2]  Clear description of the allocation concealment.See next columns for evaluation of the Risk of Bias.

Negative [0]
There is no information in the text.

Blinding
Positive [2]   1) The authors describe who is blinded in the study.2. In single-blind studies (when this is clearly reported by the authors), just the description of participant or evaluator (the one blinded) is enough; however when the study is double blind or triple blind all blinded people should be described.
2) The study describes just the participant or examiner blinded but one of these people cannot be blinded by intrinsic features of the study design.

Negative [0]
There is no description of the blinding.
Poor [1]   Insufficient/partial information.For instance, (1) the authors describe examiners' blinding or participants' blinding, but never both.( 2) The authors describe the study was blind or double-blind but does not specify who was blinded.

Hypothesis testing
Positive [2]   Statistical methods are described with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results.Additionally, statistical tests employed by the authors seem to be adequate for the type of trial design and nature of the data collected.
Negative [0] Statistical methods are not described.

Poor [1]
1) There is not enough information to evaluate the statistical method used by the authors and/or the type of statistical tests employed by the authors are inadequate for the trial design and/or nature of the data (for instance, tests that do not take into account the paired nature of the data when this is the case).
2) The authors describe several statistical tests but does not specify for each outcome they were applied.

Estimated effect size
Positive [2]   Authors report at least for the primary outcome the effect size and its precision (such as 95% confidence interval).Odds ratio, risk ratio, risk difference, mean difference, etc.

Negative [0]
There is no description of the effect size and 95% confidence interval Poor [1]  There is incomplete information.

Flow diagram
Positive [2]   For each group, the numbers of participants who were randomly assigned, received intended treatment and were analyzed for the primary outcome is described in the flow chart CONSORT diagram.

Negative [0]
The flow-chart is not presented in the article.
Poor [1]   1.There are inconsistencies between the numbers described in the flow-chart and other parts of the manuscript.2. Incomplete diagram with missing information

Losses/Exclusions
Positive [2]   1.For each group, losses and exclusions after randomization are described with reasons.2. During reading, reviewer observes that there is no loss to follow-up.

Negative [0]
1.There is no description of losses and exclusions.
Poor [1]   Incomplete information.For instance, 1. the authors describe the overall percentage of losses but this information is not specified per group.2. The authors describe the losses and exclusions but does not specify the reasons Continue were in vitro; 7) 2 studies were case series; 8) 1 study was a literature review; 9) 1 study was an ex-vivo study; 10) 1 study is currently in the recruitment phase and evaluation of tooth color (results not yet available); 11) 1 study evaluated the color change of the composite resin after bleaching; 12) 11 studies were not accessible.After these exclusions, 185 RCTs remained for assessment (Figure 1).The included RCTs investigated several topics, such as the comparison of 1) at-home dental bleaching techniques; 2) in-office dental bleaching techniques; 3) patient related factors; 4) in-office vs. at home and 5) combined bleaching techniques.
Table 4 displays the 185 RCTs tabulated by their collected characteristics.The journals contributing with the most RCTs were Oper Dent (17.8%), followed by Comp Cont Educ Dent (11.4%),Am J Dent (7.6%) and Quintessence Int (7.0%).Approximately 29.2% of the publications were published in 37 different journals.The countries with most publications were USA (40.5%) and Brazil (28.1%), representing together about 70% of all publications in the field.The most frequent follow-up period (days) reported in the articles occurred between 14 (22.7%) and 28 (10.3%)days.contain six domains: sequence generation, allocation concealment, blinding of the outcome assessors, incomplete outcome data, selective outcome reporting, and other possible sources of bias.Each domain of the Cochrane risk of bias tool was evaluated at low, high or unclear risk of bias.After assessment of the domains, each study was then evaluated into low risk of bias if all domains were at low risk.The study was judged as at high risk of bias if at least one of the key domains was evaluated as high risk of bias.And finally, the study would be considered at unclear risk, if at least one domain were judged at unclear risk of bias.

Characteristics of the included studies
From the 1925 articles that were originally screened, after removal of duplicates, 1756 were excluded for not complying with the inclusion criteria.The full-text of 234 papers were assessed and 49 papers were excluded for the following reasons: 1) 15 studies were not randomized clinical trials; 2) 7 studies were case reports; 3) 3 studies were duplicates; 4) 2 studies were abstracts; 5) 1 study was published in Korean language; 6) 4 studies

Continuation Baseline data
Positive [2]  A table/text description containing baseline demographic and clinical characteristics of each group are presented in the article.

Negative [0]
There is no table/text description with baseline data or description in the body of the text.
Poor [1]   1.A table/ text description with baseline data is presented but the data is not distributed between the study groups and/or given in percentages instead of raw numbers.2. Insufficient information about participants is provided; 3. Inconsistencies in the data presented can be observed.

Numbers analysed
Positive [2]  For each group and for each outcome, the number or participants (denominator) included in the analysis are clear.

Negative [0]
Authors do not report the numbers analyzed.
Poor [1]   There is no clear description of the number of participants (denominator) included in the analysis of at least one of the outcomes.2. Instead of reporting the raw number of participants, the authors report their data in percentages.3. The authors fail to report the baseline number of patients included in each analysis.4. Data can be obtained indirectly in the study.

Registration and protocol
Positive [2]  The study was registered in a trial registry and the protocol number is provided.
Negative [0] This information is not available in the manuscript.Registration in an Ethics Committee is valid as trial registry Poor [1]  The authors describe that the study was registered but does not provide the registration number and/or the number provided does not link to the study.

Study compliance with each of the CONSORT instrument tool items
Figure 2 displays the percentage of studies per score for each item of the CONSORT Statement in percentage of studies.In regard to the items' intervention and outcomes, more than 80% of the studies were scored as 2, with an adequate reporting.For the items eligibility, hypothesis testing, losses/exclusion and numbers analyzed, more than 50% of the studies were scored as 2.
More than 50% of the studies received score 1 (poor reporting) or score 0 (no reporting) for all other items.This was more critical with the items protocol, flow chart, allocation concealment and sample size where more than 80% of the studies were scored as 0 (no reporting).
In order to help future randomized clinical trials of bleaching, some examples of adequate description of each item of the results, material and methods of CONSORT were added in Tables 5 to 9.

Average CONSORT score per study characteristics
The overall CONSORT score for the included studies in this review was 16.7 ± 5.4 points, which represents 52.2% of the maximum CONSORT score of 32 points.We observed a significant influence of journal, country, and period of time on the average CONSORT score (Table 10).Significant differences among journals were observed (p < 0.0001; Table 10), with the average CONSORT scores of J Dent (higher score), Oper Dent, Clin Oral Investig and JADA being higher than the remaining journals.'Other journals' are composed of 37 different journals, which published 54 different papers (29.1% of total).A significant but weak correlation between average CONSORT score and impact journal factor was observed (r = 0.16; p < 0.0001, Figure 3).
Regarding country, a significant difference was also observed (p = 0.02; Table 10).Brazil showed the highest average CONSORT score, being statistically higher than those of UK, Italy and USA.On the same line, the period of time in years had a significant influence on the average CONSORT score (p = 0.004; Table 10).We observed an increase in the average CONSORT score in the 2011-2016 interval (19.0 ± 6.8) in comparison with the 1996-2000 period (13.4 ± 4.0).The individual CONSORT score for each one of the included studies can be seen in Table 11.

Risk of bias of the included studies
Except for the selective outcome reporting and incomplete outcome data, most of the studies were judged to be at "unclear" or "high" risk of bias in the Cochrane Collaboration tool domains (Figure 4).Table 10 reports the individual risk of bias in each domain for all included studies.This table facilitates the analysis of the risk of bias within each study.Only 14 included studies (7.6%) were judged to be at "low" risk of bias in all domains; 115 studies were classified as at "unclear" risk of bias in at least one domain, resulting in 62.2% of the studies being classified at "unclear" risk of bias at the study level.The remaining 56 studies were classified as at "high" risk of bias in at least one domain, representing 30.3% of studies judged as at "high" risk of bias.

Discussion
Study compliance with the CONSORT Although the CONSORT Statement has been misleadingly used as an instrument to evaluate the quality of the RCTs available in the literature, 24 the aim of the CONSORT Statement is to guide authors to describe details on their studies to enable the evaluation of the risk of bias of RCTs. 25 This is why adherence to CONSORT Statement is of ultimate importance so that readers can appraise the available literature and translate this literature into clinical knowledge pertinent to evidence-based practice.
In the present study, we assessed the adherence of RCTs of bleaching materials and techniques to the CONSORT Statement. 26,27  order to provide a better analysis of the compliance of the studies with each item of the CONSORT score, a 0-2 scale was developed in a way that zero means no reporting, 1 poor reporting, and 2 adequate reporting. 22This is different from what had been done in other papers, which have reported the adherence of RCTs in other dental areas, such as orthodontics, prosthodontics, oral implants, periodontics and pediatric dentistry. 28,29,30,31,32,33These earlier studies were more focused on the journal's compliance rather than the article's compliance with a specific subject.Subsequently, few of these earlier studies performed a comprehensive search review of the articles published in a specific research area, as we have tried to do in the present study.To the extent of the authors' knowledge this is the first study that has attempted to evaluate the adherence of RCTs of bleaching materials and techniques to the CONSORT Statement, which was one of the aims of the present study.
To evaluate the risk of bias of the RCTs it is imperative that we concentrate on the design and the results of any study report.CONSORT adherence to introduction or discussion section increases the quality of the article reporting but does not affect the risk of bias of the studies.This is the reason behind our decision to only evaluate each study's compliance   Example 2: "This was a randomized, parallel, placebo-controlled, triple-masked clinical trial, in which the patient, operator, and evaluator were masked to the group assignment.A third researcher, not involved in the evaluation process, was responsible for the randomization process, and delivery and guidance on the administration of the drugs."57

Eligibility criteria
The authors judged that it was not necessary to add some examples, because this item showed an adequate reporting as seen in Figure 2.

Settings and locations
Example

Interventions
The authors judged that it was not necessary to add some examples, because this item showed an adequate reporting as seen in Figure 2.

Outcomes
The authors judged that it was not necessary to add some examples, because this item showed an adequate reporting as seen in Figure 2.

Sample size
For Tooth sensitivity For superiority trial: "The primary outcome of this study was the absolute risk of TS.The absolute risk of TS (that is, the number of patients [percent] who reported pain at some point during dental bleaching) was reported to be approximately 87% (4,8) for the bleaching product Whiteness HP Maxx (FGM Dental Products).Thus, a minimum sample size of 56 participants was required to have a 90% chance of detecting, as significant at the 2-sided 5% level, a decrease in the primary outcome measure from 86% in the control group to 50% in the experimental group."57 For equivalent trial: We selected the absolute risk of TS as the primary study outcome.Considering the absolute risk of TS to be approximately 90% (19, 40) participants were required to be 90% (study power) sure that the limits of a two-sided 90% confidence interval will exclude a difference between the standard and experimental group of more than 30% (equivalence limit)."59

For Color evaluation
For superiority trial: "The primary outcome of this study was color change of the participants' teeth.A previous study (34) reported that two bleaching sessions with the product Whiteness HP Maxx 35% (FGM Dental Products, Joinville, SC, Brazil) without light activation produced a whitening effect of about 7 ± 2 SGUs.To detect a difference of 2 SGUs between the means of any pair of the study groups, with a power of 80% and an alpha of 5%, a minimum sample size of 17 patients per group was required.This threshold of perceptibility was based on the fact that ''untrained'' people, such as the patients, do not detect easily changes of one shade guide unit at the lighter end of the vita classical guide."58 For equivalent trial: We based the sample size calculation on the color change measured with the spectrophotometer (DE), the primary outcome of the study.One hundred eighteen participants were required to exclude a difference of means of 2.0 units of DE at 1 week and 1 month (equivalence limit) with a power of 90 % and a of 5 %.With these calculations, we took into consideration a standard deviation of 3.3 in the DE.
The equivalence limit we chose was lower than the DE threshold of 3.0, above which color differences become clinically perceptible (24-26)."60

Sequence generation, allocation concealment and implementation
Example 1: "The randomization process was performed by coin toss immediately before the bleaching procedure to provide adequate allocation concealment."61 Example 2: "Participants were randomly divided into four groups according to the combination of the main factors: HP (20% or 35%) and light activation (with or without).A third person who was not involved in the research protocol performed the randomization procedure by using computer-generated tables.We used blocked randomization (block sizes of 2 and 4) with an equal allocation ratio (www.sealedenvelope.com).Opaque and sealed envelopes containing the identification of the groups were prepared by a third party not involved in the study intervention."58

Blinding
Example 1: "The participant and the operator could not be blinded to the procedure, as the application of bleaching gel for different times could not be masked.However, the examiners who evaluated the color changes were not aware of which group the participant was assigned to."62 Example 2: "Neither the participant nor the operator knew the group allocation, both being blinded to the protocol.""The two examiners, blinded to the allocation assignment, scheduled these patients for bleaching and evaluated their teeth against the shade guide at baseline and 30 days after the procedure."63

Continue
with the items related to methodology and results.
Earlier studies with the same aim, conducted on different specialties of dentistry, evaluated additional items, including the subjective items of introduction and discussion sections. 28,29,30,31,32,33 In t present study we observed that the overall CONSORT score for the included studies was 16.6 ± 5.3 points, which represents only 52.2% of the maximum CONSORT score a study could have reached.This reduced compliance with CONSORT Statement was also observed in an earlier study from our research group evaluating the compliance of RCTs in non-carious cervical lesions with the CONSORT.22 Similarly, other dental specialties such as periodontics and pediatric dentistry yielded similar results.For instance, a CONSORT compliance of approximately 60% was observed for RCTs in prosthodontics and implant dentistry.In orthodontics, this compliance ranged from 40 to 70%.28,29,30,34,35 Although these variations are small, they may reflect the inclusion criteria of the RCTs, the method of compliance evaluation, the number of CONSORT items evaluated, and also the period of publication.Our previous study of RCTs in non-carious cervical lesions demonstrated that the adherence of the study increases when the study is more recent.22

Hypothesis testing
The authors judged that it was not necessary to add some examples, because this item showed an adequate reporting as seen in Figure 2.

Estimated effect size
Two examples of how to report an effect size can be seen in Tables 6 and 7.

Flow diagram
Please see the following link to have access templates of the CONSORT flow diagram available in MS Word (http:// www.consort-statement.org/consort-statement/flow-diagram)

Losses and exclusions
The authors judged that it was not necessary to add some examples, because this item showed an adequate reporting as seen in Figure 2.

Baseline data
Two examples of how to report an effect size can be seen in Tables 8 and 9.

Numbers analysed
The authors judged that it was not necessary to add some examples, because this item showed an adequate reporting as seen in Figure 2.

Registration and protocol
Example   The results of the present study confirmed that the journal endorsement of the CONSORT Statement might positively influence the completeness of reporting of RCTs, mainly because three out of four journals with high average CONSORT score (J Dent, Clin Oral Investig, and JADA) have adopted this policy within the last decade.The same tendency has been observed for medical journals 36 and for orthodontics journals, 28,37 but not for RCTs conducted in non-carious cervical lesions. 22Braz Oral Res is another journal that clearly endorses the CONSORT Statement.Although there is an increasing number of journals endorsing the CONSORT Statement in medical journals as well as dental journals, the CONSORT compliance is still considered suboptimal even in these journals. 38   Table 11.List of the scored papers along with their average CONSORT score and evaluation of the risk of bias in each domain.Theoretically, one should expect that journals with high impact factor would publish studies with better reporting standards.Indeed, a significant correlation between journal impact factor and journal average CONSORT score was observed in the present and in earlier investigations, 39,40 but this correlation is usually weak.In the present study the correlation coefficient (R 2 = 0.1602) was also very weak, which means that the great variation observed in the average CONSORT score is not explained by the journal impact factor.
We hypothesize that not all members of the editorial board of these journals check the submitted articles for compliance with the CONSORT Statement, which prevents the journals from reaching an improved reporting score of RCTs.More attention to these items during the peerreview process may be required.Apart from that, the ambiguous language of what was meant by CONSORT endorsement 25,41,42 in journals may prevent a better CONSORT adherence.In fact, instructions on how CONSORT should be used by authors are inconsistent across journals and publishers.For instance, J Dent recommends the use of CONSORT and submission of the checklist and flow diagram in the instructions for authors, while Clin Oral Investig does not recommend the use of reporting guidelines in the instructions. 38ublishers and journals should encourage authors to use CONSORT and set clear instructions for authors regarding full compliance with CONSORT.Braz Oral Res, for example, clearly indicates that authors must fully comply with the CONSORT Statement.
In regard to the period of time, better compliance was observed in more recent studies (2011-2016; mean CONSORT score of 19.0 ± 6.8) than in earlier periods (1996-2000; mean CONSORT score of 13.4 ± 4.0).This finding had been reported by other authors 28,35 and in an earlier RCT study of adhesive materials applied onto non-carious cervical lesions. 22However, this increase is still small and substandard, as it reached slightly more than 50% of the maximum CONSORT score (32 points).Had all trials described the evaluated items correctly, the score might have been closer to 32.
Regarding the country, there is not a clear explanation why papers published by Brazilian researchers reached higher average CONSORT score than authors from more developed countries, such as USA, UK and Italy.We believe that the policies and efforts of Brazil government agencies in supporting training of specialized researchers in Science and Technology, implemented by Periódicos Capes Theses databases (www.capes.gov.br[Coordination of Personal Formation for Higher Education]) in the last 40 years, has led to an increasing number and quality of Brazilian articles in all science fields.Based on data from the SCImago database (www.scimagojr.com),the number of published papers in Dentistry is higher than those in other areas. 43s reported in the results section, the item sample size was reported poorly.This is also problematic in the medical field.For instance, Chan and Altman 44 reported that 73% of the 519 medical trials indexed in PubMed in December 2000 did not report sample size calculation.Although sample size does not affect the validity of the study and its risk of bias, if not done properly and based on a clinically important effect, it may result in underpowered studies, which is usually misunderstood as groups being statistically similar.However, the lack of evidence to reject the null hypothesis does not mean that the groups are similar to one another.It may also mean that the study did not have a sample size big enough to detect a smaller difference if it really existed.
Based on the same premise, by using an infinite sample size we can prove any small and non-clinical relevant difference as being statistically different which may induce readers to change equivocally the standard protocol or technique for others that may be more costly or with higher side effects. 45his is why authors from RCTs should describe in their study the effect size rather than only the results of the hypothesis testing.Effect sizes and confidence intervals make the interpretation of the results easier.If a protocol has a fictitious relative risk for tooth sensitivity of 0.75 (95% CI 0.5 to 0.8), this means that the experimental group has a chance of 25% lower (from 50% to 20% lower) to develop tooth sensitivity.This response carries much more information than only stating that two groups were statistically different based on a probability value of 0.1%, for instance.Unfortunately, in the present study 88.1% of the studies did not report well, or did not report at all, the effect sizes, which is also a problem in medical journals. 46ased on these ideas, researchers are advised to move away from significance tests and to display, instead, an estimate of effect size delimited by confidence intervals.This method incorporates all the i n format ion normally i ncluded i n a hypothesis, but in a way that emphasizes what is really important (clinical significance rather than statistical significance). 46,47,48nother concern in the included bleaching studies is related to randomization.Ideally, such description should include details about both the methods used to generate the random sequence, as well as the method used to conceal this the random sequence.Inadequately and unclearly concealed trials have been shown to result in exaggerated effect sizes in favor of the experimental group. 49This problem also occurs in other areas: poor reporting of allocation concealment was observed in 78% of the RCTs among dental journals 50 and 93% in the specialty of periodontology. 31In the present study problems in random sequence generation and allocation concealment (scores 0 and 1) were seen in 53.5% and 84.8% of the trials, respectively.
These two items (random sequence and allocation concealment) allow readers to evaluate if the study is free of selection bias.A well-done random sequence generation is worthless if not well concealed.The objective of the randomization process is to balance the participants in terms of known and unknown factors so that no other variable apart from the one under investigation can account for the differences observed among participants from distinct groups.
Usually, authors refer to terms such as "random allocation" or "the groups were randomized", without further elaboration.Authors should specify the method of sequence generation (such as a random-number table or a computerized random number generator, coin toss, dice throwing, etc.) as well as restrictions to the process such as stratification, block randomization, etc. 45 Blinding is also a key element in RCT reporting and should not be con fused with allocation concealment, as blinding prevents performance and detection bias 45 instead of selection bias.In some research questions of bleaching studies, operator and patient blinding may be not possible, when for instance light activated systems are being tested.However, evaluator blinding may be always possible and it could be implemented in the study design, mainly if the primary outcome color change is being checked against a shade guide unit.In such case, lack of evaluator blindness would put the study at a higher risk of bias.However, for objective outcomes, such as color measurements with a spectrophotometer, the lack of examiner blindness is not that important.When the primary outcome is tooth sensitivity, which is a patient-centered subjective outcome, it is the lack of participants' blinding and not evaluators' that downgrade the level of confidence in the research findings.
Failures to describe who is blinded in the study are the most common problems observed in the eligible studies.Reports like "this study was single-blind", "this was a double-blind study", are useless, as they do not inform readers of who was in fact blinded.In agreement with these ideas, Pandis et al. 50reported that inadequate description of blinding in RCTs published in leading dental journals ranged from 74 to 100%.In implant dentistry, the lack of adequate blinding reporting was informed to be 58%. 51he design and conduct of some RCTs may be not straightforward, particularly when there are losses to follow-up, or exclusions.This precludes the description of the numbers of participants through each phase of the study in a few sentences. 52his can be simply described by introducing a flow chart with the number of participants in each phase of the trial.Although the CONSORT Statement recommends the inclusion of a flow chart, we observed that only 48.1% of the clinical trials followed this recommendation.
Another type of bias commonly found in RCTs is selective outcome reporting.In general, there is most enthusiasm about the publication of RCTs that show either a large effect of a new treatment (positive trials) or equivalence of two approaches.Consequently, articles with negative findings are less submitted or accepted for publication by journals.This may even be more relevant in sponsored RCTs if the results of the trial place financial interests at risk. 53o manage such problems, the International Committee of Medical Journal Editors (ICMJE) has proposed comprehensive trials registration.Trials must register at or before the onset of patient enrollment. 53For the ICMJE, this policy applies to any clinical trial that started enrollment after July 1, 2005.However, only 12 out of 120 included studies of this review published in 2005 or later performed trial registration (Table 5).Such earlier registration prevents selective reporting and reduces publication bias, two important issues that may downgrade the level of evidence of a randomized clinical trial. 54ome dental journals as J Dent, Oper Dent, and Braz Oral Res have added this indication as mandatory in their instructions for authors.
In regard to numbers analyzed, the number of participants per group in all analyzes should be clear.Reporting summary statistics without their spread over the mean or only percentages, relative risks, odds ratios is not enough as does not allow assessment of whether or not some of the randomly assigned participants were excluded from the analysis.The same should be applied to losses and exclusions.Along with the description of these figures per group, reasons for the losses and exclusions should be given as they may be related to the intervention.For instance, when a patient quits the treatment because another disease is requiring his/her attention, this is unlikely to be related to the intervention; but if a patient does not attend the recalls because he/she wants to be withdrawn from the trial, the reason may be related to side effects or lack of efficacy of the treatments under evaluation.
Baseline information was adequately reported in only 34% of the papers and it is important to check comparability at baseline.Any differences in baseline characteristics are, however, the result of chance rather than bias; the reason of why there is no need to perform hypothesis testing for these characteristics. 55or any item, when reporting data, authors should be careful.They should not display percentages instead of raw figures, as it is risky.Rounded percentages may be compatible with more than one numerator and if the authors fail to provide the total number of participants, the number of participants in the event under evaluation will be unclear.For instance, 90% may represent 1 out of 10 but also 100 out of 1000 -this makes a profound influence on the precision of the data.Merging data of groups can done as long as their individual data are also reported.Finally, summary statistics for continuous variables should be presented with their measure of spread; for dichotomous variables authors should describe the number of counts vs. total number of observations. 22he trial design involves the description of type of the trial (parallel, cross-over, factorial, split-mouth and or multiple restorations); the conceptual framework (superiority, non-inferiority or equivalence trial) and also the allocation ratio (example 1:1 or 1:2). 20The settings (where and when the study was performed) are also essential to place the study in historical context and to evaluate its external validity (generalization of the findings to other populations).

Risk of bias
Although incomplete outcome data and selective reporting were poorly described, this occurred in small percentage of the studies.In all other domains of the Cochrane Collaboration risk of bias tool, most the RCTs were judged to be at "unclear" or at "high" risk of bias.The implications of inadequate sequence generation, allocation concealment and examiner blinding were already discussed in details.
At the study level, only 7.57% of the studies were considered to be at low risk of bias, which means being low risk of bias in all domains.The remaining studies were at unclear or high risk of bias.This is worrying since our treatment decisions are being based on studies that do not have a rigorous methodology and therefore they may lead to biased results.
Although CONSORT guidelines have been included in the instructions for authors of some journals, active compliance is far from being achieved.Perhaps, the inclusion of additional subheadings, as suggested by Kloukos et al. 29 might result in better compliance with the CONSORT Statement.The results of the present study indicate that adherence of RCTs of bleaching systems to the CONSORT Statement requires improvements.Adherence to the CONSORT Statement will also make readers to rethink their methodology and ultimately reduce the high risk of bias of studies in the field.
There are some limitations in the present study.Although a very comprehensive search in terms of different databases with specific vocabulary and keywords were performed, we may have missed some articles in the search.
Nevertheless, looking at Table 4, the higher numbers of the papers were produced in USA and Brazil and the majority of them were published in English language journals.Only a few papers were published in Portuguese or Spanish (10 in total).Also, as mentioned in the results section, only one paper was excluded due to language.These details make us confident in the results herein presented.Although other studies on the field may not be cited here they are unlikely to change the results herein presented.

Figure 2 .
Figure 2. Percentage of studies per CONSORT score for each CONSORT item analyzed.
1: "The study took place in the clinics of the dentistry schools at the State University of Ponta Grossa, Paraná, and the University of São Paulo, São Paulo, from June 2010 to June 2012."58Example 2: "This study was performed from February 2011 to March 2012 in the city of Guarapuava (Paraná, Brazil)."12 64 SD: Standard-deviation; L*: luminosity; b*: color along the yellow-blue axis; a*: Color along the red-gree axis;.

Figure 3 .
Figure 3. Linear regression between Impact Factor and Consort Score.

Table 2 .
Inclusion and exclusion criteria.

Table 3 .
Instrument tool developed from the 2010 CONSORT Statement to evaluate the compliance of the studies to the CONSORT Statement.

Table 4 .
Characteristics of the included studies by categories.

Table 5 .
Examples of adequate description of the evaluate parameters of the Instrument tool developed from the 2010 CONSORT Statement for bleaching studies.This study was a randomized, single-blind, controlled trial with a parallel group and an allocation rate of 1:1."56

Table 6 .
Baseline characteristics of the participants.
Adapted from DeGeus et al.

Table 7 .
Demographic features of the participants of each study group.

Table 8 .
Means (standard deviations) of the change in shade guide units obtained with the VITA Classical and VITA Bleachedguide* and the color change measured by spectrophotometer at baseline versus 1-month postbleaching.

Table 9 .
Absolute risk of tooth sensitivity, along with the risk ratio, for both groups at the different assessment points.

Table 10 .
Average CONSORT score per journal, country and period of time.