Dental flossing as a diagnostic method for proximal gingivitis : a validation study

This study evaluated the clinical diagnosis of proximal gingivitis by comparing two methods: dental flossing and the gingival bleeding index (GBI). One hundred subjects (aged at least 18 years, with 15% of positive proximal sites for GBI, without proximal attachment loss) were randomized into five evaluation protocols. Each protocol consisted of two assessments with a 10-minute interval between them: first GBI/second floss, first floss/second GBI, first GBI/second GBI, first tooth floss/second floss, and first gum floss-second floss. The dental floss was slid against the tooth surface (TF) and the gingival tissue (GF). The evaluated proximal sites should present teeth with established point of contact and probing depth ≤ 3mm. One trained and calibrated examiner performed all the assessments. The mean percentages of agreement and disagreement were calculated for the sites with gingival bleeding in both evaluation methods (GBI and flossing). The primary outcome was the percentage of disagreement between the assessments in the different protocols. The data were analyzed by one-way ANOVA, McNemar, chi-square and Tukey’s post hoc tests, with a 5% significance level. When gingivitis was absent in the first assessment (negative GBI), bleeding was detected in the second assessment by TF and GF in 41.7% (p < 0.001) and 50.7% (p < 0.001) of the sites, respectively. In the absence of gingivitis in the second assessment (negative GBI), TF and GF detected bleeding in the first assessment in 38.9% (p = 0.004) and 58.3% (p < 0.001) of the sites, respectively. TF and GF appears to be a better diagnostic indicator of proximal gingivitis than GBI.


Introduction
Plaque-induced gingivitis is an inflammation caused by the accumulation of microorganisms around the gingival margins. 1 As gingivitis precedes periodontitis, the diagnosis of marginal inflammation allows good quality monitoring of plaque control at home. 2 Both prevention and treatment of gingivitis are important as it affects almost 100% of the general population. 3,4There is a large abundance of supragingival biofilms in the general population, indicating poor oral hygiene, particularly in proximal areas. 5,6leeding upon probing of the marginal sulcus 7 is an indication of gingivitis and is universally applicable in clinical and epidemiological studies owing to the ease and speed of the technique. 8,9However, variations Declaration of Interests: The authors certify that they have no commercial or associative interest that represents a conflict of interest in connection with the manuscript.
in the depth of insertion and angulation of the probe can influence results.Probing can stimulate bleeding in deeper regions of the pocket and cause injury and, therefore, potentially compromise the diagnostic value of the technique. 10,11vidence suggests that gingival inflammation at proximal sites initiates in the central area of the papilla. 12,13,14As this area is not fully stimulated by probing, the presence of gingivitis in proximal regions with an established point of contact might be underestimated.Flossing was first proposed as a diagnostic tool in gingivitis by Carter and Barnes. 9n comparison with periodontal probing, flossing stimulates a larger area of the papilla and it is therefore hypothesized that it can be more accurate for the diagnosis of proximal gingivitis.
Previous studies 15,16 have compared the use of dental floss with the gingival bleeding index (GBI) 7 and found that dental floss is a suitable tool for the diagnosis of proximal gingivitis in children and that it is particularly useful as an indicator of gingival health.However, there is no evidence comparing flossing with GBI in adults with no history of periodontitis.
Therefore, the aim of this study was to compare the diagnostic ability of flossing in the detection of proximal gingivitis with the GBI in adults with gingivitis but with no history of periodontitis.It was hypothesized that flossing detects more bleeding sites than the GBI.

Methodology Study design and eligibility criteria
This validation study evaluated 100 subjects recruited from the School of Dentistry of the Universidade Federal de Santa Maria -UFSM.Subjects were clinically screened between July 2013 and April 2014.Volunteers were asked to complete an interview about their health and habits to verify study eligibility.
Eligible subjects were ≥18 years old, with at least 20 present teeth and 15% of positive proximal sites for GBI, 16 in addition to having 12 papillae (teeth without giroversion and/or diastema).Subjects were not included if they presented proximal attachment loss, had a probing depth >3 mm at proximal sites, had diabetes mellitus, or used cyclosporine, phenytoin, or nifedipine.Smokers or former smokers, pregnant or lactating women, wearers of fixed orthodontic devices or fixed retainers, or those who had used antibiotics/anti-inflammatory drugs in the past 3 months prior to screening were also not included.

Sample size
The difference in the mean percentage of disagreement for sites with gingival bleeding between dental floss use (33.7 ± 18.3) and the gingival bleeding index (GBI) (18.2 ± 10.7) was considered for estimation of the minimum sample size. 16The parameters used were: 10% difference in the mean percentages for sites with gingival bleeding between the assessments, standard deviation of 15%, an 80% statistical power, α = 0.05, and a paired design.Twenty subjects were necessary for each evaluation protocol.The final sample included a total of 100 subjects.

Evaluation methods
The assessments were conducted at the outpatient dental clinic of UFSM.Prior to the assessments, the teeth were air-dried and isolated with cotton rolls.Proximal spaces between the central incisors and between the second and third molars were not evaluated.
The following assessments were performed: a. Gingival bleeding index: exam method was adapted from Ainamo and Bay. 7A periodontal probe (Williams, Neumar, São Paulo, Brazil) was positioned at the transition angle between the free and the proximal surfaces parallel to the long axis of the tooth. 8The probe was inserted into the gingival sulcus at around 2 mm 17 and advanced along the sulcus.The probe was extended as close as possible to the central region of the papilla.This procedure was performed once.The presence of marginal bleeding was assessed within 10 seconds.Bleeding in the buccal and/or lingual aspects of the papilla indicated gingivitis.b.Bleeding index with flossing: exam method was adapted from Carter and Barnes. 9The insertion of the waxed floss (Sanifill, São Paulo, Brazil) was performed with two gentle rubs.One examiner (A.P.G.) performed this procedure.The dental floss was slid against the tooth surface (Tooth Flossing -TF) and against the gingival tissue (Gingival Flossing -GF).The presence of bleeding within 10 seconds originating from the buccal or lingual/palatal niches was recorded.Bleeding in the buccal and/or lingual aspects of the papilla indicated gingivitis.
The choice of TF was based on the methodology of Carter and Barnes. 9We also considered GF in order to provide the closest possible contact with the sulcular epithelium for its stimulation.GF was therefore added as a stimulator, similar to the GBI method, 7 in which the periodontal probe is rubbed against the gingival tissue in order to establish close contact with the sulcular epithelium during the assessment.

Evaluation protocols
Five evaluation protocols were used.Each protocol consisted of two assessments with a 10-minute interval between them.a. GBI-Flossing: Assessment of GBI followed by dental flossing.The GBI was recorded at all proximal sites.TF was assessed in two randomly selected contralateral quadrants.GF was performed in the other two quadrants.b.Flossing-GBI: Flossing assessment followed by GBI.TF was performed in two randomly selected contralateral quadrants, while GF was performed in the other two quadrants.
After that, the GBI was evaluated.c.GBI-GBI: Assessment of GBI followed by repetition of GBI assessment.d.TF-Flossing: TF was performed followed by TF and GF.TF was recorded in two randomly selected contralateral quadrants, while GF was performed in the other two quadrants.e. GF-Flossing: GF was performed followed by TF and GF.TF was recorded in two randomly selected contralateral quadrants, while GF was performed in the other two quadrants.

Examiner's training and calibration
One trained and calibrated examiner (A.P.G.) performed all the assessments.Intra-examiner calibration was performed in 10 individuals not included in the study (188, 114, and 108 sites evaluated for GBI, TF, and GF, respectively) after training by a gold standard examiner.The calibration was performed using the same protocols as those chosen for the assessments.Ten-minute assessments were performed in duplicate for the GBI and dental flossing.Reproducibility was assessed using the Kappa (K) coefficient.K coefficients were 0.78, 0.89, and 0.83 for GBI, TF, and GF, respectively.

Randomization
The subjects were randomized into the evaluation protocols (n = 20).Block randomization, with the aid of a brown paper (four blocks of 25 patients, with five possible groups), was performed by another individual, i.e., not the examiner.
The quadrants and the way that the floss should be used were chosen by a coin toss when TF or GF was used in the same assessment but in different quadrants.One coin was used for when the floss was used against the teeth or gums and another coin was used for when flossing was applied to quadrants 1 and 3 or 2 and 4. The examiner conducted the randomization of the quadrants.

Outcomes
The primary outcome was the difference in the mean percentage of disagreement for sites with gingival bleeding between the evaluation methods (GBI and flossing).Secondary outcomes included sensitivity (S), specificity (SP), positive predictive values (PPV), negative predictive values (NPV), and accuracy (A).

Ethical considerations
El ig ible sub je c t s were re qu i red to sig n a consent form.This study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of the Universidade Federal de Santa Maria -UFSM, RS, Brazil (CAAE: 15139513.8.0000.5346).

Statistical analyses
The site was the unit of analysis.The mean percentages of agreement and disagreement were calculated for the sites with gingival bleeding between the evaluation methods (GBI and flossing).These data were computed for anterior sites (from mesial surface of canine to the distal surface of central incisor) and posterior sites (distal surface of canine to the mesial surface of second molar).The S, SP, PPV, NPV, and A values were also calculated when considering the GBI as the gold standard.Statistical analyses were performed after testing the data for normal distribution using the Kolmogorov-Smirnov test.The frequencies of the presence/absence of bleeding for each sequence were compared using McNemar's test at a 5% significance level.The distribution of gender and race for each group was compared by the chi-square test.The average number of teeth present, evaluated papillae, and the percentage of total bleeding were compared using one-way ANOVA and Tukey's post hoc test.The analyses were performed using the SPSS Statistics 18 software (Statistical Package for Social Sciences, Chicago, USA).

Results
Table 1 shows the characteristics of subject in the five evaluation protocols.There were no statistically significant differences between the subjects in the five protocols, except for gender (p = 0.016).Of the total 174 subjects screened, 74 were not included (Figure 1).
Figure 2 presents the mean percentage of non-bleeding sites in the first assessment, which had presented gingival bleeding in the second assessment according to the evaluation methods and to site location.In the absence of gingivitis in the first assessment (negative GBI), bleeding was detected in 19.5%, 41.7%, and 50.7% of the sites in the second assessment using the GBI, TF, and GF methods, respectively.Except for TF-TF (p = 0.392) and GF-GF (p > 1.000), all the other evaluation protocols presented statistically significant difference in the mean percentage of bleeding sites between the first and second assessments (p < 0.05, McNemar's test).
Figure 3 presents the mean percentage of bleeding sites in the first assessment, which had not presented gingival bleeding in the second assessment according to the evaluation methods and to site location.In the presence of gingivitis in the first assessment (positive GBI), bleeding was not detected in 6.7%, 8.7%, and 3.4% of the sites in the second assessment using the GBI, TF, and GF methods, respectively.These values were 38.9%, 6%, and 1.9% when TF was performed in the first assessment and 58.3%, 43.6%, and 8.3% when GF was conducted in the first assessment, respectively (Figure 3).Except for the TF-TF (p = 0.392) and GF-GF (p > 1.000) sequences, there was a statistically significant difference in bleeding frequency between the first and second assessments (p < 0.05, McNemar's test).

GBI-Floss
Floss-GBI GBI-GBI TF-Floss GF-Floss  The stratified analysis of anterior and posterior sites in Figure 2 showed that, in the absence of bleeding in the first assessment, there was a statistically significant greater percentage of bleeding at posterior sites (molars and premolars) compared with anterior sites (incisors and canines) in the second assessment for all of the evaluated sequences.Figure 3 shows that in the presence of bleeding in the first assessment, there       Posterior Anterior was a statistically significant higher percentage of non-bleeding at posterior sites (molars and premolars) compared with anterior sites (incisors and canines) in the second assessment for all the sequences, except for TF-GBI (p = 0.678).The S, SP, PPV, NPV and A values are presented in Table 2.For NPV, the chance of obtaining true results in the absence of disease (negative GBI) was 0.88 and 0.90 for TF performed before and after the GBI, respectively; the corresponding GF values were 0.84 and 0.95.In relation to the PPV, the chance of obtaining true results in the presence of disease (positive GBI) was 0.61 and 0.59 for TF performed before and after the GBI, respectively, and 0.41 and 0.55 for GF before and after the GBI, respectively.Table 3 shows the percentage of agreement and disagreement for the sites in all protocols.More than 50% of the sites are in disagreement when compared with the dental flossing and GBI protocols.

Discussion
The findings of this study indicate that flossing against both the teeth and gums can diagnose proximal gingivitis (bleeding) at a higher frequency in adults compared with the GBI.These findings were more apparent at posterior sites than at anterior ones.
Approximately half of GBI non-bleeding sites in the first assessment presented bleeding in the second TF and GF assessments.These findings can be explained by two hypotheses.First, the GBI is not as adequate as dental flossing (tooth flossing or gingival flossing) for the diagnosis of proximal gingival bleeding.The subgingival area stimulated during probing and flossing can be different.Waerhaug 18 demonstrated that the floss is normally inserted from 2.0 to 3.5 mm subgingivally.In the GBI assessment, the probe was inserted around 2 mm into the gingival sulcus and run along the sulcus once.Although our study did not include subjects with a probing depth and clinical attachment loss over 3 mm, the disagreements between flossing and the GBI may also have been influenced by the greater depth of subgingival flossing compared to that of probing.Another consideration is that, with flossing, the proximal sites with established points of contact reach an area that does not seem to be fully accessed by the probe during the GBI.At the bleeding sites, the papillary central regions presented a percentage of inflammatory infiltrate approximately 3.5 times greater than that of the buccal and lingual regions. 12,13Therefore, flossing appears to have a higher sensitivity for the diagnosis of proximal gingival inflammation than the GBI and it is possibly better than the latter for the diagnosis of the early inflammatory stages of gingivitis. 14This hypothesis is Table 3. Percentage of sites with agreement and disagreement for each one of the nine protocols.GBI: gingival bleeding index; TF: tooth surface ; GF: gingival tissue.GBI-TF (n = 452); TF-GBI (n = 456); GBI-GF (n = 450); GF-GBI (n = 448); GBI-GBI (n = 950); TF-TF (n = 446); TF-GF (n = 450); GF-TF (n = 458); GF-GF (n = 470).All comparisons between the 1st and 2nd exam: Significant differences (p < 0.05, McNemar test), except TF-TF (p = 0.392), GF-GF (p > 0.999).supported by the findings from the stratified analysis, which show that there were more discordant results between flossing and GBI when posterior sites had a significantly higher frequency of bleeding in the second assessment compared to anterior sites, in the absence of bleeding with GBI in the first assessment.In posterior teeth, the greater extent of the papilla in the buccolingual direction, 19 where the probe is less likely to reach the central papilla, may explain the greater discordant results between posterior and anterior teeth.These results corroborate those reported by Caton et al., 20 who demonstrated more discrepancies between assessments at posterior sites than between anterior ones.Another interesting result was that 38.9% and 58.3% of the sites that showed bleeding in the TF and GF protocols, respectively, in the first assessment, did not bleed with the GBI in the second assessment.These results reinforce the hypothesis that flossing may be more accurate than the GBI for the diagnosis of proximal gingivitis.

First exam
The second hypothesis is that the use of dental floss may be more traumatic than the GBI.While the floss was inserted by rubbing twice, the periodontal probe was only applied once.Therefore, the flossing results may have been overestimated.However, according to our data, in the absence of bleeding in the first assessment, when the same evaluation method was repeated (GBI-GBI, TF-TF, or GF-GF), more bleeding was observed in the second assessment.This tendency was expected owing to the trauma caused by the double consecutive mechanical stimulation. 15Nevertheless, the GBI in the second assessment appeared to have caused more trauma to the gingival tissue than both the tooth and gingival flossing protocols.These data, however, refute the explanation that the differences in the results are due to trauma and reinforce the hypothesis that flossing may be the most accurate method for the diagnosis of proximal gingivitis.Nonetheless, these hypotheses are inconclusive, and caution should be exercised in the interpretation of our results, due to the lack of histological evaluation comparing these two methods.
It is essential that the measurements made in clinical trials and epidemiological studies be accurate and reproducible.Our study indicates a high degree of intra-examiner reliability in the assessment of interdental inflammation using bleeding after stimulation; the reliability obtained with the dental flossing protocol was higher than that with periodontal probing.One reason for this reliability may be the method of stimulation used for bleeding.The use of a periodontal probe is, by nature, highly prone to intra-and inter-examiner error. 10,11When probing is performed, the force used to probe, probe size, and probe position are factors that can lead to variation in the results.On the other hand, when dental flossing is performed, there may be fewer variables that could be susceptible to an assessment bias.
Previous studies have also evaluated TF as a diagnostic tool for proximal gingivitis. 9,15,16In the present study, flossing against the teeth was compared with that against the gums.In the absence of bleeding with TF in the first assessment, 19% of the sites bled following the second assessment with GF.When this sequence was reversed, 43.6% of the sites that bled in the first assessment with GF did not bleed in the second assessment with TF.These results can be explained by the fact that TF does not probably have close contact with the sulcular epithelium in the central region of the papilla.
High S and NPV values were recorded in the first GBI-second flossing group; this can be explained by the low numbers of false negative results.When the gold standard detected the presence of gingivitis, so did the flossing, and when flossing failed to detect gingivitis, the likelihood of the assessment being correct increased.By contrast, low SP and PPV values were recorded for flossing when compared with the GBI.This can be explained by the numerous false positives; there was a higher percentage of bleeding sites with flossing which did not bleed with the GBI (false positives) compared to the percentage of non-bleeding sites with flossing that bled with the GBI (false negatives).By reversing the order of the assessments, in the first flossing-second GBI group, lower S values and higher SP values were observed; however, there was a larger number of false positives than false negatives.These findings are not in agreement with those of Mariath et al., 16 who found low S values and high SP values for the first GBI-second flossing and first floss-second GBI groups.These differences may be explained, in part, by the fact that children were evaluated in the study of Mariath et al. 16 Gingivitis is less severe in children compared to adults, when similar amounts of dental plaque are found. 21urthermore, in children, the junctional epithelium in the primary dentition tends to be thicker than in the permanent dentition, which potentially reduces the permeability of the gingival structures to the bacterial endotoxins that trigger the inflammatory response. 22he lower SP values for GF (0.66) compared to TF (0.80) may be explained by the fact that GF detected more bleeding sites than TF when GBI did not detect bleeding in the second assessment.Therefore, GF appears to be a more appropriate method for the diagnosis of gingivitis.
The lack of consensus on a gold standard for the evaluation of gingival inflammation is an issue in validation studies.In this study, the GBI was selected as the gold standard owing to the ease and simplicity of its application in clinical and epidemiological studies and due to the fact that its diagnosis of gingivitis, conducted through the detection of bleeding after stimulation, is similar to that of the flossing evaluation method.The conceptual hypothesis of this study considered the GBI to be a less valid method for the diagnosis of proximal gingivitis at sites with an established point of contact compared with flossing.Therefore, it could be suggested that the sensitivity and specificity results are relative and indicate co-positivity and co-negativity, respectively, because the diagnosis of proximal gingivitis through the GBI may not adequately diagnose the disease.
Another limitation of the study is the lack of blinding in the first assessment.However, the fact that at least 40 sites were assessed suggests that measurement bias due to memorization of previous results by the examiner was probably reduced.
The results of this study may have low external validity as they do not directly apply to individuals with periodontal attachment loss and proximal periodontal pockets over 3 mm, which are highly prevalent in the general population.However, the presence of more severe periodontal conditions can be diagnosed by well-established clinical signs. 23In population surveys, highly sensitive tests should be used when the disease prevalence in the population is low.In this regard, the use of dental floss may not reflect major changes in the disease prevalence since the prevalence of gingivitis is high in different populations. 3,4Nevertheless, the relevance of these present findings particularly applies to individuals who do not have periodontal attachment loss, as changes in gingival health can be detected as early as possible so that preventive measures can be taken early as well.

Conclusion
Our results demonstrate that flossing detects more bleeding at proximal sites than did GBI in subjects without periodontal attachment loss and periodontal pockets.Flossing rubbed against the gingival tissue appears to be a more appropriate method for the diagnosis of gingivitis.The differences in the evaluation methods were largest at posterior sites.

Figure 2 .
Figure 2. Percentage of non-bleeding sites in the first assessment and bleeding sites in the second assessment according to location in the anterior (incisors and canines) or posterior (premolars and molars) regions.There were statistically significant differences in the frequency of scores between posterior and anterior sites for all sequences (p < 0.05, McNemar's test).

Figure 3 .
Figure 3. Percentage of bleeding sites in the first assessment and non-bleeding sites in the second assessment according to location in the anterior (incisors and canines) or posterior (premolars and molars) regions.There were statistically significant differences in the frequency of scores between posterior and anterior sites for all sequences (p < 0.05, McNemar's test), except for TF-GBI (p = 0.678).0 5 10 15 20 25 30 35 40 45

Table 1 .
Subject characteristics across the five evaluation protocols.

Table 2 .
Values of sensitivity (S), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), and accuracy (A) when considering GBI as the gold standard.