Reliability of radiographic parameters in adenoid evaluation

The assessment of adenoids by x-ray imaging has been the topic of heated debate, but few studies have looked into the reliability of most existing radiographic parameters. Objective: This study aims to verify the intra-examiner and inter-examiner reproducibility of the adenoid radiographic assessment methods. Materials and Methods: This is a cross-sectional case series study. Forty children of both genders aged between 4 and 14 were enrolled. They were selected based on complaints of nasal obstruction or mouth breathing and suspicion of pharyngeal tonsil hypertrophy. Cavum x-rays and orthodontic teleradiographs were assessed by two examiners in quantitative and categorical terms. Results: All quantitative parameters in both x-ray modes showed excellent intra and inter-examiner reproducibility. Relatively better performance was observed in categorical parameters used in cavum x-ray assessment by C-Kurien, C-Wang, C-Fujioka, and C-Elwany over C-Cohen and C-Ysunza. As for orthodontic teleradiograph grading systems, C-McNamara has been proven to be more reliable than C-Holmberg. Conclusion: Most instruments showed adequate reproducibility levels. However, more research is needed to properly determine the accuracy and viability of each method.


INTRODUCTION
The assessment of pharyngeal tonsil hypertrophy by lateral x-ray images of the skull has been the target of debate for years [1][2][3][4] . Nevertheless, opinions on the usefulness of these images still vary significantly.
These differences of opinions are, among other factors, the outcome of the lack of studies simultaneously looking into a considerable number of parameters, of the diversity seen in the studied samples, and of the application of various methods, some of which questionable 5 . Among these shortcomings is the frequent absence of reliability tests for most radiographic parameters 5 .
Reproducibility is an essential requirement to determine the quality of any assessment parameter. Therefore, this study was developed with the purpose of verifying the intra and inter-examiner reproducibility of a series radiographic parameters used to assess the pharyngeal tonsil and the nasal pharyngeal airway.

MATERIALS AND METHODS
This cross-sectional study was approved by the Research Ethics Committee of the institution in which it was carried out and given permit nº 0181/08).

The sample
Forty children (n = 40) of both genders with ages ranging between 4 and 14 years were selected at the Pediatric ENT Ward of the institution in which the study was carried out. The enrolled patients shared complaints of nasal obstruction and/ or mouth breathing, and were suspected for pharyngeal tonsil hypertrophy. Syndromic children, patients with malformations, individuals with acute respiratory tract infection at the time of examination, and subjects with a history of adenoidectomy were excluded. The guardians of the children enrolled in the study formalized their participation by signing an informed consent term as per the requirements of the Research Ethics Committee of the institution in which the study was carried out.

Cavum x-rays
One radiologist took cavum x-rays of the selected children at a specialized center. All x-ray images were made on the same apparatus at a focus-film distance of 140 cm and exposure factors of 70 kV, 12 mA for 0.40 to 0.64 seconds. Patients were positioned in a standing position in a way that the horizontal plane of Frankfurt was parallel to the floor and the central beam of x-rays were directed to the nasopharynx. The children were advised to breathe through their noses keeping their mouths closed and teeth occluded as x-ray images were taken. x-ray film used was Kodak ® 20 cm x 25 cm which after exposure was developed automatically according to the standard method. Images showing elevated soft palates or significant rotation of the head were discarded and the respective subjects removed from the sample.

Lateral orthodontic teleradiography (TR)
TR images were captured by the same operator. The same exposure, patient positioning, and patient orientation used in cavum x-rays were used in TR. This turn, however, a device called cephalostat was used to ensure proper reproducible patient head positioning as x-ray images were produced. The central x-ray beam was directed towards the external acoustic meatus. Film, development method, and other exclusion criteria were the same as used in cavum x-rays.
Each radiographic image (cavum x-rays and TR) was given a number to mask patient and to prevent examiners from knowing the subjects' respiratory symptoms and initial complaints. Two independent examiners looked at the tracings of anatomic structures and assessed the images. The independent examiners were not involved in patient enrollment or patient examination. The main examiner (Examiner 1) performed radiographic measurements (Charts 1 and 2; Figures 1 and 2) twice at different times with a 30-day interval between them, to allow for truly independent assessment.
Tracings and further measurements were made on Ultraphan paper towels with the aid of a negastocope, ruler, square, and a Starret™ (model 799A-8/200) digital caliper with 0.01 mm divisions. Area calculations (Npaa 6 ); (Ad/Nf 7 ) were carried out with the aid of software program ImageJ available for download at http://rsbweb.nih.gov/ij/download.html after the cephalometric tracings had been scanned.

Analysis methods
The reliability of radiographic methods was determined by the analysis of intra and inter-examiner reproducibility. Reproducibility of quantitative radiographic variables was measured in terms of the interclass correlation coefficient (ICC) and the mean differences between pairs of observations. Reliability analysis of categorical radiographic variables was per-

Reference Study
Assessment Method

Jóhannesson 8
Pharyngeal tonsil thickness (PT) (mm): distance measured along a perpendicular line until the superior bone border of the nasopharynx from the pharyngeal tubercle to the convexity of the pharyngeal tonsil ( Figure 1A).

Fujioka et al. 9
Adenoid/Nasopharynx ratio (A/N): ratio between the thicknesses of the adenoid (A) and the nasopharynx (N), being A the distance along a line perpendicular to the straight portion of the anterior border of the basioccipital bone and the point of greatest convexity in the pharyngeal tonsil; and N as the distance between the posterior and superior portion of the hard palate and the anterior border of the spheno-occipital synchondrosis ( Figure 1B).

Crepeau et al. 10
Antral adenoid (AA) (mm): shortest distance between the most anterior portion of the pharyngeal border and the posterior wall of the maxillary antrum located on the same plane as the choana ( Figure 1C).

Maw et al. 11
Passage of air (PA) (mm): shortest distance between the pharyngeal tonsil convexity and soft palate ( Figure 1C).

Cohen & Konak 12
Air column (AC) (mm): distance between the posterior border of the soft palate 10 mm away from the posterior nasal spine and the anterior curvature of the pharyngeal tonsil border ( Figure 1D).
Air column/soft palate ratio (AC/SfP): ratio between AC (see description above) and SfP, the latter being the thickness of the soft palate measured 10 mm away from the posterior nasal spine ( Figure 1D).
Mlynarek et al. 2 Airway occlusion (AWO) (%): percent relationship between PT (see description above) and NF, the latter being the distance measured along a line perpendicular to the superior bone border of the nasopharynx from the pharyngeal tubercle to the soft palate. ( Figure 1A).

Chart 2.
Teleradiography assessment methods and their respective references.

Schulhof 14
PtV-Ad (mm): the shortest distance between the adenoid border and the PtV (5mm above the posterior nasal spine nasal posterior) ( Figure 2C).
McNamara Jr. 16 Superior pharynx (SP) (mm): shortest distance from a point on the superior border of the soft palate and a point on the border of the pharyngeal tonsil ( Figure 1D).

RESULTS
Eleven patients refused to participate in the study. One patient was excluded for inconclusive x-ray images.
The level of statistical significance established for statistical tests was 5% (α ≤ 0.05). Statistical analysis was done using software program SPSS 10.0 for Windows. Table 1. Interclass correlation coefficient (ICC) of the quantitative cavum x-ray parameters in relation to the first and second measurements done by Examiner 1 (intra-examiner analysis) and to the measurements done by examiners 1 and 2 (inter--examiner analysis).

Intra-examiner
Inter-examiner Clinically insignificant variations were also observed when comparing measurements done by the same examiner in two occasions or by two examiners (Tables 3 and 4). Table 3. Differences between paired observations for quantitative cavum x-ray parameters in relation to the first and second measurements done by Examiner 1 (intra-examiner analysis) and to the measurements done by examiners 1 and 2 (inter-examiner analysis).

Intra-examiner
Inter-examiner  Table 4. Differences between paired observations for quantitative teleradiography parameters in relation to the first and second measurements done by Examiner 1 (intra-examiner analysis) and to the measurements done by examiners 1 and 2 (inter-examiner analysis).
C-Wang had "nearly perfect" agreement levels in intra-examiner agreement and "substantial" agreement in inter-examiner analysis. Agreement percentages were 95.0% and 90.0% respectively ( Table 5).
C-Cohen had "moderate" performance based on the obtained kappa indices. Agreement rates mounted to 75.0% for both intra and inter-examiner analyses ( Table 5).
Additionally to "moderate" agreement in the intra-examiner analysis, C-Ysunza was rater "fair" when looking at different examiners. Percentages of correct answers were 65.0% on intra-examiner analysis and 42.5% on inter-examiner analysis ( Table 5).
C-McNamara had "nearly perfect" agreement in the kappa coefficient for intra and inter-examiner performance ( Table 6). The rate of agreement was 97.5% between observations and 95.0% between different examiners.

Cavum x-rays
Quantitative variables had excellent reproducibility among examiners. Previous studies reported similar results for A/N 13,19 , PA 19 e AA 19 . Other quantitative pa- Table 5. Kappa (k) coefficient of categorical cavum x-ray parameters in relation to the first and second measurements done by Examiner 1 (intra-examiner analysis) and to the measurements done by examiners 1 and 2 (inter-examiner analysis). Agreements in bold type. Table 5. Table 6. Kappa (k) coefficient of categorical teleradiography parameters in relation to the first and second measurements done by Examiner 1 (intra-examiner analysis) and to the measurements done by examiners 1 and 2 (inter-examiner analysis).

Continuation
Intra-examiner rameters (PT, AC, AC/SfP, AWO), although not investigated previously, were also in agreement with the data of this study and presented excellent inter-examiner reliability. The results for intra-examiner performance seen in this study showed for the first time excellent rates of reproducibility for all investigated instruments. Therefore, quantitative parameters may be reliably used researchers and physicians specialized in this area.
However, less consistency was observed in relation to categorical cavum x-ray variables. In this case, various reproducibility rates were observed, ranging from fair to nearly perfect.
Instrument C-Kurien outperformed all other tested categorization systems. The excellent rates of reproducibility connected to the presence of reliable objective categorization criteria (PA) grant this instrument outstanding levels of reliability.
C-Wang also had satisfactory levels of reproducibility, even when submitted to the subjective impressions of examiners. Its performance may be related to the fact that examiners tend to systematically categorize doubtful cases as "non-obvious" hypertrophy. Therefore, albeit reliable, this assessment instrument should be used carefully by examiners.
Satisfactory levels of reproducibility were also observed for C-Fujioka and C-Elwany, whose categorization criteria are based on the A/N value. These instruments should be used in cases in which the characterization of the nasopharyngeal airway needs to be done in a simplified (dichotomic categories) and objective manner.
Despite the moderate levels of intra-examiner reliability, C-Cohen was rated as a reproducible system by Souki 20 . Kolo et al. 21 as high agreement rates were reported between an ENT and a radiologist (k = 0.8182; agreement rate of 82.35%). However, when agreement was verified between two ENT physicians, more modest performance was observed (k = 0.6696; agreement rate of 74.51%) 21 , and closer to the reproducibility rates observed in our study.
Lower levels of performance on categorization parameters was observed in instrument C-Ysunza.
Other studies reported inter-examiner agreement rates ranging between 77.5% 11 and 90.0% of the assessments 4 ; agreement rates seen in our study were lower. According to Maw et al. 11 , this type of assessment is highly dependent on examiner experience; the assessments on Ysunza et al. 4 were performed by experienced personnel. This instrument requires experienced examiners. Therefore, training is needed before the C-Ysunza instrument is used, despite the substantial levels of agreement seen in intra-examiner analysis.

Teleradiography
According to the data collected, all investigated quantitative parameters had excellent intra-examiner reproducibility. These findings are in agreement with other studies 20,[22][23][24] in which statistically significant intra-examiner variations and clinically insignificant differences were found. Although the literature on orthodontics has found parameters Npaa 20 , Pm-ad 1 21,24 , Pm-ad 2 22,23 , ad 1 -Ba 22,24 , ad 2 -S 0 22,23 , Pm-Ba 22,24 , e SP 20,24 to have satisfactory intra-examiner reliability, other variables such as PtV-Ad and Ad/NP were also proven to offer sufficient intra-examiner reproducibility.
No studies in the literature have verified the inter-examiner reproducibility of these radiological variables. However, the results of this study suggest they offer satisfactory agreement between examiners. Our findings have confirmed the reliability of quantitative methods, and their appropriateness for practical use.
When looking at the reproducibility of categorization systems, this study found excellent agreement rates intra and inter-examiners using C-McNamara. However, C-Holmberg -a system based on subjective examiner impressions -was not as well rated as C-Mc-Namara, specifically on inter-examiner reproducibility.
Paradise et al. 25 , using a categorization system similar to C-Holmberg, found excellent rates of reproducibility (intra-examiner: k = 0.89; inter-examiner: k = 0.81). Souki et al. 20 studied the intra-examiner reproducibility rates for the same parameter and did not find statistically significant differences between the intra-examiner paired mean values. Our study also revealed a considerable agreement rate for intra-examiner analyses. Even so, the authors of this study recommend that C-McNamara be given preference. The absence of defined criteria and objectives in C-Holmberg, the excessive number of categories, and the lower rates of inter-examiner agreement should be enough justification to use C-McNamara, a simpler, more objective and more reliable categorization system.
Other requirements than reproducibility should be considered when picking a diagnostic method, such as viability and accuracy. That is why further research is required to determine the capacity each parameter analyzed in this study has to represent what they are intended for. The ideal instrument should be reliable, accurate, and practical.

CONCLUSION
Every quantitative parameter measured on cavum x-rays or teleradiography presented excellent reproducibility and clinically irrelevant variation.
The top performers among the categorical parameters observed in cavum x-rays were C-Kurien, C-Wang, C-Fujioka and C-Elwany over C-Cohen and C-Ysunza.
C-McNamara outperformed C-Holmberg in reproducibility among teleradiography-based categorization systems.