Reliability of radiographic parameters in adenoid evaluation

Feres, Murilo Fernando Neuppmann; Sousa, Helder Inocêncio Paulo de; Francisco, Sheila Márcia; Pignatari, Shirley Shizue Nagata

doi:10.1590/S1808-86942012000400016

Abstracts

The assessment of adenoids by x-ray imaging has been the topic of heated debate, but few studies have looked into the reliability of most existing radiographic parameters. OBJECTIVE: This study aims to verify the intra-examiner and inter-examiner reproducibility of the adenoid radiographic assessment methods. MATERIALS AND METHODS: This is a cross-sectional case series study. Forty children of both genders aged between 4 and 14 were enrolled. They were selected based on complaints of nasal obstruction or mouth breathing and suspicion of pharyngeal tonsil hypertrophy. Cavum x-rays and orthodontic teleradiographs were assessed by two examiners in quantitative and categorical terms. RESULTS: All quantitative parameters in both x-ray modes showed excellent intra and inter-examiner reproducibility. Relatively better performance was observed in categorical parameters used in cavum x-ray assessment by C-Kurien, C-Wang, C-Fujioka, and C-Elwany over C-Cohen and C-Ysunza. As for orthodontic teleradiograph grading systems, C-McNamara has been proven to be more reliable than C-Holmberg. CONCLUSION: Most instruments showed adequate reproducibility levels. However, more research is needed to properly determine the accuracy and viability of each method.

adenoids; reproducibility of results; x-rays

Embora a avaliação radiográfica da hipertrofia de tonsila faríngea tenha sido constantemente debatida, há ainda carência de estudos que testem a confiabilidade da maioria dos parâmetros radiográficos existentes. OBJETIVO: Verificar a reprodutibilidade intra e interexaminadores de vários métodos destinados à avaliação da tonsila faríngea. Forma de estudo: Estudo de série, metodológico e transversal. MATERIAL E MÉTODO: Quarenta crianças de ambos os sexos, de 4 a 14 anos, foram selecionadas mediante apresentação de queixas de obstrução nasal ou respiração oral, com suspeita de diagnóstico de hipertrofia de tonsila faríngea. Radiografias do cavum faríngeo e telerradiografias ortodônticas foram obtidas e, posteriormente, avaliadas por dois examinadores por meio de instrumentos de avaliação quantitativos e categóricos. RESULTADOS: Todos os parâmetros quantitativos de ambas as modalidades radiográficas apresentaram excelente reprodutibilidade intra e interexaminadores. Dentre os parâmetros categóricos de avaliação da radiografia de cavum, observou-se desempenho relativamente melhor de C-Kurien, C-Wang, C-Fujioka e C-Elwany sobre C-Cohen e C-Ysunza. Em relação aos sistemas destinados à classificação da telerradiografia, C-McNamara apresentou maior reprodutibilidade que C-Holmberg. CONCLUSÃO: A maioria dos instrumentos apresentou reprodutibilidade adequada. No entanto, novas investigações ainda devem ser realizadas com o intuito de determinar a capacidade de cada parâmetro em relação sua acurácia e viabilidade.

raios x; reprodutibilidade dos testes; tonsila faríngea

ORIGINAL ARTICLE

^IMSc (Doctoral student in the Otorhinolaryngology and Head and Neck Surgery Graduate Program of the University of São Paulo)

^IIOrthodontist (private practice)

^IIIPhD (Professor and Head of the Pediatric Otorhinolaryngology Course at the Federal University of São Paulo)

Send correspondence to

ABSTRACT

The assessment of adenoids by x-ray imaging has been the topic of heated debate, but few studies have looked into the reliability of most existing radiographic parameters.

OBJECTIVE: This study aims to verify the intra-examiner and inter-examiner reproducibility of the adenoid radiographic assessment methods.

MATERIALS AND METHODS: This is a cross-sectional case series study. Forty children of both genders aged between 4 and 14 were enrolled. They were selected based on complaints of nasal obstruction or mouth breathing and suspicion of pharyngeal tonsil hypertrophy. Cavum x-rays and orthodontic teleradiographs were assessed by two examiners in quantitative and categorical terms.

RESULTS: All quantitative parameters in both x-ray modes showed excellent intra and inter-examiner reproducibility. Relatively better performance was observed in categorical parameters used in cavum x-ray assessment by C-Kurien, C-Wang, C-Fujioka, and C-Elwany over C-Cohen and C-Ysunza. As for orthodontic teleradiograph grading systems, C-McNamara has been proven to be more reliable than C-Holmberg.

CONCLUSION: Most instruments showed adequate reproducibility levels. However, more research is needed to properly determine the accuracy and viability of each method.

Keywords: adenoids, reproducibility of results, x-rays.

INTRODUCTION

The assessment of pharyngeal tonsil hypertrophy by lateral x-ray images of the skull has been the target of debate for years^1-4. Nevertheless, opinions on the usefulness of these images still vary significantly.

These differences of opinions are, among other factors, the outcome of the lack of studies simultaneously looking into a considerable number of parameters, of the diversity seen in the studied samples, and of the application of various methods, some of which questionable⁵. Among these shortcomings is the frequent absence of reliability tests for most radiographic parameters⁵.

Reproducibility is an essential requirement to determine the quality of any assessment parameter. Therefore, this study was developed with the purpose of verifying the intra and inter-examiner reproducibility of a series radiographic parameters used to assess the pharyngeal tonsil and the nasal pharyngeal airway.

MATERIALS AND METHODS

This cross-sectional study was approved by the Research Ethics Committee of the institution in which it was carried out and given permit nº 0181/08).

The sample

Forty children (n = 40) of both genders with ages ranging between 4 and 14 years were selected at the Pediatric ENT Ward of the institution in which the study was carried out. The enrolled patients shared complaints of nasal obstruction and/or mouth breathing, and were suspected for pharyngeal tonsil hypertrophy. Syndromic children, patients with malformations, individuals with acute respiratory tract infection at the time of examination, and subjects with a history of adenoidectomy were excluded. The guardians of the children enrolled in the study formalized their participation by signing an informed consent term as per the requirements of the Research Ethics Committee of the institution in which the study was carried out.

Methods

Cavum x-rays

One radiologist took cavum x-rays of the selected children at a specialized center. All x-ray images were made on the same apparatus at a focus-film distance of 140 cm and exposure factors of 70 kV, 12 mA for 0.40 to 0.64 seconds. Patients were positioned in a standing position in a way that the horizontal plane of Frankfurt was parallel to the floor and the central beam of x-rays were directed to the nasopharynx. The children were advised to breathe through their noses keeping their mouths closed and teeth occluded as x-ray images were taken. x-ray film used was Kodak^® 20 cm x 25 cm which after exposure was developed automatically according to the standard method. Images showing elevated soft palates or significant rotation of the head were discarded and the respective subjects removed from the sample.

Lateral orthodontic teleradiography (TR)

TR images were captured by the same operator. The same exposure, patient positioning, and patient orientation used in cavum x-rays were used in TR. This turn, however, a device called cephalostat was used to ensure proper reproducible patient head positioning as x-ray images were produced. The central x-ray beam was directed towards the external acoustic meatus. Film, development method, and other exclusion criteria were the same as used in cavum x-rays.

Each radiographic image (cavum x-rays and TR) was given a number to mask patient and to prevent examiners from knowing the subjects' respiratory symptoms and initial complaints. Two independent examiners looked at the tracings of anatomic structures and assessed the images. The independent examiners were not involved in patient enrollment or patient examination. The main examiner (Examiner 1) performed radiographic measurements (Charts 1 and 2; Figures 1 and 2) twice at different times with a 30-day interval between them, to allow for truly independent assessment.

Tracings and further measurements were made on Ultraphan paper towels with the aid of a negastocope, ruler, square, and a Starret^TM (model 799A- 8/200) digital caliper with 0.01 mm divisions. Area calculations (Npaa⁶); (Ad/Nf ⁷) were carried out with the aid of software program ImageJ available for download at http://rsbweb.nih.gov/ij/download.html after the cephalometric tracings had been scanned.

Analysis methods

The reliability of radiographic methods was determined by the analysis of intra and inter-examiner reproducibility. Reproducibility of quantitative radiographic variables was measured in terms of the interclass correlation coefficient (ICC) and the mean differences between pairs of observations. Reliability analysis of categorical radiographic variables was performed by calculating the kappa (k) coefficient and the overall agreement percentage between paired observations, including the occurrence of random agreement. ICC was interpreted according to Weir et al.¹⁷, wherein reliability was categorized as "low" (CCI < 0.20), "fair" (0.20 < CCI < 0.40), "good" (0.40 < CCI < 0.60), "very good" (0.60 < CCI < 0.80) or "excellent" (0.80 < CCI < 1.00). The value of the kappa coefficient was interpreted based on the criteria designed by Landis & Koch¹⁸, in which reliability was rated "low" (k < 0.20), "fair" (0.20 < k < 0.40), "moderate" (0.40 < k < 0.60), "substantial" (0.60 < k < 0.80) or "nearly perfect" (0.80 < k < 1.00).

The level of statistical significance established for statistical tests was 5% (α < 0.05). Statistical analysis was done using software program SPSS 10.0 for Windows.

RESULTS

Eleven patients refused to participate in the study. One patient was excluded for inconclusive x-ray images.

Forty subjects were enrolled in this study, twenty (50.0%) females and twenty (50.0%) males. Patient mean age was 9.5 years (4.1-14.3 years; standard deviation of 2.4 years). All included patients were suspected for pharyngeal tonsil hypertrophy (40/40, 100.0%). Most of them complained of mixed breathing (19/40; 47.5%) or mouth breathing alone (17/40; 42.5%).

Every cavum x-ray (Table 1) and teleradiography (Table 2) quantitative parameter was rated as excellent for both intra and inter-examiner reproducibility.

Thumbnail

Clinically insignificant variations were also observed when comparing measurements done by the same examiner in two occasions or by two examiners (Tables 3 and 4).

Thumbnail

In cavum x-ray categorical variables, C-Kurien had "nearly perfect" agreement in intra and inter-examiner analysis. Great agreement percentages were also found in intra (90.0%) and inter-examiner (92.5%) analysis (Table 5).

Thumbnail

C-Wang had "nearly perfect" agreement levels in intra-examiner agreement and "substantial" agreement in inter-examiner analysis. Agreement percentages were 95.0% and 90.0% respectively (Table 5).

C-Fujoka and C-Elwany had "substantial" kappa agreement for both analyses. Different measurements (C-Fujioka: 95.0%; C-Elwany: 90.0%) or examiners (C-Fujioka: 95.0%; C-Elwany: 92.5%) had agreement in a significant portion of the assessments (Table 5).

C-Cohen had "moderate" performance based on the obtained kappa indices. Agreement rates mounted to 75.0% for both intra and inter-examiner analyses (Table 5).

Additionally to "moderate" agreement in the intra-examiner analysis, C-Ysunza was rater "fair" when looking at different examiners. Percentages of correct answers were 65.0% on intra-examiner analysis and 42.5% on inter-examiner analysis (Table 5).

C-McNamara had "nearly perfect" agreement in the kappa coefficient for intra and inter-examiner performance (Table 6). The rate of agreement was 97.5% between observations and 95.0% between different examiners.

Thumbnail

C-Holmberg had "substantial" agreement in intra-examiner performance and "moderate" agreement for inter-examiner performance (Table 6). This parameter had the following agreement percentages - intra-examiner: 80.0%; inter-examiner: 57.5%.

DISCUSSION

Cavum x-rays

Quantitative variables had excellent reproducibility among examiners. Previous studies reported similar results for A/N^13,19, PA¹⁹ e AA¹⁹. Other quantitative parameters (PT, AC, AC/SfP, AWO), although not investigated previously, were also in agreement with the data of this study and presented excellent inter-examiner reliability. The results for intra-examiner performance seen in this study showed for the first time excellent rates of reproducibility for all investigated instruments. Therefore, quantitative parameters may be reliably used researchers and physicians specialized in this area.

However, less consistency was observed in relation to categorical cavum x-ray variables. In this case, various reproducibility rates were observed, ranging from fair to nearly perfect.

Instrument C-Kurien outperformed all other tested categorization systems. The excellent rates of reproducibility connected to the presence of reliable objective categorization criteria (PA) grant this instrument outstanding levels of reliability.

C-Wang also had satisfactory levels of reproducibility, even when submitted to the subjective impressions of examiners. Its performance may be related to the fact that examiners tend to systematically categorize doubtful cases as "non-obvious" hypertrophy. Therefore, albeit reliable, this assessment instrument should be used carefully by examiners.

Satisfactory levels of reproducibility were also observed for C-Fujioka and C-Elwany, whose categorization criteria are based on the A/N value. These instruments should be used in cases in which the characterization of the nasopharyngeal airway needs to be done in a simplified (dichotomic categories) and objective manner.

Despite the moderate levels of intra-examiner reliability, C-Cohen was rated as a reproducible system by Souki²⁰. Kolo et al.²¹ as high agreement rates were reported between an ENT and a radiologist (k = 0.8182; agreement rate of 82.35%). However, when agreement was verified between two ENT physicians, more modest performance was observed (k = 0.6696; agreement rate of 74.51%)²¹, and closer to the reproducibility rates observed in our study.

Lower levels of performance on categorization parameters was observed in instrument C-Ysunza. Other studies reported inter-examiner agreement rates ranging between 77.5%¹¹ and 90.0% of the assessments⁴; agreement rates seen in our study were lower. According to Maw et al.¹¹, this type of assessment is highly dependent on examiner experience; the assessments on Ysunza et al.⁴ were performed by experienced personnel. This instrument requires experienced examiners. Therefore, training is needed before the C-Ysunza instrument is used, despite the substantial levels of agreement seen in intra-examiner analysis.

Teleradiography

According to the data collected, all investigated quantitative parameters had excellent intra-examiner reproducibility. These findings are in agreement with other studies^20,22-24 in which statistically significant intra-examiner variations and clinically insignificant differences were found. Although the literature on orthodontics has found parameters Npaa²⁰, Pm-ad₁^21,24, Pm-ad₂^22,23, ad₁-Ba^22,24, ad₂-S₀^22,23, Pm-Ba^22,24, e SP^20,24 to have satisfactory intra-examiner reliability, other variables such as PtV-Ad and Ad/NP were also proven to offer sufficient intra-examiner reproducibility.

No studies in the literature have verified the inter-examiner reproducibility of these radiological variables. However, the results of this study suggest they offer satisfactory agreement between examiners. Our findings have confirmed the reliability of quantitative methods, and their appropriateness for practical use.

When looking at the reproducibility of categorization systems, this study found excellent agreement rates intra and inter-examiners using C-McNamara. However, C-Holmberg - a system based on subjective examiner impressions - was not as well rated as C-McNamara, specifically on inter-examiner reproducibility.

Paradise et al.²⁵, using a categorization system similar to C-Holmberg, found excellent rates of reproducibility (intra-examiner: k = 0.89; inter-examiner: k = 0.81). Souki et al.²⁰ studied the intra-examiner reproducibility rates for the same parameter and did not find statistically significant differences between the intra-examiner paired mean values. Our study also revealed a considerable agreement rate for intra-examiner analyses. Even so, the authors of this study recommend that C-McNamara be given preference. The absence of defined criteria and objectives in C-Holmberg, the excessive number of categories, and the lower rates of inter-examiner agreement should be enough justification to use C-McNamara, a simpler, more objective and more reliable categorization system.

Other requirements than reproducibility should be considered when picking a diagnostic method, such as viability and accuracy. That is why further research is required to determine the capacity each parameter analyzed in this study has to represent what they are intended for. The ideal instrument should be reliable, accurate, and practical.

CONCLUSION

Every quantitative parameter measured on cavum x-rays or teleradiography presented excellent reproducibility and clinically irrelevant variation.

The top performers among the categorical parameters observed in cavum x-rays were C-Kurien, C-Wang, C-Fujioka and C-Elwany over C-Cohen and C-Ysunza.

C-McNamara outperformed C-Holmberg in reproducibility among teleradiography-based categorization systems.

REFERENCES

1. Wang DY, Bernheim N, Kaufman L, Clement P. Assessment of adenoid size in children by fibreoptic examination. Clin Otolaryngol Allied Sci. 1997;22(2):172-7.
2. Mlynarek A, Tewfik MA, Hagr A, Manoukian JJ, Schloss MD, Tewfik TL, et al. Lateral neck radiography versus direct video rhinoscopy in assessing adenoid size. J Otolaryngol. 2004;33(6):360-5.
3. Kurien M, Lepcha A, Mathew J, Ali A, Jeyaseelan L. X-rays in the evaluation of adenoid hypertrophy: It's role in the endoscopic era. Indian J Otolaryngol Head Neck Surg. 2005;57(1):45-7.
4. Ysunza A, Pamplona MC, Ortega JM, Prado H. Video fluoroscopy for evaluating adenoid hypertrophy in children. Int J Pediatr Otorhinolaryngol. 2008;72(8):1159-65.
5. Feres MF, Hermann JS, Cappellette M Jr, Pignatari SS. Lateral X-ray view of the skull for the diagnosis of adenoid hypertrophy: a systematic review. Int J Pediatr Otorhinolaryngol. 2011;75(1):1-11.
6. Handelman CS, Osborne G. Growth of the nasopharynx and adenoid development from one to eighteen years. Angle Orthod. 1976;46(3):243-59.
7. Linder-Aronson S, Leighton BC. A longitudinal study of the development of the posterior nasopharyngeal wall between 3 and 16 years of age. Eur J Orthod. 1983;5(1):47-58.
8. Jóhannesson S. Roentgenologic investigation of the nasopharyngeal tonsil in children of different ages. Acta Radiol Diagn (Stockh). 1968;7(4):299-304.
9. Fujioka M, Young LW, Girdany BR. Radiographic evaluation of adenoidal size in children: adenoidal-nasopharyngeal ratio. AJR Am J Roentgenol. 1979;133(3):401-4.
10. Crepeau J, Patriquin HB, Poliquin JF, Tetreault L. Radiographic evaluation of the symptom-producing adenoid. Otolaryngol Head Neck Surg. 1982;90(5):548-54.
11. Maw AR, Jeans WD, Fernando DC. Inter-observer variability in the clinical and radiological assessment of adenoid size, and the correlation with adenoid volume. Clin Otolaryngol Allied Sci. 1981;6(5):317-22.
12. Cohen D, Konak S. The evaluation of radiographs of the nasopharynx. Clin Otolaryngol Allied Sci. 1985;10(2):73-8.
13. Elwany S. The adenoidal-nasopharyngeal ratio (AN ratio). Its validity in selecting children for adenoidectomy. J Laryngol Otol. 1987;101(6):569-73.
14. Schulhof RJ. Consideration of airway in orthodontics. J Clin Orthod. 1978;12(6):440-4.
15. Holmberg H, Linder-Aronson S. Cephalometric radiographs as a means of evaluating the capacity of the nasal and nasopharyngeal airway. Am J Orthod. 1979;76(5):479-90.
16. McNamara JA Jr. A method of cephalometric evaluation. Am J Orthod. 1984;86(6):449-69.
17. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19(1):231-40.
18. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-74.
19. Jeans WD, Fernando DC, Maw AR. How should adenoidal enlargement be measured? A radiological study based on interobserver agreement. Clin Radiol. 1981;32(3):337-40.
20. Souki MQ. Estudo comparativo da telerradiografia em norma lateral da face e da fibronasoendoscopia na avaliação dos níveis de obstrução adenoidiana em pacientes respiradores bucais. [dissertação de mestrado]. Belo Horizonte: Pontifícia Universidade Católica de Minas Gerais; 2006.
21. Kolo ES, Salisu AD, Tabari AM, Dahilo EA, Aluko AA. Plain radiographic evaluation of the nasopharynx: do raters agree? Int J Pediatr Otorhinolaryngol. 2010;74(5):532-4.
22. Imamura N, Ono T, Hiyama S, Ishiwata Y, Kuroda T. Comparison of the sizes of adenoidal tissues and upper airways of subjects with and without cleft lip and palate. Am J Orthod Dentofacial Orthop. 2002;122(2):189-94.
23. Vilella Bde S, Vilella Ode V, Koch HA. Growth of the nasopharynx and adenoidal development in Brazilian subjects. Braz Oral Res. 2006;20(1):70-5.
24. Martin O, Muelas L, Viñas MJ. Nasopharyngeal cephalometric study of ideal occlusions. Am J Orthod Dentofacial Orthop. 2006;130(4):436e1-9.
25. Paradise JL, Bernard BS, Colborn DK, Janosky JE. Assessment of adenoidal obstruction in children: clinical signs versus roentgenographic findings. Pediatrics. 1998;101(6):979-86.