Open-access User influence on fully automated cephalometric tracing using an artificial intelligence-driven online platform: pilot study

Influência do usuário no traçado cefalométrico totalmente automatizado utilizando uma plataforma online com inteligência artificial: estudo piloto

ABSTRACT

Objective:  The aim of this study was to evaluate the diagnostic reliability of automated tracings compared with the modifications made by the user.

Methods:  This observational analytical cross-sectional study used 44 lateral cephalograms, analyzed in the WebCeph™ software by a trained observer. The angles evaluated were those specified by the Brazilian Board of Orthodontics. The first evaluation was performed by the software’s artificial intelligence, and the results were recorded. In the second evaluation, the user checked the position of the landmarks and modified them if necessary. The results were subjected to statistical analysis.

Results:  There was a statistically significant correlation between the user and AI (p<0.05). The user modified 88.63% of the sample. The most modified landmarks were the nasion, B, U1 border, and L1 border.

Conclusion:  There was technical agreement between AI and user, with the values obtained being equivalent. The AI diagnosis is reliable, but user experience remains fundamental for the final diagnosis.

Indexing terms
Artificial intelligence; Cephalometry; Orthodontics; Radiography; Supervised machine learning

RESUMO

Objetivo:  O objetivo deste estudo foi avaliar a confiabilidade diagnóstica dos traçados automatizados em comparação com as modificações realizadas pelo usuário.

Métodos:  Este estudo analítico observacional transversal utilizou 44 telerradiografias em norma lateral, analisadas no software WebCeph™ por um observador treinado. Os ângulos avaliados foram aqueles especificados pelo Board Brasileiro de Ortodontia e Ortopedia Facial. A primeira avaliação foi realizada pela inteligência artificial do software, e os resultados foram registrados. Na segunda avaliação, o usuário verificou a localização dos pontos e os modificou, se necessário. Os resultados foram submetidos a análise estatística.

Resultados:  Houve uma correlação estatisticamente significativa entre o usuário e a IA (p<0,05). O usuário modificou 88,63% da amostra. Os pontos mais modificados foram o Nasio, B, borda do incisivo superior e borda do incisivo inferior.

Conclusão:  Houve concordância técnica entre a IA e o usuário, com os valores obtidos sendo equivalentes. O diagnóstico da IA é confiável, mas a experiência do usuário continua sendo fundamental para o diagnóstico final.

Termos de indexação
Inteligência Artificial; Cefalometria; Ortodontia; Radiografia; Aprendizado de máquina supervisionado

INTRODUCTION

Cephalometric analysis, a tool for studying craniofacial growth and the changes produced by treatment, has evolved from manual tracing to fully automated analysis. In this process, the operator is only required to prepare and import the radiograph into the program, and the Artificial Intelligence (AI) performs the rest of the procedure [1-3]. AI used in dentistry is based on a subfield known as Machine Learning (ML), utilizing Convolutional Neural Networks and Artificial Neural Networks. These algorithms can function similarly to the human brain and can be also trained to identify patterns [4,5].

This digitization and automation brings advantages in terms of precision, archiving, transmission, layout simultaneity, and optimized work time [1,6,7]. However, there are associated errors, such as the quality of the radiograph, the definition of the cephalometric landmark, the tracing modality, and the user. The user’s experience is essential in determining whether the landmark location and values proposed by the program align with their clinical interpretation, and, if modifications are made, they remain reliable [7,8].

The first step in demonstrating the reliability of any cephalometric technique is the precision in the location of the landmarks, where AI has achieved differences of less than 1.46 mm compared to the human tracings and remains within the clinical tolerance range of 2 mm [9-12]. The second step is to assess the agreement between the AI tracing and the user, which occurs when the user modifies any of the landmarks originally proposed by the AI.

If a high level of agreement is found, it could be evaluated whether they are equivalent, meaning there are no differences within predetermined limits between the automated analysis and that modified by the user. If these three steps are approved, it could be affirmed that the automated tracing produces results equivalent to those made by the human user [13]. For this purpose, the WebCeph™ platform will be used; it is a website that functions as cloud storage and allows for cephalometric analysis using AI-based algorithms to automatically trace in seconds, as well as enabling the user to modify the cephalometric landmarks [14].

It has been reported that there are differences between fully automated and semi-automated cephalometric analyses, which can affect the final diagnosis. These differences may be influenced by the experience of the observer [15].

The aim of this study was to determine the diagnostic reliability of automated cephalometric tracings compared to those modified by the user. The following question was posed: “Are there differences in the cephalometric value results according to the manual modification made by the user?”.

METHODS

Study design

This was an observational analytical cross-sectional study. The aim was to investigate the differences between the fully automated and semi-automated (user- influenced) cephalometric analyses. The study adhered to the guidelines of the STROBE initiative (Strengthening the Reporting of Observational Studies in Epidemiology).

Setting

The images were acquired from the medical records file of José Antonio Páez University [16] after the informed consent form was signed by the guardians. These images were selected from February 2022 to May 2022.

Participants

The radiographs belonged to Venezuelan patients aged between 7 and 12 years (average age of 8.79 years). The inclusion criteria comprised all patients with mixed dentition, regardless of gender, and without a history of orthodontic treatment. Patients with craniofacial growth deformities, poor image quality, or images with motion artifacts were excluded from the study.

Sample size

The required sample size was determined using G*Power 3.1 software, with a target ICC of 0.075 and a 0.25 error margin. The significance level (α) was set at 0.05, with a desired power of 80% to detect meaningful differences. Initially, it was estimated that a sample of 34 lateral cephalometric radiographs would be necessary to detect variations; however, 48 images were selected for analysis. Due to recognition issues by the platform, 4 images were not included, resulting in a final working sample of 44.

Variables

The variables encompassed a comprehensive set of angles and distances essential for documenting orthodontic cases according to the Brazilian Board of Orthodontics (https://bbo.org.br/index) Specifically, these included SNA, SNB, ANB, FMA, U1-SN, SN-GoGn, Y-axis, U1-NA, U1.NA, L1-NB, L1.NB (figure 1). Each of these measures provides information about craniofacial morphology and dental relationships.

Figure 1
Cephalometric traces.

Data sources/measurements

The previously calibrated lateral cephalometric radiographs were uploaded to the WebCeph™ AI-driven online orthodontic and orthognathic platform (free version). An Excel data sheet was provided for the evaluator to record the values generated for the angles and linear measurements by the AI through the platform. Subsequently, the examiner modified the location of those pertinent landmarks based on their judgment and recorded these adjustments in the spreadsheet, along with the generated angles. The entire procedure was conducted on the same laptop, during daily one-hour sessions until all tracings were completed. The observer (EG) is a student in the Oral Radiology master’s degree program at the School of Dentistry of the University of São Paulo, with one year of experience in computer-assisted cephalometric analysis. In this context, the examiner was considered the gold standard, as the reliability of the AI depends on its training and the quality of the data on which it was trained.

Bias

A potential bias of this study could be the lack of comparisons among different users, as this is a pilot study that will include such comparisons in future research. The presence of a single observer may also be considered a bias; however, since the objective is to compare the analyses between the user and AI to determine whether there are differences in the results, there was no need to involve additional observers at this stage.

Quantitative variables

Diagnostic reliability: Existence of agreement and equivalence between the values generated by the automated trace and those modified by the user.

Diagnostic agreement: Degree of agreement between two or more evaluations of the variable studied, resulting in a ratio or an average of the differences, along with the construction of limits of agreement. To achieve clinical agreement, a minimum ratio of 0.75 is proposed, indicating that most values fall within the established limits.

Diagnostic equivalence: Verification that the differences between the values are close to zero and, furthermore, that they fall within the predetermined equivalence limits, making them interchangeable. For this investigation, the limits of 2 mm/2 º was predetermined [18].

Statistical methods

Initially, the total number of radiographs that had modifications made by the user to the landmarks identified by AI were recorded. These radiographs were then statistically described, including their normal distribution. Subsequently, the Intraclass Correlation Coefficient (ICC) and equivalence via a two-tailed test were computed to compare the values of the angles and measurements obtained. The R programming language was used under the RStudio 1.1.463 application.

RESULTS

The sample consisted of 48 radiographs, but 4 files were not recognized by the platform, resulting in a total of 44 usable radiographs. The study participants were Venezuelan patients aged between 7 and 12 years, with an average age of 8.79 years. The images were selected based on specific inclusion and exclusion criteria.

Of the 44 radiographs traced, the user modified 39 of them, representing 88.63% of the sample, which were traced at an average speed of 8 traces per hour. The frequency of the modified landmarks is presented in table 1. Consequently, the rest of the analysis will be based on this subgroup, with the averages and normality of the produced values shown in table 2.

Table 1
Distribution of landmarks modified by the user.
Table 2
Descriptive characteristics of user-modified radiographs (n=39).

Of all the angles and distances evaluated, only the Y-axis showed an abnormal distribution of values for both the automated analysis and the user-modified analysis. Consequently, it was excluded for the evaluation of both agreement and equivalence.

The ICC indicates agreement for all the evaluated values, except for the U1-NA dental aspects, both in distance and angle, as well as the L1-NB angle. However, verification using the 95% confidence interval would only accept SNB, ANB, and SN-GoGn (table 3).

Table 3
Intraclass Correlation Coefficient and Two-Tailed Equivalence Test.

Finally, the equivalence test (figure 2) reported average differences of less than ±2 mm/º for all values. Nevertheless, like the ICC, a second verification using a 90% confidence interval would only accept the ANB, and the distances of the incisors to the NA and NB planes, respectively.

Figure 2
Equivalence of the angles and measures obtained.

DISCUSSION

Based on the findings of this study, only the Y-axis showed an abnormal distribution of values, so it was excluded from the evaluation. The frequency of modified landmarks was 93.

The most frequently modified landmarks in this tracing modality were Na, B, U1 border, and L1 border. These findings may be related to the ICC results, which were lower than 0.75 for the U1-NA dental aspects, both in distance and angle, as well as for the L1-NB angle. Mahto et al. [19] studied the differences between manual and WebCeph™ analyses, reporting all ICC values above this number. Although they did not focus on landmark identification, they emphasize that errors in landmark identification could affect the results of cephalometric measurements.

Based on the averages, all the evaluated values are reliable; however, the confidence intervals substantially reduce the replaceable values. The ICC values for the SNA angle reported by Kunz et al. [20], Alqahtani et al. [21] and Prince et al. [22] showed almost perfect agreement (0.915, 0.992, and 0.97, respectively), while Mahto et al. [19] also reported good agreement (0.87). In contrast, this study found substantial agreement for the SNA angle (0.76), like Duran et al. [23] when comparing Dolphin® and WebCeph™. It is important to consider the influence of operator manipulation, as landmark N was modified repeatedly, which could generate variability in the obtained results.

The equivalence in linear dental measurements compared to angular values confirms the classical findings that angles are more susceptible to error. Regarding the working time using WebCeph™, the average duration was 7.5 minutes per trace, including manual correction by the evaluator when it was necessary to modify the landmark location. Meriç et al. [24] report in their research that CephX analysis is 13 times faster than manual tracing and 3 times faster than Dolphin® and CephNinja tracing. Nevertheless, Kunz et al. [25] concluded that it should be used only under supervision.

The results of this study determined that there is diagnostic reliability of the AI tracings compared to those modified by the user, consistent with other studies [12,19,26] that evaluated orthodontists versus a trained AI and showed considerably similar accuracies for landmark location. This suggests that, in the future, this method could potentially be more reliable. However, Rodrigues et al. [5] suggest that users should understand how AI works and question its validity to avoid unfavorable outcomes.

Meanwhile, AI-based tracing is still considered as a complementary tool for clinicians due to its limited accuracy [23,27,28]. Nevertheless, the reliability findings of this study suggest that future research should expand the sample size and involve additional observers, such as orthodontists and oral radiologists, to determine the validity of the instrument, promoting its use within academic training programs. This approach aims to simplify the diagnostic process, encourage comprehensive training, adapt to changes, and facilitate the creation of related population databases. The free version of WebCeph™ offers basic tools for cephalometric analyses, while its paid version includes additional features and is free of ads. The decision to pay for a membership depends on each user, which can be seen as an advantage of this platform. Future research could include a module to detect spatial location using coordinates, facilitating the identification of dispersion patterns in the precision of cephalometric landmarks.

A potential limitation of the study arises from the inability to complete all tracings due to certain radiographs not being recognized by the platform. This underscores the significant impact of image quality on the ability of artificial intelligence capability to accurately identify anatomical structures. Consequently, the reliability and effectiveness of the automated cephalometric analysis could be compromised when dealing with radiographs of inadequate quality. This limitation highlights the importance of ensuring high-quality images to optimize the performance of AI-based diagnostic tools in orthodontics.

CONCLUSION

This study revealed strong technical agreement across most variables, except for U1-NA (distance and angle) and L1-NB (angle). All values were found to be equivalent. It can be concluded that there is diagnostic reliability of the automated cephalometric tracings when compared to those modified by the user in this study.

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) – Finance Code 001. It is important to acknowledge CAPES for their support, as their funding contributed to the execution of the study and the dissemination of its findings.

  • How to cite this article
    Lopez EAG, Salgado DMRA, Moreira LMYA, Rotundo OM, Costa C. User influence on fully automated cephalometric tracing using an artificial intelligence-driven online platform: pilot study. RGO, Rev Gaúch Odontol. 2025;73:e20250018. http://dx.doi.org/10.1590/1981-86372025001820230111

REFERENCES

  • 1 Albarakati SF, Kula KS, Ghoneima AA. The reliability and reproducibility of cephalometric measurements: a comparison of conventional and digital methods. Dentomaxillofac Radiol. 2012;41(1):11-7. https://doi.org/10.1259/dmfr/37010910
    » https://doi.org/10.1259/dmfr/37010910
  • 2 Goracci C, Ferrari M. Reproducibility of measurements in tablet-assisted, PC-aided, and manual cephalometric analysis. Angle Orthod. 2014;84(3):437-42. https://doi.org/10.2319/061513-451.1
    » https://doi.org/10.2319/061513-451.1
  • 3 Akdeniz S, Tosun ME. A review of the use of artificial intelligence in orthodontics. J Exp Clin Med (Turk.). 2021;38(3s):157-62. https://doi.org/10.52142/omujecm.38.si.dent.13
    » https://doi.org/10.52142/omujecm.38.si.dent.13
  • 4 Khanagar SB, Al-Ehaideb A, Maganur PC, Vishwanathaiah S, Patil S, Baeshen HA, et al. Developments, application, and performance of artificial intelligence in dentistry: a systematic review. J Dent Sci. 2021;16(1):508-22. https://doi.org/10.1016/j.jds.2020.06.019
    » https://doi.org/10.1016/j.jds.2020.06.019
  • 5 Rodrigues JA, Krois J, Schwendicke F. Demystifying artificial intelligence and deep learning in dentistry. Braz Oral Res. 2021;35:e094. https://doi.org/10.1590/1807-3107bor-2021.vol35.0094
    » https://doi.org/10.1590/1807-3107bor-2021.vol35.0094
  • 6 İzgi E, Pekiner FN. Comparative evaluation of conventional and OnyxCeph™ dental software measurements on cephalometric radiography. Turk J Orthod. 2019;32(2):87-95. https: //doi.org/10.5152/TurkJOrthod.2019.18038
    » https://doi.org/10.5152/TurkJOrthod.2019.18038
  • 7 Sayinsu K, Isik F, Trakyali G, Arun T. An evaluation of the errors in cephalometric measurements on scanned cephalometric images and conventional tracings. Eur J Orthod. 2007;29(1):105-8. https://doi.org/10.1093/ejo/cjl0 65
    » https://doi.org/10.1093/ejo/cjl0 65
  • 8 Kamoen A, Dermaut L, Verbeeck R. The clinical significance of error measurement in the interpretation of treatment results. Eur J Orthod. 2001;23(5):569-78. https://doi.org/10.1093/ejo/23.5.569
    » https://doi.org/10.1093/ejo/23.5.569
  • 9 Lindner C, Wang CW, Huang CT, Li CH, Chang SW, Cootes TF. Fully automatic system for accurate localisation and analysis of cephalometric landmarks in lateral cephalograms. Sci Rep. 2016;6:33581. https://doi.org/10.1038/srep33581
    » https://doi.org/10.1038/srep33581
  • 10 Durão AR, Pittayapat P, Rockenbach MI, Olszewski R, Ng S, Ferreira AP, et al. Validity of 2D lateral cephalometry in orthodontics: a systematic review. Prog Orthod. 2013;14(1):31. https://doi.org/10.1186/2196-1042-14-31
    » https://doi.org/10.1186/2196-1042-14-31
  • 11 Mahto RK, Kharbanda OP, Duggal R, Sardana HK. A comparison of cephalometric measurements obtained from two computerized cephalometric softwares with manual tracings. J Indian Orthod Soc. 2016;50(3):162-70. https://doi.org/10.4103/0301-5742.186359
    » https://doi.org/10.4103/0301-5742.186359
  • 12 Hwang HW, Park JH, Moon JH, Yu Y, Kim H, Her SB, et al. Automated identification of cephalometric landmarks: part 2-might it be better than human? Angle Orthod. 2020;90(1):69-76. https://doi.org/10.2319/022019-129.1
    » https://doi.org/10.2319/022019-129.1
  • 13 Lakens D. Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Soc Psychol Personal Sci. 2017;8(4):355-62. https://doi.org/10.1177/1948550617697177
    » https://doi.org/10.1177/1948550617697177
  • 14 AssembleCircle. WebCeph™ A.I. Web-based Orthodontic & Orthognathic Platform. Republic of Korea: AssembleCircle; 2020 [cited 2021 Mar 21]. Available from: https://webceph.com/en/about/
    » https://webceph.com/en/about/
  • 15 Panesar S, Zhao A, Hollensbe E, Wong A, Bhamidipalli SS, Eckert G, et al. Precision and accuracy assessment of cephalometric analyses performed by deep learning artificial intelligence with and without human augmentation. Appl Sci. 2023;13(12):6921. https://doi.org/10.3390/app13126921
    » https://doi.org/10.3390/app13126921
  • 16 Ruiz D, Pérez C. Implementación del protocolo de digitalización radiográfica en la clínica de ortodoncia y ortopedia dentofacial Universidad José Antonio Páez hasta el periodo 2016-2 [dartaset]. 2024 fev 22 [acceso 21 mar 2022]. En: Riujap Repositorio Institucional. Valencia: Universidad José Antonio Páez; 2019. 1 file: 1,38 MB. Disponible en: https://riujap.ujap.edu.ve/entities/publication/8e89bd99-3c90-4109-82f8-95e96923a0a7/full)
    » https://riujap.ujap.edu.ve/entities/publication/8e89bd99-3c90-4109-82f8-95e96923a0a7/full
  • 17 Brazilian Board of Orthodontics and Facial Orthopedics. Case documentation 2021. Rio de Janeiro: BBO; 2021 [citado 22 jan. 2022].. Available from: https://bbo.org.br/index
    » https://bbo.org.br/index
  • 18 De Riu G, Virdis PI, Meloni SM, Lumbau A, Vaira LA. Accuracy of computer-assisted orthognathic surgery. J Craniomaxillofac Surg. 2018;46(2):293-8. https://doi.org/10.1016/j.jcms.2017.11.023
    » https://doi.org/10.1016/j.jcms.2017.11.023
  • 19 Mahto RK, Kafle D, Giri A, Luintel S, Karki A. Evaluation of fully automated cephalometric measurements obtained from web-based artificial intelligence driven platform. BMC Oral Health. 2022;22(1):132. https://doi.org/10.1186/s12903-022-02170-w
    » https://doi.org/10.1186/s12903-022-02170-w
  • 20 Kunz F, Stellzig-Eisenhauer A, Zeman F, Boldt J. Artificial intelligence in orthodontics: Evaluation of a fully automated cephalometric analysis using a customized convolutional neural network. J Orofac Orthop. 2020;81(1):52-68. https://doi.org/10.1186/s12903-022-02170-w
    » https://doi.org/10.1186/s12903-022-02170-w
  • 21 Alqahtani H. Evaluation of an online website-based platform for cephalometric analysis. J Stomatol Oral Maxillofac Surg. 2020;121(1):53-7. https://doi.org/10.1016/j.jormas.20 1 9.04.017
    » https://doi.org/10.1016/j.jormas.20 1 9.04.017
  • 22 Tito Prince ST, Srinivasan D, Duraisamy S, Kannan R, Rajaram K. Reproducibility of linear and angular cephalometric measurements obtained by an artificial-intelligence assisted software (WebCeph) in comparison with digital software (AutoCEPH) and manual tracing method. Dental Press J Orthod. 2023;28(1):1-21. https://doi.org/10.1590/2177-6709.28.1.e2321214 .oar
    » https://doi.org/10.1590/2177-6709.28.1.e2321214
  • 23 Duran GS, Topsakal KG, Görgülü S, Gökmen Ş. Evaluation of the accuracy of fully automatic cephalometric analysis software with artificial intelligence algorithm. Orthod Craniofac Res. 2023;26(3):481-90. https://doi.org/10.1111/ocr.12633
    » https://doi.org/10.1111/ocr.12633
  • 24 Meriç P, Naoumova J. Web-based fully automated cephalometric analysis: comparisons between app-aided, computerized, and manual tracings. Turk J Orthod. 2020;33(3):142-9. https://doi.org/10.5152/TurkJOrthod.2020.20062
    » https://doi.org/10.5152/TurkJOrthod.2020.20062
  • 25 Kunz F, Stellzig-Eisenhauer A, Widmaier LM, Zeman F, Boldt J. Assessment of the quality of different commercial providers using artificial intelligence for automated cephalometric analysis compared to human orthodontic experts. J Orofac Orthop. 2025;86:145-60. https://doi.org/10.1007/s00056-023-00491-1
    » https://doi.org/10.1007/s00056-023-00491-1
  • 26 Kim IH, Kim YG, Kim S, Park JW, Kim N. Comparing intra-observer variation and external variations of a fully automated cephalometric analysis with a cascade convolutional neural net. Sci Rep. 2021;11(1):7925. https://doi.org/10.1038/s41598-021-87261-4
    » https://doi.org/10.1038/s41598-021-87261-4
  • 27 Choi YJ, Lee K-J, editors. Possibilities of artificial intelligence use in orthodontic diagnosis and treatment planning: Image recognition and three-dimensional VTO. Semin Orthod. 2021;27(2):121-27. https://doi.org/10.1053/j.sodo.202 1.05.008
    » https://doi.org/10.1053/j.sodo.202 1.05.008
  • 28 Queiroz Tavares Borges Mesquita G, Brito Júnior RB, Vieira WA, Vidigal MTC, Travençolo BAN, Beaini TL, et al. Artificial Intelligence for Detecting Cephalometric Landmarks: a systematic review and meta-analysis. J Digit Imaging. 2023;36(3):1158-79. https://doi.org/10.1007/s10278-022-00766-w
    » https://doi.org/10.1007/s10278-022-00766-w

Edited by

  • Assistant editor:
    Luciana Butini Oliveira

Publication Dates

  • Publication in this collection
    19 Sept 2025
  • Date of issue
    2025

History

  • Received
    22 Jan 2024
  • Reviewed
    18 Oct 2024
  • Accepted
    26 Dec 2024
location_on
Faculdade São Leopoldo Mandic Rua José Rocha Junqueira, 13, CEP: 13045-755 , Tel.: +55 (19) 3211-3689 - Campinas - SP - Brazil
E-mail: contato@revistargo.com.br
rss_feed Acompanhe os números deste periódico no seu leitor de RSS
Reportar erro