ABSTRACT
Objective: To analyze the agreement between responses generated by AI tools regarding dental trauma, as derived from a validated questionnaire for laypeople, based on the latest International Association of Dental Traumatology (IADT) guideline recommendations.
Materials and Methods: Eleven questions were entered into the AI tools ChatGPT-3.5, Microsoft Bing-Copilot, and Google Gemini on the same date and in sequence. The answers were collected and coded as "correct" (1 point) or "incorrect" (0 point). The difference between correct responses among the AI tools analyzed was evaluated using Fisher's exact test (p<0.05). The agreement between the AI tools and IADT was assessed using the Kappa test.
Results: None of the AI tools answered all the questions correctly, and there were no significant differences between the three AI tools (p>0.05). Regarding the accordance with IADT recommendations, ChatGPT-3.5 demonstrated weak agreement (k = 0.46; p = 1.122), Google Gemini showed moderate agreement (k = 0.63; p = 0.036), and Microsoft Bing-Copilot exhibited strong agreement (k = 0.81; p = 0.006). The frequency of correct answers among ChatGPT-3.5, Google Gemini, and Microsoft Bing-Copilot was 73%, 82%, and 91%, respectively.
Conclusion: Although there were no differences regarding the number of correct answers, the level of agreement with IADT recommendations varied among the AI tools, and their use by the public should be cautiously approached.
Keywords:
Artificial Intelligence; Health Information Exchange; Tooth Avulsion
Introduction
Dental traumatic injuries are listed among public health problems due to their prevalence, frequent assessments by dental professionals, and their long-term effects, which significantly impact an individual's quality of life [1]. Given their prevalence, dental traumatic injuries have become a common dental issue in many countries [2]. It is believed that in the last 20 years, over 1 billion people have experienced an event related to dental trauma, and despite this, this condition remains underestimated [3].
According to the International Association of Dental Traumatology (IADT) [4] and the Academy of Sports Dentistry [5], immediate actions taken following dental trauma are crucial as they influence the prognosis of these injuries. Additionally, they emphasize the importance of educating everyone potentially involved with dental traumatic injuries regarding their prevention and first aid. Despite this, recent systematic reviews have highlighted the lack of knowledge among parents, schoolteachers, and non-dental health professionals regarding appropriate immediate care after dental trauma [6,7,8].
Over the years, the internet and, more recently, Artificial Intelligence (AI) tools have become essential means of disseminating knowledge. The so-called chatbots are AI systems capable of replicating conversations in a human-like manner through their natural language processing (NLP) methods and Deep Machine Learning (DML) capabilities [9]. This digital apparatus can reproduce human intellect, responding to dichotomous and complex questions and making predictions and decisions [10]. In contrast to standard search engines, which provide general content from diverse sources, these AI platforms prioritize conversational systems, which enhances people's understanding [9].
In healthcare, patients often face challenges establishing immediate contact with the responsible professional, whether to address questions or in an emergency. Thus, the use of these AI tools becomes an important mechanism that can be employed by individuals seeking to clarify questions in which they lack expertise [9] and to enhance their knowledge regarding health-related issues [11].
Given easy access to AI chatbots when seeking dental trauma information and guidance on emergencies, evaluating the content provided by free AI tools on this subject is highly relevant. Therefore, this cross-sectional study aimed to analyze the agreement between responses generated by AI tools: ChatGPT-3.5, Microsoft Bing-Copilot, and Google Gemini after role-playing binary questions regarding dental trauma, derived from a validated questionnaire for a lay public, based on the latest recommendations from the International Association of Dental Traumatology (IADT) [4,12,13].
Material and Methods
Ethical Considerations
Ethical approval was not required as this study did not involve human or animal participants.
Study Design
This comparative study evaluated the content generated by three AI tools using questions related to traumatic dental injuries (TDI) derived from a validated questionnaire for laypeople. The accuracy of the answers and the percentage of correct responses provided by the AI tools were measured based on the research conducted by Ozden et al. [9].
Questionnaire l
The 11 questions are derived from the TDI-Q (Traumatic Dental Injuries-Questionnaire) [14], a validated tool used to assess laypeople’s knowledge regarding traumatic dental injuries. It comprises binary responses (yes or no) based on the latest recommendations from the IADT. TDI-Q validation was conducted in six stages to ensure proper elaboration and accurate psychometric properties, which were evaluated for the Brazilian Portuguese language. The questionnaire demonstrated high quality, distinctive properties, positive and strong convergent validity, satisfactory reliability, and temporal stability, indicating that the TDI-Q is easy to understand, with clearly defined questions and answer alternatives [14].
Insertion of the Questions into the AI Tools
The TDI-Q questions were entered into the free AI tools ChatGPT-3.5, Microsoft Bing-Copilot, and Google Gemini on the same date and in sequence. Initially, a prompt format was posed to prepare the tool and prevent excessive content generation by AIs: "Hello, how are you? Could you please answer the following questions with only 'yes' or 'no'?" The prompt was included to prevent vague and lengthy responses, as well as additional content. All questions were entered into the tools by the same operator (BC) to ensure consistency at this stage using a single Google account on the same date and in the same order: First ChatGPT, then Gemini, and finally, Microsoft Bing.
Answers Collection and Scoring
The 11 answers were coded as "correct" or "incorrect¨ by an endodontist (APPZ), according to the IADT recommendations. The data was entered into an Excel spreadsheet. A score of 1 was assigned for correct answers, and a score of 0 was assigned for incorrect answers. Therefore, each tool's final score ranged from 0 to 11 points.
Statistical Analysis
The agreement between the AI tools and the recommendations from the IADT was assessed using the Kappa test and classified as weak, moderate, or strong according to MacHugh [15] (40%-59%: weak; 60% to 79%: Moderate; 80% to 90%: Strong; Above 90%: Almost perfect). The difference in correct responses among the AI tools (ChatGPT 3.5 vs. Google Gemini, ChatGPT 3.5 vs. Microsoft Bing Copilot, and Gemini vs. Microsoft Bing Copilot) was evaluated using Fisher's exact test with a minimum significance level of 5% (p < 0.05). For statistical analysis, SPSS (Statistical Product and Service Solutions) version 2.9 (IBM Corp., Armonk, NY, USA) was used.
Results
Table 1 shows the accuracy rates of the tools. Microsoft Bing-Copilot answered 10 out of 11 questions correctly, Google Gemini answered 9 out of 11, and ChatGPT 3.5 answered 8 out of 11 questions correctly. None of the three AI tools tested answered the tenth question correctly. For the comparison analysis between the tools regarding the frequency of correct answers, no statistically significant differences were found among the AI tools in any of the comparisons (p>0.05).
The analysis of agreement between the responses from the AI tools and the recommendations from IADT revealed the following results: ChatGPT 3.5 showed weak agreement (k = 0.46, p=1.122), while Google Gemini exhibited moderate agreement (k = 0.63, p=0.036). In contrast, Microsoft Bing-Copilot demonstrated strong agreement (k = 0.81, p=0.006) (Table 2).
Discussion
Regarding strategies to increase public awareness about traumatic dental injuries (TDI), the use of informational resources provided periodically and repetitively has been shown to improve knowledge levels [7]. Authors argue that digital technology and quality social media content may be crucial in assisting the general population in a traumatic dental incident [16,17]. In this context, AI chatbot tools emerge as accessible and constantly available sources of knowledge, as they can, unlike traditional search engines, understand and generate human-like text, providing personalized feedback and responses conversationally, making them more attractive [18,19,20].
Given the potential for AI tools to be used by the public as reliable sources of health information, several studies have focused on the validity and accuracy of such content [9,19]. Ozden et al. assessed the consistency and accuracy of responses provided by Google Bard (now Google Gemini) and ChatGPT regarding 25 dichotomous questions (Yes/No) about dental trauma generated by experts [9]. As in the current study, their results showed that Google Bard (Google Gemini) demonstrated better accuracy and agreement with IADT recommendations than Chat GPT. Portilla et al. assessed the accuracy and consistency of 99 responses provided by Google Gemini to TDI dichotomous questions, comparing them to the recommendations from the European Society of Endodontology's position statement regarding managing traumatized permanent teeth (TPT). Google Gemini achieved an overall accuracy rate of 80.8%, similar to the 82% achieved by the same tool in our study [21].
This study offers the unique aspect of accessing such content based on a validated instrument suitable for the lay public, thereby enhancing the robustness of the data obtained [14]. Furthermore, considering the increasing number of available tools, this study evaluated and compared three free AI tools.
The IADT recommendations for properly managing dental trauma are a valuable source of increasing knowledge regarding prevention, management, and treatment, thereby improving the chances of more favorable outcomes, and should be regarded as the best current evidence [4]. Disseminating such recommendations leads to better prognosis and improved patient outcomes [22]. The results of this study indicated that there were no significant differences in correctness rates among the three AI tools (p > 0.05), with only Microsoft Bing-Copilot showing strong agreement (Kappa = 0.81) with the IADT. However, none of the AI tools achieved 100% agreement. Therefore, none of the AI tools assessed in this study showed perfect agreement with the IADT recommendations, which is concerning, as these tools should provide valid information based on evidence-based care practices.
Interestingly, none of the AI tools answered question 10 correctly. This error is likely due to the recent update in the latest guidelines from the IADT regarding the best medium for transporting an avulsed tooth. This highlights the necessity for continuous updates to the content provided by these tools. Chatbots will become increasingly common for guidance and health information exchange. This highlights the need for constant monitoring, evaluation, and regular updating of the information generated by these tools. Errors or false medical information AI provides can potentially lead to wide-ranging and harmful consequences [23]. According to the World Health Organization (WHO), in its recent regulatory considerations on the use of AI in healthcare, regulatory policies should be developed in collaboration with AI developers and regulators to ensure the safe use of AI in health [24]. Given the results obtained in this study, it is essential to note that the use of AI tools by the general population seeking guidance on dental trauma should be approached with caution.
Regarding the language used, the questions were inserted into AI tools in Brazilian Portuguese once their psychometric properties were evaluated for the Brazilian population. Using another language could impact questionnaire reliability. It is recommended that the questionnaire be validated in other languages and that further studies be conducted to assess AI performance across different languages.
This study has the limitation of not analyzing the consistency of responses over time, as the questions were not posed at different intervals, thus lacking data on response stability and patterns of updates based on their type and learning. On the other hand, it benefits from using a validated questionnaire for laypeople, designed to mimic the questions that this audience would pose to AI tools regarding dental trauma.
Conclusion
There was no significant difference between AI tools in terms of the occurrence of correct answers. However, the level of agreement between the AI tools' responses and the IADT recommendations presented variation. The use of AI tools by the general public should be approached with caution, and individuals should seek guidance from qualified professionals. Additional studies are needed to assess the accuracy of these tools regarding traumatic dental injuries.
-
Financial Support
Fundação de Amparo à Pesquisa partially financed the work do Estado do Rio de Janeiro – FAPERJ (E-26/210.933/2024 and E-26/202.584/2024) and Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq (CNPQ 310225/2020-5 and and CNPq 307901/2025-4).
Data Availability
The data supporting the findings of this study can be made available upon request to the corresponding author.
References
-
1 Glendor U. Epidemiology of traumatic dental injuries: A 12-year review of the literature. Dent Traumatol 2008; 24(6):603-611. https://doi.org/10.1111/j.1600-9657.2008.00696.x
» https://doi.org/10.1111/j.1600-9657.2008.00696.x -
2 Bezerra EFN, Herkrath FJ, Vettore MV, Rebelo MAB, de Queiroz AC, Rebelo Vieira JM, et al. Contextual and individual factors associated with traumatic dental injuries in deprived 12-year-old schoolchildren: A cohort study. Dent Traumatol 2024; 40(5):546-556. https://doi.org/10.1111/edt.12955
» https://doi.org/10.1111/edt.12955 -
3 Petti S, Glendor U, Andersson L. World traumatic dental injury prevalence and incidence, a meta-analysis-One billion living people have had traumatic dental injuries. Dent Traumatol 2018; 34(2):71-86. https://doi.org/10.1111/edt.12389
» https://doi.org/10.1111/edt.12389 -
4 Levin L, O'Connell AC, Tewari N, Mills SC, Stasiuk H, Roettger M, et al. The International Association of Dental Traumatology (IADT) and the Academy for Sports Dentistry (ASD) guidelines for prevention of traumatic dental injuries: Part 1: General introduction. Dent Traumatol 2024; 40(Suppl 1):1-3. https://doi.org/10.1111/edt.12923
» https://doi.org/10.1111/edt.12923 -
5 Tewari N, Abbott PV, O'Connell AC, Mills SC, Stasiuk H, Roettger M, et al. The International Association of Dental Traumatology (IADT) and the Academy for Sports Dentistry (ASD) guidelines for prevention of traumatic dental injuries: Part 10: First aid education. Dent Traumatol 2024; 40(Suppl 1):22-24. https://doi.org/10.1111/edt.12931
» https://doi.org/10.1111/edt.12931 -
6 Tewari N, Jonna I, Mathur VP, Goel S, Ritwik P, Rahul M, et al. Global status of knowledge for the prevention and emergency management of traumatic dental injuries among non-dental healthcare professionals: A systematic review and meta-analysis. Injury 2021; 52(8):2025-2037. https://doi.org/10.1016/j.injury.2021.06.006
» https://doi.org/10.1016/j.injury.2021.06.006 -
7 Cantile T, Lombardi S, Quaraniello M, Riccitiello F, Leuci S, Riccitiello A. Parental knowledge, attitude and practice regarding paediatric dental trauma. A systematic review. Eur J Paediatr Dent 2025; 26(1):8-18. https://doi.org/10.23804/ejpd.2023.2050
» https://doi.org/10.23804/ejpd.2023.2050 -
8 Tewari N, Johnson RM, Mathur VP, Rahul M, Goel S, Ritwik P, et al. Global status of knowledge for prevention and emergency management of traumatic dental injuries in sports persons and coaches: A systematic review. Dent Traumatol 2021; 37(2):196-207. https://doi.org/10.1111/edt.12629
» https://doi.org/10.1111/edt.12629 -
9 Ozden I, Gokyar M, Ozden ME, Sazak Ovecoglu H. Assessment of artificial intelligence applications in responding to dental trauma. Dental Traumatol 2024; 40(6):722-729. https://doi.org/10.1111/edt.12965
» https://doi.org/10.1111/edt.12965 -
10 Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: A comprehensive review. Cureus 2023; 15(4):e38317. https://doi.org/10.7759/cureus.38317
» https://doi.org/10.7759/cureus.38317 -
11 Kirchner GJ, Kim RY, Weddle JB, Bible JE. Can artificial intelligence improve the readability of patient education materials?. Clin Orthop Relat Res 2023; 481(11):2260-2267. https://doi.org/10.1097/CORR.0000000000002668
» https://doi.org/10.1097/CORR.0000000000002668 -
12 Bourguignon C, Cohenca N, Lauridsen E, Flores MT, O'Connell AC, Day PF, et al. International Association of Dental Traumatology guidelines for the management of traumatic dental injuries: 1. Fractures and luxations. Dent Traumatol 2020; 36(4):314-330. https://doi.org/10.1111/edt.12578
» https://doi.org/10.1111/edt.12578 -
13 Day PF, Flores MT, O'Connell AC, Abbott PV, Tsilingaridis G, Fouad AF et al. International Association of Dental Traumatology guidelines for the management of traumatic dental injuries: 3. Injuries in the primary dentition. Dent Traumatol 2020; 36(4):343-359. https://doi.org/10.1111/edt.12576
» https://doi.org/10.1111/edt.12576 -
14 Magno MB, Jural LA, Ribeiro-Lages MB, Silva K, Coqueiro RS, Pithon MM, et al. Development and psychometric properties of a questionnaire about knowledge of lay people about traumatic dental injury. Dent Traumatol 2024; 40(2):171-177. https://doi.org/10.1111/edt.12892
» https://doi.org/10.1111/edt.12892 - 15 McHugh ML. Interrater reliability: The kappa statistic. Biochem Med 2012; 22(3):276-282.
-
16 Budak L, Levin L. The importance of immediate dental trauma care: Comprehensive education, treatment approaches, and their profound impact on patients' quality of life. Dent Traumatol 2024; 40(5):477-481. https://doi.org/10.1111/edt.12987
» https://doi.org/10.1111/edt.12987 -
17 Nangia D, Saini A, Krishnan A, Sharma S, Kumar V, Chawla A, et al. Quality and accuracy of patient-oriented Web-based information regarding tooth avulsion. Dent Traumatol 2022; 38(4):299-308. https://doi.org/10.1111/edt.12741
» https://doi.org/10.1111/edt.12741 -
18 Rizvi DS. Health education and global health: Practices, applications, and future research. J Educ Health Promot 2022; 11:262. https://doi.org/10.4103/jehp.jehp_218_22
» https://doi.org/10.4103/jehp.jehp_218_22 -
19 Johnson D, Goodman R, Patrinely J, Stone C, Zimmerman E, Donald R, et al. Acessing the accuracy and reliability of ai-generated medical responses: An evaluation of the Chat-GPT model. Res Sq 2023; rs.3.rs-2566942. https://doi.org/10.21203/rs.3.rs-2566942/v1
» https://doi.org/10.21203/rs.3.rs-2566942/v1 -
20 Ghanem YK, Rouhi AD, Al-Houssan A, Saleh Z, Moccia MC, Joshi H, et al. Dr. Google to Dr. ChatGPT: Assessing the content and quality of artificial intelligence-generated medical information on appendicitis. Surg Endosc 2024; 38(5):2887-2893. https://doi.org/10.1007/s00464-024-10739-5
» https://doi.org/10.1007/s00464-024-10739-5 -
21 Portilla ND, Garcia-Font M, Nagendrababu V, Abbott PV, Sanchez JAG, Abella F. Accuracy and consistency of gemini responses regarding the management of traumatized permanent teeth. Dent Traumatol 2025; 41(2):171-177. https://doi.org/10.1111/edt.13004
» https://doi.org/10.1111/edt.13004 -
22 Budak L, Levin L. The importance of immediate dental trauma care: Comprehensive education, treatment approaches, and their profound impact on patients' quality of life. Dent Traumatol 2024; 40(5):477-481. https://doi.org/10.1111/edt.12987
» https://doi.org/10.1111/edt.12987 -
23 Walker HL, Ghani S, Kuemmerli C, Nebiker CA, Müller BP, Raptis DA, et al. Reliability of medical information provided by ChatGPT: Assessment against clinical guidelines and patient information quality instrument. J Med Internet Res 2023; 25:e47479. https://doi.org/10.2196/47479
» https://doi.org/10.2196/47479 -
24 World Health Organization. Regulatory considerations on artificial intelligence for health. World Health Organization; 2023. Available from: https://iris.who.int/handle/10665/373421 [Accessed on December 5, 2024].
» https://iris.who.int/handle/10665/373421
Edited by
-
Academic Editor:
Alessandro Leite Cavalcanti
Publication Dates
-
Publication in this collection
28 Nov 2025 -
Date of issue
2026
History
-
Received
20 Jan 2025 -
Accepted
06 Feb 2025
