Open-access Between the algorithm and clinical reasoning

Generative artificial intelligence (AI), powered by large language models (LLMs), has emerged as one of the most transformative innovations of our era. These systems, trained on vast volumes of text and data, are capable of processing and generating human language with unprecedented fluency. Although powerful models such as GPT-3 had existed since 2020, it was the release of the public ChatGPT interface (OpenAI) in November 2022 that became the catalyst of a revolution1. By broadening access to AI, ChatGPT accelerated its adoption across different sectors at an unprecedented pace—culminating in the launch of GPT-5 in August 2025. In this context of rapid transformation, medicine, and specifically nephrology, has begun to explore the real impact of this tool. Feitosa et al.2 recently published in this journal the data of a pioneering national study that evaluated the performance of GPT-3.5 and GPT-4 in solving 411 nephrology residency examination questions administered between 2010 and 2024.

The design of this pilot study involved a direct comparison between the two versions of the model within four key domains: hydroelectrolytic and acid-base disorders, tubulointerstitial diseases, chronic kidney disease (CKD), and glomerular diseases. The results showed that GPT-4 achieved an overall accuracy of 79.8%, significantly higher than GPT-3.5’s 56.3% (p < 0.001), with consistent superiority across all subtopics. Furthermore, GPT-4 performed better on text-based questions (81.5%), but showed limitations on image-based questions (54.5%). The findings reported by the authors suggest that GPT-4 may be a promising tool for medical education, simulating clinical scenarios and assisting in the development of diagnostic reasoning2.

These results echo what international literature has demonstrated. Studies comparing the performance of LLMs with that of physicians on certification examinations and multiple international licensing exams have shown that GPT-4 reached accuracy levels above 90%3,4,5. Interestingly, its performance varied between exams, which may reflect not only differences in test design but also cultural and curricular particularities that shape how medical knowledge is assessed in different countries5. This underscores the importance of local validation studies, such as the study by Feitosa et al., which ensure the contextual relevance of technology assessment.

The educational implications are vast. The true value of LLMs lies not in their role as repositories of correct answers, but in their potential as interactive learning partners. More broadly, AI can support educational efforts by enhancing preclinical and clinical learning, assisting diagnostic processes, making assessments more efficient, reducing faculty workload, providing feedback, and generating insights to guide curricula and evidence-based policies6. In specialties with a high cognitive load, such as nephrology, AI can act as a catalyst of knowledge, structuring and personalizing the educational experience. Instead of focusing on memorization, learners can use AI platforms to mitigate this load through simulations and adaptive scenarios7. This type of interaction, simultaneously scalable and personal, allows them to test hypotheses and refine clinical reasoning in a safe, controlled environment, and has been associated among medical students with improvements in diagnostic accuracy and problem-solving skills8. In this sense, LLMs can serve as a supportive layer in the educational infrastructure of the specialty, assisting both learners and teachers. AI, therefore, if carefully used, does not replace the educator but complements them, allowing medical education to shift from “what” to “how”.

However, human–AI collaboration is not without paradoxes. While AI can increase the rate of solving complex cases, it can also induce error. A recent study revealed that although AI improved overall performance, some specialists missed questions they had previously answered correctly, precisely after being influenced by the algorithm’s incorrect suggestion9. Furthermore, excessive technological dependence can lead to “deskilling” (loss of acquired skills) and “never-skilling” (failure to develop essential competencies)10. These aspects reinforce the central thesis: AI is a powerful tool, but one that requires constant critical oversight, acting as a copilot rather than as an autopilot.

The reflection goes beyond education. In clinical practice, AI systems are already capable of identifying patients with CKD and stratifying its progression, predicting the risk of acute kidney injury, and optimizing compatibility in kidney transplantation11,12. The increasing accuracy of these models in factual knowledge tasks, especially those based on clinical guidelines, suggests that they are approaching expert-level performance. With the reduction of intrinsic errors, failures in responses will likely reflect more the ambiguities in question formulation than the system’s limitations. In this scenario, a fundamental question arises: what will be the physician’s role when algorithms assume responsibility for memory and knowledge retrieval? The future challenges us to cultivate essentially human competencies such as critical thinking, decision-making in uncertain contexts, empathy, and leadership.

The integration of LLMs into health professions education and medical practice is no longer a speculative possibility, but an inevitable trajectory. The question is not “if”, but “how” we will implement this integration in an ethical, safe, regulated, and ultimately patient-centered manner. The challenge ahead is to ensure that AI incorporation occurs critically and strategically, maximizing benefits for education and clinical care while minimizing risks. Only then will we be able to transform the potential of the algorithm into concrete benefits for the community.

Data Availability

No new data were generated or analyzed in this study.

References

  • 1. Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for Science. Nature. 2023;614(7947): 214–6. doi: http://doi.org/10.1038/d41586-023-00340-6. PubMed PMID: 36747115.
    » https://doi.org/10.1038/d41586-023-00340-6
  • 2. Feitosa Fo H, Furtado JF, Eulálio EC, Ribeiro PV, Paiva LM, Correia MM, et al. ChatGPT performance in answering medical residency questions in nephrology: a pilot study in Brazil. Braz. J. Nephrol. 2025;47(4):e20240254. doi: http://doi.org/10.1590/2175-8239-jbn-2024-0254pt. PubMed PMID: 40623208.
    » https://doi.org/10.1590/2175-8239-jbn-2024-0254pt
  • 3. Katz U, Cohen E, Shachar E, Somer J, Fink A, Morse E, et al. GPT versus resident physicians: a benchmark based on official board scores. NEJM AI. 2024;1(5):1. doi: http://doi.org/10.1056/AIdbp2300192.
    » https://doi.org/10.1056/AIdbp2300192
  • 4. Balu A, Prvulovic ST, Fernandez Perez C, Kim A, Donoho DA, Keating G. Evaluating the value of AI-generated questions for USMLE step 1 preparation: a study using ChatGPT-3.5. Med Teach. 2025;47(10):1645–53. doi: http://doi.org/10.1080/0142159X.2025.2478872. PubMed PMID: 40146672.
    » https://doi.org/10.1080/0142159X.2025.2478872
  • 5. Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Med Educ. 2024;24(1):1372. doi: http://doi.org/10.1186/s12909-024-06309-x. PubMed PMID: 39593041.
    » https://doi.org/10.1186/s12909-024-06309-x
  • 6. Boscardin CK, Abdulnour RE, Gin BC. Macy foundation innovation report part I: current landscape of Artificial Intelligence in medical education. Acad Med. 2025;100(9S, Suppl 1):S15–21. doi: http://doi.org/10.1097/ACM.0000000000006107. PubMed PMID: 40456178.
    » https://doi.org/10.1097/ACM.0000000000006107
  • 7. Miao J, Thongprayoon C, Craici IM, Cheungpasitporn W. How to incorporate generative artificial intelligence in nephrology fellowship education. J Nephrol. 2024;37(9):2491–7. doi: http://doi.org/10.1007/s40620-024-02165-6. PubMed PMID: 39621255.
    » https://doi.org/10.1007/s40620-024-02165-6
  • 8. Turner L, Kelleher M, Overla S, Zheng W, Gregath A, Gharib M, et al. Harnessing the generative power of AI to move closer to personalized medical education. Acad Med. 2025. doi: http://doi.org/10.1097/ACM.0000000000006185. PubMed PMID: 40795133.
    » https://doi.org/10.1097/ACM.0000000000006185
  • 9. Noda R, Tanabe K, Ichikawa D, Shibagaki Y. GPT-4’s performance in supporting physician decision-making in nephrology multiple-choice questions. Sci Rep. 2025;15(1):15439. doi: http://doi.org/10.1038/s41598-025-99774-3. PubMed PMID: 40316716.
    » https://doi.org/10.1038/s41598-025-99774-3
  • 10. Abdulnour RE, Gin B, Boscardin CK. Educational strategies for clinical supervision of Artificial Intelligence use. N Engl J Med. 2025;393(8):786–97. doi: http://doi.org/10.1056/NEJMra2503232. PubMed PMID: 40834302.
    » https://doi.org/10.1056/NEJMra2503232
  • 11. Koirala P, Thongprayoon C, Miao J, Garcia Valencia AO, Sheikh MS, Suppadungsuk S, et al. Evaluating AI performance in nephrology triage and subspecialty referrals. Sci Rep. 2025;15(1):3455. doi: http://doi.org/10.1038/s41598-025-88074-5. PubMed PMID: 39870788.
    » https://doi.org/10.1038/s41598-025-88074-5
  • 12. Singh P, Goyal L, Mallick DC, Surani SR, Kaushik N, Chandramohan D, et al. Artificial Intelligence in nephrology: clinical applications and challenges. Kidney Med. 2025;7(1):100927. doi: http://doi.org/10.1016/j.xkme.2024.100927. PubMed PMID: 39803417.
    » https://doi.org/10.1016/j.xkme.2024.100927

Edited by

Publication Dates

  • Publication in this collection
    28 Nov 2025
  • Date of issue
    Oct-Dec 2025

History

  • Received
    04 Sept 2025
  • Accepted
    16 Sept 2025
location_on
Sociedade Brasileira de Nefrologia Rua Machado Bittencourt, 205 - 5ºandar - conj. 53 - Vila Clementino - CEP:04044-000 - São Paulo SP, Telefones: (11) 5579-1242/5579-6937, Fax (11) 5573-6000 - São Paulo - SP - Brazil
E-mail: bjnephrology@gmail.com
rss_feed Stay informed of issues for this journal through your RSS reader
Report error