Assessing the accuracy and reproducibility of artificial intelligence-generated medical responses by ChatGPT on Scheuermann`s kyphosis

Esra Giray ¹ ,Özge Gülsüm Illeez ¹ ,Merve Damla Korkmaz ² ,Nalan Capan ³ ,Evrim Karadag Saygi ⁴ ,Resa Aydın ³

¹ Department of Physical Medicine and Rehabilitation, University of Health Sciences, Fatih Sultan Mehmet Training and Research Hospital, İstanbul, Türkiye
² Department of Physical Medicine and Rehabilitation, Biruni University Faculty of Medicine, İstanbul, Türkiye
³ Department of Physical Medicine and Rehabilitation, İstanbul University, İstanbul Faculty of Medicine, İstanbul, Türkiye
⁴ Department of Physical Medicine and Rehabilitation, Marmara University School of Medicine, İstanbul, Türkiye DOI : 10.5606/tftrd.2025.15876 Objectives: The study aimed to measure the performance and reproducibility of artificial intelligence in answering frequently asked questions about Scheuermann`s kyphosis and to compare the artificial intelligence with the SOSORT (International Scientific Society on Scoliosis Orthopaedic and Rehabilitation Treatment) consensus in answering case-based questions.

Materials and methods: In this cross-sectional study, 75 questions adapted from frequently asked questions about Scheuermann`s kyphosis were queried twice on ChatGPT. Response similarity was assessed to investigate reproducibility. The accuracy of responses was scored based on a scale. Four case studies from the end of the 7th SOSORT consensus paper on the conservative treatment of idiopathic and Scheuermann`s kyphosis were presented to ChatGPT.

Results: ChatGPT provided correct and comprehensive answers to 43 (57.33%) questions, correct but not comprehensive answers to 29 (38.67%) questions, and partially incorrect answers to 3 (4%) questions. ChatGPT performed best in the quality-of-life category, with 18/19 (94.73%) correct scores (score of 1). ChatGPT performed worst in the diagnosis category, with 3/8 (37.5%) correct and comprehensive answers, and in the treatment and follow-up category, with 9/24 (37.5%) correct and comprehensive answers. ChatGPT provided reproducible answers to 92% of the questions. ChatGPT's responses to the treatment of all four case studies were incorrect.

Conclusion: While ChatGPT can provide valuable general information regarding Scheuermann`s kyphosis, its ability to offer accurate treatment-related advice is limited. Keywords : Artificial intelligence, ChatGPT, large language model, machine learning, Scheuermann`s disease, Scheuermann`s kyphosis