Evaluation of chatgpts performance in Türkiye's first emergency medicine sub-specialization exam
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
This study aims to evaluate ChatGPT's performance in T & uuml;rkiye's Emergency Medicine Sub-Specialization Exam by assessing its success in answering both standalone and scenario-based questions through repeated testing. Materials and Methods: This study utilized 60 multiple-choice questions from the Emergency Medicine Sub-Specialization Exam, comprising 30 standalone questions (50%) and 30 scenario-based questions (50%). Each question was presented to ChatGPT five times on different days, with all tests being conducted by researchers using the same computer. The latest version of ChatGPT, based on the GPT-4 architecture and extensively trained on medical texts and journals as of October 2023, was employed to ensure the highest available level of medical knowledge. Results: ChatGPT achieved an overall accuracy rate of 85%, correctly answering 255 out of 300 questions across five trials. The accuracy rates for the five trials were 85% (51/60), 86.7% (52/60), 86.7% (52/60), 85% (51/60), and 81.7% (49/60), respectively, with no statistically significant difference between trials (p=0.94). ChatGPT demonstrated significantly higher accuracy in standalone questions compared to scenario-based questions [91.3% (137/150) vs. 78.7% (118/150), p=0.002]. Notably, ChatGPT exhibited consistent accuracy in interpreting visual data and correctly answering the two radiology-related questions across all five trials. Conclusion: ChatGPT demonstrated high performance and consistency in T & uuml;rkiye's first Emergency Medicine Sub-Specialization Exam, particularly excelling in standalone questions and radiological image interpretation. While the system is generally promising, its lower performance on scenario-based questions highlights the need for further development of clinical reasoning skills. These findings suggest potential applications of artificial intelligence systems in medical education and assessment, while emphasizing the necessity for improvements in clinical decision-making abilities.