A comparative analysis of the performance of large language models in the basic life support exam: comprehensive evaluation of ChatGPT-4o, Gemini 2.0, Claude 3.5, and DeepSeek R1

Considering the growing role artificial intelligence technologies play in medical education, this study aims to provide a comparative evaluation of the performances of large language models ChatGPT-4o, Gemini 2.0, Claude 3.5 , DeepSeek R1 in the Basic Life Support (BLS) Exam. Materials , Methods: In this observational study, we presented four large language models with 25 multiple-choice questions based on the American Heart Association (AHA) guidelines. Questions were divided into two categories as knowledge-based (n = 14, 56%) and case-based (n = 11, 44%). Response consistency was ensured by presenting each question on three separate days to all models. Models' accuracy rates were assessed using overall accuracy, strict accuracy, and ideal accuracy criteria. Results: In the overall accuracy assessment, ChatGPT-4o and DeepSeek R1 models showed 100% success, and Gemini 2.0 and Claude 3.5 models achieved 96% success rate. All models performed perfectly on the case-based questions. On the knowledge-based questions, ChatGPT-4o and DeepSeek R1 scored full points, while Gemini 2.0 and Claude 3.5 achieved 90.9% success. Statistical analysis showed no significant difference between results (p = 0.368). Discussion: Large language models show high accuracy rates in BLS training. These technologies can be used in supportive roles in medical education, but human supervision is critical in clinical decision-making.

Anahtar Kelimeler

Artificial Intelligence, Large Language Models, Basic Life Support, Medical Education, ChatGPT, Resuscitation

Kaynak

Annals of Clinical and Analytical Medicine

WoS Q Değeri

Q4

Cilt

16

Sayı

8

Bağlantı

https://doi.org/10.4328/ACAM.22758
https://hdl.handle.net/20.500.12451/14527

Koleksiyon

Makale Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

A comparative analysis of the performance of large language models in the basic life support exam: comprehensive evaluation of ChatGPT-4o, Gemini 2.0, Claude 3.5, and DeepSeek R1

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama