AI-Assisted knowledge assessment: comparison of ChatGPT and Gemini on undescended testicle in children

Özdemir Kaçer, EmineTuşat, MustafaKılıçaslan, MuratMemiş, Sebahattin2025-10-012025-10-012025https://hdl.handle.net/20.500.12451/14581This study aimed to evaluate the accuracy and completeness of ChatGPT-4 and Google Gemini in answering questions about undescended testis, as these AI tools can sometimes provide seemingly accurate but incorrect information, raising caution in medical applications. Methods: Researchers created 20 identical questions independently and submitted them to both ChatGPT-4 and Google Gemini.A pediatrician and a pediatric surgeon evaluated the responses for accuracy, using the Johnson et al. scale (accuracy rated from 1 to 6 and completeness from 1 to 3).Responses that lacked content received a score of 0. Statistical analyses were performed using R Software (version 4.3.1) to assess differences in accuracy and consistency between the tools. Results: Both chatbots answered all questions, with ChatGPT achieving a median accuracy score of 5.5 and a mean score of 5.35, while Google Gemini had a median score of 6 and a mean of 5.5. Completeness was similar, with ChatGPT scoring a median of 3 and Google Gemini showing comparable performance. Conclusion: ChatGPT and Google Gemini showed comparable accuracy and completeness; however, inconsistencies between accuracy and completeness suggest these AI tools require refinement.Regular updates are essential to improve the reliability of AI-generated medical information on UDT and ensure up-to-date, accurate responses.eninfo:eu-repo/semantics/openAccessChatGPTGeminiChildrenUndescended TesticleAI-Assisted knowledge assessment: comparison of ChatGPT and Gemini on undescended testicle in childrenArticle539397