Abstract
Recently, ChatGPT and GPT-4 have emerged and gained immense global attention due to their unparalleled performance in language processing. Despite demonstrating impressive capability in various open-domain tasks, their adequacy in highly specific fields like radiology remains untested. Radiology presents unique linguistic phenomena distinct from open-domain data due to its specificity and complexity. Assessing the performance of large language models (LLMs) in such specific domains is crucial not only for a thorough evaluation of their overall performance but also for providing valuable insights into future model design directions: whether model design should be generic or domain specific. To this end, in this study, we evaluate the performance of ChatGPT/GPT-4 on a radiology natural language inference (NLI) task and compare it to other models fine-tuned specifically on task-related data samples. We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty. Our results show that 1) ChatGPT and GPT-4 outperform other LLMs in the radiology NLI task and 2) other specifically fine-tuned Bert-based models require significant amounts of data samples to achieve comparable performance to ChatGPT/GPT-4. These findings not only demonstrate the feasibility and promise of constructing a generic model capable of addressing various tasks across different domains but also highlight several key factors crucial for developing a unified model, particularly in a medical context, paving the way for future artificial general intelligence (AGI) systems. We release our code and data to the research community.
Recommended Citation
Z. Wu and L. Zhang and C. Cao and X. Yu and Z. Liu and L. Zhao and Y. Li and H. Dai and C. Ma and G. Li and W. Liu and Q. Li and D. Shen and X. Li and D. Zhu and T. Liu, "Exploring The Trade-Offs: Unified Large Language Models Vs Local Fine-Tuned Models For Highly-Specific Radiology NLI Task," IEEE Transactions on Big Data, vol. 11, no. 3, pp. 1027 - 1041, Institute of Electrical and Electronics Engineers, Jan 2025.
The definitive version is available at https://doi.org/10.1109/TBDATA.2025.3536928
Department(s)
Computer Science
Keywords and Phrases
Large language models; natural language inference; natural language processing; radiology report
International Standard Serial Number (ISSN)
2332-7790
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2025 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
01 Jan 2025
