Exploring The Trade-Offs: Unified Large Language Models Vs Local Fine-Tuned Models For Highly-Specific Radiology NLI Task

Zihao Wu
Lu Zhang
Chao Cao
Xiaowei Yu, Missouri University of Science and TechnologyFollow
Zhengliang Liu
Lin Zhao
Yiwei Li
Haixing Dai
Chong Ma
Gang Li
Wei Liu
Quanzheng Li
Dinggang Shen
Xiang Li
Dajiang Zhu
Tianming Liu

Abstract

Recently, ChatGPT and GPT-4 have emerged and gained immense global attention due to their unparalleled performance in language processing. Despite demonstrating impressive capability in various open-domain tasks, their adequacy in highly specific fields like radiology remains untested. Radiology presents unique linguistic phenomena distinct from open-domain data due to its specificity and complexity. Assessing the performance of large language models (LLMs) in such specific domains is crucial not only for a thorough evaluation of their overall performance but also for providing valuable insights into future model design directions: whether model design should be generic or domain specific. To this end, in this study, we evaluate the performance of ChatGPT/GPT-4 on a radiology natural language inference (NLI) task and compare it to other models fine-tuned specifically on task-related data samples. We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty. Our results show that 1) ChatGPT and GPT-4 outperform other LLMs in the radiology NLI task and 2) other specifically fine-tuned Bert-based models require significant amounts of data samples to achieve comparable performance to ChatGPT/GPT-4. These findings not only demonstrate the feasibility and promise of constructing a generic model capable of addressing various tasks across different domains but also highlight several key factors crucial for developing a unified model, particularly in a medical context, paving the way for future artificial general intelligence (AGI) systems. We release our code and data to the research community.

Recommended Citation

Z. Wu and L. Zhang and C. Cao and X. Yu and Z. Liu and L. Zhao and Y. Li and H. Dai and C. Ma and G. Li and W. Liu and Q. Li and D. Shen and X. Li and D. Zhu and T. Liu, "Exploring The Trade-Offs: Unified Large Language Models Vs Local Fine-Tuned Models For Highly-Specific Radiology NLI Task," IEEE Transactions on Big Data, vol. 11, no. 3, pp. 1027 - 1041, Institute of Electrical and Electronics Engineers, Jan 2025.

The definitive version is available at https://doi.org/10.1109/TBDATA.2025.3536928

Department(s)

Computer Science

Keywords and Phrases

Large language models; natural language inference; natural language processing; radiology report

International Standard Serial Number (ISSN)

2332-7790

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Jan 2025

Computer Science Faculty Research & Creative Works

Exploring The Trade-Offs: Unified Large Language Models Vs Local Fine-Tuned Models For Highly-Specific Radiology NLI Task

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

Exploring The Trade-Offs: Unified Large Language Models Vs Local Fine-Tuned Models For Highly-Specific Radiology NLI Task

Author

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations