Masters Theses
Abstract
This work presents a structured benchmarking study of multimodal large language models (MLLMs) applied to electrocardiogram (ECG) interpretation tasks. We evaluate three representative architectures: MedGemma, HuatuoGPT-Vision, and LLaVA-Med, across progressive experimental stages involving text-only structured prompt normalization, text–image fusion with ECG plots, and full multimodal fusion incorporating time-series signals. A standardized five-section cardiology prompt was designed to enforce consistent output structure and SCP-code alignment, enabling reproducible metric computation across models. Quantitative evaluation using BERTScore, token-level F1, and diagnostic accuracy demonstrates that HuatuoGPT-Vision achieves the highest semantic and diagnostic alignment, while MedGemma exhibits superior formatting stability and reproducibility. In contrast, LLaVA-Med showed limited ability to handle extended clinical prompts, yielding a high invalid-response rate. Preliminary multimodal results suggest that augmenting textual and visual prompts with ECG time-series data doesn't enhance diagnostic precision and semantic coherence, indicating image-forward training practices. Overall, the findings highlight the critical role of structured reasoning, and modality fusion in improving interpretability and reliability of medical MLLMs, providing a reproducible framework for future ECG-centric language–vision model evaluation.
Advisor(s)
Yang, Huiyuan
Committee Member(s)
Maity, Suman
Yu, Xiaowei
Department(s)
Computer Science
Degree Name
M.S. in Computer Science
Publisher
Missouri University of Science and Technology
Publication Date
Fall 2025
Pagination
ix, 57 pages
Note about bibliography
Includes_bibliographical_references_(pages 54-56)
Rights
© 2026 Prisha Anil , All Rights Reserved
Document Type
Thesis - Open Access
File Type
text
Language
English
Thesis Number
T 12555
Recommended Citation
Anil, Prisha, "Performance of Standard Medical MLLMs on ECG Image Data" (2025). Masters Theses. 8270.
https://scholarsmine.mst.edu/masters_theses/8270
