Abstract
Multi-modality learning, exemplified by the language-image pair pre-trained CLIP model, has demonstrated remarkable performance in enhancing zero-shot capabilities and has gained significant attention recently. However, simply applying language-image pre-trained CLIP to medical image analysis encounters substantial domain shifts, resulting in severe performance degradation due to inherent disparities between natural (non-medical) and medical image characteristics. To address this challenge and uphold or even enhance CLIP's zero-shot capability in medical image analysis, we develop a novel approach, Core-Periphery feature alignment for CLIP (CP-CLIP), to model medical images and corresponding clinical text jointly. To achieve this, we design an auxiliary neural network whose structure is organized by the core-periphery (CP) principle. This auxiliary CP network not only aligns medical image and text features into a unified latent space more efficiently but also ensures alignment driven by principles of brain network organization. In this way, our approach effectively mitigates and further enhances CLIP's zero-shot performance in medical image analysis. More importantly, the proposed CP-CLIP exhibits excellent explanatory capability, enabling the automatic identification of critical disease-related regions in clinical analysis. Extensive experiments and evaluation across five public datasets covering different diseases underscore the superiority of our CP-CLIP in zero-shot medical image prediction and critical features detection, showing its promising utility in multimodal feature alignment in current medical applications.
Recommended Citation
X. Yu et al., "Core-Periphery Multi-Modality Feature Alignment For Zero-Shot Medical Image Analysis," IEEE Transactions on Medical Imaging, Institute of Electrical and Electronics Engineers, Jan 2024.
The definitive version is available at https://doi.org/10.1109/TMI.2024.3482228
Department(s)
Computer Science
Publication Status
Early Access
Keywords and Phrases
Brain-inspired AI; CLIP; Core-Periphery; Feature Alignment; Multi-Modality; Zero-Shot
International Standard Serial Number (ISSN)
1558-254X; 0278-0062
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2025 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
01 Jan 2024
PubMed ID
39418140

Comments
National Institutes of Health, Grant R01AG075582