Abstract

Multi-modality learning, exemplified by the language-image pair pre-trained CLIP model, has demonstrated remarkable performance in enhancing zero-shot capabilities and has gained significant attention recently. However, simply applying language-image pre-trained CLIP to medical image analysis encounters substantial domain shifts, resulting in severe performance degradation due to inherent disparities between natural (non-medical) and medical image characteristics. To address this challenge and uphold or even enhance CLIP's zero-shot capability in medical image analysis, we develop a novel approach, Core-Periphery feature alignment for CLIP (CP-CLIP), to model medical images and corresponding clinical text jointly. To achieve this, we design an auxiliary neural network whose structure is organized by the core-periphery (CP) principle. This auxiliary CP network not only aligns medical image and text features into a unified latent space more efficiently but also ensures alignment driven by principles of brain network organization. In this way, our approach effectively mitigates and further enhances CLIP's zero-shot performance in medical image analysis. More importantly, the proposed CP-CLIP exhibits excellent explanatory capability, enabling the automatic identification of critical disease-related regions in clinical analysis. Extensive experiments and evaluation across five public datasets covering different diseases underscore the superiority of our CP-CLIP in zero-shot medical image prediction and critical features detection, showing its promising utility in multimodal feature alignment in current medical applications.

Department(s)

Computer Science

Publication Status

Early Access

Comments

National Institutes of Health, Grant R01AG075582

Keywords and Phrases

Brain-inspired AI; CLIP; Core-Periphery; Feature Alignment; Multi-Modality; Zero-Shot

International Standard Serial Number (ISSN)

1558-254X; 0278-0062

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2025 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Jan 2024

PubMed ID

39418140

Share

 
COinS