OFF-CLIP: Improving Normal Detection Confidence in Radiology CLIP with Simple Off-Diagonal Term Auto-adjustment
Issued Date
2025-09-24
Citation
International Conference on Medical Image Computing and Computer Assisted Interventions, pp.379 - 389
Type
Conference Paper
ISBN
9783032049810
ISSN
1611-3349
Abstract
Contrastive Language-Image Pre-Training (CLIP) based models enable zero-shot classification in radiology but often struggle with detecting normal cases due to rigid intra-sample alignment, which leads to poor feature clustering and increased false positive and false negative rates. We propose OFF-CLIP, a simple and effective refinement that introduces an off-diagonal loss term to promote the clustering of normal samples explicitly. In addition, it applies sentence-level filtering to remove typical normal phrases embedded within abnormal reports. OFF-CLIP does not require architectural changes and does not compromise abnormal classification performance. In the VinDr-CXR dataset, normal classification shows a notable 0.61 AUC improvement over the state-of-the-art baseline CARZero. It also improves zero-shot grounding performance by increasing pointing game accuracy and providing more reliable and precise anomaly localization. These results clearly demonstrate that OFF-CLIP serves as an efficient plug-and-play enhancement to existing medical vision-language models. The code and pre-trained models are publicly available at https://github.com/Junhyun-Park01/OFF-CLIP.