Detail View
Reliable Representation Learning for Multi-Institutional Medical Image Analysis
WEB OF SCIENCE
SCOPUS
- Title
- Reliable Representation Learning for Multi-Institutional Medical Image Analysis
- DGIST Authors
- Myeongkyun Kang ; Sang Hyun Park ; Kyong Hwan Jin
- Advisor
- 박상현
- Co-Advisor(s)
- Kyong Hwan Jin
- Issued Date
- 2025
- Awarded Date
- 2025-08-01
- Type
- Thesis
- Description
- Debiasing, Domain adaptation, Federated learning, One-shot federated learning, Vision language model
- Abstract
-
Advancements in artificial intelligence and computer vision have brought significant innovations to the field of medical imaging. However, despite these promising developments, the technologies often struggle to generalize across institutions. In the medical domain, multi-institutional representation learning is crucial for building reliable models. To achieve this objective, our study aims to develop a reliable model by exploring debiasing, domain adaptation, federated learning, and vision language models. First, to enable reliable representation learning on data from a single or two institutions, we proposed an image translation model that updates texture information from a target image while preserving the original content. By utilizing the generated data for training, we developed a classifier that is robust to bias. Additionally, to achieve accurate segmentation performance across various imaging modalities, we propose an image translation model that leverages mutual information to preserve structural consistency. This strategy facilitates effective domain adaptation, ultimately resulting in robust segmentation performance across diverse domains. For training on multi-institutional medical data, federated learning offers a viable solution to address privacy concerns. However, a major challenge in federated learning is the texture heterogeneity across institutions. To mitigate this issue, we applied normalizations at both the parameter and feature levels, significantly improving the model's accuracy and convergence speed. Additionally, we proposed a one-shot federated learning designed to reduce participation costs in collaborative training. This method leverages synthetic images containing structural noise, effectively reducing communication overhead while preserving high accuracy. Lastly, we introduced a novel framework combining a visual encoder with a large language model to simultaneously perform disease diagnosis and generate radiology reports. By jointly learning the two tasks, our method achieves higher accuracy than existing approaches and facilitates large-scale multi-institutional training by leveraging textual data. The diverse methodologies proposed in this study are expected to significantly enhance the practical applicability and reliability of artificial intelligence in medical image analysis.|최근 인공지능과 컴퓨터 비전의 발전은 의료 영상 분야에 혁신적인 변화를 가져오고 있다. 그러나 이러한 유망한 진전에도 불구하고, 해당 기술들은 다양한 의료기관 간 일반화 능력을 확보하는 데 여전히 한계를 보이고 있다. 의료 도메인에서 신뢰할 수 있는 인공지능 모델을 구축하기 위해서는 다기관 표현 학습이 필수적이다. 이에 본 연구는 편향 제거 기법, 도메인 적응, 연합 학습, 비전 언어 모델 등의 기술을 탐구하여 신뢰도 높은 모델을 개발하고자 한다. 먼저, 단일 기관 또는 두 기관 데이터에서도 견고한 표현 학습이 가능하도록, 콘텐츠 유사성을 유지하면서 목표 이미지의 텍스처 정보를 효과적으로 전달하는 이미지 변환 모델을 제안하며, 이를 통해 분류기를 학습시켜 신뢰도 높은 모델을 구현한다. 또한, 구조적 일관성을 유지하면서 다양한 영상 모달리티에서 높은 분할 정확도를 달성하기 위해 상호 정보(mutual information)를 활용한 이미지 변환 모델을 제안하며, 이 모델이 생성한 데이터를 학습에 활용함으로써 도메인 적응 분할 성능을 효과적으로 향상시킨다. 의료 분야에서 다기관 데이터를 활용한 학습을 수행하기 위해서는 개인정보 보호 문제를 해결할 수 있는 연합학습이 요구된다. 연합학습에서는 클라이언트 간 텍스처 이질성(heterogeneity)이 주요 도전 과제로 남아 있으며, 이를 완화하기 위해 파라미터 및 피처 수준의 정규화 기법을 적용함으로써 모델의 정확도와 수렴 속도를 유의미하게 향상시킨다. 아울러, 참여 비용을 줄이기 위한 원샷(one-shot) 연합 학습 기법을 제안하며, 구조적 노이즈를 포함한 합성 이미지를 활용함으로써 통신 부담을 경감하면서도 높은 정확도를 유지한다. 마지막으로, 비전 인코더와 대규모 언어 모델을 결합한 프레임워크를 통해 질병 진단과 방사선 보고서 생성을 동시에 수행하도록 한다. 두 작업을 공동으로 학습함으로써 기존 방법보다 향상된 정확도를 달성하며, 텍스트 정보를 활용함으로써 대규모 다기관 학습을 보다 용이하게 만든다. 본 연구에서 제안하는 다양한 접근 방식은 의료 영상 분석 분야에서 실질적인 적용성과 높은 신뢰성을 갖춘 인공지능 모델 개발에 기여할 것으로 기대된다.
더보기
- Table Of Contents
-
I. Introduction 1
1 Background and Motivations 1
2 Contributions and Outline 3
3 Publications 4
3.1 Excluded Research 5
II. Debiasing for single-institutional data 8
1 Introduction 8
2 Related Works 11
2.1 Image Translation and Style Transfer 11
2.2 Texture Bias and Debiasing Methods 12
2.3 Unsupervised Domain Adaptation 13
3 Methods 13
3.1 Image Generation using Content and Texture 15
3.2 Texture Co-occurrence Loss 15
3.3 Spatial Self-Similarity Loss 16
3.4 Full Objective and Implementation Details 16
3.5 Extension to Multi-domain 17
4 Experiments 17
4.1 Image Manipulation 18
4.2 Debiasing Classification 18
4.3 Unsupervised Domain Adaptation 22
5 Results 25
5.1 Image Manipulation 25
5.2 Debiasing Classification 27
5.3 Unsupervised Domain Adaptation 34
6 Conclusion 38
III. Domain adaptation for two-institutional data 39
1 Introduction 39
2 Related Works 41
2.1 Domain Adaptation 41
2.2 Image Translation Models 42
2.3 Domain Generalization 43
2.4 Mutual Information 43
3 Method 43
3.1 Image Translation using Structure and Texture 44
3.2 Mutual Information Loss 45
3.3 Texture Co-occurrence Loss 46
3.4 Full Objective and Implementation Details 46
4 Experimental Results 47
4.1 Datasets 47
4.2 Image Translation 48
4.3 Domain Adaptation 51
4.4 Domain Generalization 54
4.5 Ablation Study 56
5 Conclusion 61
IV. Federated learning for multi-institutional data 63
1 Introduction 63
2 Related Work 66
2.1 Federated Learning 66
2.2 Normalization 67
3 Method 68
3.1 Weight Normalization (WN) 69
3.2 Adaptive Group Normalization (AGN) 69
4 Experiments 72
4.1 Evaluation scenarios 72
4.2 Datasets 72
4.3 Baseline 74
4.4 Models 74
4.5 Training Details 74
4.6 Implementation Details 75
5 Results 75
5.1 Comparison against previous FL methods 75
5.2 Fast convergence of FedNN 79
5.3 t-SNE visualization analysis 80
5.4 Experiments on prior probability shift 80
5.5 Comparison against personalized FL 81
5.6 Comparison against a larger number of clients 83
5.7 Comparison against other feature normalization 83
5.8 FedNN on centralized learning 84
6 Conclusion 84
V. One-shot federated learning for multi-institutional data with minimal cost 86
1 Introduction 86
2 Related Work 89
2.1 One-shot FL 89
2.2 Data-free KD 90
3 Method 90
3.1 Backgrounds 90
3.2 FedISCA: Federated Learning using Image Synthesis and Client Model Adaptation 93
3.3 E-FedISCA: Efficient Federated Learning using Image Synthesis and Client Model Adaptation 95
4 Experiments 97
4.1 Datasets 97
4.2 Experimental Scenarios 99
4.3 Implementation Details 101
5 Results 101
5.1 Comparison against previous one-shot FL methods 101
5.2 Ablation studies 103
5.3 Experiments on non-IID data heterogeneity 104
5.4 Experiments on model heterogeneity 105
5.5 Quantitative analysis on computation speed 106
5.6 Experiments on multi-shot 106
5.7 Experiments on larger clients 107
5.8 Impact on natural dataset 108
5.9 Limitations 109
6 Conclusion 109
VI. Language-based multi-institutional medical image analysis 111
1 Introduction 111
2 Method 113
2.1 Overview 113
2.2 Pre-training 114
2.3 Fine-tuning 114
2.4 Uniqueness 115
2.5 Evaluation 115
3 Experiments 115
3.1 Datasets 115
3.2 Preprocessing 115
3.3 Training Details 116
4 Results 116
4.1 IDH Mutation Classification 116
4.2 Zero-Shot Evaluation of Pre-trained Representations 117
4.3 Report Generation 118
5 Conclusion 119
VII. Concluding Remarks 121
1 Conclusion 121
2 Future Work 122
VIII. Acknowledgement 124
References 125
IX. 요약문 147
- URI
-
https://scholar.dgist.ac.kr/handle/20.500.11750/59767
http://dgist.dcollection.net/common/orgView/200000888676
- Degree
- Doctor
- Publisher
- DGIST
File Downloads
- There are no files associated with this item.
공유
Total Views & Downloads
???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???:
