Detail View

DC Field Value Language
dc.contributor.author Arsalane, Wafa -
dc.contributor.author Chikontwe, Philip -
dc.contributor.author Luna, Acevedo Miguel Andres -
dc.contributor.author Kang, Myeongkyun -
dc.contributor.author Park, Sang Hyun -
dc.date.accessioned 2025-06-12T10:40:18Z -
dc.date.available 2025-06-12T10:40:18Z -
dc.date.created 2025-03-13 -
dc.date.issued 2024-10-06 -
dc.identifier.isbn 9783031791031 -
dc.identifier.issn 1865-0929 -
dc.identifier.uri https://scholar.dgist.ac.kr/handle/20.500.11750/58416 -
dc.description.abstract Given a medical image and a question in natural language, medical VQA systems are required to predict clinically relevant answers. Integrating information from visual and textual modalities requires complex fusion techniques due to the semantic gap between images and text, as well as the diversity of medical question types. To address this challenge, we propose aligning image and text features in VQA models by using text from medical reports to provide additional context during training. Specifically, we introduce a transformer-based alignment module that learns to align the image with the textual context, thereby incorporating supplementary medical features that can enhance the VQA model’s predictive capabilities. During the inference stage, VQA operates robustly without requiring any medical report. Our experiments on the Rad-Restruct dataset demonstrate a significant impact of the proposed strategy and show promising improvements, positioning our approach as competitive with state-of-the-art methods in this task. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025. -
dc.language English -
dc.publisher Medical Image Computing and Computer Assisted Intervention Society -
dc.relation.ispartof Communications in Computer and Information Science -
dc.title Context-Guided Medical Visual Question Answering -
dc.type Conference Paper -
dc.identifier.doi 10.1007/978-3-031-79103-1_25 -
dc.identifier.wosid 001455743400025 -
dc.identifier.scopusid 2-s2.0-85219214441 -
dc.identifier.bibliographicCitation Arsalane, Wafa. (2024-10-06). Context-Guided Medical Visual Question Answering. 1st MICCAI Student Board Workshop on Empowering Medical Information Computing and Research through Early-Career Expertise, EMERGE 2024, Held in Conjunction with MICCAI 2024, 245–255. doi: 10.1007/978-3-031-79103-1_25 -
dc.identifier.url https://miccaimsb.github.io/emerge/2024.html#schedule -
dc.citation.conferenceDate 2024-10-06 -
dc.citation.conferencePlace MR -
dc.citation.conferencePlace Marrakesh -
dc.citation.endPage 255 -
dc.citation.startPage 245 -
dc.citation.title 1st MICCAI Student Board Workshop on Empowering Medical Information Computing and Research through Early-Career Expertise, EMERGE 2024, Held in Conjunction with MICCAI 2024 -
Show Simple Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Related Researcher

박상현
Park, Sang Hyun박상현

Department of Robotics and Mechatronics Engineering

read more

Total Views & Downloads