DGIST Scholar: Vision Transformer 기반 Trimap 분할 기법

Detail View

Division of Mobility Technology 2. Conference Papers

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

XML

Abstract: In this study, we propose a trimap prediction model aimed at achieving more accurate foreground extraction compared to conventional background removal techniques. The proposed model is based on a click-free structure that does not rely on user clicks or point-based interactions and is constructed by modifying a Vision Transformer (ViT) backbone to accept both an image and a segmentation mask as inputs. The output consists of a three-class trimap comprising background, unknown, and foreground regions. For training, we constructed a large-scale trimap dataset by integrating real-world datasets such as AM-2K, AIM-500, P3M-10K, and Composition-431K [1]. In addition, a composite loss function combining Normalized Focal Loss and Unknown Region Distance Transform Loss was applied to encourage the model to focus more effectively on the unknown regions. Experimental results demonstrate that the proposed model accurately predicts trimap boundaries without any user interaction and effectively enhances segmentation performance.
더보기

Show Full Item Record