Detail View

Video Instance Segmentation with Context-Aware Representations

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

Title
Video Instance Segmentation with Context-Aware Representations
DGIST Authors
Jiwan SeoSunghoon Im
Advisor
임성훈
Issued Date
2026
Awarded Date
2026-02-01
Type
Thesis
Description
비디오 인스턴스 세그멘테이션(Video instance segmentation), 객체 추적(Instance tracking), 인스턴스 세그멘테이션(Instance segmentation), 대표 학습(Representation learning)
Abstract

We introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy. Additionally, we design the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames, thereby significantly enhancing matching accuracy. CAVIS demonstrates superior performance over state-of-the-art methods on all benchmark datasets in video instance segmentation (VIS) and video panoptic segmentation (VPS). Notably, our method excels on the OVIS dataset, known for its particularly challenging videos. Keywords: Video instance segmentation, Instance segmentation, Representation learning, Instance tracking|본 논문에서는 객체 주변의 문맥 정보를 통합하여 인스턴스 간 연관성을 강화하는 새로운 프레임워크인 Context-Aware Video Instance Segmentation (CAVIS)을 제안합니다. 제안하는 방법은 각 객체에 인접한 문맥 정보를 효율적으로 추출하고 활용하기 위해, 인스턴스의 핵심 특징과 주변 문맥 정보를 결합하여 추적 정확도를 향상시키는 Context-Aware Instance Tracker (CAIT)를 도입합니다. 또한, 프레임 간 객체 수준 특징의 일관성을 보장하기 위해 Prototypical Cross-frame Contrastive (PCC) loss를 설계하였으며, 이를 통해 객체 매칭 정확도를 크게 향상시킵니다. 실험 결과, CAVIS는 Video Instance Segmentation (VIS)과 Video Panoptic Segmentation (VPS)의 모든 벤치마크 데이터셋에서 기존 최신 기법들을 능가하는 성능을 보였습니다. 특히, 복잡하고 난이도가 높은 영상으로 구성된 OVIS 데이터셋에서 뛰어난 성능을 달성함을 확인하였습니다. 키워드: 비디오 인스턴스 세그멘테이션, 객체 추적, 인스턴스 세그멘테이션, 대표 학습

더보기
Table Of Contents
I. INTRODUCTION 1
II. RELATED WORK 3
2.1 Video Instance Segmentation 3
2.2 Advancements in Query-based Networks 3
2.3 Object Tracking with Additional Cues 3
III. METHOD 5
3.1 Preliminary 5
3.2 Context-aware Instance Tracker 6
3.3 Prototypical Cross-frame Contrastive Loss 9
3.4 Training Loss 10
IV. EXPERIMENTS 11
4.1 Implementation details 11
4.2 Datasets 12
4.3 Comparison to State-of-the-Art Methods 13
4.4 Ablation Study 15
4.5 Further Studies 18
4.6 Limitations 19
V. CONCLUSION 22
VI. References 23
VII. 요약문 28
URI
https://scholar.dgist.ac.kr/handle/20.500.11750/59730
http://dgist.dcollection.net/common/orgView/200000943230
DOI
10.22677/THESIS.200000943230
Degree
Master
Department
Artificial Intelligence Major
Publisher
DGIST
Show Full Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Total Views & Downloads

???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???: