Detail View

DC Field Value Language
dc.contributor.author Jeong, Jaewoo -
dc.contributor.author Lee, Seohee -
dc.contributor.author Park, Daehee -
dc.contributor.author Lee, Giwon -
dc.contributor.author Yoon, Kuk-Jin -
dc.date.accessioned 2026-01-14T16:40:10Z -
dc.date.available 2026-01-14T16:40:10Z -
dc.date.created 2026-01-05 -
dc.date.issued 2025-06-15 -
dc.identifier.isbn 9798331543648 -
dc.identifier.issn 2575-7075 -
dc.identifier.uri https://scholar.dgist.ac.kr/handle/20.500.11750/59363 -
dc.description.abstract Pedestrian trajectory forecasting is crucial in various applications such as autonomous driving and mobile robot navigation. In such applications, camera-based perception enables the extraction of additional modalities (human pose, text) to enhance prediction accuracy. Indeed, we find that textual descriptions play a crucial role in integrating additional modalities into a unified understanding. However, online extraction of text requires the use of VLM, which may not be feasible for resource-constrained systems. To address this challenge, we propose a multimodal knowledge distillation framework: a student model with limited modality is distilled from a teacher model trained with full range of modalities. The comprehensive knowledge of a teacher model trained with trajectory, human pose, and text is distilled into a student model using only trajectory or human pose as a sole supplement. In doing so, we separately distill the core locomotion insights from intra-agent multi-modality and inter-agent interaction. Our generalizable framework is validated with two state-of-the-art models across three datasets on both ego-view (JRDB, SIT) and BEV-view (ETH/UCY) setups, utilizing both annotated and VLM-generated text captions. Distilled student models show consistent improvement in all prediction metrics for both full and instantaneous observations, improving up to ∼13%. The code is available at github.com/Jaewoo97/KDTF. -
dc.language English -
dc.publisher IEEE Computer Society, Computer Vision Foundation -
dc.relation.ispartof Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition -
dc.title Multi-modal Knowledge Distillation-based Human Trajectory Forecasting -
dc.type Conference Paper -
dc.identifier.doi 10.1109/CVPR52734.2025.02256 -
dc.identifier.bibliographicCitation Conference on Computer Vision and Pattern Recognition, pp.24222 - 24233 -
dc.identifier.url https://cvpr.thecvf.com/virtual/2025/poster/33379 -
dc.citation.conferenceDate 2025-06-11 -
dc.citation.conferencePlace US -
dc.citation.conferencePlace Nashville -
dc.citation.endPage 24233 -
dc.citation.startPage 24222 -
dc.citation.title Conference on Computer Vision and Pattern Recognition -
Show Simple Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Related Researcher

박대희
Park, Daehee박대희

Department of Electrical Engineering and Computer Science

read more

Total Views & Downloads

???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???: