DGIST Scholar: Scale-Invariant and View-Relational Representation Learning for Full Surround Monocular Depth

Detail View

Department of Electrical Engineering and Computer Science Computer Vision Lab. 1. Journal Articles

Scale-Invariant and View-Relational Representation Learning for Full Surround Monocular Depth

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

XML

Excel

DC Field	Value	Language
dc.contributor.author	Hwang, Kyumin	-
dc.contributor.author	Choi, Wonhyeok	-
dc.contributor.author	Han, Kiljoon	-
dc.contributor.author	Choi, Wonjoon	-
dc.contributor.author	Choi, Minwoo	-
dc.contributor.author	Na, Yongcheon	-
dc.contributor.author	Park, Minwoo	-
dc.contributor.author	Im, Sunghoon	-
dc.date.accessioned	2026-02-10T00:40:16Z	-
dc.date.available	2026-02-10T00:40:16Z	-
dc.date.created	2025-12-04	-
dc.date.issued	2026-01	-
dc.identifier.uri	https://scholar.dgist.ac.kr/handle/20.500.11750/60002	-
dc.description.abstract	Recent foundation models demonstrate strong generalization capabilities in monocular depth estimation. However, directly applying these models to Full Surround Monocular Depth Estimation (FSMDE) presents two major challenges: (1) high computational cost, which limits realtime performance, and (2) difficulty in estimating metricscale depth, as these models are typically trained to predict only relative depth. To address these limitations, we propose a novel knowledge distillation strategy that transfers robust depth knowledge from a foundation model to a lightweight FSMDE network. Our approach leverages a hybrid regression framework combining the knowledge distillation scheme–traditionally used in classification–with a depth binning module to enhance scale consistency. Specifically, we introduce a crossinteraction knowledge distillation scheme that distills the scaleinvariant depth bin probabilities of a foundation model into the student network while guiding it to infer metric-scale depth bin centers from ground-truth depth. Furthermore, we propose view-relational knowledge distillation, which encodes structural relationships among adjacent camera views and transfers them to enhance cross-view depth consistency. Experiments on DDAD and nuScenes demonstrate the effectiveness of our method compared to conventional supervised methods and existing knowledge distillation approaches. Moreover, our method achieves a favorable trade-off between performance and efficiency, meeting real-time requirements. © 2016 IEEE.	-
dc.language	English	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.title	Scale-Invariant and View-Relational Representation Learning for Full Surround Monocular Depth	-
dc.type	Article	-
dc.identifier.doi	10.1109/LRA.2025.3635451	-
dc.identifier.wosid	001631846400001	-
dc.identifier.scopusid	2-s2.0-105022656899	-
dc.identifier.bibliographicCitation	IEEE Robotics and Automation Letters, v.11, no.1, pp.1002 - 1009	-
dc.description.isOpenAccess	FALSE	-
dc.subject.keywordAuthor	Full surround depth	-
dc.subject.keywordAuthor	knowledge distillation	-
dc.subject.keywordAuthor	lightweight	-
dc.subject.keywordAuthor	monocular depth	-
dc.subject.keywordAuthor	representation learning	-
dc.citation.endPage	1009	-
dc.citation.number	1	-
dc.citation.startPage	1002	-
dc.citation.title	IEEE Robotics and Automation Letters	-
dc.citation.volume	11	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Robotics	-
dc.relation.journalWebOfScienceCategory	Robotics	-
dc.type.docType	Article	-

Show Simple Item Record

File Downloads

There are no files associated with this item.

Im, Sunghoon임성훈: Department of Electrical Engineering and Computer Science

Detail View

Scale-Invariant and View-Relational Representation Learning for Full Surround Monocular Depth

File Downloads

공유

Related Researcher

Total Views & Downloads