Cited time in webofscience Cited time in scopus

Full metadata record

DC Field Value Language
dc.contributor.author Lee, Seokju -
dc.contributor.author Rameau, Francois -
dc.contributor.author Im, Sunghoon -
dc.contributor.author Kweon, In So -
dc.date.accessioned 2022-11-17T11:40:16Z -
dc.date.available 2022-11-17T11:40:16Z -
dc.date.created 2022-07-28 -
dc.date.issued 2022-09 -
dc.identifier.issn 0920-5691 -
dc.identifier.uri http://hdl.handle.net/20.500.11750/17172 -
dc.description.abstract We introduce an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion, and depth in a monocular camera setup without geometric supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we propose two types of residual motion learning frameworks to explicitly disentangle camera and object motions in dynamic driving scenes with different levels of semantic prior knowledge: video instance segmentation as a strong prior, and object detection as a weak prior. Third, we design a unified photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we present a unsupervised method of 3D motion field regularization for semantically plausible object motion representation. Our proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI, Cityscapes, and Waymo open dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are publicly available. © 2023 Springer Nature Switzerland AG. Part of Springer Nature. -
dc.language English -
dc.publisher Springer -
dc.title Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue -
dc.type Article -
dc.identifier.doi 10.1007/s11263-022-01641-5 -
dc.identifier.wosid 000827403200001 -
dc.identifier.scopusid 2-s2.0-85134520935 -
dc.identifier.bibliographicCitation International Journal of Computer Vision, v.130, no.9, pp.2265 - 2285 -
dc.description.isOpenAccess FALSE -
dc.subject.keywordAuthor 3D visual perception -
dc.subject.keywordAuthor Monocular depth estimation -
dc.subject.keywordAuthor Motion estimation -
dc.subject.keywordAuthor Self-supervised learning -
dc.citation.endPage 2285 -
dc.citation.number 9 -
dc.citation.startPage 2265 -
dc.citation.title International Journal of Computer Vision -
dc.citation.volume 130 -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.relation.journalResearchArea Computer Science -
dc.relation.journalWebOfScienceCategory Computer Science, Artificial Intelligence -
dc.type.docType Article -
Files in This Item:

There are no files associated with this item.

Appears in Collections:
Department of Electrical Engineering and Computer Science Computer Vision Lab. 1. Journal Articles

qrcode

  • twitter
  • facebook
  • mendeley

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE