Cited time in webofscience Cited time in scopus

Full metadata record

DC Field Value Language
dc.contributor.author Bae, Jinwoo -
dc.contributor.author Moon, Sungho -
dc.contributor.author Im, Sunghoon -
dc.date.accessioned 2023-12-26T18:12:00Z -
dc.date.available 2023-12-26T18:12:00Z -
dc.date.created 2023-01-21 -
dc.date.issued 2023-02-12 -
dc.identifier.isbn 9781577358800 -
dc.identifier.issn 2374-3468 -
dc.identifier.uri http://hdl.handle.net/20.500.11750/46776 -
dc.description.abstract Self-supervised monocular depth estimation has been widely studied recently. Most of the work has focused on improving performance on benchmark datasets, such as KITTI, but has offered a few experiments on generalization performance. In this paper, we investigate the backbone networks (e.g. CNNs, Transformers, and CNN-Transformer hybrid models) toward the generalization of monocular depth estimation. We first evaluate state-of-the-art models on diverse public datasets, which have never been seen during the network training. Next, we investigate the effects of texture-biased and shape-biased representations using the various texture-shifted datasets that we generated. We observe that Transformers exhibit a strong shape bias and CNNs do a strong texture-bias. We also find that shape-biased models show better generalization performance for monocular depth estimation compared to texture-biased models. Based on these observations, we newly design a CNN-Transformer hybrid network with a multi-level adaptive feature fusion module, called MonoFormer. The design intuition behind MonoFormer is to increase shape bias by employing Transformers while compensating for the weak locality bias of Transformers by adaptively fusing multi-level representations. Extensive experiments show that the proposed method achieves state-of-the-art performance with various public datasets. Our method also shows the best generalization ability among the competitive methods. Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. -
dc.language English -
dc.publisher Association for the Advancement of Artificial Intelligence (AAAI) -
dc.title Deep Digging into the Generalization of Self-Supervised Monocular Depth Estimation -
dc.type Conference Paper -
dc.identifier.doi 10.1609/aaai.v37i1.25090 -
dc.identifier.scopusid 2-s2.0-85146267200 -
dc.identifier.bibliographicCitation AAAI Conference on Artificial Intelligence, pp.187 - 196 -
dc.identifier.url https://aaai-23.aaai.org/wp-content/uploads/2023/01/Sunday-February-12-Poster-Session-Schedule-1.pdf -
dc.citation.conferencePlace US -
dc.citation.conferencePlace Washington, -
dc.citation.endPage 196 -
dc.citation.startPage 187 -
dc.citation.title AAAI Conference on Artificial Intelligence -
Files in This Item:

There are no files associated with this item.

Appears in Collections:
Department of Electrical Engineering and Computer Science Computer Vision Lab. 2. Conference Papers

qrcode

  • twitter
  • facebook
  • mendeley

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE