Cited time in webofscience Cited time in scopus

Full metadata record

DC Field Value Language
dc.contributor.author Bae, Jinwoo -
dc.contributor.author Hwang, Kyumin -
dc.contributor.author Im, Sunghoon -
dc.date.accessioned 2024-01-10T17:10:11Z -
dc.date.available 2024-01-10T17:10:11Z -
dc.date.created 2024-01-02 -
dc.date.issued 2024-04 -
dc.identifier.issn 0162-8828 -
dc.identifier.uri http://hdl.handle.net/20.500.11750/47602 -
dc.description.abstract Monocular depth estimation has been widely studied, and significant improvements in performance have been recently reported. However, most previous works are evaluated on a few benchmark datasets, such as KITTI datasets, and none of the works provide an in-depth analysis of the generalization performance of monocular depth estimation. In this paper, we deeply investigate the various backbone networks (e.g. CNN and Transformer models) toward the generalization of monocular depth estimation. First, we evaluate state-of-the-art models on both in-distribution and out-of-distribution datasets, which have never been seen during network training. Then, we investigate the internal properties of the representations from the intermediate layers of CNN-/Transformer-based models using synthetic texture-shifted datasets. Through extensive experiments, we observe that the Transformers exhibit a strong shape-bias rather than CNNs, which have a strong texture-bias. We also discover that texture-biased models exhibit worse generalization performance for monocular depth estimation than shape-biased models. We demonstrate that similar aspects are observed in real-world driving datasets captured under diverse environments. Lastly, we conduct a dense ablation study with various backbone networks which are utilized in modern strategies. The experiments demonstrate that the intrinsic locality of the CNNs and the self-attention of the Transformers induce texture-bias and shape-bias, respectively. IEEE -
dc.language English -
dc.publisher Institute of Electrical and Electronics Engineers -
dc.title A Study on the Generality of Neural Network Structures for Monocular Depth Estimation -
dc.type Article -
dc.identifier.doi 10.1109/TPAMI.2023.3332407 -
dc.identifier.wosid 001180891600019 -
dc.identifier.scopusid 2-s2.0-85177094424 -
dc.identifier.bibliographicCitation IEEE Transactions on Pattern Analysis and Machine Intelligence, v.46, no.4, pp.2224 - 2238 -
dc.description.isOpenAccess FALSE -
dc.subject.keywordAuthor Monocular depth estimation -
dc.subject.keywordAuthor Out-of-Distribution -
dc.subject.keywordAuthor Generalization -
dc.subject.keywordAuthor Transformer -
dc.citation.endPage 2238 -
dc.citation.number 4 -
dc.citation.startPage 2224 -
dc.citation.title IEEE Transactions on Pattern Analysis and Machine Intelligence -
dc.citation.volume 46 -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.relation.journalResearchArea Computer Science; Engineering -
dc.relation.journalWebOfScienceCategory Computer Science, Artificial Intelligence; Engineering, Electrical & Electronic -
dc.type.docType Article -
Files in This Item:

There are no files associated with this item.

Appears in Collections:
Department of Electrical Engineering and Computer Science Computer Vision Lab. 1. Journal Articles

qrcode

  • twitter
  • facebook
  • mendeley

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE