Detailed Record



Recurrent Multiscale Feature Modulation for Geometry Consistent Depth Learning


Abstract The U-Net-like coarse-to-fine network design is currently the dominant choice for dense prediction tasks. Although this design can often achieve competitive performance, it suffers from some inherent limitations, such as training error propagation from low to high resolution and the dependency on the deeper and heavier backbones. To design an effective network that performs better, we instead propose Recurrent Multiscale Feature Modulation (R-MSFM), a new lightweight network design for self-supervised monocular depth estimation. R-MSFM extracts per-pixel features, builds a multiscale feature modulation module, and performs recurrent depth refinement through a parameter-shared decoder at a fixed resolution. This network design enables our R-MSFM to maintain a more lightweight architecture and fundamentally avoid error propagation caused by the coarse-to-fine design. Furthermore, we introduce the mask geometry consistency loss to facilitate our R-MSFM for geometry consistent depth learning. This loss penalizes the inconsistency of the estimated depths between adjacent views within the nonoccluded and nonstationary regions. Experimental results demonstrate the superiority of our proposed R-MSFM both at model size and inference speed, and show state-of-the-art results on two datasets: KITTI and Make3D.
Authors Zhongkai Zhou ORCID , Xinnan Fan ORCID , Pengfei Shi ORCID , Yuanxue Xin ORCID , Dongliang Duan University of WyomingORCID , Liuqing Yang ORCID
Journal Info IEEE Computer Society | IEEE Transactions on Pattern Analysis and Machine Intelligence , vol: 46 , iss: 12 , pages: 9551 - 9566
Publication Date 6/27/2024
ISSN 0162-8828
TypeKeyword Image article
Open Access closed Closed Access
DOI https://doi.org/10.1109/tpami.2024.3420165
KeywordsKeyword Image Feature (linguistics) (Score: 0.54216856) , Computational Geometry (Score: 0.42911488)