Journal of Guangdong University of Technology ›› 2023, Vol. 40 ›› Issue (02): 22-29.doi: 10.12052/gdutxb.210139

Previous Articles     Next Articles

Channel Attentive Self-supervised Network for Monocular Depth Estimation

Wu Jun-xian, He Yuan-lie   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2021-09-17 Online:2023-03-25 Published:2023-04-07

Abstract: A new method is proposed for self-supervised monocular depth estimation. Although recent methods have been able to produce high-precision depth maps, previous work has ignored channel-wise information in the image. To solve this problem, Channel Attention is introduced and improvements made in two aspects of the network structure: (a) a Squeeze-and-Excitation (SE) block is injected into the corresponding networks to capture the channel relation in feature map; (b) a Channel Attention Dense Connection (CADC) block is applied to combine multi-scale features and recalibrate channel-wise feature. Experiments on the KITTI dataset show the effectiveness of the proposed approach, which outperforms quantitatively and qualitatively the state-of-the-art Self-Supervised Depth Estimation methods.

Key words: monocular depth estimation, attention mechanism, self-supervised deep-learning

CLC Number: 

  • TP249
[1] MUR-ARTAL R, TARDóS J D. Orb-slam 2: an open-source slam system for monocular, stereo, and rgb-d cameras [J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-62.
[2] GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE. doi:10.1109/ICCV.2019.00393.
[3] YIN Z, SHI J. Geonet: Unsupervised learning of dense depth, optical flow and camera pose[C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi:10.1109/CVPR.2018.00212.
[4] ZHOU T, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.700
[5] LUO C, YANG Z, WANG P, et al. Every pixel counts++: joint learning of geometry and motion with 3d holistic understanding [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2624-41.
[6] ZHAO W, LIU S, SHU Y, et al. Towards better generalization: Joint depth-pose learning without posenet[C]//IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE. doi:10.1109/CVPR42600.2020.00917.
[7] PATIL V, VAN GANSBEKE W, DAI D, et al. Don’t forget the past: recurrent depth estimation from monocular video [J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6813-20.
[8] GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency [C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.699.
[9] DIJK T V, CROON G D. How do neural networks see depth in single images? [C]//IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South) : IEEE. doi: 10.1109/ICCV.2019.00227.
[10] CAO C, LIU X, YANG Y, et al. Look and think twice: capturing top-down visual attention with feedback convolutional neural networks [C]//IEEE International Conference on Computer Vision (ICCV) . Santiago, Chile: IEEE. doi: 10.1109/ICCV.2015.338.
[11] RANJAN A, JAMPANI V, BALLES L, et al. Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach, CA, USA: IEEE. doi: 10.1109/CVPR.2019.01252.
[12] JIAO J , YING C , SONG Y , et al. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss[C]// European Conference on Computer Vision. Cham: Springer, 2018: 55-71.
[13] FU H, GONG M, WANG C, et al. Deep ordinal regression network for monocular depth estimation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi: 10.1109/CVPR.2018.00214.
[14] EIGEN D, PUHRSCH C, FERGUS R J A P A. Depth map prediction from a single image using a multi-scale deep network [C]//Conference on Neural Information Processing Systems. La Jolla, USA: Neural Information Processing Systems. Foundation, 2014: 2366-2374.
[15] ZBONTAR J, LECUN Y J J M L R. Stereo matching by training a convolutional neural network to compare image patches [J]. IEEE Geoscience and Remote Sensing Letters, 2020(99): 1-5.
[16] UMMENHOFER B, ZHOU H, UHRIG J, et al. Demon: depth and motion network for learning monocular stereo [C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.596.
[17] GARG R, BG V K, CARNEIRO G, et al. Unsupervised cnn for single view depth estimation: geometry to the rescue[C]// European Conference on Computer Vision. Cham, Switzerland: Springer, 2016: 740-756.
[18] DHARMASIRI T, SPEK A, DRUMMOND T. Eng: end-to-end neural geometry for robust depth and pose estimation using cnns[C]// Asian Conference on Computer Vision. Cham, Switzerland: Springer, 2018: 625-642.
[19] ZHAN H, GARG R, WEERASEKERA C S, et al. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE. doi: 10.1109/CVPR.2018.00043.
[20] LI R, WANG S, LONG Z, et al. Undeepvo: monocular visual odometry through unsupervised deep learning [C]// IEEE International Conference on Robotics and Automation. Brisbane, QLD, Australia: IEEE. doi: 10.1109/ICRA.2018.8461251.
[21] BIAN J, LI Z, WANG N, et al. Unsupervised scale-consistent depth and ego-motion learning from monocular video [J]. Advances in Neural Information Processing Systems, 2019, 32: 35-45.
[22] XUE F, ZHUO G, HUANG Z, et al. Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas: IEEE. doi: 10.1109/IROS45743.2020.9340802.
[23] ZHAN H, WEERASEKERA C S, BIAN J-W, et al. Visual odometry revisited: what should be learnt?[C]// IEEE International Conference on Robotics and Automation (ICRA) . Paris, France: IEEE. doi: 10.1109/ICRA40945.2020.9197374.
[24] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-12.
[25] POGGI M, TOSI F, MATTOCCIA S. Learning monocular depth estimation with unsupervised trinocular assumptions [C]//International Conference on 3D Vision (3DV) . Verona, Italy: IEEE. doi: 10.1109/3DV.2018.00045.
[26] PILLAI S, AMBRUŞ R, GAIDON A. Superdepth: self-supervised, super-resolved monocular depth estimation[C]//International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada: IEEE. doi: 10.1109/ICRA.2019.8793621.
[27] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention. Cham: Springer, 2015: 234-241.
[28] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi: 10.1109/CVPR.2018.00745.
[29] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE. doi: 10.1109/CVPR.2016.90.
[30] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge [J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[1] Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.
[2] Teng Shao-hua, Dong Pu, Zhang Wei. An Attention Text Summarization Model Based on Syntactic Structure Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(03): 1-8.
[3] Liang Guan-shu, Cao Jiang-zhong, Dai Qing-yun, Huang Yun-fei. An Unsupervised Trademark Retrieval Method Based on Attention Mechanism [J]. Journal of Guangdong University of Technology, 2020, 37(06): 41-49.
[4] Zeng Bi-qing, Han Xu-li, Wang Sheng-yu, Xu Ru-yang, Zhou Wu. Sentiment Classification Based on Double Attention Convolutional Neural Network Model [J]. Journal of Guangdong University of Technology, 2019, 36(04): 10-17.
[5] Gao Jun-yan, Liu Wen-yin, Yang Zhen-guo. Object Tracking Combined with Attention and Feature Fusion [J]. Journal of Guangdong University of Technology, 2019, 36(04): 18-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!