广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (02): 22-29.doi: 10.12052/gdutxb.210139

• 综合研究 • 上一篇    下一篇

基于通道注意力的自监督深度估计方法

吴俊贤, 何元烈   

  1. 广东工业大学 计算机学院,广东 广州 510006
  • 收稿日期:2021-09-17 出版日期:2023-03-25 发布日期:2023-04-07
  • 通信作者: 何元烈(1976-),男,副教授,博士,主要研究方向为视觉SLAM、计算机视觉、深度学习,E-mail:heyuanlie@163.com
  • 作者简介:吴俊贤(1995-),男,硕士研究生,主要研究方向为深度估计、深度学习、视觉SLAM
  • 基金资助:
    国家自然科学基金资助项目(61876043)

Channel Attentive Self-supervised Network for Monocular Depth Estimation

Wu Jun-xian, He Yuan-lie   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2021-09-17 Online:2023-03-25 Published:2023-04-07

摘要: 提出了一种基于自监督深度学习和通道注意力的深度估计方法。虽然以往的方法已经能够生成高精度的深度图,但是它们忽略了图像中的通道信息。对通道之间的依赖关系进行显式建模,并根据建模结果重新校准通道权重能有效地提高网络性能,从而提高深度估计的精度。本文从两个方面引入通道注意力机制以增强网络模型的能力:在网络中插入SE (Suqeeze-and-Excitation) 模块以提高网络模型获得特征图中通道间关系的能力;设计了一个多尺度融合通道注意力模块,实现融合多尺度像素特征和重新校准通道权重的功能。通过在KITTI数据集上的实验验证,所提方法在精准度、误差和深度图的具体效果上都优于现有的基于自监督深度学习的深度估计方法。

关键词: 单目深度估计, 注意力机制, 自监督深度学习

Abstract: A new method is proposed for self-supervised monocular depth estimation. Although recent methods have been able to produce high-precision depth maps, previous work has ignored channel-wise information in the image. To solve this problem, Channel Attention is introduced and improvements made in two aspects of the network structure: (a) a Squeeze-and-Excitation (SE) block is injected into the corresponding networks to capture the channel relation in feature map; (b) a Channel Attention Dense Connection (CADC) block is applied to combine multi-scale features and recalibrate channel-wise feature. Experiments on the KITTI dataset show the effectiveness of the proposed approach, which outperforms quantitatively and qualitatively the state-of-the-art Self-Supervised Depth Estimation methods.

Key words: monocular depth estimation, attention mechanism, self-supervised deep-learning

中图分类号: 

  • TP249
[1] MUR-ARTAL R, TARDóS J D. Orb-slam 2: an open-source slam system for monocular, stereo, and rgb-d cameras [J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-62.
[2] GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE. doi:10.1109/ICCV.2019.00393.
[3] YIN Z, SHI J. Geonet: Unsupervised learning of dense depth, optical flow and camera pose[C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi:10.1109/CVPR.2018.00212.
[4] ZHOU T, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.700
[5] LUO C, YANG Z, WANG P, et al. Every pixel counts++: joint learning of geometry and motion with 3d holistic understanding [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2624-41.
[6] ZHAO W, LIU S, SHU Y, et al. Towards better generalization: Joint depth-pose learning without posenet[C]//IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE. doi:10.1109/CVPR42600.2020.00917.
[7] PATIL V, VAN GANSBEKE W, DAI D, et al. Don’t forget the past: recurrent depth estimation from monocular video [J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6813-20.
[8] GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency [C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.699.
[9] DIJK T V, CROON G D. How do neural networks see depth in single images? [C]//IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South) : IEEE. doi: 10.1109/ICCV.2019.00227.
[10] CAO C, LIU X, YANG Y, et al. Look and think twice: capturing top-down visual attention with feedback convolutional neural networks [C]//IEEE International Conference on Computer Vision (ICCV) . Santiago, Chile: IEEE. doi: 10.1109/ICCV.2015.338.
[11] RANJAN A, JAMPANI V, BALLES L, et al. Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach, CA, USA: IEEE. doi: 10.1109/CVPR.2019.01252.
[12] JIAO J , YING C , SONG Y , et al. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss[C]// European Conference on Computer Vision. Cham: Springer, 2018: 55-71.
[13] FU H, GONG M, WANG C, et al. Deep ordinal regression network for monocular depth estimation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi: 10.1109/CVPR.2018.00214.
[14] EIGEN D, PUHRSCH C, FERGUS R J A P A. Depth map prediction from a single image using a multi-scale deep network [C]//Conference on Neural Information Processing Systems. La Jolla, USA: Neural Information Processing Systems. Foundation, 2014: 2366-2374.
[15] ZBONTAR J, LECUN Y J J M L R. Stereo matching by training a convolutional neural network to compare image patches [J]. IEEE Geoscience and Remote Sensing Letters, 2020(99): 1-5.
[16] UMMENHOFER B, ZHOU H, UHRIG J, et al. Demon: depth and motion network for learning monocular stereo [C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.596.
[17] GARG R, BG V K, CARNEIRO G, et al. Unsupervised cnn for single view depth estimation: geometry to the rescue[C]// European Conference on Computer Vision. Cham, Switzerland: Springer, 2016: 740-756.
[18] DHARMASIRI T, SPEK A, DRUMMOND T. Eng: end-to-end neural geometry for robust depth and pose estimation using cnns[C]// Asian Conference on Computer Vision. Cham, Switzerland: Springer, 2018: 625-642.
[19] ZHAN H, GARG R, WEERASEKERA C S, et al. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE. doi: 10.1109/CVPR.2018.00043.
[20] LI R, WANG S, LONG Z, et al. Undeepvo: monocular visual odometry through unsupervised deep learning [C]// IEEE International Conference on Robotics and Automation. Brisbane, QLD, Australia: IEEE. doi: 10.1109/ICRA.2018.8461251.
[21] BIAN J, LI Z, WANG N, et al. Unsupervised scale-consistent depth and ego-motion learning from monocular video [J]. Advances in Neural Information Processing Systems, 2019, 32: 35-45.
[22] XUE F, ZHUO G, HUANG Z, et al. Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas: IEEE. doi: 10.1109/IROS45743.2020.9340802.
[23] ZHAN H, WEERASEKERA C S, BIAN J-W, et al. Visual odometry revisited: what should be learnt?[C]// IEEE International Conference on Robotics and Automation (ICRA) . Paris, France: IEEE. doi: 10.1109/ICRA40945.2020.9197374.
[24] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-12.
[25] POGGI M, TOSI F, MATTOCCIA S. Learning monocular depth estimation with unsupervised trinocular assumptions [C]//International Conference on 3D Vision (3DV) . Verona, Italy: IEEE. doi: 10.1109/3DV.2018.00045.
[26] PILLAI S, AMBRUŞ R, GAIDON A. Superdepth: self-supervised, super-resolved monocular depth estimation[C]//International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada: IEEE. doi: 10.1109/ICRA.2019.8793621.
[27] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention. Cham: Springer, 2015: 234-241.
[28] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi: 10.1109/CVPR.2018.00745.
[29] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE. doi: 10.1109/CVPR.2016.90.
[30] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge [J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[1] 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9.
[2] 滕少华, 董谱, 张巍. 融合语义结构的注意力文本摘要模型[J]. 广东工业大学学报, 2021, 38(03): 1-8.
[3] 梁观术, 曹江中, 戴青云, 黄云飞. 一种基于注意力机制的无监督商标检索方法[J]. 广东工业大学学报, 2020, 37(06): 41-49.
[4] 曾碧卿, 韩旭丽, 王盛玉, 徐如阳, 周武. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(04): 10-17.
[5] 高俊艳, 刘文印, 杨振国. 结合注意力与特征融合的目标跟踪[J]. 广东工业大学学报, 2019, 36(04): 18-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!