基于通道注意力的自监督深度估计方法

doi:10.12052/gdutxb.210139

摘要/Abstract

摘要： 提出了一种基于自监督深度学习和通道注意力的深度估计方法。虽然以往的方法已经能够生成高精度的深度图，但是它们忽略了图像中的通道信息。对通道之间的依赖关系进行显式建模，并根据建模结果重新校准通道权重能有效地提高网络性能，从而提高深度估计的精度。本文从两个方面引入通道注意力机制以增强网络模型的能力：在网络中插入SE (Suqeeze-and-Excitation) 模块以提高网络模型获得特征图中通道间关系的能力；设计了一个多尺度融合通道注意力模块，实现融合多尺度像素特征和重新校准通道权重的功能。通过在KITTI数据集上的实验验证，所提方法在精准度、误差和深度图的具体效果上都优于现有的基于自监督深度学习的深度估计方法。

关键词: 单目深度估计, 注意力机制, 自监督深度学习

Abstract: A new method is proposed for self-supervised monocular depth estimation. Although recent methods have been able to produce high-precision depth maps, previous work has ignored channel-wise information in the image. To solve this problem, Channel Attention is introduced and improvements made in two aspects of the network structure: (a) a Squeeze-and-Excitation (SE) block is injected into the corresponding networks to capture the channel relation in feature map; (b) a Channel Attention Dense Connection (CADC) block is applied to combine multi-scale features and recalibrate channel-wise feature. Experiments on the KITTI dataset show the effectiveness of the proposed approach, which outperforms quantitatively and qualitatively the state-of-the-art Self-Supervised Depth Estimation methods.

Key words: monocular depth estimation, attention mechanism, self-supervised deep-learning

中图分类号:

TP249

吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29.

Wu Jun-xian, He Yuan-lie. Channel Attentive Self-supervised Network for Monocular Depth Estimation[J]. Journal of Guangdong University of Technology, 2023, 40(02): 22-29.

参考文献

[1] MUR-ARTAL R, TARDóS J D. Orb-slam 2: an open-source slam system for monocular, stereo, and rgb-d cameras [J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-62.
[2] GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE. doi:10.1109/ICCV.2019.00393.
[3] YIN Z, SHI J. Geonet: Unsupervised learning of dense depth, optical flow and camera pose[C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi:10.1109/CVPR.2018.00212.
[4] ZHOU T, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.700
[5] LUO C, YANG Z, WANG P, et al. Every pixel counts++: joint learning of geometry and motion with 3d holistic understanding [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2624-41.
[6] ZHAO W, LIU S, SHU Y, et al. Towards better generalization: Joint depth-pose learning without posenet[C]//IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE. doi:10.1109/CVPR42600.2020.00917.
[7] PATIL V, VAN GANSBEKE W, DAI D, et al. Don’t forget the past: recurrent depth estimation from monocular video [J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6813-20.
[8] GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency [C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.699.
[9] DIJK T V, CROON G D. How do neural networks see depth in single images? [C]//IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South) : IEEE. doi: 10.1109/ICCV.2019.00227.
[10] CAO C, LIU X, YANG Y, et al. Look and think twice: capturing top-down visual attention with feedback convolutional neural networks [C]//IEEE International Conference on Computer Vision (ICCV) . Santiago, Chile: IEEE. doi: 10.1109/ICCV.2015.338.
[11] RANJAN A, JAMPANI V, BALLES L, et al. Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach, CA, USA: IEEE. doi: 10.1109/CVPR.2019.01252.
[12] JIAO J , YING C , SONG Y , et al. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss[C]// European Conference on Computer Vision. Cham: Springer, 2018: 55-71.
[13] FU H, GONG M, WANG C, et al. Deep ordinal regression network for monocular depth estimation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi: 10.1109/CVPR.2018.00214.
[14] EIGEN D, PUHRSCH C, FERGUS R J A P A. Depth map prediction from a single image using a multi-scale deep network [C]//Conference on Neural Information Processing Systems. La Jolla, USA: Neural Information Processing Systems. Foundation, 2014: 2366-2374.
[15] ZBONTAR J, LECUN Y J J M L R. Stereo matching by training a convolutional neural network to compare image patches [J]. IEEE Geoscience and Remote Sensing Letters, 2020(99): 1-5.
[16] UMMENHOFER B, ZHOU H, UHRIG J, et al. Demon: depth and motion network for learning monocular stereo [C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE. doi: 10.1109/CVPR.2017.596.
[17] GARG R, BG V K, CARNEIRO G, et al. Unsupervised cnn for single view depth estimation: geometry to the rescue[C]// European Conference on Computer Vision. Cham, Switzerland: Springer, 2016: 740-756.
[18] DHARMASIRI T, SPEK A, DRUMMOND T. Eng: end-to-end neural geometry for robust depth and pose estimation using cnns[C]// Asian Conference on Computer Vision. Cham, Switzerland: Springer, 2018: 625-642.
[19] ZHAN H, GARG R, WEERASEKERA C S, et al. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE. doi: 10.1109/CVPR.2018.00043.
[20] LI R, WANG S, LONG Z, et al. Undeepvo: monocular visual odometry through unsupervised deep learning [C]// IEEE International Conference on Robotics and Automation. Brisbane, QLD, Australia: IEEE. doi: 10.1109/ICRA.2018.8461251.
[21] BIAN J, LI Z, WANG N, et al. Unsupervised scale-consistent depth and ego-motion learning from monocular video [J]. Advances in Neural Information Processing Systems, 2019, 32: 35-45.
[22] XUE F, ZHUO G, HUANG Z, et al. Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas: IEEE. doi: 10.1109/IROS45743.2020.9340802.
[23] ZHAN H, WEERASEKERA C S, BIAN J-W, et al. Visual odometry revisited: what should be learnt?[C]// IEEE International Conference on Robotics and Automation (ICRA) . Paris, France: IEEE. doi: 10.1109/ICRA40945.2020.9197374.
[24] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-12.
[25] POGGI M, TOSI F, MATTOCCIA S. Learning monocular depth estimation with unsupervised trinocular assumptions [C]//International Conference on 3D Vision (3DV) . Verona, Italy: IEEE. doi: 10.1109/3DV.2018.00045.
[26] PILLAI S, AMBRUŞ R, GAIDON A. Superdepth: self-supervised, super-resolved monocular depth estimation[C]//International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada: IEEE. doi: 10.1109/ICRA.2019.8793621.
[27] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention. Cham: Springer, 2015: 234-241.
[28] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE. doi: 10.1109/CVPR.2018.00745.
[29] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE. doi: 10.1109/CVPR.2016.90.
[30] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge [J]. International Journal of Computer Vision, 2015, 115(3): 211-252.

Metrics

Viewed

Full text

1824

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	1824

From	Others	local

Times	270	1554
Rate	15%	85%

Abstract

735

Just accepted	Online first	Issue

0	0	735

From	Others	local

Times	9	726
Rate	1%	99%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed