广东工业大学学报 ›› 2020, Vol. 37 ›› Issue (04): 35-41.doi: 10.12052/gdutxb.190140

• • 上一篇    下一篇

结合注意力与无监督深度学习的单目深度估计

岑仕杰, 何元烈, 陈小聪   

  1. 广东工业大学 计算机学院,广东 广州 510006
  • 收稿日期:2019-11-18 出版日期:2020-07-11 发布日期:2020-07-02
  • 通信作者: 何元烈(1976-),男,副教授. 主要研究方向为计算机视觉、深度学习和智能机器人,E-mail:heyuanlie@163.com E-mail:heyuanlie@163.com
  • 作者简介:岑仕杰(1992-),男,硕士研究生. 主要研究方向为计算机视觉、深度学习和深度估计
  • 基金资助:
    国家自然科学基金资助项目(61876043)

A Monocular Depth Estimation Combined with Attention and Unsupervised Deep Learning

Cen Shi-jie, He Yuan-lie, Chen Xiao-cong   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2019-11-18 Online:2020-07-11 Published:2020-07-02

摘要: 针对当前的无监督单目深度估计方法边界模糊的问题, 提出了一种基于双重注意力模块的网络架构。这种架构能有效利用图像特征的远程上下文信息解决深度估计中的边界模糊问题。整个框架使用基于视图合成的无监督方法训练, 模型框架包括深度估计网络与位姿估计网络, 同步估计深度和相机位姿变换。双重注意力模块嵌入在深度估计网络中, 包含位置注意力模块和通道注意力模块, 能表示远程空间位置和不同特征图间的上下文信息, 从而使网络估计出细节更好的深度信息。在KITTI数据集以及Make3D数据集上的实验结果表明, 本文的方法能有效提高单目深度估计的精度和解决深度估计边界模糊问题。

关键词: 深度估计, 无监督学习, 深度学习, 注意力, 机器人技术

Abstract: To solve the problem of boundary blurring of current unsupervised monocular depth estimation method, a network architecture is proposed based on dual attention module. This architecture can effectively solve the problem of boundary blurring of depth estimation by using long-range context information of image features. The model framework that includes depth estimation network and pose estimation network is trained by an unsupervised method based on view synthesis and estimation depth and camera pose transformation at the same time. The dual attention module is embedded in the depth estimation network, including position attention module and channel attention module. This module can represent the long-range spatial location and the context information between different feature maps, so that the network can estimate the depth information with better details. The experimental results on the KITTI dataset and the Make3D dataset show that our method can effectively improve the accuracy of the monocular depth estimation and can solve the depth estimation boundary blur problem.

Key words: depth estimation, unsupervised learning, deep learning, attention, robotics

中图分类号: 

  • TP249
[1] 朱福利, 曾碧, 曹军. 基于粒子滤波的SLAM算法并行优化与实现[J]. 广东工业大学学报, 2017, 34(2): 92-96
ZHU F L, ZENG B, CAO J. Parallel optimization and implementation of SLAM algorithm based on particle filter [J]. Journal of Guangdong University of Technology, 2017, 34(2): 92-96
[2] XIE J, GIRSHICK R, FARHADI A. Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks[C]//European Conference on Computer Vision. Amsterdam: Springer, 2016: 842-857.
[3] GARG R, BG V K, CARNEIRO G, et al. Unsupervised CNN for single view depth estimation: Geometry to the Rescue[C]//European Conference on Computer Vision. Amsterdam: Springer, 2016: 740-756.
[4] GODARD C, AODHA O M, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6602-6611.
[5] ZHOU T H, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6612-6619.
[6] MAHJOURIAN R, WICKE M, ANGELOVA A, et al. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5667-5675.
[7] YIN Z C, SHI J P. GeoNet: unsupervised learning of dense depth, optical flow and camera pose[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1983-1992.
[8] HUANG J, LEE A B, Mumford D. Statistics of range images[C]//Proceedings IEEE Conference on Computer Vision and Pattern Recog-nition. Hilton Head Island: IEEE, 2000: 324-331.
[9] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3146-3154.
[10] HE K, ZHANNG X, REN S, et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 770-778.
[11] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612
[12] GODARD C, AODHA O M, BROSTOW G J, et al. Digging into self-supervised monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3828-3838.
[13] EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multiscale deep network[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT press, 2014: 2366-2374
[14] LIU F, SHEN C, LIN G, et al. Learning depth from single monocular images using deep convolutional neural fields [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 38(10): 2024-2039
[15] ZOU Y, LUO Z, HUANG J, et al. DF-Net: unsupervised joint learning of depth and flow using cross-task consistency[C]//European Conference on Computer Vision. Munich: Springer International Publishing, 2018: 38-55.
[16] RANJAN A, JAMPANI V, BALLES L, et al. Adversarial collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 12240-12249.
[17] BIAN J, LI Z, WANG N, et al. Unsupervised scale-consistent depth and ego-motion learning from monocular video[C]//Proceedings of the 32th International Conference on Neural Information Processing Systems. Vancouver: MIT press, 2019: 35-45.
[1] 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29.
[2] 刘冬宁, 王子奇, 曾艳姣, 文福燕, 王洋. 基于复合编码特征LSTM的基因甲基化位点预测方法[J]. 广东工业大学学报, 2023, 40(01): 1-9.
[3] 徐伟锋, 蔡述庭, 熊晓明. 基于深度特征的单目视觉惯导里程计[J]. 广东工业大学学报, 2023, 40(01): 56-60,76.
[4] 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9.
[5] 章云, 王晓东. 基于受限样本的深度学习综述与思考[J]. 广东工业大学学报, 2022, 39(05): 1-8.
[6] 郑佳碧, 杨振国, 刘文印. 基于细粒度混杂平衡的营销效果评估方法[J]. 广东工业大学学报, 2022, 39(02): 55-61.
[7] Gary Yen, 栗波, 谢胜利. 地球流体动力学模型恢复的长短期记忆网络渐进优化方法[J]. 广东工业大学学报, 2021, 38(06): 1-8.
[8] 张巍, 张圳彬. 联合图嵌入与特征加权的无监督特征选择[J]. 广东工业大学学报, 2021, 38(05): 16-23.
[9] 滕少华, 董谱, 张巍. 融合语义结构的注意力文本摘要模型[J]. 广东工业大学学报, 2021, 38(03): 1-8.
[10] 赖峻, 刘震宇, 刘圣海. 基于全局数据混洗的小样本数据预测方法[J]. 广东工业大学学报, 2021, 38(03): 17-21.
[11] 梁观术, 曹江中, 戴青云, 黄云飞. 一种基于注意力机制的无监督商标检索方法[J]. 广东工业大学学报, 2020, 37(06): 41-49.
[12] 赵永建, 杨振国, 刘文印. 基于双向条目注意网络的推荐系统[J]. 广东工业大学学报, 2020, 37(04): 27-34.
[13] 滕少华, 冯镇业, 滕璐瑶, 房小兆. 联合低秩表示与图嵌入的无监督特征选择[J]. 广东工业大学学报, 2019, 36(05): 7-13.
[14] 曾碧卿, 韩旭丽, 王盛玉, 徐如阳, 周武. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(04): 10-17.
[15] 高俊艳, 刘文印, 杨振国. 结合注意力与特征融合的目标跟踪[J]. 广东工业大学学报, 2019, 36(04): 18-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!