广东工业大学学报 ›› 2024, Vol. 41 ›› Issue (03): 71-80.doi: 10.12052/gdutxb.230044

• 计算机科学与技术 • 上一篇    下一篇

基于YOLOv5的轻量化无人机航拍小目标检测算法

李雪森1, 谭北海2, 余荣1, 薛先斌1   

  1. 1. 广东工业大学 自动化学院, 广东 广州 510006;
    2. 广东工业大学 集成电路学院, 广东 广州 510006
  • 收稿日期:2023-03-04 出版日期:2024-05-25 发布日期:2024-06-14
  • 通信作者: 谭北海(1980-),男,副教授,博士,硕士生导师,主要研究方向为AI算法及芯片设计、人工智能、深度学习,E-mail:bhtan@gdut.edu.cn
  • 作者简介:李雪森(1997-),男,硕士研究生,主要研究方向为深度学习,E-mail:18325945913@163.com
  • 基金资助:
    国家自然科学基金资助项目(61971148);国家自然科学基金资助项目(U22A2054);广东省基础与应用基础研究基金联合基金重点项目(2019B1515120036);广西自然科学基金重点项目(2018GXNSFDA281013)

Small Target Detection Algorithm for Lightweight UAV Aerial Photography Based on YOLOv5

Li Xue-sen1, Tan Bei-hai2, Yu Rong1, Xue Xian-bin1   

  1. 1. School of Automation, Guangdong University of Technology, Guangzhou 510006, China;
    2. School of Integrated Circuits, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2023-03-04 Online:2024-05-25 Published:2024-06-14

摘要: 针对无人机航拍视角下图像目标特征尺寸小且存在背景复杂、分布密集的问题,提出了一种基于YOLOv5的轻量化无人机航拍小目标检测改进算法GA-YOLO。该算法改进了Mosaic数据增强方法和网络整体结构,并增加了微小物体检测头,同时设计了轻量化的全局注意力模块和并行结构的空间通道注意力机制模块,提高了网络的全局特征提取能力和训练过程中卷积通道之间的竞争和合作关系。以4.0版本的YOLOv5s为基准,在公开无人机航拍数据集VisDrone2019-DET上实验,结果表明,改进后的模型相较于原模型,参数量下降了48%,计算量下降了26%,而mAP@0.5提高了4.9个百分点,mAP@0.5:0.95提高了3.3个百分点,有效地提高了无人机空中视角下对密集型小目标的检测能力。

关键词: 无人机航拍, YOLOv5s, 小目标检测, 数据增强, 注意力机制

Abstract: A lightweight unmanned aerial vehicle (UAV) aerial photography small target detection algorithm GA-YOLO based on YOLOv5 is proposed to address the problem of small target feature size, complex background, and dense distribution in images from the perspective of UAV aerial photography. This algorithm improves the Mosaic data augmentation method and overall network structure, and adds a small object detection head. At the same time, a lightweight global attention module and a parallel spatial channel attention mechanism module are designed to enhance the network's global feature extraction ability and the competition and cooperation between convolutional channels during the training process. Based on the 4.0 version of YOLOv5s, experiments were conducted on the publicly available drone aerial photography dataset VisDrone2019-DET. The results showed that the improved model reduced the number of parameters by 48% and the computational complexity by 26% compared to the original model, and mAP@0.5 improved by 4.9 percentage points, mAP@0.5 0.95 increased by 3.3 percentage points, effectively enhancing the detection capability of unmanned aerial vehicles for dense small targets from an aerial perspective.

Key words: UAV aerial photography, YOLOv5s, small target detection, data enhancement, attention mechanism

中图分类号: 

  • TP391.41
[1] 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722.
CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection [J]. China Journal of Image and Graphics, 2022, 27(6): 1697-1722.
[2] 戴文君, 常天庆, 张雷, 等. 图像目标检测技术在坦克火控系统中的应用[J]. 火力与指挥控制, 2020, 45(7): 147-152.
DAI W J, CHANG T Q, ZHANG L, et al. Application of image target detection technology in tank fire control system [J]. Fire and Command Control, 2020, 45(7): 147-152.
[3] LIO W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C] //Computer Vision–ECCV 2016: 14th European Conference. Amsterdam, Netherlands: Springer International Publishing, 2016: 21-37.
[4] ZHAI S, SHANG D, WANG S, et al. DF-SSD: an improved SSD object detection algorithm based on DenseNet and feature fusion [J]. IEEE Access, 2020, 8: 24344-24357.
[5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[6] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) . Hawaii: IEEE, 2017: 7263-7271.
[7] REDMON J, FARHADI A. Yolov3: An incremental improvement[EB/OL]. arXiv: 1804.02767 (2018-04-08) [2023-02-07]. https://arxiv.53yu.com/abs/1804.02767.
[8] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. arXiv: 2004.10934 (2020-04-22) [2023-02-07]. https://arxiv.org/abs/2004.10934.
[9] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C] //Proceedings of the IEEE International Conference on Computer Vision. Hong Kong: IEEE, 2017: 2980-2988.
[10] GIRSHICK R. Fast R-CNN[C] //Proceedings of the IEEE International Conference on Computer Vision. Santiago Chile: IEEE, 2015: 1440-1448.
[11] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[12] PURKAIT P, ZHAO C, ZACH C. SPP-Net: deep absolute pose regression with synthetic views[EB/OL]. arXiv: 1712.03452 (2017-12-09) [2023-02-09]. https://arxiv.53yu.com/abs/1712.03452.
[13] LI P, CHE C. SeMo-YOLO: a multiscale object detection network in satellite remote sensing images[C] //2021 International Joint Conference on Neural Networks (IJCNN) . Shenzhen: IEEE, 2021: 1-8.
[14] TAN L, LV X, LIAN X, et al. YOLOv4_Drone: UAV image target detection based on an improved YOLOv4 algorithm [J]. Computers & Electrical Engineering, 2021, 93: 107261.
[15] WANG M, LI Q, GU Y, et al. SCAF-net: Scene context attention-based fusion network for vehicle detection in aerial imagery [J]. IEEE Geoscience and Remote Sensing Letters, 2021, 19: 1-5.
[16] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices[C] //Proceedings of the IEEE Eonference on Computer Vision and Pattern Recognition. Wellington New Zealand: IEEE, 2018: 6848-6856.
[17] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C] //Proceedings of the European Conference on Computer Vision (ECCV) . Munich: EACV, 2018: 3-19.
[18] GUO M H, LU C Z, LIU Z N, et al. Visual attention network [J]. Computational Visual Media, 2023, 9(4): 733-752.
[19] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. arXiv: 2010.11929 (2021-06-03) [2023-02-11]. https://arxiv.53yu.com/abs/2010.11929.
[20] MEHTA S, RASTEGARI M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[EB/OL]. arXiv: 2110.02178 (2022-03-04) [2023-02-11]. https://arxiv.53yu.com/abs/2110.02178.
[1] 涂泽良, 程良伦, 黄国恒. 基于局部正交特征融合的小样本图像分类[J]. 广东工业大学学报, 2024, 41(02): 73-83.
[2] 杨镇雄, 谭台哲. 基于生成对抗网络的低光照图像增强算法[J]. 广东工业大学学报, 2024, 41(01): 55-62.
[3] 赖志茂, 章云, 李东. 基于Transformer的人脸深度伪造检测技术综述[J]. 广东工业大学学报, 2023, 40(06): 155-167.
[4] 曾安, 陈旭宙, 姬玉柱, 潘丹, 徐小维. 基于自注意力和三维卷积的心脏多类分割方法[J]. 广东工业大学学报, 2023, 40(06): 168-175.
[5] 吴亚迪, 陈平华. 基于用户长短期偏好和音乐情感注意力的音乐推荐模型[J]. 广东工业大学学报, 2023, 40(04): 37-44.
[6] 曹智雄, 吴晓鸰, 骆晓伟, 凌捷. 融合迁移学习与YOLOv5的安全帽佩戴检测算法[J]. 广东工业大学学报, 2023, 40(04): 67-76.
[7] 赖东升, 冯开平, 罗立宏. 基于多特征融合的表情识别算法[J]. 广东工业大学学报, 2023, 40(03): 10-16.
[8] 吴俊贤, 何元烈. 基于通道注意力的自监督深度估计方法[J]. 广东工业大学学报, 2023, 40(02): 22-29.
[9] 刘洪伟, 林伟振, 温展明, 陈燕君, 易闽琦. 基于MABM的消费者情感倾向识别模型——以电影评论为例[J]. 广东工业大学学报, 2022, 39(06): 1-9.
[10] 滕少华, 董谱, 张巍. 融合语义结构的注意力文本摘要模型[J]. 广东工业大学学报, 2021, 38(03): 1-8.
[11] 梁观术, 曹江中, 戴青云, 黄云飞. 一种基于注意力机制的无监督商标检索方法[J]. 广东工业大学学报, 2020, 37(06): 41-49.
[12] 曾碧卿, 韩旭丽, 王盛玉, 徐如阳, 周武. 基于双注意力卷积神经网络模型的情感分析研究[J]. 广东工业大学学报, 2019, 36(04): 10-17.
[13] 高俊艳, 刘文印, 杨振国. 结合注意力与特征融合的目标跟踪[J]. 广东工业大学学报, 2019, 36(04): 18-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!