基于YOLOv5的轻量化无人机航拍小目标检测算法

doi:10.12052/gdutxb.230044

Abstract

Abstract: A lightweight unmanned aerial vehicle (UAV) aerial photography small target detection algorithm GA-YOLO based on YOLOv5 is proposed to address the problem of small target feature size, complex background, and dense distribution in images from the perspective of UAV aerial photography. This algorithm improves the Mosaic data augmentation method and overall network structure, and adds a small object detection head. At the same time, a lightweight global attention module and a parallel spatial channel attention mechanism module are designed to enhance the network's global feature extraction ability and the competition and cooperation between convolutional channels during the training process. Based on the 4.0 version of YOLOv5s, experiments were conducted on the publicly available drone aerial photography dataset VisDrone2019-DET. The results showed that the improved model reduced the number of parameters by 48% and the computational complexity by 26% compared to the original model, and mAP@0.5 improved by 4.9 percentage points, mAP@0.5 0.95 increased by 3.3 percentage points, effectively enhancing the detection capability of unmanned aerial vehicles for dense small targets from an aerial perspective.

Key words: UAV aerial photography, YOLOv5s, small target detection, data enhancement, attention mechanism

CLC Number:

TP391.41

Li Xue-sen, Tan Bei-hai, Yu Rong, Xue Xian-bin. Small Target Detection Algorithm for Lightweight UAV Aerial Photography Based on YOLOv5[J].Journal of Guangdong University of Technology, 2024, 41(03): 71-80.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: https://xbzrb.gdut.edu.cn/EN/10.12052/gdutxb.230044

https://xbzrb.gdut.edu.cn/EN/Y2024/V41/I03/71

References

[1] 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722.
CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection [J]. China Journal of Image and Graphics, 2022, 27(6): 1697-1722.
[2] 戴文君, 常天庆, 张雷, 等. 图像目标检测技术在坦克火控系统中的应用[J]. 火力与指挥控制, 2020, 45(7): 147-152.
DAI W J, CHANG T Q, ZHANG L, et al. Application of image target detection technology in tank fire control system [J]. Fire and Command Control, 2020, 45(7): 147-152.
[3] LIO W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C] //Computer Vision–ECCV 2016: 14th European Conference. Amsterdam, Netherlands: Springer International Publishing, 2016: 21-37.
[4] ZHAI S, SHANG D, WANG S, et al. DF-SSD: an improved SSD object detection algorithm based on DenseNet and feature fusion [J]. IEEE Access, 2020, 8: 24344-24357.
[5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[6] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) . Hawaii: IEEE, 2017: 7263-7271.
[7] REDMON J, FARHADI A. Yolov3: An incremental improvement[EB/OL]. arXiv: 1804.02767 (2018-04-08) [2023-02-07]. https://arxiv.53yu.com/abs/1804.02767.
[8] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. arXiv: 2004.10934 (2020-04-22) [2023-02-07]. https://arxiv.org/abs/2004.10934.
[9] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C] //Proceedings of the IEEE International Conference on Computer Vision. Hong Kong: IEEE, 2017: 2980-2988.
[10] GIRSHICK R. Fast R-CNN[C] //Proceedings of the IEEE International Conference on Computer Vision. Santiago Chile: IEEE, 2015: 1440-1448.
[11] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[12] PURKAIT P, ZHAO C, ZACH C. SPP-Net: deep absolute pose regression with synthetic views[EB/OL]. arXiv: 1712.03452 (2017-12-09) [2023-02-09]. https://arxiv.53yu.com/abs/1712.03452.
[13] LI P, CHE C. SeMo-YOLO: a multiscale object detection network in satellite remote sensing images[C] //2021 International Joint Conference on Neural Networks (IJCNN) . Shenzhen: IEEE, 2021: 1-8.
[14] TAN L, LV X, LIAN X, et al. YOLOv4_Drone: UAV image target detection based on an improved YOLOv4 algorithm [J]. Computers & Electrical Engineering, 2021, 93: 107261.
[15] WANG M, LI Q, GU Y, et al. SCAF-net: Scene context attention-based fusion network for vehicle detection in aerial imagery [J]. IEEE Geoscience and Remote Sensing Letters, 2021, 19: 1-5.
[16] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices[C] //Proceedings of the IEEE Eonference on Computer Vision and Pattern Recognition. Wellington New Zealand: IEEE, 2018: 6848-6856.
[17] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C] //Proceedings of the European Conference on Computer Vision (ECCV) . Munich: EACV, 2018: 3-19.
[18] GUO M H, LU C Z, LIU Z N, et al. Visual attention network [J]. Computational Visual Media, 2023, 9(4): 733-752.
[19] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. arXiv: 2010.11929 (2021-06-03) [2023-02-11]. https://arxiv.53yu.com/abs/2010.11929.
[20] MEHTA S, RASTEGARI M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[EB/OL]. arXiv: 2110.02178 (2022-03-04) [2023-02-11]. https://arxiv.53yu.com/abs/2110.02178.

Related Articles 10

[1]	Tu Ze-liang, Cheng Liang-lun, Huang Guo-Heng. Local Orthogonal Feature Fusion for Few-Shot Image Classification [J]. Journal of Guangdong University of Technology, 2024, 41(02): 73-83.doi: 10.12052/gdutxb.230044
[2]	Lai Zhi-mao, Zhang Yun, Li Dong. A Survey of Deepfake Detection Techniques Based on Transformer [J]. Journal of Guangdong University of Technology, 2023, 40(06): 155-167.doi: 10.12052/gdutxb.230044
[3]	Zeng An, Chen Xu-zhou, Ji Yu-Zhu, Pan Dan, Xu Xiao-Wei. Cardiac Multiclass Segmentation Method Based on Self-attention and 3D Convolution [J]. Journal of Guangdong University of Technology, 2023, 40(06): 168-175.doi: 10.12052/gdutxb.230044
[4]	Lai Dong-sheng, Feng Kai-ping, Luo Li-hong. Facial Expression Recognition Based on Multi-feature Fusion [J]. Journal of Guangdong University of Technology, 2023, 40(03): 10-16.doi: 10.12052/gdutxb.230044
[5]	Wu Jun-xian, He Yuan-lie. Channel Attentive Self-supervised Network for Monocular Depth Estimation [J]. Journal of Guangdong University of Technology, 2023, 40(02): 22-29.doi: 10.12052/gdutxb.230044
[6]	Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9.doi: 10.12052/gdutxb.230044
[7]	Teng Shao-hua, Dong Pu, Zhang Wei. An Attention Text Summarization Model Based on Syntactic Structure Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(03): 1-8.doi: 10.12052/gdutxb.230044
[8]	Liang Guan-shu, Cao Jiang-zhong, Dai Qing-yun, Huang Yun-fei. An Unsupervised Trademark Retrieval Method Based on Attention Mechanism [J]. Journal of Guangdong University of Technology, 2020, 37(06): 41-49.doi: 10.12052/gdutxb.230044
[9]	Zeng Bi-qing, Han Xu-li, Wang Sheng-yu, Xu Ru-yang, Zhou Wu. Sentiment Classification Based on Double Attention Convolutional Neural Network Model [J]. Journal of Guangdong University of Technology, 2019, 36(04): 10-17.doi: 10.12052/gdutxb.230044
[10]	Gao Jun-yan, Liu Wen-yin, Yang Zhen-guo. Object Tracking Combined with Attention and Feature Fusion [J]. Journal of Guangdong University of Technology, 2019, 36(04): 18-23.doi: 10.12052/gdutxb.230044

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Small Target Detection Algorithm for Lightweight UAV Aerial Photography Based on YOLOv5

HTML

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 10

Metrics

Comments

Recommended 0