语义引导下自适应拓扑推理图卷积网络的人体动作识别

doi:10.12052/gdutxb.220107

广东工业大学学报 ›› 2023, Vol. 40 ›› Issue (04): 45-52.doi: 10.12052/gdutxb.220107

语义引导下自适应拓扑推理图卷积网络的人体动作识别

林哲煌, 李东

广东工业大学自动化学院，广东广州 510006

收稿日期:2022-06-13 出版日期:2023-07-25 发布日期:2023-08-02
通信作者: 李东(1983–)，男，副教授，博士，主要研究方向为模式识别、机器学习、人脸识别，机器视觉，E-mail: dong.li@gdut.edu.cn
作者简介:林哲煌(1996–)，男，硕士研究生，主要研究方向为机器学习、深度学习、动作识别
基金资助:
广东省自然科学基金资助项目(2021A1515011867)

Semantics-guided Adaptive Topology Inference Graph Convolutional Networks for Skeleton-based Action Recognition

Lin Zhe-huang, Li Dong

School of Automation, Guangdong University of Technology, Guangzhou 510006, China

Received:2022-06-13 Online:2023-07-25 Published:2023-08-02

摘要/Abstract

摘要： 图卷积网络(Graph Convolutional Networks, GCN) 对于基于骨架关节点信息的人体动作识别任务具有天然的优势，越来越受到重视。图卷积网络的关键在于如何获取更丰富的特征信息以及采用更合理的拓扑结构。本文改进了人体骨架关节点及其语义信息(关节点类型和帧间索引)的特征融合方式，集成为一个语义信息编码模块，从而更适用于复杂的多层网络。在语义信息编码模块的语义引导下，网络可以获取更丰富的关节点特征信息。其次，本文提出了一种拓扑结构推理网络，结合卷积神经网络(Convolutional Neural Networks，CNN) 高效的特征学习能力，自适应地根据不同动作样本的上下文特征信息学习不同的邻接矩阵，有助于网络摆脱固定拓扑结构的局限性。将上述方法应用于双流自适应图卷积网络，本文提出了一种语义引导下多流自适应拓扑推理的图卷积网络。实验结果证明，本文的方法使图卷积网络识别精度有了明显的提高，在基于骨架信息的人体动作识别大型数据集NTU RGB+D、NTU RGB+D 120上均达到了目前先进水平。

关键词: 动作识别, 图卷积网络, 人体骨架, 邻接矩阵

Abstract: Graph convolutional networks (GCN), with natural advantages for skeleton-based action recognition, has attracted more and more attention. The key lies in how to obtain richer feature information and the design of the skeleton topology. In this research, the feature fusion method of joint and semantics (joint type and frame index) is improved, and integrated into a Semantics Coding Module (SCM), which is more applicable for complex multi-layer networks. Guided by the SCM, the network can obtain more feature information of skeleton. Secondly, a skeleton Topology Inference Network (TIN) is proposed, which adaptively learns different adjacency matrices according to the context information of different samples with the efficient feature learning ability of CNN, so that the network can get rid of the limitation of fixed topology. By applying the SCM and TIN to 2s-AGCN, we propose a semantics-guided multi-stream adaptive topology inference graph convolutional network for skeleton-based action recognition. Extensive experiments on datasets, NTU RGB+D and NTU RGB+D 120, demonstrate that our methods obviously improve the accuracy of network and our model has achieved the state-of-the-art performance.

Key words: action recognition, graph convolutional network, skeleton, adjacency matrix

中图分类号:

TP391.4

林哲煌, 李东. 语义引导下自适应拓扑推理图卷积网络的人体动作识别[J]. 广东工业大学学报, 2023, 40(04): 45-52.

Lin Zhe-huang, Li Dong. Semantics-guided Adaptive Topology Inference Graph Convolutional Networks for Skeleton-based Action Recognition[J]. Journal of Guangdong University of Technology, 2023, 40(04): 45-52.

参考文献

[1] EVANGELIDIS G, SINGH G, HORAUD R. Skeletal quads: human action recognition using joint quadruples[C]//201422nd International Conference on Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2014: 4513-4518.
[2] 周小平, 郭开仲. 基于计算机视觉的腾空飞脚错误动作识别模型[J]. 广东工业大学学报, 2012, 29(4): 14-17.ZHOU X P, GUO K Z. The model for the recognition of flying kick error action based on computer vision[J]. Journal of Guangdong University of Technology, 2012, 29(4): 14-17.
[3] WANG H, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE, 2013: 3551-3558.
[4] CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20(3): 273-297.
[5] SADEGH A M, SADAT S F, SALZMANN M, et al. Encouraging lstms to anticipate actions very early[C]//Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE, 2017: 280-289.
[6] JAIN A, SINGH A, KOPPULA H S, et al. Recurrent neural networks for driver activity anticipation via sensory-fusion architecture[C]//2016 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2016: 3118-3125.
[7] WANG H, WANG L. Beyond joints: learning representations from primitive geometries for skeleton-based action recognition and detection[J]. IEEE Transactions on Image Processing, 2018, 27(9): 4382-4394.
[8] YANG C, XU Y, SHI J, et al. Temporal pyramid network for action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2020: 591-600.
[9] WANG L, TONG Z, JI B, et al. TDN: temporal difference networks for efficient action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2021: 1895-1904.
[10] LI C, ZHONG Q, XIE D, et al. Skeleton-based action recognition with convolutional neural networks[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). New York: IEEE, 2017: 597-600.
[11] WENG J, LIU M, JIANG X, et al. Deformable pose traversal convolution for 3D action and gesture recognition[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham, Switzerland: Springer, 2018: 136-152.
[12] KE Q, BENNAMOUN M, AN S, et al. Learning clip representations for skeleton-based 3d action recognition[J]. IEEE Transactions on Image Processing, 2018, 27(6): 2842-2855.
[13] LIU J, WANG G, DUAN L Y, et al. Skeleton-based human action recognition with global context-aware attention LSTM networks[J]. IEEE Transactions on Image Processing, 2017, 27(4): 1586-1599.
[14] SI C, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham, Switzerland: Springer, 2018: 103-118.
[15] LI S, LI W, COOK C, et al. Independently recurrent neural network (indrnn): building a longer and deeper rnn[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 5457-5466.
[16] CHEN T, ZHOU D, WANG J, et al. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2021: 4334-4342.
[17] YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018, 32(1).
[18] SHI L, ZHANG Y, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 12026-12035.
[19] ZHANG P, LAN C, ZENG W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 1112-1121.
[20] SONG Y F, ZHANG Z, SHAN C, et al. Richly activated graph convolutional network for robust skeleton-based action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(5): 1915-1925.
[21] ZENG A, SUN X, YANG L, et al. Learning skeletal graph neural networks for hard 3D pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2021: 11436-11445.
[22] LI S, YI J, FARHA Y A, et al. Pose refinement graph convolutional network for skeleton-based action recognition[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 1028-1035.
[23] YANG H, GU Y, ZHU J, et al. PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition[J]. IEEE Access, 2020, 8: 10040-10047.
[24] DING X, YANG K, CHEN W. A semantics-guided graph convolutional network for skeleton-based action recognition[C]//Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence. New York: Association for Computing Machinery, 2020: 130-136.
[25] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2016: 770-778.
[26] SHI L, ZHANG Y, CHENG J, et al. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks[J]. IEEE Transactions on Image Processing, 2020, 29: 9532-9545.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

语义引导下自适应拓扑推理图卷积网络的人体动作识别

Semantics-guided Adaptive Topology Inference Graph Convolutional Networks for Skeleton-based Action Recognition

HTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

Metrics

本文评价

推荐阅读 0