基于可分离Transformer的点云分类方法

doi:10.12052/gdutxb.240003

摘要/Abstract

摘要： Transformer往往利用捕获远程依赖项的优势来提取点云远程点的关系交互，忽略重要的局部结构细节，并且依赖于大量的计算成本来实现高性能，计算负担增加。为了缓解这个问题，本文借鉴可分离视觉Transformer的思想，提出了一种可分离的Transformer点云分类方法，简称Sep-point。Sep-point通过深度可分离的自注意，帮助在点云组内和组间按顺序进行局部–全局关系交互。采用新的位置令牌嵌入和分组自注意方法，分别以可忽略的代价计算组间的注意关系，并建立跨多个区域的远程信息交互。提取局部–全局特征的同时，大大减少了计算负担。实验结果表明提出的Sep-point在数据集ModelNet40上比现有的PCT(Point Cloud Transformer)分类精度提升了0.2%，在真实数据集ScanObjectNN上的分类精度提升了6.3%。同时网络参数量和FLOPS指标分别降低了0.72M和0.18G，充分验证了本文方法的有效性。

关键词: 点云分类, 可分离Transformer, 位置令牌嵌入, 局部–全局关系交互

Abstract: Transformer tends to take advantage of capturing remote dependencies to extract relational interactions at remote points of the point cloud, ignoring important local structural details, and achieves high performance by significantly increasing the computational burden. To alleviate this problem, we propose a separable Transformer point cloud classification method, named Sep-point, based on the idea of separable visual Transformer. The proposed Sep-point facilitates sequential local-global relational interactions within and between groups of point clouds through depth-separable self-attention. New location token embedding and group self-attention methods are used to compute inter-group attentional relationships with negligible computational cost and to establish telematic interactions across multiple regions, respectively. In this way, the local-global features are extracted while the computational burden is significantly reduced. Experimental results show that the proposed Sep-point improves the classification accuracy by 0.2% on the ModelNet40 dataset over the existing PCT (Point Cloud Transformer) and by 6.3% on the real ScanObjectNN dataset, respectively. Moreover, the number of network parameters and FLOPS metrics are reduced by 0.72M and 0.18G, respectively. These experimental results clearly demonstrate the promising effectiveness of our proposed method.

Key words: point cloud classification, separable Transformer, location token embedding, local-global relational interactions

中图分类号:

TP302.7

刘诚辉, 李光平. 基于可分离Transformer的点云分类方法[J]. 广东工业大学学报,doi: 10.12052/gdutxb.240003

Liu Cheng-hui, Li Guang-ping. Point Cloud Classification Based on Separable Transformer[J]. Journal of Guangdong University of Technology, 2024, 41(0): 0-.doi: 10.12052/gdutxb.240003

参考文献

[1] SHI S, WANG X, LI H. PointRCNN: 3D object proposal generation and detection from point cloud[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 770-779.
[2] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017: 6000-6010.
[3] RAO Y, LU J , ZHOU J. Spherical fractal convolutional neural networks for point cloud recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 452-460.
[4] YI L, SU H, GUO X, et al. Syncspeccnn: synchronized spectral CNN for 3D shape segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6584-6592.
[5] ZHANG Z, LI K, YIN X, et al. Point cloud semantic scene segmentation based on coordinate convolution [J]. Computer Animation and Virtual Worlds, 2020, 31(4-5): e1948.
[6] SHI S, WANG Z, SHI J, et al. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(8): 2647-2664.
[7] PENG L, LIU F, YU Z, et al. Lidar point cloud guided monocular 3D object detection[C]//2022 European Conference on Computer Vision. Tel Aviv: Springer, 2022: 123-139.
[8] YANG H, SHI J, CARLONE L. Teaser: fast and certifiable point cloud registration [J]. IEEE Transactions on Robotics, 2020, 37(2): 314-333.
[9] GUO M, CAI J, LIU Z, et al. PCT: point cloud transformer [J]. Computational Visual Media, 2021, 7(2): 187-199.
[10] ZHAO H, JIANG L, JIA J, et al. Point transformer [J]. IEEE Access, 2021, 9(3): 16259-16268.
[11] LI W, WANG X, XIA X, et al. Sepvit: separable vision transformer[EB/OL]. arXiv: 2203.15380(2022-06-15) [2024-04-16]. https://doi.org/10.48550/arXiv.2203.15380.
[12] WANG Z, LU F. VoxSegNet: volumetric CNNs for semantic part segmentation of 3D shapes [J]. IEEE Transactions on Visualization and Computer Graphics, 2019, 26(9): 2919-2930.
[13] SHI S, GUO C, JIANG L, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10526-10535.
[14] QI C, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 77-85.
[15] QI C, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017: 5105–5114.
[16] LI Y, BU R, SUN M, et al. PointCNN: convolution on x-transformed points[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: ACM, 2018: 828-838.
[17] LIU X, HAN Z, LIU Y, et al. Point2sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network[C]//Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. Honolulu: AAAI, 2019, 33: 8778–8785.
[18] WANG Y, SUN Y, LIU Z, et al. Dynamic graph CNN for learning on point clouds [J]. ACM Transactions on Graphics, 2019, 38(5): 1-12.
[19] WU W, QI Z, Li F. PointConv: deep convolutional networks on 3D point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9613–9622.
[20] THOMAS H, QI C, DESCHAUD J, et al. KPConv: Flexible and deformable convolution for point clouds[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6410–6419.
[21] XU M, DING R, ZHAO H, et al. PAConv: position adaptive convolution with dynamic kernel assembling on point clouds[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3172–3181.
[22] HOWARD A, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL]. arXiv: 1704.04861(2017-04-17) [2024-04-16]. https://doi.org/10.48550/arXiv.1704.04861.
[23] LANDRIEU L, BOUSSAHA M. Point cloud oversegmentation with graph-structured deep metric learning[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7432–7441.
[24] WU Z, SONG S, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1912–1920.
[25] UY M, PHAM Q, HUA B, et al. Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1588-1597.
[26] XU Y, FAN T, XU M, et al. SpiderCNN: deep learning on point sets with parameterized convolutional filters[C]//Computer Vision-ECCV 2018. Munich: Springer, 2018: 99-105.
[27] LIU Y, FAN B, XIANG S, et al. Relation-shape convolutional neural network for point cloud analysis[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 8887-8896.
[28] YAN X, ZHENG C, LI Z, et al. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 5588-5597.
[29] BERG A, OSKARSSON M, CONNOR M. Points to patches: enabling the use of self-attention for 3D shape recognition[EB/OL]. arXiv: 2204.03957(2022-04-08) [2024-04-16]. https://doi.org/10.48550/arXiv.2204.03957.
[30] WIJAYA K, PAEK D, KONG S. Advanced feature learning on point clouds using multi-resolution features and learnable pooling[EB/OL]. arXiv: 2205.09962(2022-05-20) [2024-04-16]. https://doi.org/10.48550/arXiv.2205.09962.
[31] QIU S, ANWAR S, BARNES N. Dense-resolution network for point cloud classification and segmentation[C]//2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa: IEEE, 2021: 3812-3821.
[32] QIU S, ANWAR S, BARNES N. Geometric back-projection network for point cloud classification [J]. IEEE Transactions on Multimedia, 2021, 24(3): 1943-1955.
[33] GOYAL A, LAW H, LIU B, et al. Revisiting point cloud shape classification with a simple and effective baseline[C]//Proceedings of the 38th International Conference on Machine Learning. Vienna: IMLS, 2021: 3809-3820.
[34] HAMDI A, GIANCOLA S, GHANEM B. MVTN: multi-view transformation network for 3D shape recognition[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 1-11.
[35] YU X, TANG L, RAO Y, et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: CVPR, 2022: 19291-19300.
[36] CHENG S, CHEN X, HE X, et al. PRA-Net: point relation-aware network for 3D point cloud analysis [J]. IEEE Transactions on Image Processing, 2021, 30(2): 4436-4448.

Metrics

Viewed

Full text

161

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	161	0

From	Others	local

Times	29	132
Rate	18%	82%

Abstract

Just accepted	Online first	Issue

0	85	0

From	Others	local

Times	1	84
Rate	1%	99%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed