Abstract:
Existing six-degree-of-freedom (6-DoF) grasp detection methods predominantly focus on improving grasp detection accuracy, yet they remain weak in enhancing network inference efficiency, making it difficult to meet the real-time grasping demands of robots in cluttered environments. To address this issue, a 6-DoF grasp detection method based on heatmap guidance and attention mechanism is proposed. The method guides the network to rapidly locate high-potential grasping regions through multi-channel heatmaps, significantly reducing redundant processing scope of point clouds and improving computational efficiency. Simultaneously, a lightweight dual attention gate module is designed to synergistically enhance key feature extraction while suppressing background noise. Furthermore, a lightweight local feature extraction and fusion module is designed to align and deeply integrate 2D image features with 3D point cloud features, enhancing the robustness of feature representation. Finally, a grasp pose generator with an anchor dynamic offset algorithm adaptively optimizes anchor distributions to better fit non-uniform distribution of ground-truth grasp poses, thereby producing dense and accurate 6-DoF grasp poses. Experimental results on the GraspNet-1Billion dataset demonstrate that the proposed method achieves a 4.28 percentage point improvement in grasping accuracy over the baseline GSNet, while reducing the average inference time to 39ms (only 20% of GSNet) . This effectively enhances inference efficiency while maintaining high accuracy. In real-world robot experiments, the grasping success rate surpasses GSNet by 8.57 percentage points, thereby validating the method’s effectiveness in practical scenarios.