基于语义融合特征的多人像语义分割方法

冯广; 汤翀

doi:10.12052/gdutxb.230211

基于语义融合特征的多人像语义分割方法

冯广,
汤翀

A Multi-Portrait Semantic Segmentation Method Based on Semantic Fusion Features

摘要

摘要: 人像语义分割是计算机视觉领域的重要研究内容之一，但现有的人像语义分割方法容易忽略多人像图像中的小尺寸人像。同时分割结果中容易出现多个人像之间相互粘连的现象。再者，图像中人像之间存在相互遮挡现象容易导致人像边缘分割精度不佳。基于以上问题，本文提出一种融合标签语义的多人像语义分割方法，对图像中的多个人像分配多个标签，并将语义标签嵌入同时作为编码器的输入，使用跨模态交叉注意力模块对语义标签和图像特征表示进行相关性建模，将语义融合的特征表示作为模型每一层编码器的输出。提出HRF attention模块，基于目标检测算法对图像生成的多个假设分别进行特征提取。将该网络在 Supervisely增强数据集上训练测试，实验结果表明该算法模型在3个评估指标PA、MIoU、Dice上分别达到95.94%、94.60%、96.02%的精度，较语义分割模型U-net、PSPNet、Deeplab v3+、PortraitNet、Swin Unet具有更高的分割精度。

Abstract: Portrait semantic segmentation is one of the important research contents in the field of computer vision, but the existing portrait semantic segmentation methods are liable to ignore the small size portraits in multi-person portrait images. At the same time, the segmentation results are prone to the phenomenon of mutual adhesion between multiple portraits. Moreover, the phenomenon of mutual occlusion between portraits in the image easily leads to poor segmentation accuracy of portrait edges. Based on the above problems, a semantic segmentation method for multiple portraits with fused label semantics is propose, where multiple labels are assigned to multiple portraits in an image, and semantic labels are embedded as inputs to the encoder at the same time, and the semantic labels and the image feature representations are correlated using the cross-modal cross-attention module, and the semantically fused feature representations are obtained as outputs of the encoder at each layer of the model. The HRF attention module is proposed to generate multiple hypotheses for image based on target detection algorithm for feature extraction separately. The network is trained and tested on Supervisely augmented dataset. The experimental results show that the algorithmic model achieves 95.94%, 94.60%, and 96.02% accuracy on the three evaluation metrics of PA, MIoU, and Dice, respectively, and has higher segmentation accuracy than the semantic segmentation models U-net, PSPNet, Deeplab v3+, PortraitNet, and Swin Unet.

HTML全文

参考文献(25)

施引文献

资源附件(0)