基于样本对语义主动挖掘的图文匹配算法

陈永锋; 刘劲; 杨志景; 陈锐涵; 谭俊鹏

doi:10.12052/gdutxb.230122

基于样本对语义主动挖掘的图文匹配算法

Active Mining Sample Pair Semantics for Image-text Matching

摘要

摘要: 针对目前基于共识学习的图文匹配算法无法有效匹配图像−文本样本对中难分的负样本,模型的泛化能力较弱,在大规模数据集上效果不佳等不足，本文提出了一种基于样本对语义主动挖掘的图文匹配模型。首先，提出的自适应分层强化损失具有多样化的学习模式，在传统的三元组损失基础上，增加具有预测性的候选实例(难以分辨的样本对)进行辅助训练。其主动学习模式通过一种惩罚机制来关注难分的负样本，以提高判别能力。此外，提出的模型还能自适应地从非真实标签样本中挖掘出更多隐藏的相关语义表征，从而提高了模型的性能和泛化能力。最后，在Flickr30K和MSCOCO公共数据集上的实验结果证明了该算法的有效性，其性能达到了目前先进水平。本方法有效地结合了图像文本两种模态，能有效提高自然语言搜索和视觉问题回答等应用的性能。

Abstract: Aiming at the shortcomings that the existing image-text matching algorithms based on common-sense learning cannot effectively match the intractable negative samples in image-text sample pairs, and the generalization ability of the models is weak and ineffective on large-scale datasets, a novel image-text matching model called Active Mining Sample Pair Semantics image-text matching model is proposed. Firstly, the proposed Adaptive Hierarchical Reinforcement Loss has diversified learning modes, and on top of the traditional triple loss, predictive candidate instances (pairs of intractable sample pairs) are added to aid in training. Its active learning mode enables model to more focus on the intractable negative samples through a penalizing mechanism to enhance the discriminative ability. In addition, the proposed model can also adaptively mine more hidden relevant semantic representations from uncommented items, which greatly improves the performance and generalization ability of model. Finally, experimental results on Flickr30K and MSCOCO datasets show that this proposed method is superior to the existing advanced comparison methods.

HTML全文

参考文献(36)

施引文献

资源附件(0)