A Research on Multimodal Method for Cooler Shelf Inventory Recognition

Zhang Weiwei; Zhu Yujie; Wu Bowen; Yang Zhijing; Chen Tianshui

doi:10.12052/gdutxb.250176

Zhang Weiwei, Zhu Yujie, Wu Bowen, et al. A research on multimodal method for cooler shelf inventory recognitionJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.250176

Citation:

Zhang Weiwei, Zhu Yujie, Wu Bowen, et al. A research on multimodal method for cooler shelf inventory recognitionJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.250176

Citation:

Zhang Weiwei, Zhu Yujie, Wu Bowen, et al. A research on multimodal method for cooler shelf inventory recognitionJ. Journal of Guangdong University of Technology. DOI: 10.12052/gdutxb.250176

A Research on Multimodal Method for Cooler Shelf Inventory Recognition

Graphical Abstract

Abstract

Abstract

This paper addresses the challenges of low efficiency in manual inventory counting and the weak generalization of existing visual methods in retail inventory management by proposing the Cooler Shelf Inventory Recognition (CSIR) framework. The framework takes multimodal inputs, encodes multi-angle images using a Vision Transformer, and aligns the resulting features to the latent space of the LLaMA (Large Language Model Meta AI) decoder through linear projection. A decoder-only language model is then constructed to enable end-to-end inventory information generation. This paper combines a domain-specific tokenizer that serializes shelf positions, product types, and inventory levels into discrete tokens to support autoregressive generation, and constructs a real-scenario dataset containing 17,000 samples covering complex conditions such as multi-view, reflective surfaces, and dense arrangements, with multidimensional evaluation metrics. Experimental results show that the proposed method achieves a tolerance-free overall accuracy of 70.17%, representing an approximately 10% improvement over the detection baseline, along with a 5.5-fold increase in inference efficiency. These results effectively reduce labor costs and inventory discrepancies, providing a scalable and reproducible reference solution for automated inventory management.

FullText(HTML)

References (34)

Cited By

Turn off MathJax

Article Contents

A Research on Multimodal Method for Cooler Shelf Inventory Recognition

Abstract

Catalog

Export File

Citation

Format

Content