Journal of Guangdong University of Technology ›› 2020, Vol. 37 ›› Issue (03): 23-35.doi: 10.12052/gdutxb.190123

Previous Articles     Next Articles

Hashing for Approximate Nearest Neighbor Search on Big Data: A Survey

Fei Lun-ke, Qin Jian-yang, Teng Shao-hua, Zhang Wei, Liu Dong-ning, Hou Yan   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2019-09-24 Online:2020-05-12 Published:2020-05-12

Abstract: Approximate Nearest Neighbor (ANN) search has served as one of the most important technologies for efficient retrieval of large-scale data in the era of artificial intelligence. As a promising solution to the ANN, hashing has received a lot of attention due to its high efficiency and extensive works have been presented in the literature. However, so far, there is no work with attempt to comprehensively analyze and overview the state-of-the-art hashing methods. To address this, the basics of hashing, including distance calculation, loss function, discrete constraint and out-of-sample learning, are first systematically introduced. Then, the state-of-the-art hashing based methods are comparatively studied and experiments on the widely used databases are conducted to evaluate their performance. Finally, the key problems of hashing methods are summarized and some potential research directions are pointed out. It is believed that this endeavor could provide other researches with a useful guideline in designing effective and efficient hashing methods.

Key words: approximate nearest neighbor search, hashing learning, hashing, data retrieval

CLC Number: 

  • TP391
[1] RUI Y, HUANG T S, CHANG S F. Image retrieval: current techniques, promising directions, and open issues [J]. Journal of Visual Communication and Image Representation, 1999, 10(1): 39-62
[2] HAR-PELED S, INDYK P, MOTWANI R. Approximate nearest neighbor: towards removing the curse of dimensionality [J]. Theory of Computing, 2012, 8(14): 321-350
[3] WEBER R, SCHEK H J, BLOTT S. A quantitative analysis and performance study for similarity- search methods in high-dimensional spaces[C]// International Conference on Very Large Data Bases. San Francisco: VLDB, 1998: 194-205.
[4] FRIEDMAN J H, BENTLEY J L, FINKEL R A. An algorithm for finding best matches in logarithmic expected time [J]. ACM Transactions on Mathematical Software, 1977, 3(3): 209-226
[5] UHLMANN J K. Satisfying general proximity / similarity queries with metric trees [J]. Information Processing Letters, 1991, 40(4): 175-179
[6] 蒋凯, 武港山. 基于Web的信息检索技术综述[J]. 计算机工程, 2005, 31(24): 7-9 JIANG K, WU G S. Overview of information retrieval technology for Web [J]. Computer Engineering, 2005, 31(24): 7-9
[7] JÉGOU H, DOUZE M, SCHMID C. Product quantization for nearest neighbor search [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(1): 117-128
[8] KULIS B, DARRELL T. Learning to hash with binary reconstructive embeddings[C]// Advances in Neural Information Processing Systems. Vancouver: NIPS, 2009: 1042-1050
[9] LIU W, WANG J, JI R R, et al. Supervised hashing with kernels[C]// Computer Vision & Pattern Recognition. Providence: IEEE, 2012: 2074-2081.
[10] WANG J, KUMAR S, CHANG S F. Sequential projection learning for hashing with compact codes[C]// International Conference on International Conference on Machine Learning. Haifa: ICML, 2010: 1127-1134.
[11] XU B, BU J, LIN Y, et al. Harmonious hashing[C]// International Joint Conference on Artificial Intelligence. Beijing: IJCAI, 2014: 1820-1826.
[12] HE J, LIU W, CHANG S F. Scalable similarity search with optimized kernel hashing[C]// International Conference on Knowledge Discovery & Data Mining. Boston: ACM SIGKDD, 2010: 1129-1138.
[13] WEISS Y, TORRALBA A, FERGUS R. Spectral hashing[C]// Advances in Neural Information Processing Systems. Vancouver: NIPS, 2009: 1753–1760.
[14] CARREIRA P M. The elastic embedding algorithm for dimensionality reduction[C]// International Conference on Machine Learning. Haifa: ICML, 2010: 167-174.
[15] SHAKHNAROVICH G, VIOLA P, DARRELL T. Fast pose estimation with parameter-sensitive hashing[C]// International Conference on Computer Vision. Nice: IEEE, 2003: 750.
[16] GONG Y, LAZEBNIK S, GORDO A, et al. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(12): 2916-2929
[17] WOLD S, ESBENSEN K, GELADI P. Principal component analysis [J]. Chemometrics & Intelligent Laboratory Systems, 1987, 2(1): 37-52
[18] LIU W, MU C, KUMAR S, et al. Discrete graph hashing[C]// Advances in Neural Information Processing Systems. Montreal: NIPS, 2014: 3419-3427.
[19] LIU W, WANG J, KUMAR S, et al. Hashing with graphs[C]// International Conference on Machine Learning. Bellevue: ICML, 2011: 1-8.
[20] ZHU X, HUANG Z, CHENG H, et al. Sparse hashing for fast multimedia search [J]. ACM Transactions on Information Systems, 2013, 31(2): 1-24
[21] HARDOON D R, SZEDMAK S, SHAWE T J. Canonical correlation analysis: an overview with application to learning methods [J]. Neural Computation, 2004, 16(12): 2639-2664
[22] GIONIS A, INDYK P, MOTWANI R. Similarity search in high dimensions via hashing[C]// International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1999: 518-529.
[23] DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on p-stable distributions[C]// Symposium on Computational Geometry. Brooklyn: SCG, 2004: 253-262.
[24] KULIS B, GRAUMAN K. Kernelized locality-sensitive hashing for scalable image search[C]// International Conference on Computer Vision. Kyoto: IEEE, 2009: 2130- 2137.
[25] 林悦. 基于哈希算法的高维数据的最近邻检索[D]. 杭州: 浙江大学, 2013.
[26] 苏毅娟, 余浩, 雷聪, 等. 基于PCA的哈希图像检索算法[J]. 计算机应用研究, 2018, 35(10): 3147-3150 SU Y J, YU H, LEI C, et al. PCA hashing for image data retrieval [J]. Application Research of Computers, 2018, 35(10): 3147-3150
[27] BODÓ Z, CSATÓ L. Linear spectral hashing [J]. Neurocomputing, 2014, 141: 117-123
[28] 夏立超, 蒋建国, 齐美彬. 基于改进谱哈希的大规模图像检索[J]. 合肥工业大学学报(自然科学版), 2016, 39(8): 1049-1054 XIA L C, JIANG J G, QI M B. A large-scale image retrieval method based on improved spectral hashing [J]. Journal of Hefei University of Technology (Natural Science), 2016, 39(8): 1049-1054
[29] KIM S, CHOI S. Multi-view anchor graph hashing[C]// International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 3123-3127.
[30] 吕成钢. 面向大规模图片搜索的基于锚图哈希的半监督度量学习算法[D]. 南京: 南京邮电大学, 2018.
[31] TAKEBE H, UEHARA Y, UCHIDA S. Efficient anchor graph hashing with data-dependent anchor selection [J]. IEICE Transactions on Information and Systems, 2015, E98-(11): 2030-2033
[32] 赵珊, 李永思. 基于随机旋转局部保持哈希的图像检索技术[J]. 工程科学与技术, 2019, 51(2): 144-150 ZHAO S, LI Y S. Image retrieval based on random rotation locality preserving hashing [J]. Advanced Engineering Sciences, 2019, 51(2): 144-150
[33] CARREIRA P M A, RAZIPERCHIKOLAEI R. Hashing with binary autoencoders[C]// Computer Vision & Pattern Recognition. Boston: IEEE, 2015: 557-566.
[34] KONG W, LI W J. Isotropic hashing[C]// International Conference on Neural Information Processing Systems. Vancouver: NIPS, 2012: 1646-1654.
[35] XIA Y, HE K, KOHLI P, et al. Sparse projections for high-dimensional binary codes[C]// Computer Vision & Pattern Recognition. Boston: IEEE, 2015: 3332-3339.
[36] 徐盼盼. 基于哈希学习的图像检索方法研究[D].沈阳: 沈阳工业大学, 2019.
[37] ZHENG M, BU J, CHEN C, et al. Graph regularized sparse coding for image representation [J]. IEEE Transactions on Image Processing, 2011, 20(5): 1327-1336
[38] DING G, ZHOU J, GUO Y, et al. Large-scale image retrieval with sparse embedded hashing [J]. Neurocomputing, 2017, 257: 24-36
[39] LAI Z H, CHEN Y D, WU J, et al. Jointly sparse hashing for image retrieval [J]. IEEE Transactions on Image Processing, 2018, 27(12): 6147-6158
[40] HU M, YANG Y, SHEN F M, et al. Hashing with angular reconstructive embeddings [J]. IEEE Transactions on Image Processing, 2018, 27(2): 545-555
[41] HU D, NIE F P, LI X L. Discrete spectral hashing for efficient similarity retrieval [J]. IEEE Transactions on Image Processing, 2019, 28(3): 1080-1091
[42] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324
[43] LECUN Y, BOTTOU L, BENGIO Y, et al. MNIST database[DB/OL]. (1998)[2019-12-3]. http://yann.lecun.com/exdb/mnist/.
[44] KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[R].[S.l.: s.n.], 2009.
[45] KRIZHEVSKY A, HINTON G. CIFAR-10 database[DB/OL].(2009)[2019-12-3]. http://www.cs.toronto.edu/~kriz/cifar.html.
[46] KINNUNEN T, KAMARAINEN J, LENSU L, et al. Making visual object categorization more challenging: randomized Caltech-101 data set[C]// International Conference on Pattern Recognition. Istanbul: IEEE, 2010: 476- 479.
[47] KINNUNEN T, KAMARAINEN J, LENSU L, et al. Cal-tech-101 database[DB/OL]. (2010)[2019-12-3]. http://www.vision.caltech.edu/Image_Datasets/Caltech101/.
[48] XIAO J, HAYS J, EHINGER K A, et al. SUN database: large-scale scene recognition from abbey to zoo[C]// Computer Vision & Pattern Recognition. San Francisco: IEEE, 2010: 3485-3492.
[49] XIAO J, HAYS J, EHINGER K A, et al. SUN397 database[DB/OL]. (2010)[2019-12-3]. https://vision.princeton.edu/projects/2010/SUN/.
[50] WOLF L, HASSNER T, MAOZ I. Face recognition in unconstrained videos with matched background similarity[C]// Computer Vision & Pattern Recognition. Colorado: IEEE, 2011: 529-534.
[51] WOLF L, HASSNER T, MAOZ I. YoutubeFaces database[DB/OL]. (2011)[2019-12-3]. http://www.cs.tau.ac.il/~wolf/ytfaces/index.html.
[52] CHUA T S, TANG J, HONG R, et al. NUS-WIDE: a real-world web image database from National University of Singapore[C]// International Conference on Image & Video Retrieval. Santorini: CIVR, 2009: 48.
[53] CHUA T S, TANG J, HONG R, et al. NUS-WIDE database[DB/OL]. (2009)[2019-12-3]. https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html.
[54] TORRALBA A, MURPHY K P, FREEMAN W T, et al. Context-based vision system for place and object recognition[C]// International Conference on Computer Vision. Nice: IEEE, 2008: 273-280.
[55] LOWE D G. Object recognition from local scale-invariant features[C]// International Conference on Computer Vision. Kerkyra: IEEE, 1999: 1150-1157.
[56] GONG Y, WANG L, GUO R, et al. Multi-scale orderless pooling of deep convolutional activation features[C]// European Conference on Computer Vision. Zurich: ECCV, 2014: 392-407.
[57] OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971-987
[58] WEISS Y, TORRALBA A, FERGUS R. Spectral Hash-ing[CP/OL]. (2009)[2019-12-3]. https://www.cse.huji.ac.il/~yweiss/SpectralHashing/.
[59] LIU W, WANG J, KUMAR S, et al. Anchor Graph Hashing[CP/OL].(2011)[2019-12-3]. http://www.ee.columbia.edu/~wliu/.
[60] GONG Y, LAZEBNIK S, GORDO A, et al. Iterative Quantization[CP/OL].(2013)[2019-12-3]. http://slazebni.cs.illinois.edu/publications.html.
[61] LAI Z H, CHEN Y D, WU J, et al. Jointly Sparse Hashing[CP/OL]. (2018)[2019-12-3]. http://www.scholat.com/laizhihui.
[62] LI X, HU D, NIE F P. Large graph hashing with spectral rotation[C]//Thirty-First AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 2203-2209.
[63] LI X, HU D, NIE F P. Large Graph Hashing with Spectral Rotation[CP/OL]. (2017)[2019-12-3]. http://www.escience.cn/people/fpnie/index.html.
[1] Xie Guo-bo, Lin Li, Lin Zhi-yi, He Di-xuan, Wen Gang. An Insulator Burst Defect Detection Method Based on YOLOv4-MP [J]. Journal of Guangdong University of Technology, 2023, 40(02): 15-21.
[2] Chen Jing-yu, Lyu Yi. Frost Detection Method of Cold Chain Refrigerating Machine Based on Spiking Neural Network [J]. Journal of Guangdong University of Technology, 2023, 40(01): 29-38.
[3] Ye Wen-quan, Li Si, Ling Jie. Sparse-view SPECT Image Reconstruction Based on Multilevel-residual U-Net [J]. Journal of Guangdong University of Technology, 2023, 40(01): 61-67.
[4] Zou Heng, Gao Jun-li, Zhang Shu-wen, Song Hai-tao. Design and Implementation of a Dropping Guidance Device for Go Robot [J]. Journal of Guangdong University of Technology, 2023, 40(01): 77-82,91.
[5] Xie Guang-qiang, Xu Hao-ran, Li Yang, Chen Guang-fu. Consensus Opinion Enhancement in Social Network with Multi-agent Reinforcement Learning [J]. Journal of Guangdong University of Technology, 2022, 39(06): 36-43.
[6] Liu Xin-hong, Su Cheng-yue, Chen Jing, Xu Sheng, Luo Wen-jun, Li Yi-hong, Liu Ba. Real Time Detection of High Resolution Bridge Crack Image [J]. Journal of Guangdong University of Technology, 2022, 39(06): 73-79.
[7] Xiong Wu, Liu Yi. Application of Particle Filter Algorithm in Static Deformation Monitoring of BDS High-Speed Rail [J]. Journal of Guangdong University of Technology, 2022, 39(04): 66-72.
[8] Yi Min-qi, Liu Hong-wei, Gao Hong-ming. Research on the Factors Influencing the Co-purchase Network of Products on E-commerce Platforms [J]. Journal of Guangdong University of Technology, 2022, 39(03): 16-24.
[9] Qiu Zhan-chun, Fei Lun-ke, Teng Shao-hua, Zhang Wei. Palmprint Recognition Based on Cosine Similarity [J]. Journal of Guangdong University of Technology, 2022, 39(03): 55-62.
[10] Zheng Jia-bi, Yang Zhen-guo, Liu Wen-yin. Marketing-Effect Estimation Based on Fine-grained Confounder Balancing [J]. Journal of Guangdong University of Technology, 2022, 39(02): 55-61.
[11] Gary Yen, Li Bo, Xie Sheng-li. An Evolutionary Optimization of LSTM for Model Recovery of Geophysical Fluid Dynamics [J]. Journal of Guangdong University of Technology, 2021, 38(06): 1-8.
[12] Li Guang-cheng, Zhao Qing-lin, Xie Kan. A Design of Decentralized Data Processing Scheme [J]. Journal of Guangdong University of Technology, 2021, 38(06): 77-83.
[13] Xie Guang-qiang, Zhao Jun-wei, Li Yang, Xu Hao-ran. Cooperative Lane-changing Based on Multi-cluster System [J]. Journal of Guangdong University of Technology, 2021, 38(05): 1-9.
[14] Zhang Wei, Zhang Zhen-bin. Joint Graph Embedding and Feature Weighting for Unsupervised Feature Selection [J]. Journal of Guangdong University of Technology, 2021, 38(05): 16-23.
[15] Deng Jie-hang, Yuan Zhong-ming, Lin Hao-run, Gu Guo-sheng. Superpixel and Visual Saliency Synergetic Image Quality Assessment [J]. Journal of Guangdong University of Technology, 2021, 38(05): 33-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!