Journal of Guangdong University of Technology ›› 2018, Vol. 35 ›› Issue (03): 29-36.doi: 10.12052/gdutxb.180043

Previous Articles     Next Articles

An Incremental Learning Approach in Voice Compression via Sparse Dictionary Learning

Teng Shao-hua, Song Huan, Huo Ying-xiang, Zhang Wei   

  1. School of Computers, Guangdong University of Technology, Guangzhou, 510006, China
  • Received:2018-03-07 Online:2018-05-09 Published:2018-04-26
  • Supported by:
     

Abstract: The explosive growth of audio streams brings difficulties in storage and transmission; however, many methods could not give high compression ratio while keeping the quality. In order to solve this problem, the proposed method compresses amplitude spectrum of voice by constructing a dynamic sparse voice dictionary based on incremental learning. It calculates amplitude envelopes spectrums via Short-Time Fourier Transform (STFT) firstly, and then it uses a dictionary to fit each envelope by projecting high dimensional vectors to several 2D planes. In addition, it minimizes the number of dictionary items and therefore can store the parameters of linear interpolation instead of spectrums. Otherwise, if the fitting step above fails, it will store this window of spectrum directly. By using dictionary and parameters of linear interpolation, it can reconstruct the spectrum efficiently in decompressing process. The results of experiments show that comparing with other methods, the proposed method gives high compression ratio as well as better accuracy in decompressing, and adapt to live voice stream encoding with high sampling rate.

Key words: voice compression, voice decompression, real-time processing, streaming data, incremental learning, sparse dictionary learning

CLC Number: 

  • TP301
[1] GIBSON J. Speech compression[J]. Information, 2016, 7(2):32-54.
[2] DIETZ M, MULTRUS M, EKSLER V, et al. Overview of the EVS codec architecture[C]//Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. Brisbane:IEEE, 2015:5698-5702.
[3] AURISTIN F N, MALI S. New Ieee standard for advanced audio coding in lossless audio compression:a literature review[J]. International Journal of Engineering and Computer Science, 2016, 5(4):16167-16171.
[4] GEORGE M L, TOMAR G S, SHARMA T, et al. Hardware implementation of adaptive-differential pulse code modulation speech compression algorithm[J]. International Journal of Smart Device and Appliance, 2014, 2(2):1-10.
[5] DUTA C L, GHEORGHE L, TAPUS N. Real time implementation of MELP speech compression algorithm using Blackfin processors[C]//Image and Signal Processing and Analysis (ISPA), 20159th International Symposium on. Zagreb:IEEE, 2015:250-255.
[6] YANG L, GUO Z Y, YONG S S, et al. A hardware implementation of real time lossless data compression and decompression circuits[J]. Applied Mechanics and Materials, 2015, 719:554-560.
[7] Wikipedia Encyclopedia. Spare dictionary learning[EB/OL]. (2017-12-19)[2018-01-29]. https://en.wiki-pedia.org/wiki/Sparse_dictionary_learning
[8] YANG M, DAI D, SHEN L, et al. Latent dictionary learning for sparse representation based classification[C]//Computer Vision and Pattern Recognition. Columbus:IEEE, 2014:4138-4145.
[9] LI F, ZHANG X, ZHANG H, et al. An AFK-SVD sparse representation approach for speech signal processing[C]//PAN J S, TSAI P W, HUANG H C. Advances in Intelligent Information Hiding and Multimedia Signal Processing. Kaohsiung, Taiwan:[s. n.], 2018:177-184.
[10] 叶向荣, 刘怡俊, 陈云华, 等. 基于L_(1/2)自适应稀疏正则化的图像重建算法[J]. 广东工业大学学报, 2017, 34(6):43-48.YE X R, LIU Y J, CHEN Y H, et al. A super-resolution image reconstruction algorithm with adoptive L_(1/2) sparse regularization[J]. Journal of Guangdong University of Technology, 2017, 34(6):43-48.
[11] YAN Y, YANG Y, MENG D, et al. Event oriented dictionary learning for complex event detection[J]. IEEE Transactions on Image Processing, 2015, 24(6):1867-1878.
[12] 杨婷, 滕少华. 改进的贝叶斯分类方法在电信客户流失中的研究与应用[J]. 广东工业大学学报, 2015, 32(3):67-72.YANG T, TENG S H. Research and application of improved Bayes algorithm for the telecommunication customer churn[J]. Journal of Guangdong University of Technology, 2015, 32(3):67-72.
[13] NADERAHMADIAN Y, BEHESHTI S, TINATI M A. Correlation based online dictionary learning algorithm[J]. IEEE Transactions on Signal Processing, 2015, 64(3):592-602.
[14] AHARON M, ELAD M, BRUCKSTEIN A. SVD:an algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11):4311-4322.
[15] MAIRAL J, BACH F, PONCE J, et al. Online learning for matrix factorization and sparse coding[J]. Journal of Machine Learning Research, 2010, 11(1):19-60.
[16] ZUBAIR S, YAN F, WANG W. Dictionary learning based sparse coefficients for audio classification with max and average pooling[J]. Digital Signal Processing, 2013, 23(3):960-970.
[17] BAO C, JI H, QUAN Y, et al. Dictionary learning for sparse coding:algorithms and convergence analysis[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(7):1356-1369.
[18] SHEN L, WANG S, SUN G, et al. Multi-level discriminative dictionary learning towards hierarchical visual categorization[C]//Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. Portland:IEEE, 2013:383-390.
[19] GUHA T, WARD R K. Image similarity using sparse representation and compression distance[J]. IEEE Transactions on Multimedia, 2014, 16(4):980-987.
[20] 李轶南, 张雄伟, 曾理, 等. 改进的稀疏字典学习单通道语音增强算法[J]. 信号处理, 2014, 30(1):44-55.LI Y N, ZHANG X W, ZENG L, et al. An improved Monaural speech enhancement algorithm based on sparse dictionary learning[J]. Journal of Signal Processing, 2014, 30(1):44-55.
[21] SUN L H. Speech enhancement based on data-driven dictionary and sparse representation[J]. Signal Processing, 2011, 27(12):1793-1800.
[22] CHEN X, DU Z, LI J, et al. Compressed sensing based on dictionary learning for extracting impulse components[J]. Signal Processing, 2014, 96(5):94-109.
[23] SRINIVAS M, ROY D, MOHAN C K. Learning sparse dictionaries for music and speech classification[C]//Digital Signal Processing (DSP), 201419th International Conference on. Hong Kong:IEEE, 2014:673-675.
[24] HSIEH C T, HUANG P Y, CHEN T W, et al. Speech enhancement based on sparse representation under color noisy environment[C]//Intelligent Signal Processing and Communication Systems (ISPACS), 2015 International Symposium on. Nusa Dua:IEEE, 2015:134-138.
[25] ALLEN J. Short term spectral analysis, synthesis, and modification by discrete Fourier transform[J]. IEEE Transactions on Acoustics Speech & Signal Processing, 2003, 25(3):235-238.
[1] YANG Ting, TENG Shao-Hua. Research and Application of Improved Bayes Algorithm for the Telecommunication Customer Churn [J]. Journal of Guangdong University of Technology, 2015, 32(3): 67-72.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!