广东工业大学学报 ›› 2018, Vol. 35 ›› Issue (03): 29-36.doi: 10.12052/gdutxb.180043

• 综合研究 • 上一篇    下一篇

一种增量式学习的语音字典构造方法

滕少华, 宋欢, 霍颖翔, 张巍   

  1. 广东工业大学 计算机学院, 广东 广州 510006
  • 收稿日期:2018-03-07 出版日期:2018-05-09 发布日期:2018-04-26
  • 通信作者: 张巍(1964-),女,副教授,主要研究方向为协同计算、数据挖掘、网络安全、大数据.E-mail:weizhang@gdut.edu.cn E-mail:weizhang@gdut.edu.cn
  • 作者简介:滕少华(1962-),男,教授,博士,主要研究方向为数据挖掘、网络安全、协同计算、大数据.
  • 基金资助:
    国家自然科学基金资助项目(61402118,61673123,61603100,61702110);广东省科技计划项目(2015B090901016,2016B010108007);广东省教育厅项目(粤教高函〔2018〕1号,粤教高函〔2015〕113号,粤教高函〔2014〕97号);广州市科技计划项目(201604020145,2016201604030034,201508010067,201604046017)

An Incremental Learning Approach in Voice Compression via Sparse Dictionary Learning

Teng Shao-hua, Song Huan, Huo Ying-xiang, Zhang Wei   

  1. School of Computers, Guangdong University of Technology, Guangzhou, 510006, China
  • Received:2018-03-07 Online:2018-05-09 Published:2018-04-26
  • Supported by:
     

摘要: 爆炸式增长的语音数据为存储与传输带来极大困难,现有方法难以实时应对海量语音频域数据.因此本文提出一种增量式学习的语音字典构造方法,该方法先将语音时域信号经短时傅里叶变换处理后转换为各窗频谱幅值,再将高维空间向量投影到低维空间,并以字典中的少数基向量线性拟合当前窗向量.进而通过存储基向量的标识和拟合系数完成对当前窗向量的存储,把无法拟合的窗向量经处理后加入字典,实现增量式学习.解压过程依据用户请求将字典中指定条目经线性拟合实现.实验结果表明,本方法能大幅度压缩语音频谱包络,适用于受带宽限制下实时高采样率的流式语音数据,与同类算法相比,在保证还原质量的情况下,能对信号的存储空间以及传输带宽进行大幅度的压缩.

关键词: 语音压缩, 语音解压, 实时处理, 流式数据, 增量学习, 稀疏字典学习

Abstract: The explosive growth of audio streams brings difficulties in storage and transmission; however, many methods could not give high compression ratio while keeping the quality. In order to solve this problem, the proposed method compresses amplitude spectrum of voice by constructing a dynamic sparse voice dictionary based on incremental learning. It calculates amplitude envelopes spectrums via Short-Time Fourier Transform (STFT) firstly, and then it uses a dictionary to fit each envelope by projecting high dimensional vectors to several 2D planes. In addition, it minimizes the number of dictionary items and therefore can store the parameters of linear interpolation instead of spectrums. Otherwise, if the fitting step above fails, it will store this window of spectrum directly. By using dictionary and parameters of linear interpolation, it can reconstruct the spectrum efficiently in decompressing process. The results of experiments show that comparing with other methods, the proposed method gives high compression ratio as well as better accuracy in decompressing, and adapt to live voice stream encoding with high sampling rate.

Key words: voice compression, voice decompression, real-time processing, streaming data, incremental learning, sparse dictionary learning

中图分类号: 

  • TP301
[1] GIBSON J. Speech compression[J]. Information, 2016, 7(2):32-54.
[2] DIETZ M, MULTRUS M, EKSLER V, et al. Overview of the EVS codec architecture[C]//Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. Brisbane:IEEE, 2015:5698-5702.
[3] AURISTIN F N, MALI S. New Ieee standard for advanced audio coding in lossless audio compression:a literature review[J]. International Journal of Engineering and Computer Science, 2016, 5(4):16167-16171.
[4] GEORGE M L, TOMAR G S, SHARMA T, et al. Hardware implementation of adaptive-differential pulse code modulation speech compression algorithm[J]. International Journal of Smart Device and Appliance, 2014, 2(2):1-10.
[5] DUTA C L, GHEORGHE L, TAPUS N. Real time implementation of MELP speech compression algorithm using Blackfin processors[C]//Image and Signal Processing and Analysis (ISPA), 20159th International Symposium on. Zagreb:IEEE, 2015:250-255.
[6] YANG L, GUO Z Y, YONG S S, et al. A hardware implementation of real time lossless data compression and decompression circuits[J]. Applied Mechanics and Materials, 2015, 719:554-560.
[7] Wikipedia Encyclopedia. Spare dictionary learning[EB/OL]. (2017-12-19)[2018-01-29]. https://en.wiki-pedia.org/wiki/Sparse_dictionary_learning
[8] YANG M, DAI D, SHEN L, et al. Latent dictionary learning for sparse representation based classification[C]//Computer Vision and Pattern Recognition. Columbus:IEEE, 2014:4138-4145.
[9] LI F, ZHANG X, ZHANG H, et al. An AFK-SVD sparse representation approach for speech signal processing[C]//PAN J S, TSAI P W, HUANG H C. Advances in Intelligent Information Hiding and Multimedia Signal Processing. Kaohsiung, Taiwan:[s. n.], 2018:177-184.
[10] 叶向荣, 刘怡俊, 陈云华, 等. 基于L_(1/2)自适应稀疏正则化的图像重建算法[J]. 广东工业大学学报, 2017, 34(6):43-48.YE X R, LIU Y J, CHEN Y H, et al. A super-resolution image reconstruction algorithm with adoptive L_(1/2) sparse regularization[J]. Journal of Guangdong University of Technology, 2017, 34(6):43-48.
[11] YAN Y, YANG Y, MENG D, et al. Event oriented dictionary learning for complex event detection[J]. IEEE Transactions on Image Processing, 2015, 24(6):1867-1878.
[12] 杨婷, 滕少华. 改进的贝叶斯分类方法在电信客户流失中的研究与应用[J]. 广东工业大学学报, 2015, 32(3):67-72.YANG T, TENG S H. Research and application of improved Bayes algorithm for the telecommunication customer churn[J]. Journal of Guangdong University of Technology, 2015, 32(3):67-72.
[13] NADERAHMADIAN Y, BEHESHTI S, TINATI M A. Correlation based online dictionary learning algorithm[J]. IEEE Transactions on Signal Processing, 2015, 64(3):592-602.
[14] AHARON M, ELAD M, BRUCKSTEIN A. SVD:an algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11):4311-4322.
[15] MAIRAL J, BACH F, PONCE J, et al. Online learning for matrix factorization and sparse coding[J]. Journal of Machine Learning Research, 2010, 11(1):19-60.
[16] ZUBAIR S, YAN F, WANG W. Dictionary learning based sparse coefficients for audio classification with max and average pooling[J]. Digital Signal Processing, 2013, 23(3):960-970.
[17] BAO C, JI H, QUAN Y, et al. Dictionary learning for sparse coding:algorithms and convergence analysis[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(7):1356-1369.
[18] SHEN L, WANG S, SUN G, et al. Multi-level discriminative dictionary learning towards hierarchical visual categorization[C]//Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. Portland:IEEE, 2013:383-390.
[19] GUHA T, WARD R K. Image similarity using sparse representation and compression distance[J]. IEEE Transactions on Multimedia, 2014, 16(4):980-987.
[20] 李轶南, 张雄伟, 曾理, 等. 改进的稀疏字典学习单通道语音增强算法[J]. 信号处理, 2014, 30(1):44-55.LI Y N, ZHANG X W, ZENG L, et al. An improved Monaural speech enhancement algorithm based on sparse dictionary learning[J]. Journal of Signal Processing, 2014, 30(1):44-55.
[21] SUN L H. Speech enhancement based on data-driven dictionary and sparse representation[J]. Signal Processing, 2011, 27(12):1793-1800.
[22] CHEN X, DU Z, LI J, et al. Compressed sensing based on dictionary learning for extracting impulse components[J]. Signal Processing, 2014, 96(5):94-109.
[23] SRINIVAS M, ROY D, MOHAN C K. Learning sparse dictionaries for music and speech classification[C]//Digital Signal Processing (DSP), 201419th International Conference on. Hong Kong:IEEE, 2014:673-675.
[24] HSIEH C T, HUANG P Y, CHEN T W, et al. Speech enhancement based on sparse representation under color noisy environment[C]//Intelligent Signal Processing and Communication Systems (ISPACS), 2015 International Symposium on. Nusa Dua:IEEE, 2015:134-138.
[25] ALLEN J. Short term spectral analysis, synthesis, and modification by discrete Fourier transform[J]. IEEE Transactions on Acoustics Speech & Signal Processing, 2003, 25(3):235-238.
[1] 杨婷, 滕少华. 改进的贝叶斯分类方法在电信客户流失中的研究与应用[J]. 广东工业大学学报, 2015, 32(3): 67-72.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!