Journal of Guangdong University of Technology ›› 2023, Vol. 40 ›› Issue (02): 45-54.doi: 10.12052/gdutxb.210149
Previous Articles Next Articles
Zhang Rui, Lyu Jun
CLC Number:
[1] EPHRAL A, MOSSERI I, LANG O, et al. Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation [J]. ACM Transactions on Graphics, 2018, 37(4): 1-11. [2] KE Y, DONG X, YAN B. Overview of patent technologies for blind separation of mixed speech signals [J]. China Science and Technology Information, 2019(5): 22-23. [3] 朱阁. 基于深度学习的单通道语音分离技术研究[D]. 南京: 南京邮电大学, 2020. [4] HUANG P, KIM M, HASEGAWA J M, et al. Deep learning for monaural speech separation[C]//2014 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014. Florence: IEEE, 2014: 1562-1566. [5] WU J, WANG Y. Research on speech separation based on GCC-NMF [J]. Journal of Jiangxi University of Technology, 2020, 41(5): 65-72. [6] GE W, ZHANG T, FAN C, et al. Human voice separation algorithm using sparse nonnegative matrix factorization and deep attractor network under noise [J]. Acta Acoustics Sinica, 2021, 46(1): 55-66. [7] VARGA A P, MOORE R K. Hidden Markov model decomposition of speech and noise[C]//International Conference on Acoustics, Speech and Signal Processing. New Mexico: IEEE, 1990: 845-848. [8] OCHIAI T, DELCROIX M, IKESHIKA R, et al. Beam-TasNet: time-domain audio separation network meets frequency-domain beamformer[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Florence: IEEE, 2020: 6384-6388. [9] WANG D L, CHEN J. Supervised speech separation based on deep learning: an overview [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018, 26(10): 1702-1726. [10] KRAWCZYK M, GERKMANN T. STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014, 22(12): 1931-1940. [11] MOWLAEE P, CHRISTENSEN M G, JEBSEB S H. Improved single-channel speech separation using sinusoidal modeling[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, Texas: IEEE Signal Processing Society, 2010: 21-24. [12] KOLVAK M, YU Z H, JENSEN J. Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017, 25(10): 1901-1913. [13] XU C, RAO W, XIAO X, et al. Single channel speech separation with constrained utterance level permutation invariant training using grid lstm[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary, Canada: IEEE, 2018: 6-10. [14] XU C, RAO W, XIAO X, et al. A shifted delta coefficient objective for monaural speech separation using multi-task learning[C]//INTERSPEECH. Hyderabad, India: IEEE, 2018: 3479-3483. [15] XU C, RAO W, CHNG E S. Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Brighton, UK: IEEE, 2019: 6990-6994. [16] DELCROIX M, ZMOLIKOVA K, KINOSHITA K. Single channel target speaker extraction and recognition with speaker beam[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary, Canada: IEEE, 2018: 5554-5558. [17] WANG Q, MUCKENHIM H, WILSON K, et al. Voice filter: targeted voice separation by speaker-conditioned spectrogram masking[C]//INTERSPEECH. Graz, Austria: IEEE, 2019: 2728-2732. [18] DING S, WANG Q, CHANG S, et al. Personal VAD: speaker-conditioned voice activity detection[C]//Proc. Odyssey 2020 The Speaker and Language Recognition Workshop. Tokyo: Odyssey, 2020: 433-439. [19] TU Y, DU J, XU Y. Deep neural network based speech separation for robust speech recognition[C]//2014 12th International Conference on Signal Processing (ICSP) . Hangzhou: IEEE, 2014: 532-536. [20] LUO Y, MESGARANI N. Tasnet: time-domain audio separation network for real-time, single-channel speech separation[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary: IEEE, 2018: 696-700. [21] VEKATANI S, CASEBEER J, SMARAGDIS P. End-to-end source separation with adaptive front-ends[C]//2018 52nd Asilomar Conference on Signals, Systems and Computers. California: IEEE, 2018: 684-688. [22] LUO Y, MESGARANI N. Conv-Tasnet: surpassing ideal time-frequency magnitude masking for speech separation [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2019, 27(8): 1256-1266. [23] LUO Y, CHEN Z, YOSHIOKA T. Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Spain: IEEE, 2020: 46-50. [24] WIJAYAKUSUMA A, GOZALI D R, WIDJAJA A, et al. Implementation of real-time speech separation model using time-domain audio separation network (TasNet) and dual-path recurrent neural network (DPRNN) [J]. Procedia Computer Science, 2021, 179: 762-772. [25] XU C, RAO W. SpEx: multi-scale time domain speaker extraction network[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1370-1384. [26] GE M, XU C, WANG L. SpEx+: A complete time domain speaker extraction network[C]//INTERSPEECH. Shanghai: IEEE, 2019: 1406-1410. [27] JIN Y, TANG C, LIU Q. Multi-head self-attention-based deep clustering for single-channel speech separation[J]. IEEE Access, 2020, 8: 100013-100021. [28] SUN Y, XIAN Y, WANG W. Monaural source separation in complex domain with long short-term memory neural network[J]. IEEE Journal of Selected Topics in Signal Processing, 2019, 13(2) : 359-369. [29] LI Z, SONG Y, MCLOUGHLIN I. Source-aware context network for single-channel multi-speaker speech separation[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary, Canada: IEEE, 2018: 681-685. [30] ZMOLIKOVA K, DELCROIX M, KINOSHITA K. Learning speaker representation for neural network based multichannel speaker extraction[C]//2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) . Okinawa, Japan: IEEE, 2017: 8-15. [31] NASSIF A B, SHAHIN I, ATTILI A, et al. Speech recognition using deep neural networks: a systematic review[J]. IEEE Access, 2019, 7: 19143-19165. [32] ABDAR M, POURPANAH F, HUSSAIN S, et al. A review of uncertainty quantification in deep learning: techniques, applications and challenges[J]. Information Fusion, 2021, 76: 243-297. [33] ROY A G, CONJETI S, NAVAB N. Bayesian quicknat: model uncertainty in deep whole-brain segmentation for structure-wise quality control[J]. Neuro Image, 2019, 195: 11-22. [34] CLEMENTS W R, VAN D B, ROBAGLIA B M, et al. Estimating risk and uncertainty in deep reinforcement learning[C]//2020 International Conference on Machine Learning (ICML). Austria: IMLS, 2020: 258-260. [35] JAIN M, LAHLOU S, NEKOEI H. DEUP: direct epistemic uncertainty prediction[C]//2022 International Conference on Learning Representations(ICLR). Online: Open Review, 2022: 292-294. [36] COMBALIA M, HUETO F, PUIG S, et al. Uncertainty estimation in deep neural networks for dermoscopic image classification[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 744-745. [37] FAN X, DENG Z, WANG K, et al. Learning discriminative representation for facial expression recognition from uncertainties[C]//2020 IEEE International Conference on Image Processing (ICIP) . Abu Dhabi, Arabia: IEEE, 2020: 903-907. [38] ZHE L J, LIN Z, PADHY S, et al. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness[J]. Advances in Neural Information Processing Systems, 2020, 33: 7498-7512. [39] RIBAS D, VINCENT E. An improved uncertainty propagation method for robust i-vector based speaker recognition[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Brighton, UK: IEEE, 2019: 6331-6335. [40] WANG K, PENG X, YANG J, et al. Suppressing uncertainties for large-scale facial expression recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 6897-6906. [41] TAGASOVSKA N, LOPEZ P D. Single-model uncertainties for deep learning[J]. Advances in Neural Information Processing Systems, 2019, 32: 6417-6428. [42] 张锐. 基于不确定性度量的单通道语音分离算法研究[D]. 广州: 广东工业大学, 2022. [43] VARGA A, STEENEKEN H J M. Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems [J]. Speech Communication, 1993, 12(3): 247-251. [44] HU G, WANG D L. A tandem algorithm for pitch estimation and voiced speech segregation [J]. IEEE Transactions on Audio, Speech and Language Processing, 2010, 18(8): 2067-2079. [45] PANAYIOTOU V, CHEN G, POKEY D, et al. Libri Speech: an ASR corpus based on public domain audio books[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Oslo, Norway: IEEE, 2015: 5206-5210. [46] LIU Y, DELARIA M, WANG D L. Deep casa for talker- independent monaural speech separation[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Barcelona: IEEE, 2020: 6354-6358. [47] SALEEN N, IRFAN M. Noise reduction based on soft masks by incorporating SNR uncertainty in frequency domain [J]. Circuits, Systems and Signal Processing, 2019, 37(6): 2591-2612. |
[1] | Xie Guo-bo, Lin Li, Lin Zhi-yi, He Di-xuan, Wen Gang. An Insulator Burst Defect Detection Method Based on YOLOv4-MP [J]. Journal of Guangdong University of Technology, 2023, 40(02): 15-21. |
[2] | Qiu Jun-hao, Cheng Zhi-jian, Lin Guo-huai, Ren Hong-ru, Lu Ren-quan. Prescribed Performance Control for a Class of Nonlinear Pure-feedback Systems with Actuator Faults [J]. Journal of Guangdong University of Technology, 2023, 40(02): 55-63. |
[3] | Chen Jing-yu, Lyu Yi. Frost Detection Method of Cold Chain Refrigerating Machine Based on Spiking Neural Network [J]. Journal of Guangdong University of Technology, 2023, 40(01): 29-38. |
[4] | Ye Wen-quan, Li Si, Ling Jie. Sparse-view SPECT Image Reconstruction Based on Multilevel-residual U-Net [J]. Journal of Guangdong University of Technology, 2023, 40(01): 61-67. |
[5] | Peng Mei-chun, Yang Chen, Li Jun-ping, Ye Wei-bin, Huang Wen-wei. A Research on Vehicle Carbon Emission Calculating Method Based on BP Neural Network [J]. Journal of Guangdong University of Technology, 2023, 40(01): 107-112. |
[6] | Liu Hong-wei, Lin Wei-zhen, Wen Zhan-ming, Chen Yan-jun, Yi Min-qi. A MABM-based Model for Identifying Consumers' Sentiment Polarity―Taking Movie Reviews as an Example [J]. Journal of Guangdong University of Technology, 2022, 39(06): 1-9. |
[7] | Zhang Yun, Wang Xiao-dong. A Review and Thinking of Deep Learning with a Restricted Number of Samples [J]. Journal of Guangdong University of Technology, 2022, 39(05): 1-8. |
[8] | Peng Ji-guang, Xiao Han-zhen. Tracking and Obstacle Avoidance of Multi-mobile Robots Under Model Predictive Control [J]. Journal of Guangdong University of Technology, 2022, 39(05): 93-101. |
[9] | Li Yao-dong, Ren Zhi-gang, Wu Zong-ze. Deep Neural Network Based Predictive Control for Injection Molding Process [J]. Journal of Guangdong University of Technology, 2022, 39(05): 120-126,136. |
[10] | Zeng Jiang-yi, Li Zhi-sheng, Ou Yao-chun, Jin Yu-kai. PM2.5 Concentration Improving Prediction Modeling of Seasonal Index [J]. Journal of Guangdong University of Technology, 2022, 39(03): 89-94. |
[11] | Gary Yen, Li Bo, Xie Sheng-li. An Evolutionary Optimization of LSTM for Model Recovery of Geophysical Fluid Dynamics [J]. Journal of Guangdong University of Technology, 2021, 38(06): 1-8. |
[12] | Guo Xin-de, Chris Hong-qiang Ding. An AGV Path Planning Method for Discrete Manufacturing Smart Factory [J]. Journal of Guangdong University of Technology, 2021, 38(06): 70-76. |
[13] | Huang Jian-hang, Wang Zhen-you. A Research on Deep Learning Object Detection Algorithm Based on Feature Fusion [J]. Journal of Guangdong University of Technology, 2021, 38(04): 52-58. |
[14] | Ma Shao-peng, Liang Lu, Teng Shao-hua. A Lightweight Hyperspectral Remote Sensing Image Classification Method [J]. Journal of Guangdong University of Technology, 2021, 38(03): 29-35. |
[15] | Xia Hao, Cai Nian, Wang Ping, Wang Han. Magnetic Resonance Image Super-Resolution via Multi-Resolution Learning [J]. Journal of Guangdong University of Technology, 2020, 37(06): 26-31. |
|