Journal of Guangdong University of Technology ›› 2024, Vol. 41 ›› Issue (03): 91-101.doi: 10.12052/gdutxb.230037
• Computer Science and Technology • Previous Articles Next Articles
Li Zhuo-zhang1, Xu Bo-yan1, Cai Rui-chu1, Hao Zhi-feng1,2
CLC Number:
[1] CHERRY E C. Some experiments on the recognition of speech, with one and with two ears [J]. The Journal of the Acoustical Society of America, 1953, 25(5): 975-979. [2] HERSHEY J R, CHEN Z, LE ROUX J, et al. Deep clustering: discriminative embeddings for segmentation and separation[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Shanghai: IEEE, 2016: 31-35. [3] YU D, KOLBæK M, TAN Z H, et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New Orleans: IEEE, 2017: 241-245. [4] CHEN Z, LUO Y, MESGARANI N. Deep attractor network for single-microphone speaker separation[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . New Orleans: IEEE, 2017: 246-250. [5] LUO Y, MESGARANI N. Conv-tasnet: surpassing ideal time–frequency magnitude masking for speech separation [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(8): 1256-1266. [6] LUO Y, CHEN Z, YOSHIOKA T. Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Online: IEEE, 2020: 46-50. [7] CHEN J, MAO Q, LIU D. Dual-path transformer network: direct context-aware modeling for end-to-end monaural speech separation[C]//Conference of the International Speech Communication Association. Online: ISCA, 2020: 2642-2646. [8] ZEGHIDOUR N, GRANGIER D. Wavesplit: end-to-end speech separation by speaker clustering [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 2840-2849. [9] LAM M W Y, WANG J, SU D, et al. Sandglasset: a light multi-granularity self-attentive network for time-domain speech separation[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Online: IEEE, 2021: 5759-5763. [10] SUBAKAN C, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Online: IEEE, 2021: 21-25. [11] DELCROIX M, ZMOLIKOVA K, KINOSHITA K, et al. Single channel target speaker extraction and recognition with speaker beam[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary: IEEE, 2018: 5554-5558. [12] WAN L, WANG Q, PAPIR A, et al. Generalized end-to-end loss for speaker verification[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary: IEEE, 2018: 4879-4883. [13] WANG Q, MUCKENHIRN H, WILSON K, et al. VoiceFilter: targeted voice separation by speaker-conditioned spectrogram masking [C]//Conference of the International Speech Communication Association. Graz: ISCA, 2019: 2728-2732. [14] XU C, RAO W, CHNG E S, et al. Spex: multi-scale time domain speaker extraction network [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1370-1384. [15] GE M, XU C, WANG L, et al. SpEx+: a complete time domain speaker extraction network[C]//Conference of the International Speech Communication Association. Online: ISCA, 2020: 1406-1410. [16] GE M, XU C, WANG L, et al. Multi-stage speaker extraction with utterance and frame-level reference signals[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Online: IEEE, 2021: 6109-6113. [17] DENG C, MA S, SHA Y, et al. Robust speaker extraction network based on iterative refined adaptation. [C]// Conference of the International Speech Communication Association. Online: 2021, 3530-3534. [18] HAO Y, XU J, ZHANG P, et al. Wase: learning when to attend for speaker extraction in cocktail party environments[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Online: IEEE, 2021: 6104-6108. [19] JI X, YU M, ZHANG C, et al. Speaker-aware target speaker enhancement by jointly learning with speaker embedding extraction[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Online: IEEE, 2020: 7294-7298. [20] JU Y, RAO W, YAN X, et al. TEA-PSE: tencent-ethereal-audio-lab personalized speech enhancement system for ICASSP 2022 DNS CHALLENGE[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Singapore: IEEE, 2022: 9291-9295. [21] ZHAO Z, GU R, YANG D, et al. Speaker-aware mixture of mixtures training for weakly supervised speaker extraction [C]//Conference of the International Speech Communication Association. Incheon: ISCA, 2022, 5318-5322. [22] PANDEY A, WANG D L. Attentive training: a new training framework for talker-independent speaker extraction. [C]//Conference of the International Speech Communication Association. Incheon: ISCA, 2022, 201-205 [23] DELCROIX M, ZMOLIKOVA K, KINOSHITA K, et al. Single channel target speaker extraction and recognition with speaker beam[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary Alberta: IEEE, 2018: 5554-5558. [24] WANG W, XU C, GE M, et al. Neural speaker extraction with speaker-speech cross-attention network[C]// Conference of the International Speech Communication Association. Online: 2021: 3535-3539. [25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008. [26] ZHAO Z, YANG D, GU R, et al. Target confusion in end-to-end speaker extraction: analysis and approaches. [C]//Conference of the International Speech Communication Association. Incheon: ISCA, 2022: 5333-5337. [27] OKABE K, KOSHINAKA T, SHINODA K. Attentive statistics pooling for deep speaker embedding[C]//Conference of the International Speech Communication Association. Hyderabad: ISCA, 2018: 2252-2256. [28] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. [29] LE ROUX J, WISDOM S, ERDOGAN H, et al. SDR–half-baked or well done?[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Brighton: IEEE, 2019: 626-630. [30] COSENTINO J, PARIENTE M, CORNELL S, et al. Librimix: an open-source dataset for generalizable speech separation[EB/OL]. (2020-5-22) [2023-3-27].https://arxiv.org/abs/2005.11262. [31] PANAYOTOV V, CHEN G, POVEY D, et al. Librispeech: an ASR corpus based on public domain audio books[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Brisbane: IEEE, 2015: 5206-5210. [32] TAAL C H, HENDRIKS R C, HEUSDENS R, et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech[C]//2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas: IEEE, 2010: 4214-4217. [33] DELCROIX M, OCHIAI T, ZMOLIKOVA K, et al. Improving speaker discrimination of target speech extraction with time-domain speakerbeam[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Online: IEEE, 2020: 691-695. |
[1] | Wu Xiao-ling, Chen Xiang-wang, Zhan Wen-tao, Ling Jie. Chinese Medical Named Entity Recognition Based on Gated Attention Unit [J]. Journal of Guangdong University of Technology, 2023, 40(06): 176-184.doi: 10.12052/gdutxb.230037 |
[2] | Zhang Rui, Lyu Jun. Single-channel Speech Separation Based on Separated SI-SNR Regression Estimation and Adaptive Frequency Modulation Network [J]. Journal of Guangdong University of Technology, 2023, 40(02): 45-54.doi: 10.12052/gdutxb.230037 |
[3] | Li Qi-xiang, Xiao Yan-shan, Hao Zhi-feng, Ruan Yi-bang. An Algorithm Based on Multi-task Multi-instance Anti-noise Learning [J]. Journal of Guangdong University of Technology, 2018, 35(03): 47-53.doi: 10.12052/gdutxb.230037 |
|