Journal of Guangdong University of Technology ›› 2023, Vol. 40 ›› Issue (04): 9-17,23.doi: 10.12052/gdutxb.220122
• Computer Science and Technology • Previous Articles Next Articles
Dai Bin1, Zeng Bi1, Wei Peng-fei1, Huang Yong-jian2
CLC Number:
[1] ZHANG Z, TAKANOBU R, ZHU Q, et al. Recent advances and challenges in task-oriented dialog systems[J]. Science China Technological Sciences, 2020, 63(10): 2011-2027. [2] LEVIN E, PIERACCINI R, ECKERT W. Learning dialogue strategies within the markov decision process framework[C]//1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings. Santa Barbara: IEEE, 1997: 72-79. [3] YOUNG S, GAŠIĆ M, THOMSON B, et al. POMDP-based statistical spoken dialog systems: a review[J]. Proceedings of the IEEE, 2013, 101(5): 1160-1179. [4] 万里鹏, 兰旭光, 张翰博, 等. 深度强化学习理论及其应用综述[J]. 模式识别与人工智能, 2019(1): 67-81.WAN L P, LAN X G, ZHANG H B, et al. A review of deep reinforcement learning theory and application[J]. Pattern Recognition and Artificial Intelligence, 2019(1): 67-81. [5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. [6] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]// Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 1861-1870. [7] SCHATZMANN J, THOMSON B, WEILHAMMER K, et al. Agenda-based user simulation for bootstrapping a POMDP dialogue system[C]// Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics. Rochester: Association for Computational Linguistics, 2007: 149-152. [8] LI X, LIPTON Z C, DHINGRA B, et al. A user simulator for task-completion dialogues[EB/OL]. arxiv: 1612. 05688(2017-11-13)[2022-03-28].https://arxiv.org/abs/1612.05688. [9] PENG B, LI X, GAO J, et al. Deep Dyna-Q: integrating planning for task-completion dialogue policy learning[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: Association for Computational Linguistics, 2018: 2182-2192. [10] SU S Y, LI X, GAO J, et al. Discriminative deep Dyna-Q: robust planning for dialogue policy learning[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 3813-3823. [11] DHINGRA B, LI L, LI X, et al. Towards end-to-end reinforcement learning of dialogue agents for information access[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017: 484-495. [12] SUTTON R S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming[M]//Machine Learning Proceedings 1990. Burlington: Morgan Kaufmann, 1990: 216-224. [13] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. [14] HAKKANI-TÜR D, TÜR G, CELIKYILMAZ A, et al. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM[C]//Interspeech 2016. San Francisco: ISCA, 2016: 715-719. [15] MRKŠIĆ N, SÉAGHDHA D Ó, WEN T H, et al. Neural belief tracker: data-driven dialogue state tracking[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017: 1777-1788. [16] WEN T H, GASIC M, MRKŠIĆ N, et al. Semantically conditioned LSTM-based natural language generation for spoken dialogue systems[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics, 2015: 1711-1721. [17] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy networks for exploration[C]//International Conference on Learning Representations. Vancouver: ICLR, 2018: 1-21. [18] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]// Proceedings of The 33rd International Conference on Machine Learning. New York: PMLR, 2016: 1995-2003. [19] VAN H H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the 30 th AAAI Conference on Artificial Intelligence. New York: AAAI, 2016, 2094-2100. [20] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1): 9-44. [21] ZHANG Y, YU X, CUI Z, et al. Every document owns its structure: inductive text classification via graph neural networks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020: 334-339. [22] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. [23] LI X, CHEN Y N, LI L, et al. End-to-end task-completion neural dialogue systems[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Taipei: IJCNLP, 2017: 733-743. |
[1] | Su Tian-ci, He Zi-nan, Cui Miao, Zhang Guang-chi. Intelligent Path Planning Algorithm for Multi-UAV-assisted Data Collection Systems [J]. Journal of Guangdong University of Technology, 2023, 40(04): 77-84. |
[2] | He Yi-shan, Wang Yong-hua, Wan Pin, Wang Lei, Wu Wen-tao. An Improved Double Deep Q Network for Multi-user Dynamic Spectrum Access [J]. Journal of Guangdong University of Technology, 2023, 40(04): 85-93. |
[3] | Chen Ci, Xie Li-hua. A Data-Driven Prescribed Convergence Rate Design for Robust Tracking of Discrete-Time Systems [J]. Journal of Guangdong University of Technology, 2021, 38(06): 29-34. |
[4] | Li Ming-lei, Zhang Yang, Kang Jia-wen, Xu Min-rui, Dusit Niyato. Multi-Agent Reinforcement Learning for Secure Data Sharing in Blockchain-Empowered Vehicular Networks [J]. Journal of Guangdong University of Technology, 2021, 38(06): 62-69. |
[5] | Guo Xin-de, Chris Hong-qiang Ding. An AGV Path Planning Method for Discrete Manufacturing Smart Factory [J]. Journal of Guangdong University of Technology, 2021, 38(06): 70-76. |
[6] | Zheng Si-yuan, Cui Miao, Zhang Guang-chi. Reinforcement Learning-Based Online Trajectory Optimization for Secure UAV Communications [J]. Journal of Guangdong University of Technology, 2021, 38(04): 59-64. |
[7] | Ye Wei-jie, Gao Jun-li, Jiang Feng, Guo Jing. A Research on a Training Model to Improve the Development Efficiency of Robot Reinforcement Learning [J]. Journal of Guangdong University of Technology, 2020, 37(05): 46-50. |
[8] | Wu Yun-xiong, Zeng Bi. Trajectory Tracking and Dynamic Obstacle Avoidance of Mobile Robot Based on Deep Reinforcement Learning [J]. Journal of Guangdong University of Technology, 2019, 36(01): 42-50. |
|