|
马骋乾(1996-),男,山西临汾人,硕士研究生,研究方向为数据链系统和强化学习。 |
|
谢 伟(1974-),男,副教授。 |
收稿日期: 2018-06-26
修回日期: 2018-07-23
网络出版日期: 2022-05-10
Research on Reinforcement Learning Technology: A Review
Received date: 2018-06-26
Revised date: 2018-07-23
Online published: 2022-05-10
马骋乾 , 谢伟 , 孙伟杰 . 强化学习研究综述[J]. 指挥控制与仿真, 2018 , 40(6) : 68 -72 . DOI: 10.3969/j.issn.1673-3819.2018.06.015
Reinforcement learning is a research hotspot in the field of machine learning. It aims to solve problems of decision or optimization. This paper systematically introduces basic principles and classical reinforcement learning algorithms, including value function based reinforcement learning algorithms and direct policy search based reinforcement learning. Then three directions including deep reinforcement learning, meta reinforcement learning, inverse reinforcement learning are described. Finally, existing application and development directions of reinforcement learning are summarized.
| [1] |
周志华. 机器学习[M]. 北京: 清华大学出版社, 2015.
|
| [2] |
|
| [3] |
|
| [4] |
郭宪, 方勇纯. 深入浅出强化学习原理入门[M]. 北京: 电子工业出版社, 2018.
|
| [5] |
|
| [6] |
|
| [7] |
Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms[C]. In:Proceedings of the 30st International Conference on Machine Learning, 2014.
|
| [8] |
Mnih, Volodymyr, et al. Playing Atari with deep reinforcement learning[EB/OL].[2013-10-22] https://arXiv.org/abs/1312.5602
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
陈希亮, 曹雷, 等. 深度逆向强化学习研究综述[J]. 计算机工程与应用, 2018, 54(5):24-34.
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
魏亮, 黄韬, 张娇, 等. 基于强化学习的服务链映射算法[J]. 通信学报, 2018(1):90-100.
|
/
| 〈 |
|
〉 |