報酬の設定を自動化した集中型高速マルチエージェント強化学習法
佐々木 薫, 飯間 等
pp. 39-47
DOI:
10.5687/iscie.35.39抄録
For multiagent environments, a centralized reinforcement learner can find optimal policies, but it is time-consuming. A method is proposed for finding the optimal policies acceleratingly, and it uses the centralized learner in combination with supplemental independent learners. In order to prevent the failure of learning, the independent learners must stop in a timely manner, which is done through finely tuning a reward. The reward tuning, however, requires additional time and effort. This paper proposes a reinforcement learning method in which the reward is automatically set.