Analysis of the possibilities for using machine learning algorithms in the Unity environment

Karina Litwynenko

karina.litwynenko@pollub.edu.pl
Lublin University of Technology (Poland)

Małgorzata Plechawska-Wójcik


Lublin University of Technology (Poland)
https://orcid.org/0000-0003-1055-5344

Abstract

Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.


Keywords:

reinforcement learning, imitation learning, Unity

A. Juliani, V. P. Berges, E. Vckay, Y. Gao, H. Henry, M. Mattar, D. Lange, Unity: A general platform for intelligent agents, arXiv preprint arXiv:1809.02627v2 (2020).
  Google Scholar

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  Google Scholar

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actorcritic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research, 80 (2018) 1861–1870.
  Google Scholar

J. Ho, S. Ermon, Generative adversarial imitation learning. Advances in neural information processing systems, (2016) 4565–4573.
  Google Scholar

A. Hussein, M. M. Gaber, E. Elyan, C. Jayne, Imitation Learning: A Survey of Learning Methods. ACM Computing Surveys (CSUR), 50(2) (2017) 1–35 https://doi.org/10.1145/3054912.
DOI: https://doi.org/10.1145/3054912   Google Scholar

R. S Sutton, A. G. Barto, Reinforcement Learning: An Introduction. Second edition. The MIT Press (2018).
  Google Scholar

J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization. In International conference on machine learning (2015) 1889–1897.
  Google Scholar

M. Urmanov, M. Alimanova, A. Nurkey, Training Unity Machine Learning Agents using reinforcement learning method. In 2019 15th International Conference on Electronics, Computer and Computation (ICECCO), (2019) 1–4, https://doi.org/10.1109/ICECCO48375.2019.9043194.
DOI: https://doi.org/10.1109/ICECCO48375.2019.9043194   Google Scholar

M. Pleines, F. Zimmer, V. Berges, Action Spaces in Deep Reinforcement Learning to Mimic Human Input Devices, 2019 IEEE Conference on Games (CoG), (2019) 1–8 https://dx.doi.org/10.1109/CIG.2019.8848080.
DOI: https://doi.org/10.1109/CIG.2019.8848080   Google Scholar

V. Mnih, K. Kavukcuoglu, D. Silver et al., Human-level control through deep reinforcement learning. Nature, 518(7540) (2015) 529–533.
DOI: https://doi.org/10.1038/nature14236   Google Scholar

M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47 (2013) 253–279.
DOI: https://doi.org/10.1613/jair.3912   Google Scholar

A. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, D. Guo, C. Blundell, Agent57: Outperforming the Atari Human Benchmark, International Conference on Machine Learning (2020) 507–517.
  Google Scholar

A. Defazio, T. Graepel, A comparison of learning algorithms on the arcade learning environment. arXiv preprint arXiv:1410.8620 (2014).
  Google Scholar

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang,W. Zaremba, OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016).
  Google Scholar

A. Tavakoli, F. Pardo, P. Kormushev, Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. No. 1. (2018).
DOI: https://doi.org/10.1609/aaai.v32i1.11798   Google Scholar

Dokumentacja biblioteki ML-Agents Toolkit — opis i zalecany zakres wartości hiperparametrów uczenia, https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md, [04.05.2021].
  Google Scholar

Download


Published
2021-09-30

Cited by

Litwynenko, K., & Plechawska-Wójcik, M. (2021). Analysis of the possibilities for using machine learning algorithms in the Unity environment. Journal of Computer Sciences Institute, 20, 197–204. https://doi.org/10.35784/jcsi.2680

Authors

Karina Litwynenko 
karina.litwynenko@pollub.edu.pl
Lublin University of Technology Poland

Authors

Małgorzata Plechawska-Wójcik 

Lublin University of Technology Poland
https://orcid.org/0000-0003-1055-5344

Statistics

Abstract views: 464
PDF downloads: 344