Knowledge sharing in Independent Deep Q-Network

Viacheslav Bochok; Nataliia Fedorova

doi:10.35784/iapgos.7545

PDF

Published: Mar 30, 2026

DOI: https://doi.org/10.35784/iapgos.7545

Issue Vol. 16 No. 1 (2026)

Efficient CNN-based classification of white blood cells: a comparative study of model performance
Achraf Benba, Sara Sandabad

5-9
Automated skin cancer diagnosis using deep learning: a systematic review of state-of-the-art architectures, techniques and performance evaluation
Subaidabeevi Shafeena, Ramayyan Sumathy Vinod Kumar, Sikamony Sumathi Kumar, David Shahi

10-20
Enhancing driver safety with ECG-based emotion recognition using BiLSTM networks
Raga Madhuri Chandra, Satya Sumanth Vanapalli, Giri Venkata Sai Tej Neelaiahgari

21-28
An automated system for calibration table calculation of cylindrical horizontal tanks
Denis Proskurenko, Mykhailo Bezuglyi

29-34
Control of water–diesel emulsion stability using turbidity measurements
Oleksandr Zabolotnyi, Andrii Khodieiev, Nicolay Koshevoy, Roman Trishch

35-41
Improving the induction motor starting mode under a voltage drop conditions
Oleksandr Vovk, Serhii Halko, Andrii Sabo, Oleksandr Miroshnyk, Taras Shchur

42-47
Modelling of dynamic modes in a DC motor for electric vehicle
Viktor Lyshuk, Anatolii Tkachuk, Sergiy Moroz, Mykola Yevsiuk, Mykola Khvyshchun, Stanislav Prystupa, Valentyn Zablotskyi

48-55
Development and analysis of power grid failure scenarios using ontology, power flow model, and knowledge graph
Oleksandr Khomenko, Vyacheslav Senchenko, Oleksandr Koval, Iryna Husyeva

56-61
Kinetics of grain material drying in installations with intermittent energy supply by microwave and infrared radiation
Roman Kalinichenko, Valentyna Bandura, Borys Kotov, Yurii Pantsyr, Ihor Garasymchuk, Serhii Stepanenko

62-66
Smartphone shell temperature controller automatic tuning method
Danylo Zinchenko, Yurii Mariiash

67-71
Using FPGA for modelling and generating chaotic processes
Oleksandr Osadchuk, Iaroslav Osadchuk, Valentyn Skoshchuk

72-77
Simulation and electronic design of a chaotic 5d artificial neural network
Michael Kopp, Inna Samuilik

78-83
Intelligent DL-SCH/PDSCH processing chain in 5G with adaptive HARQ mechanism
Juliy Boiko, Ilya Pyatin

84-93
Exploring generative models for remote sensing: a comprehensive review
Gottapu Santosh Kumar, Gurugubelli Jagadeesh, Swarajya Madhuri Rayavarapu

94-98
Ensemble noise-aided bit flipping decoding of low-density parity-check codes
Mykola Shtompel, Oleksandr Shefer

99-103
Knowledge sharing in Independent Deep Q-Network
Viacheslav Bochok, Nataliia Fedorova

104-108
Detection of humans in drone images using deep learning techniques
Sobhana Mummaneni, Naga Deepika Ginjupalli, Pragathi Dodda, Novaline Jacob, Sanjay Raj Emmanuel Katari

109-115
Comparative analysis of DeepSORT, ByteTrack and StrongSORT algorithms for multi-object tracking in UAV-based video surveillance
Andrii Safonyk, Viktor Podvyshennyi, Oleksandr Naumchuk

116-120
Highly efficient approaches to processing complex visual data in decision support systems
Oleksandr Poplavskyi, Sergii Pavlov, Oksana Bezsmernta, Iryna Gerasymova, Bakhyt Yeraliyeva

121-125
Anti-aliasing method for second-order curves on a hexagonal raster
Oleksandr Melnyk, Tetiana Prysiazhniuk

126-129
Method for assessing the risk of user compromise based on individual security profile
Svitlana Lehominova, Mykhailo Zaporozhchenko, Tetiana Kapeliushna, Yuriy Shchavinsky, Tetiana Muzhanova

130-137
Positional coding method in differential wave space
Volodymyr Barannik, Anatolii Berchanov, Valeriy Barannik, Dmytro Uzlov, Mykola Dihtiar, Mykhailo Osovytskyi, Andrii Sushko, Yurii Babenko

138-146
Web platform with Checkbox support: aspects of fiscal accounting, reporting, and interaction with tax authorities
Yuliia Povstiana, Lyudmila Samchuk, Ivan Kachula

147-154
Comparative analysis of web development frameworks in PHP: Codeigniter, Cakephp and Yii
Karol Rak, Mariusz Dzieńkowski

155-161
Crop price forecasting using a Temporal Fusion Transformer for Krishna district of Andhra Pradesh
Dedeepya Manikonda, Ashutosh Satapathy, Keerthi Padamata, Jaswanthi Machcha, J. Chandrakanta Badajena

162-170
Model of packet transmission of text data using SDR in the GNU Radio Companion environment
Nurbol Kaliaskarov, Kyrmyzy Taissariyeva, Nurlykhan Raulyev, Akezhan Sabibolda

171-176
Modelling of a pull-flow production system with dynamic buffer stock control
Saad Elbaraka, Salah-eddine Mokhlis, Adil Barra, Hicham Fouraiji, Mohamed Rhouzali, Najat Messaoudi

177-182

Authors

Viacheslav Bochok

vooron3@gmail.com

National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine

https://orcid.org/0009-0000-3929-2758

Nataliia Fedorova

fedorova_natalia@lll.kpi.ua

National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine

Abstract

This paper investigates knowledge sharing mechanisms in weakly coupled multi-agent reinforcement learning systems based on Independent Deep Q-Networks (IDQN). Although parallel agents can accelerate data collection, their learning processes typically remain isolated, resulting in suboptimal use of collective experience. To address this limitation, the study proposes two complementary methods: (1) a teacher-selection mechanism that identifies the most efficient agent based on episodic performance, and (2) a dynamic control mechanism that adjusts the intensity of knowledge transfer according to the performance gap between teacher and student. The experiments were conducted in the OpenAI Gym CartPole-v1 and LunarLander-v3 environments using three independent agents, to validate the effectiveness across tasks with different reward structures, dynamics, and difficulty levels. All agents were trained with Batch TD(0) at the end of each episode, using a replay. Knowledge transfer was implemented through policy distillation on pseudo-labeled transitions sampled from the teacher’s experience buffer. The number of distillation epochs was dynamically determined using a nonlinear scaling function bounded by predefined minimum and maximum values. Results demonstrate that the proposed mechanisms consistently accelerate learning and improve stability compared to baseline DQN configurations without knowledge sharing. Systems employing teacher selection outperform random teacher choice and all-to-all sharing. Dynamic intensity adjustment proves more effective than constant-intensity distillation. Normalized AUC analysis further confirms statistically significant improvements in both maximum and average episodic returns, indicating faster convergence of the best agent as well as more uniform progress across all agents. The findings show that knowledge sharing with informed teacher selection and adaptive transfer strength provides a robust and scalable approach for improving the efficiency of independent agents in stationary environments. These mechanisms are compatible with common DQN extensions and can serve as a foundation for future research on adaptive multi-agent knowledge exchange strategies.

Keywords:

Deep Q-Network (DQN), knowledge sharing, multi-agent systems, policy distillation, reinforcement learning

References

[1] Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying Count-Based Exploration and Intrinsic Motivation (arXiv:1606.01868). arXiv. https://doi.org/10.48550/arXiv.1606.01868

[2] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1606.01540

[3] Dahal, M., & Vaezi, M. (2025). Selective Experience Sharing in Reinforcement Learning Enhances Interference Management (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2501.15735 DOI: https://doi.org/10.1109/LCOMM.2025.3535898

[4] Gao, Z., Xu, K., Ding, B., Wang, H., Li, Y., & Jia, H. (2021). KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2105.11611 DOI: https://doi.org/10.3390/e23081043

[5] Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2017). Deep Reinforcement Learning that Matters (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1709.06560 DOI: https://doi.org/10.1609/aaai.v32i1.11694

[6] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236 DOI: https://doi.org/10.1038/nature14236

[7] Qiao, D., Li, W., Yang, S., Zha, H., & Wang, B. (2025). Offline Multi-agent Reinforcement Learning via Score Decomposition (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2505.05968

[8] Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized Experience Replay (Version 4). arXiv. https://doi.org/10.48550/ARXIV.1511.05952

[9] Shi, D., Tong, J., Liu, Y., & Fan, W. (2022). Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks. Entropy, 24(4), 470. https://doi.org/10.3390/e24040470 DOI: https://doi.org/10.3390/e24040470

[10] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961 DOI: https://doi.org/10.1038/nature16961

[11] Singh, A., et al. (2020). Reinforcement learning for autonomous traffic signal control. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

[12] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (Second edition). The MIT Press.

[13] Tokic, M. (2010). Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences. In R. Dillmann, J. Beyerer, U. D. Hanebeck, & T. Schultz (Eds), KI 2010: Advances in Artificial Intelligence (Vol. 6359, pp. 203–210). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-16111-7_23 DOI: https://doi.org/10.1007/978-3-642-16111-7_23

[14] Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10295 DOI: https://doi.org/10.1609/aaai.v30i1.10295

[15] Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., … Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z DOI: https://doi.org/10.1038/s41586-019-1724-z

[16] Wadhwania, S., Kim, D.-K., Omidshafiei, S., & How, J. P. (2019). Policy Distillation and Value Matching in Multiagent Reinforcement Learning (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1903.06592 DOI: https://doi.org/10.1109/IROS40897.2019.8967849

[17] Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M. & Freitas, N.. (2016). Dueling Network Architectures for Deep Reinforcement Learning. Proceedings of The 33rd International Conference on Machine Learning, 48. https://proceedings.mlr.press/v48/wangf16.html

Bochok, V., & Fedorova, N. (2026). Knowledge sharing in Independent Deep Q-Network. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 16(1), 104–108. https://doi.org/10.35784/iapgos.7545

Knowledge sharing in Independent Deep Q-Network

Issue Vol. 16 No. 1 (2026)

Archives

Authors

Abstract

Keywords:

References

License

Article Sidebar

Issue Vol. 16 No. 1 (2026)

Archives

Main Article Content

Authors

Abstract

Keywords:

References

Article Details

License