Knowledge sharing in Independent Deep Q-Network
Article Sidebar
Issue Vol. 16 No. 1 (2026)
-
Efficient CNN-based classification of white blood cells: a comparative study of model performance
Achraf Benba, Sara Sandabad5-9
-
Automated skin cancer diagnosis using deep learning: a systematic review of state-of-the-art architectures, techniques and performance evaluation
Subaidabeevi Shafeena, Ramayyan Sumathy Vinod Kumar, Sikamony Sumathi Kumar, David Shahi10-20
-
Enhancing driver safety with ECG-based emotion recognition using BiLSTM networks
Raga Madhuri Chandra, Satya Sumanth Vanapalli, Giri Venkata Sai Tej Neelaiahgari21-28
-
An automated system for calibration table calculation of cylindrical horizontal tanks
Denis Proskurenko, Mykhailo Bezuglyi29-34
-
Control of water–diesel emulsion stability using turbidity measurements
Oleksandr Zabolotnyi, Andrii Khodieiev, Nicolay Koshevoy, Roman Trishch35-41
-
Improving the induction motor starting mode under a voltage drop conditions
Oleksandr Vovk, Serhii Halko, Andrii Sabo, Oleksandr Miroshnyk, Taras Shchur42-47
-
Modelling of dynamic modes in a DC motor for electric vehicle
Viktor Lyshuk, Anatolii Tkachuk, Sergiy Moroz, Mykola Yevsiuk, Mykola Khvyshchun, Stanislav Prystupa, Valentyn Zablotskyi48-55
-
Development and analysis of power grid failure scenarios using ontology, power flow model, and knowledge graph
Oleksandr Khomenko, Vyacheslav Senchenko, Oleksandr Koval, Iryna Husyeva56-61
-
Kinetics of grain material drying in installations with intermittent energy supply by microwave and infrared radiation
Roman Kalinichenko, Valentyna Bandura, Borys Kotov, Yurii Pantsyr, Ihor Garasymchuk, Serhii Stepanenko62-66
-
Smartphone shell temperature controller automatic tuning method
Danylo Zinchenko, Yurii Mariiash67-71
-
Using FPGA for modelling and generating chaotic processes
Oleksandr Osadchuk, Iaroslav Osadchuk, Valentyn Skoshchuk72-77
-
Simulation and electronic design of a chaotic 5d artificial neural network
Michael Kopp, Inna Samuilik78-83
-
Intelligent DL-SCH/PDSCH processing chain in 5G with adaptive HARQ mechanism
Juliy Boiko, Ilya Pyatin84-93
-
Exploring generative models for remote sensing: a comprehensive review
Gottapu Santosh Kumar, Gurugubelli Jagadeesh, Swarajya Madhuri Rayavarapu94-98
-
Ensemble noise-aided bit flipping decoding of low-density parity-check codes
Mykola Shtompel, Oleksandr Shefer99-103
-
Knowledge sharing in Independent Deep Q-Network
Viacheslav Bochok, Nataliia Fedorova104-108
-
Detection of humans in drone images using deep learning techniques
Sobhana Mummaneni, Naga Deepika Ginjupalli, Pragathi Dodda, Novaline Jacob, Sanjay Raj Emmanuel Katari109-115
-
Comparative analysis of DeepSORT, ByteTrack and StrongSORT algorithms for multi-object tracking in UAV-based video surveillance
Andrii Safonyk, Viktor Podvyshennyi, Oleksandr Naumchuk116-120
-
Highly efficient approaches to processing complex visual data in decision support systems
Oleksandr Poplavskyi, Sergii Pavlov, Oksana Bezsmernta, Iryna Gerasymova, Bakhyt Yeraliyeva121-125
-
Anti-aliasing method for second-order curves on a hexagonal raster
Oleksandr Melnyk, Tetiana Prysiazhniuk126-129
-
Method for assessing the risk of user compromise based on individual security profile
Svitlana Lehominova, Mykhailo Zaporozhchenko, Tetiana Kapeliushna, Yuriy Shchavinsky, Tetiana Muzhanova130-137
-
Positional coding method in differential wave space
Volodymyr Barannik, Anatolii Berchanov, Valeriy Barannik, Dmytro Uzlov, Mykola Dihtiar, Mykhailo Osovytskyi, Andrii Sushko, Yurii Babenko138-146
-
Web platform with Checkbox support: aspects of fiscal accounting, reporting, and interaction with tax authorities
Yuliia Povstiana, Lyudmila Samchuk, Ivan Kachula147-154
-
Comparative analysis of web development frameworks in PHP: Codeigniter, Cakephp and Yii
Karol Rak, Mariusz Dzieńkowski155-161
-
Crop price forecasting using a Temporal Fusion Transformer for Krishna district of Andhra Pradesh
Dedeepya Manikonda, Ashutosh Satapathy, Keerthi Padamata, Jaswanthi Machcha, J. Chandrakanta Badajena162-170
-
Model of packet transmission of text data using SDR in the GNU Radio Companion environment
Nurbol Kaliaskarov, Kyrmyzy Taissariyeva, Nurlykhan Raulyev, Akezhan Sabibolda171-176
-
Modelling of a pull-flow production system with dynamic buffer stock control
Saad Elbaraka, Salah-eddine Mokhlis, Adil Barra, Hicham Fouraiji, Mohamed Rhouzali, Najat Messaoudi177-182
Archives
-
Vol. 16 No. 1
2026-03-30 27
-
Vol. 15 No. 4
2025-12-20 27
-
Vol. 15 No. 3
2025-09-30 24
-
Vol. 15 No. 2
2025-06-27 24
-
Vol. 15 No. 1
2025-03-31 26
-
Vol. 14 No. 4
2024-12-21 25
-
Vol. 14 No. 3
2024-09-30 24
-
Vol. 14 No. 2
2024-06-30 24
-
Vol. 14 No. 1
2024-03-31 23
-
Vol. 13 No. 4
2023-12-20 24
-
Vol. 13 No. 3
2023-09-30 25
-
Vol. 13 No. 2
2023-06-30 14
-
Vol. 13 No. 1
2023-03-31 12
-
Vol. 12 No. 4
2022-12-30 16
-
Vol. 12 No. 3
2022-09-30 15
-
Vol. 12 No. 2
2022-06-30 16
-
Vol. 12 No. 1
2022-03-31 9
Main Article Content
DOI
Authors
Abstract
This paper investigates knowledge sharing mechanisms in weakly coupled multi-agent reinforcement learning systems based on Independent Deep Q-Networks (IDQN). Although parallel agents can accelerate data collection, their learning processes typically remain isolated, resulting in suboptimal use of collective experience. To address this limitation, the study proposes two complementary methods: (1) a teacher-selection mechanism that identifies the most efficient agent based on episodic performance, and (2) a dynamic control mechanism that adjusts the intensity of knowledge transfer according to the performance gap between teacher and student. The experiments were conducted in the OpenAI Gym CartPole-v1 and LunarLander-v3 environments using three independent agents, to validate the effectiveness across tasks with different reward structures, dynamics, and difficulty levels. All agents were trained with Batch TD(0) at the end of each episode, using a replay. Knowledge transfer was implemented through policy distillation on pseudo-labeled transitions sampled from the teacher’s experience buffer. The number of distillation epochs was dynamically determined using a nonlinear scaling function bounded by predefined minimum and maximum values. Results demonstrate that the proposed mechanisms consistently accelerate learning and improve stability compared to baseline DQN configurations without knowledge sharing. Systems employing teacher selection outperform random teacher choice and all-to-all sharing. Dynamic intensity adjustment proves more effective than constant-intensity distillation. Normalized AUC analysis further confirms statistically significant improvements in both maximum and average episodic returns, indicating faster convergence of the best agent as well as more uniform progress across all agents. The findings show that knowledge sharing with informed teacher selection and adaptive transfer strength provides a robust and scalable approach for improving the efficiency of independent agents in stationary environments. These mechanisms are compatible with common DQN extensions and can serve as a foundation for future research on adaptive multi-agent knowledge exchange strategies.
Keywords:
References
[1] Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying Count-Based Exploration and Intrinsic Motivation (arXiv:1606.01868). arXiv. https://doi.org/10.48550/arXiv.1606.01868
[2] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1606.01540
[3] Dahal, M., & Vaezi, M. (2025). Selective Experience Sharing in Reinforcement Learning Enhances Interference Management (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2501.15735 DOI: https://doi.org/10.1109/LCOMM.2025.3535898
[4] Gao, Z., Xu, K., Ding, B., Wang, H., Li, Y., & Jia, H. (2021). KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2105.11611 DOI: https://doi.org/10.3390/e23081043
[5] Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2017). Deep Reinforcement Learning that Matters (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1709.06560 DOI: https://doi.org/10.1609/aaai.v32i1.11694
[6] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236 DOI: https://doi.org/10.1038/nature14236
[7] Qiao, D., Li, W., Yang, S., Zha, H., & Wang, B. (2025). Offline Multi-agent Reinforcement Learning via Score Decomposition (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2505.05968
[8] Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized Experience Replay (Version 4). arXiv. https://doi.org/10.48550/ARXIV.1511.05952
[9] Shi, D., Tong, J., Liu, Y., & Fan, W. (2022). Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks. Entropy, 24(4), 470. https://doi.org/10.3390/e24040470 DOI: https://doi.org/10.3390/e24040470
[10] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961 DOI: https://doi.org/10.1038/nature16961
[11] Singh, A., et al. (2020). Reinforcement learning for autonomous traffic signal control. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
[12] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (Second edition). The MIT Press.
[13] Tokic, M. (2010). Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences. In R. Dillmann, J. Beyerer, U. D. Hanebeck, & T. Schultz (Eds), KI 2010: Advances in Artificial Intelligence (Vol. 6359, pp. 203–210). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-16111-7_23 DOI: https://doi.org/10.1007/978-3-642-16111-7_23
[14] Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10295 DOI: https://doi.org/10.1609/aaai.v30i1.10295
[15] Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., … Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z DOI: https://doi.org/10.1038/s41586-019-1724-z
[16] Wadhwania, S., Kim, D.-K., Omidshafiei, S., & How, J. P. (2019). Policy Distillation and Value Matching in Multiagent Reinforcement Learning (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1903.06592 DOI: https://doi.org/10.1109/IROS40897.2019.8967849
[17] Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M. & Freitas, N.. (2016). Dueling Network Architectures for Deep Reinforcement Learning. Proceedings of The 33rd International Conference on Machine Learning, 48. https://proceedings.mlr.press/v48/wangf16.html
Article Details
Abstract views: 8

