Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem
Article Sidebar
Issue Vol. 22 No. 1 (2026)
-
Development of dead-reckoning sensor system for indoor environments
Toshihiro YUKAWA1-19
-
A real-time adaptive traffic light control algorithm at urban intersections for smart cities
Chahrazad HAMBLI, Mourad AMAD20-34
-
A text-guided vision model for enhanced recognition of small instances
Hyun-Ki JUNG35-46
-
Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem
Wojciech MISZTAL, Sybilla NAZAREWICZ47-62
-
SCADA-Driven big data framework for fault prediction in spiral steel pipe manufacturing using fuzzy and neural network models
Bakhshali BAKHTIYAROV, Aynur JABIYEVA, Mahabbat KHUDAVERDIYEVA63-81
-
Enhanced ELECTRE III method with interval-valued hesitant fuzzy linguistic sets for multi-criteria group decision-making in smart supply networks
Fadoua TAMTAM, Amina TOURABI82-98
-
Models for calculating the integral quality indicator of the offset printing process for the IIOT-system
Vyacheslav REPETA, Pavlo RYVAK, Oleksandra KRYKHOVETS99-109
-
A scalable and cost-effective forest fire detection approach using deep transfer learning on a Raspberry Pi cluster
Achraf Nasser Eddine BELFERD, Hamdan BENSENANE, Abdellatif RAHMOUN110-122
-
Addressing non-stationarity with stochastic trend in the context of limited time series data: An experimental survey in healthcare analytics
Apollinaire BATOURE BAMANA, Yannick SOKDOU BILA LAMOU, David Jaures FOTSA-MBOGNE, Mahdi SHAFIEE KAMALABAD123-139
-
Efficient multi-robot exploration of unknown environments using inverted ant colony optimization and reinforcement learning
Nabila RAHMOUNE, Adel RAHMOUNE140-153
-
A comprehensive review of metaheuristic algorithms for mobile robot path planning
Sheren SADIQ, Araz ABRAHIM, Haval SADEEQ154-170
-
Smart Autolube: Optimized machine learning-based pressure prediction for AIoT lubrication systems
Ali KHUMAIDI, Risanto DARMAWAN; Lukman ADITYA; Wardhana Halking HAMKA, Hudzaifah Al JIHAD171-183
-
Application of artificial intelligence methods to determine the optimal process parameters in resistance projection welding of steel nuts
Szymon KARSKI, Michał AWTONIUK, Mirosław SZALA184-198
-
Development of non-destructive vibration method for classification of bone fracture severity
Jignesh JANI, Nikunj RACHCHH199-213
-
Quantifying pain: An AI-driven approach to detecting pain levels via facial expressions
Abeer A. Mohamad ALSHIHA214-227
Archives
-
Vol. 22 No. 1
2026-03-31 15
-
Vol. 21 No. 4
2025-12-31 12
-
Vol. 21 No. 3
2025-10-05 12
-
Vol. 21 No. 2
2025-06-27 12
-
Vol. 21 No. 1
2025-03-31 12
-
Vol. 20 No. 4
2025-01-31 12
-
Vol. 20 No. 3
2024-09-30 12
-
Vol. 20 No. 2
2024-08-14 12
-
Vol. 20 No. 1
2024-03-30 12
-
Vol. 19 No. 4
2023-12-31 10
-
Vol. 19 No. 3
2023-09-30 10
-
Vol. 19 No. 2
2023-06-30 10
-
Vol. 19 No. 1
2023-03-31 10
-
Vol. 18 No. 4
2022-12-30 8
-
Vol. 18 No. 3
2022-09-30 8
-
Vol. 18 No. 2
2022-06-30 8
-
Vol. 18 No. 1
2022-03-31 8
Main Article Content
DOI
Authors
sybilla.nazarewicz@up.lublin.pl
Abstract
The application of reinforcement learning techniques to optimization problems has gained increasing attention due to their adaptability, generalization potential, and capacity to handle complex decision-making processes. This study explores the opportunities and limitations of Q-learning, in the context of the classical Assignment Problem, which plays an important role in transportation logistics and resource allocation scenarios. Four variants of the algorithm were developed and evaluated: a basic version, a version incorporating min-max normalization of cost values, a long-term profitability strategy, and a backward optimization approach. For each of the algorithms, the hyperparameters were optimized using the Optuna library and tests were performed on randomly generated cost matrices of varying dimensions (5, 10, 50, 100, and 200). The quality of the solutions was evaluated based on degradation relative to the optimal objective function value. The time to generate solutions was also measured. The results indicate significant differences in the capabilities of different algorithm variants. The basic Q-learning version is characterized by limited effectiveness and high variability, particularly for larger problem instances. Normalization improved computational efficiency and reduced variance, but did not lead to substantial improvements in solution quality for more complex cases. In contrast, the long-term profitability variant demonstrated notable improvements in both solution quality and stability, especially for smaller and medium-sized problems. The backward optimization variant yielded the highest overall solution quality.
Keywords:
References
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization framework. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2623–2631). Association for Computing Machinery. https://doi.org/10.1145/3292500.3330701
Asef-Vaziri, A., Kazemi, M., & Radman, M. (2022). The facility layout instances of the generalised travelling salesman problem. International Journal of Production Research, 60(19), 5794–5811. https://doi.org/10.5604/01.3001.0015.8269
Baller, A. C., Dabia, S., Dullaert, W. E., & Vigo, D. (2020). The vehicle routing problem with partial outsourcing. Transportation Science, 54(4), 1034–1052. https://doi.org/10.1287/TRSC.2019.0940
Bladen, K., & Cutler, D. R. (2024). Assessing agreement between permutation and dropout variable importance methods for regression and random forest models. Electronic Research Archive, 32(7), 4495–4514. https://doi.org/10.3934/ERA.2024.7.4495
Boccia, M., Masone, A., Sforza, A., & Sterle, C. (2021). A column-and-row generation approach for the flying sidekick travelling salesman problem. Transportation Research Part C: Emerging Technologies, 124, 102913. https://doi.org/10.1016/j.trc.2021.102913
Cárdenas-Montes, M. (2018). Creating hard-to-solve instances of travelling salesman problem. Applied Soft Computing, 71, 268–276. https://doi.org/10.1016/j.asoc.2018.07.010
Chen, T., Chu, F., Zhang, J., & Sun, J. (2024). Sustainable collaborative strategy in pharmaceutical refrigerated logistics routing problem. International Journal of Production Research, 62(14), 5036–5060. https://doi.org/10.1080/00207543.2023.2283566
Chládek, P., Smetanová, D., & Krile, S. (2018). On some aspects of graph theory for optimal transport among marine ports. Zeszyty Naukowe. Transport/Politechnika Śląska, 101, 37–45. https://doi.org/10.20858/sjsutst.2018.101.4
Clifton, J., & Laber, E. (2020). Q-learning: Theory and applications. Annual Review of Statistics and Its Application, 7(1), 279–301. https://doi.org/10.1146/annurev-statistics-031219-041220
Eimer, T., Lindauer, M., & Raileanu, R. (2023, July). Hyperparameters in reinforcement learning and how to tune them. International Conference on Machine Learning (pp. 9104–9149). PMLR. https://doi.org/10.1109/ICML.2023.9149494
Fumagalli, F., Muschalik, M., Hüllermeier, E., & Hammer, B. (2023). Incremental permutation feature importance (iPFI): Towards online explanations on data streams. Machine Learning, 112(12), 4863–4903. https://doi.org/10.1007/s10994-023-06385-y
Hu, M. (2023). The art of reinforcement learning: Fundamentals, mathematics, and implementations with Python. Apress.
Ilosvay, V., & Iaccarino, E. (2023). Exploring and optimizing reinforcement learning algorithms in the Frozen Lake environment (in deterministic and stochastic env) and hyperparameters optimization through Optuna. https://doi.org/10.13140/RG.2.2.35989.09445
Jarrett, D., Stride, E., Vallis, K., & Gooding, M. J. (2019). Applications and limitations of machine learning in radiation oncology. The British Journal of Radiology, 92(1100), 20190001. https://doi.org/10.1259/bjr.20190001
Loecher, M. (2024). Debiasing SHAP scores in random forests. AStA Advances in Statistical Analysis, 108(2), 427–440. https://doi.org/10.1007/s10182-024-00452-w
Małek, A., Caban, J., Dudziak, A., Marciniak, A., & Vrábel, J. (2023). The concept of determining route signatures in urban and extra-urban driving conditions using artificial intelligence methods. Machines, 11(5), 575. https://doi.org/10.3390/machines11050575
Miller, T. (2024). Mastering reinforcement learning. The University of Queensland.
Nichols, J. A., Herbert Chan, H. W., & Baker, M. A. (2019). Machine learning: Applications of artificial intelligence to imaging and diagnosis. Biophysical Reviews, 11, 111–118. https://doi.org/10.1007/s12551-018-0500-7
Pečený, L., Meško, P., Kampf, R., & Gašparík, J. (2020). Optimisation in transport and logistic processes. Transportation Research Procedia, 44, 15–22. https://doi.org/10.1016/j.trpro.2020.02.003
Rabbani, Q., Khan, A., & Quddoos, A. (2019). Modified Hungarian method for unbalanced assignment problem with multiple jobs. Applied Mathematics and Computation, 361, 493–498. https://doi.org/10.1016/j.amc.2019.07.022
Salih, A. M., Raisi‐Estabragh, Z., Galazzo, I. B., Radeva, P., Petersen, S. E., Lekadir, K., & Menegaz, G. (2025). A perspective on explainable artificial intelligence methods: SHAP and LIME. Advanced Intelligent Systems, 7(1), 2400304. https://doi.org/10.1002/aisy.202400304
Scikit-learn. (2024). Permutation feature importance. Retrieved May 4, 2025 from https://scikit-learn.org/dev/modules/permutation_importance.html
Sharma, A., Jain, A., Gupta, P., & Chowdary, V. (2020). Machine learning applications for precision agriculture: A comprehensive review. IEEE Access, 9, 4843–4873. https://doi.org/10.1109/ACCESS.2020.2962538
Sharma, N., Sharma, R., & Jindal, N. (2021). Machine learning and deep learning applications - a vision. Global Transitions Proceedings, 2(1), 24–28. https://doi.org/10.1016/j.gtp.2021.05.004
Shekhar, S., Bansode, A., & Salim, A. (2021, December). A comparative study of hyper-parameter optimization tools. 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) (pp. 1–6). IEEE. https://doi.org/10.1109/CSDE52545.2021.9761710
Shinde, P. P., & Shah, S. (2018, August). A review of machine learning and deep learning applications. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) (pp. 1–6). IEEE. https://doi.org/10.1109/ICCUBEA.2018.00025
Shopov, V. K., & Markova, V. D. (2021, September). Application of Hungarian algorithm for assignment problem. In 2021 International Conference on Information Technologies (InfoTech) (pp. 1–4). IEEE. https://doi.org/10.1109/InfoTech52438.2021.9548600
Shramenko, N., Merkisz-Guranowska, A., Kiciński, M., & Shramenko, V. (2022). Model of operational planning of freight transportation by tram as part of a green logistics system. Archives of Transport, 63(3), 113–122. https://doi.org/10.5604/01.3001.0015.9929
Surblys, V., Kozłowski, E., Matijošius, J., Gołda, P., Laskowska, A., & Kilikevičius, A. (2024). Accelerometer-based pavement classification for vehicle dynamics analysis using neural networks. Applied Sciences, 14(21), 10027. https://doi.org/10.3390/app142110027
Taha, H. A. (2017). Operations research: An introduction (10th ed.). Pearson.
Taran, I., Bikhimova, G., Danchuk, V., Toktamyssova, A., Tursymbekova, Z., & Oliskevych, M. (2024). Improving the methodology for optimizing multimodal transportation delivery routes and cyclic schedules in a transnational direction. Transport Problems: An International Scientific Journal, 19(1). https://doi.org/10.20858/tp.2024.19.1.13
Tarapata, Z., Kulas, W., & Antkiewicz, R. (2022). Machine learning algorithms for the problem of optimizing the distribution of parcels in time-dependent networks: The case study. Archives of Transport, 61(1), 133–147. https://doi.org/10.5604/01.3001.0015.8269
Vadseth, S. T., Andersson, H., Stålhane, M., & Chitsaz, M. (2023). A multi-start route improving matheuristic for the production routeing problem. International Journal of Production Research, 61(22), 7608–7629. https://doi.org/10.1080/00207543.2022.2154402
Wang, H., Liang, Q., Hancock, J. T., & Khoshgoftaar, T. M. (2024). Feature selection strategies: A comparative analysis of SHAP-value and importance-based methods. Journal of Big Data, 11, 44. https://doi.org/10.1186/s40537-024-00874-7
Zhang, J., Liu, C., Li, X., Zhen, H. L., Yuan, M., Li, Y., & Yan, J. (2023). A survey for solving mixed integer programming via machine learning. Neurocomputing, 519, 205–217. https://doi.org/10.48550/arXiv.2203.02878
Zhu, C., Dastani, M., & Wang, S. (2024). A survey of multi-agent deep reinforcement learning with communication. Autonomous Agents and Multi-Agent Systems, 38, 4. https://doi.org/10.1007/s10458-023-096
Zhu, L., & Hu, D. (2019). Study on the vehicle routing problem considering congestion and emission factors. International Journal of Production Research, 57(19), 6115–6129. https://doi.org/10.1080/00207543.2018.1533260
Article Details
Abstract views: 0
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
