Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem

Wojciech MISZTAL; Sybilla NAZAREWICZ

doi:10.35784/acs_8031

PDF

Published: Mar 31, 2026

DOI: https://doi.org/10.35784/acs_8031

Issue Vol. 22 No. 1 (2026)

Articles

Development of dead-reckoning sensor system for indoor environments
Toshihiro YUKAWA

1-19
A real-time adaptive traffic light control algorithm at urban intersections for smart cities
Chahrazad HAMBLI, Mourad AMAD

20-34
A text-guided vision model for enhanced recognition of small instances
Hyun-Ki JUNG

35-46
Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem
Wojciech MISZTAL, Sybilla NAZAREWICZ

47-62
SCADA-Driven big data framework for fault prediction in spiral steel pipe manufacturing using fuzzy and neural network models
Bakhshali BAKHTIYAROV, Aynur JABIYEVA, Mahabbat KHUDAVERDIYEVA

63-81
Enhanced ELECTRE III method with interval-valued hesitant fuzzy linguistic sets for multi-criteria group decision-making in smart supply networks
Fadoua TAMTAM, Amina TOURABI

82-98
Models for calculating the integral quality indicator of the offset printing process for the IIOT-system
Vyacheslav REPETA, Pavlo RYVAK, Oleksandra KRYKHOVETS

99-109
A scalable and cost-effective forest fire detection approach using deep transfer learning on a Raspberry Pi cluster
Achraf Nasser Eddine BELFERD, Hamdan BENSENANE, Abdellatif RAHMOUN

110-122
Addressing non-stationarity with stochastic trend in the context of limited time series data: An experimental survey in healthcare analytics
Apollinaire BATOURE BAMANA, Yannick SOKDOU BILA LAMOU, David Jaures FOTSA-MBOGNE, Mahdi SHAFIEE KAMALABAD

123-139
Efficient multi-robot exploration of unknown environments using inverted ant colony optimization and reinforcement learning
Nabila RAHMOUNE, Adel RAHMOUNE

140-153
A comprehensive review of metaheuristic algorithms for mobile robot path planning
Sheren SADIQ, Araz ABRAHIM, Haval SADEEQ

154-170
Smart Autolube: Optimized machine learning-based pressure prediction for AIoT lubrication systems
Ali KHUMAIDI, Risanto DARMAWAN; Lukman ADITYA; Wardhana Halking HAMKA, Hudzaifah Al JIHAD

171-183
Application of artificial intelligence methods to determine the optimal process parameters in resistance projection welding of steel nuts
Szymon KARSKI, Michał AWTONIUK, Mirosław SZALA

184-198
Development of non-destructive vibration method for classification of bone fracture severity
Jignesh JANI, Nikunj RACHCHH

199-213
Quantifying pain: An AI-driven approach to detecting pain levels via facial expressions
Abeer A. Mohamad ALSHIHA

214-227

Authors

Wojciech MISZTAL

wojciech.misztal@up.lublin.pl

University of Life Sciences in Lublin, Poland

https://orcid.org/0000-0001-6214-506X

Sybilla NAZAREWICZ

sybilla.nazarewicz@up.lublin.pl

University of Life Sciences in Lublin, Poland

https://orcid.org/0000-0003-1192-1262

Abstract

The application of reinforcement learning techniques to optimization problems has gained increasing attention due to their adaptability, generalization potential, and capacity to handle complex decision-making processes. This study explores the opportunities and limitations of Q-learning, in the context of the classical Assignment Problem, which plays an important role in transportation logistics and resource allocation scenarios. Four variants of the algorithm were developed and evaluated: a basic version, a version incorporating min-max normalization of cost values, a long-term profitability strategy, and a backward optimization approach. For each of the algorithms, the hyperparameters were optimized using the Optuna library and tests were performed on randomly generated cost matrices of varying dimensions (5, 10, 50, 100, and 200). The quality of the solutions was evaluated based on degradation relative to the optimal objective function value. The time to generate solutions was also measured. The results indicate significant differences in the capabilities of different algorithm variants. The basic Q-learning version is characterized by limited effectiveness and high variability, particularly for larger problem instances. Normalization improved computational efficiency and reduced variance, but did not lead to substantial improvements in solution quality for more complex cases. In contrast, the long-term profitability variant demonstrated notable improvements in both solution quality and stability, especially for smaller and medium-sized problems. The backward optimization variant yielded the highest overall solution quality.

Keywords:

Q-learning, machine learning, optimisation, assignment problem

References

MISZTAL, W., & NAZAREWICZ, S. (2026). Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem. Applied Computer Science, 22(1), 47–62. https://doi.org/10.35784/acs_8031

Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem

Issue Vol. 22 No. 1 (2026)

Archives

Authors

Abstract

Keywords:

References

License

Article Sidebar

Issue Vol. 22 No. 1 (2026)

Archives

Main Article Content

Authors

Abstract

Keywords:

References

Article Details

License