Evaluation of the performance of LLMs deployments in selected cloud-based container services

Mateusz Stęgierski; Piotr Szpak; Sławomir Przyłucki

doi:10.35784/iapgos.8206

PDF

Published: Dec 20, 2025

DOI: https://doi.org/10.35784/iapgos.8206

Issue Vol. 15 No. 4 (2025)

Control of the magnetic levitation using a PID controller with adaptation based on linear interpolation logic and genetic algorithm
Dominik Fila, Andrzej Neumann, Bartosz Olesik, Jakub Pawelec, Kamil Przybylak, Mateusz Ungier, Dawid Wajnert

5-9
Development of a system for predicting failures of bagging machines
Nataliia Huliieva, Nataliia Lishchyna, Viktoriya Pasternak, Zemfira Huliieva

10-13
Development and verification of a modular object-oriented fuzzy logic controller architecture for customizable and embedded applications
Rahim Mammadzada

14-24
Mechanical fracture energy and structural-mechanical properties of meat snacks with beekeeping additives
Artem Antoniv, Igor Palamarchuk, Leonora Adamchuk, Marija Zheplinska

25-31
Modelling of dynamic processes in a nonholonomic system in the form of Gibbs-Appell equations on the example of a ball mill
Volodymyr Shatokhin, Yaroslav Ivanchuk, Vitaly Liman, Sergii Komar, Oleksii Kozlovskyi

32-38
Real-time Covid-19 diagnosis on embedded IoT platforms
Elmehdi Benmalek, Wajih Rhalem, Atman Jbari, Abdelilah Jilbab, Jamal Elmhamdi

39-45
Hybrid models for handwriting-based diagnosis of Parkinson's disease
Asma Ouabd, Achraf Benba, Abdelilah Jilbab, Ahmed Hammouch

46-50
Computer system for diagnostic and treatment of unilateral neglect syndrome
Krzysztof Strzecha, Agata Bukalska-Strzecha, Krzysztof Kurzdym, Dominik Sankowski

51-55
Informatics and measurement in healthcare: deep learning for diabetic patient readmission prediction
Shiva Saffari, Mahdi Bahaghighat

56-64
Optimization of non-invasive glucose monitoring accuracy using an optical sensor
Nurzhigit Smailov, Aliya Zilgarayeva, Sergii Pavlov, Balzhan Turusbekova, Akezhan Sabibolda

65-70
Stochastic multi-objective minimax optimization of combined electromagnetic shield based on three-dimensional modeling of overhead power lines magnetic field
Borys Kuznetsov, Tatyana Nikitina, Alexander Kutsenko, Ihor Bovdui, Kostiantyn Czunikhin, Olena Voloshko, Roman Voliansky, Viktoriia Ivannikova

71-75
Advanced energy management strategies for AC/DC microgrids
Zouhir Boumous, Samira Boumous, Tawfik Thelaidjia

76-82
Experimental study of a multi-stage converter circuit
Kyrmyzy Taissariyeva, Kuanysh Muslimov, Yerlan Tashtay, Gulim Jobalayeva, Lyazzat Ilipbayeva, Ingkar Issakozhayeva, Akezhan Sabibolda

83-86
Deep learning-based prediction of structural parameters in FDTD-simulated plasmonic nanostructures
Shahed Jahidul Haque, Arman Mohammad Nakib

87-94
Development of an algorithm for calculating ion exchange processes using the Python ecosystem
Iryna Chub, Oleksii Proskurnia, Kateryna Demchenko, Oleksandr Miroshnyk, Taras Shchur, Serhii Halko

95-99
Intelligent model for reliability control and safety in urban transport systems
Anastasiia Kashkanova, Alexander Rotshtein, Andrii Kashkanov, Denis Katelnikov

100-107
Analysis of the interaction of components of a modular parcel storage system using UML diagrams
Lyudmila Samchuk, Yuliia Povstiana, Anastasia Hryshchuk

108-116
Evaluating modified pairing insertion heuristics for efficient dial-a-ride problem solutions in healthcare logistics
Rodolfo Eleazar Pérez Loaiza, Aaron Guerrero-Campanur, Edmundo Bonilla Huerta

117-123
Analysis of modern tools, methods of audit and monitoring of database security
Kateryna Mykhailyshyn, Oleh Harasymchuk, Oleh Deineka, Yurii Dreis, Volodymyr Shulha, Yuriy Pepa

124-129
Improving underwater visuals by fusion of Deep-Retinex and GAN for enhanced image quality in subaquatic environments
Anuradha Chinta, Bharath Kumar Surla, Chaitanya Kodali

130-136
The mathematical method for assessing the cybersecurity state of cloud services
Yevheniia Ivanchenko, Volodymyr Shulha, Ihor Ivanchenko, Yevhenii Pedchenko, Mari Petrovska

137-141
Evaluation of the performance of LLMs deployments in selected cloud-based container services
Mateusz Stęgierski, Piotr Szpak, Sławomir Przyłucki

142-150
Implementing traits in C# using Roslyn Source Generators
Mykhailo Pozur, Viktoria Voitko, Svitlana Bevz, Serhii Burbelo, Olena Kosaruk

151-157
Impact of customizable orchestrator scheduling on machine learning efficiency in edge environments
Konrad Cłapa, Krzysztof Grudzień, Artur Sierszeń

158-163
Reconfigured CoARX architecture for implementing ARX hashing in microcontrollers of IoT systems with limited resources
Serhii Zabolotnii, Inna Rozlomii, Andrii Yarmilko, Serhii Naumenko

164-169
Integral assessment of the spring water quality with the use of fuzzy logic toolkit
Vyacheslav Repeta, Oleksandra Krykhovets, Yurii Kukura

170-176
Selected issues concerning fibre-optic bending sensors
Les Hotra, Jacek Klimek, Ihor Helzhynskyy, Oksana Boyko, Svitlana Kovtun

177-181

DOI

https://doi.org/10.35784/iapgos.8206

Authors

Mateusz Stęgierski

s97221@pollub.edu.pl

Lublin University of Technology, Poland

https://orcid.org/0009-0007-1501-4156

Piotr Szpak

s95585@pollub.edu.pl

Lublin University of Technology, Poland

https://orcid.org/0009-0007-0517-7250

Sławomir Przyłucki

s.przylucki@pollub.pl

Lublin University of Technology, Poland

https://orcid.org/0000-0001-9565-3802

Abstract

The growing adoption of serverless container services has created challenges in selecting optimal cloud platforms for production LLM deployments, yet comparative performance evaluations remain limited. This study evaluates AWS Fargate and Azure Container Apps for LLM deployments, investigating whether architectural differences cause substantial performance variations under diverse load patterns. We conducted systematic experiments using containerized Llama 3.2:1b across multiple scenarios: baseline measurements, inference tests with varying prompt lengths, streaming API performance, and concurrent load testing with progressive scaling. Each scenario was executed on both standard and auto-scaled infrastructure with 10 runs per configuration to ensure statistical reliability. Key findings reveal distinct platform characteristics: AWS Fargate demonstrates superior baseline API response times and time-to-first-token performance, while Azure Container Apps consistently outperforms AWS in inference processing for short and medium prompts with better consistency across test runs. Streaming performance shows platform-specific trade-offs, with AWS achieving lower initial latency but Azure providing superior token generation consistency. Under concurrent loads, both platforms maintain full capacity at lower concurrency levels, but AWS exhibits exponential response time degradation at higher loads while Azure shows more linear, predictable scaling behavior. Statistical analysis confirms significant performance differences across all metrics, validating that platform architecture fundamentally impacts LLM deployment performance. These findings indicate platform selection should align with specific workload requirements: AWS Fargate for latency-critical applications with steady loads, and Azure Container Apps for inference-intensive workloads requiring robust scaling and consistency. This study offers crucial benchmarking data for businesses deploying production-grade AI services on serverless container platforms.

Keywords:

serverless containers, performance comparison, language models, auto-scaling, load testing

References

[1] Abraham A., Yang J.: Analyzing the system features, usability, and performance of a containerized application on serverless cloud computing systems. Research Square 2023 [https://doi.org/10.21203/rs.3.rs-3167840/v1]. DOI: https://doi.org/10.21203/rs.3.rs-3167840/v1

[2] Agache A. et al.: Firecracker: Lightweight virtualization for serverless applications. 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI '20), 2020, 419–434.

[3] Dittakavi R. S. S.: Cold start latency in serverless computing: Current trends and mitigation techniques. EDUZONE: International Peer Reviewed/Refereed Multidisciplinary Journal 12(2), 2023, 134–138.

[4] Golec M. et al.: Cold start latency in serverless computing: A systematic review, taxonomy, and future directions (Appendix). Journal of ACM 37(4), 2024, 1–8. DOI: https://doi.org/10.1145/3700875

[5] Jain P. et al.: Performance analysis of various server hosting techniques. Procedia Computer Science 173, 2020, 70–77 [https://doi.org/10.1016/j.procs.2020.06.010]. DOI: https://doi.org/10.1016/j.procs.2020.06.010

[6] Jonnakuti S.: LLMs in the cloud: Best practices for scaling generative AI in regulated industries. International Journal of Engineering Technology Research & Management 8(2), 2024, 105–112.

[7] Llama3.2:1b. Meta, 2024 [https://ollama.com/library/llama3.2:1b].

[8] McGrath G., Brenner P. R.: Serverless computing: Design, implementation, and performance. IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), 2017, 405–410 [https://doi.org/10.1109/ICDCSW.2017.36]. DOI: https://doi.org/10.1109/ICDCSW.2017.36

[9] MSV J.: The evolution of serverless container platform on AWS Fargate. The New Stack, 2019 [https://thenewstack.io/the-evolution-of-serverless-container-platform-on-aws-fargate/].

[10] Ollama. GitHub - ollama/ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models. GitHub, 2025 [https://github.com/ollama/ollama].

[11] Petrovski T., Gusev M.: Container vs function as a service: Impact on cloud deployment for real-world applications. 47th MIPRO ICT and Electronics Convention (MIPRO), 2024, 869–874 [https://doi.org/10.1109/MIPRO60963.2024.10569811]. DOI: https://doi.org/10.1109/MIPRO60963.2024.10569811

[12] Roloff E. et al.: High performance computing in the cloud: Deployment, performance and cost efficiency. IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom). Taipei, Taiwan, 2012, 371–378 [https://doi.org/10.1109/CloudCom.2012.6427549]. DOI: https://doi.org/10.1109/CloudCom.2012.6427549

[13] Sadaqat M., Sánchez-Gordón M., Colomo-Palacios R.: Benchmarking serverless computing: Performance and usability. Journal of Information Technology Research 15(1), 2022, 1–17 [https://doi.org/10.4018/JITR.299374]. DOI: https://doi.org/10.4018/JITR.299374

[14] Sagi S.: Overcoming challenges in deploying large language models for generative AI use cases: The role of containers and orchestration. International Journal of Computer Trends and Technology 72(2), 2024, 75–81 [https://doi.org/10.14445/22312803/IJCTT-V72I2P114]. DOI: https://doi.org/10.14445/22312803/IJCTT-V72I2P114

[15] Seth D., Chintale P.: Performance benchmarking of serverless computing platforms. International Journal of Computer Trends and Technology 72(6), 2024, 160–167 [https://doi.org/10.14445/22312803/IJCTT-V72I6P121]. DOI: https://doi.org/10.14445/22312803/IJCTT-V72I6P121

[16] Shrestha R., Nisha B.: Performance evaluation and comparison of microservices and serverless deployments in cloud. IEEE 8th International Conference on Smart Cloud (SmartCloud). Tokyo, Japan, 2023, 202–207 [https://doi.org/10.1109/SmartCloud58862.2023.00043]. DOI: https://doi.org/10.1109/SmartCloud58862.2023.00043

[17] Srivastava S. et al.: The future of AI in production: Leveraging Kubernetes for Large Language Model deployment. International Journal for Multidisciplinary Research 7(1), 2025, 1–18. DOI: https://doi.org/10.36948/ijfmr.2025.v07i01.36056

[18] Stroh D., Mailach A., Siegmund N.: Themes of building LLM-based applications for production: A practitioner's view. arXiv preprint arXiv:2411.08574v2, 2024.

[19] Synergy Research Group: Cloud Market Jumped to $330 billion in 2024 – GenAI is Now Driving Half of the Growth. [https://www.srgresearch.com/articles/cloud-market-jumped-to-330-billion-in-2024-genai-is-now-driving-half-of-the-growth] (avaible: 7.06.2025).

[20] Waseem M. et al.: Containerization in multi-cloud environment: Roles, strategies, challenges, and solutions for effective implementation. arXiv preprint arXiv:2403.12980v2, 2024.

Stęgierski, M., Szpak, P., & Przyłucki, S. (2025). Evaluation of the performance of LLMs deployments in selected cloud-based container services. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 15(4), 142–150. https://doi.org/10.35784/iapgos.8206

Evaluation of the performance of LLMs deployments in selected cloud-based container services

Issue Vol. 15 No. 4 (2025)

Archives

DOI

Authors

Abstract

Keywords:

References

License

Article Sidebar

Issue Vol. 15 No. 4 (2025)

Archives

Main Article Content

DOI

Authors

Abstract

Keywords:

References

Article Details

License