Evaluation of the performance of LLMs deployments in selected cloud-based container services
Article Sidebar
Open full text
Issue Vol. 15 No. 4 (2025)
-
Control of the magnetic levitation using a PID controller with adaptation based on linear interpolation logic and genetic algorithm
Dominik Fila, Andrzej Neumann, Bartosz Olesik, Jakub Pawelec, Kamil Przybylak, Mateusz Ungier, Dawid Wajnert5-9
-
Development of a system for predicting failures of bagging machines
Nataliia Huliieva, Nataliia Lishchyna, Viktoriya Pasternak, Zemfira Huliieva10-13
-
Development and verification of a modular object-oriented fuzzy logic controller architecture for customizable and embedded applications
Rahim Mammadzada14-24
-
Mechanical fracture energy and structural-mechanical properties of meat snacks with beekeeping additives
Artem Antoniv, Igor Palamarchuk, Leonora Adamchuk, Marija Zheplinska25-31
-
Modelling of dynamic processes in a nonholonomic system in the form of Gibbs-Appell equations on the example of a ball mill
Volodymyr Shatokhin, Yaroslav Ivanchuk, Vitaly Liman, Sergii Komar, Oleksii Kozlovskyi32-38
-
Real-time Covid-19 diagnosis on embedded IoT platforms
Elmehdi Benmalek, Wajih Rhalem, Atman Jbari, Abdelilah Jilbab, Jamal Elmhamdi39-45
-
Hybrid models for handwriting-based diagnosis of Parkinson's disease
Asma Ouabd, Achraf Benba, Abdelilah Jilbab, Ahmed Hammouch46-50
-
Computer system for diagnostic and treatment of unilateral neglect syndrome
Krzysztof Strzecha, Agata Bukalska-Strzecha, Krzysztof Kurzdym, Dominik Sankowski51-55
-
Informatics and measurement in healthcare: deep learning for diabetic patient readmission prediction
Shiva Saffari, Mahdi Bahaghighat56-64
-
Optimization of non-invasive glucose monitoring accuracy using an optical sensor
Nurzhigit Smailov, Aliya Zilgarayeva, Sergii Pavlov, Balzhan Turusbekova, Akezhan Sabibolda65-70
-
Stochastic multi-objective minimax optimization of combined electromagnetic shield based on three-dimensional modeling of overhead power lines magnetic field
Borys Kuznetsov, Tatyana Nikitina, Alexander Kutsenko, Ihor Bovdui, Kostiantyn Czunikhin, Olena Voloshko, Roman Voliansky, Viktoriia Ivannikova71-75
-
Advanced energy management strategies for AC/DC microgrids
Zouhir Boumous, Samira Boumous, Tawfik Thelaidjia76-82
-
Experimental study of a multi-stage converter circuit
Kyrmyzy Taissariyeva, Kuanysh Muslimov, Yerlan Tashtay, Gulim Jobalayeva, Lyazzat Ilipbayeva, Ingkar Issakozhayeva, Akezhan Sabibolda83-86
-
Deep learning-based prediction of structural parameters in FDTD-simulated plasmonic nanostructures
Shahed Jahidul Haque, Arman Mohammad Nakib87-94
-
Development of an algorithm for calculating ion exchange processes using the Python ecosystem
Iryna Chub, Oleksii Proskurnia, Kateryna Demchenko, Oleksandr Miroshnyk, Taras Shchur, Serhii Halko95-99
-
Intelligent model for reliability control and safety in urban transport systems
Anastasiia Kashkanova, Alexander Rotshtein, Andrii Kashkanov, Denis Katelnikov100-107
-
Analysis of the interaction of components of a modular parcel storage system using UML diagrams
Lyudmila Samchuk, Yuliia Povstiana, Anastasia Hryshchuk108-116
-
Evaluating modified pairing insertion heuristics for efficient dial-a-ride problem solutions in healthcare logistics
Rodolfo Eleazar Pérez Loaiza, Aaron Guerrero-Campanur, Edmundo Bonilla Huerta117-123
-
Analysis of modern tools, methods of audit and monitoring of database security
Kateryna Mykhailyshyn, Oleh Harasymchuk, Oleh Deineka, Yurii Dreis, Volodymyr Shulha, Yuriy Pepa124-129
-
Improving underwater visuals by fusion of Deep-Retinex and GAN for enhanced image quality in subaquatic environments
Anuradha Chinta, Bharath Kumar Surla, Chaitanya Kodali130-136
-
The mathematical method for assessing the cybersecurity state of cloud services
Yevheniia Ivanchenko, Volodymyr Shulha, Ihor Ivanchenko, Yevhenii Pedchenko, Mari Petrovska137-141
-
Evaluation of the performance of LLMs deployments in selected cloud-based container services
Mateusz Stęgierski, Piotr Szpak, Sławomir Przyłucki142-150
-
Implementing traits in C# using Roslyn Source Generators
Mykhailo Pozur, Viktoria Voitko, Svitlana Bevz, Serhii Burbelo, Olena Kosaruk151-157
-
Impact of customizable orchestrator scheduling on machine learning efficiency in edge environments
Konrad Cłapa, Krzysztof Grudzień, Artur Sierszeń158-163
-
Reconfigured CoARX architecture for implementing ARX hashing in microcontrollers of IoT systems with limited resources
Serhii Zabolotnii, Inna Rozlomii, Andrii Yarmilko, Serhii Naumenko164-169
-
Integral assessment of the spring water quality with the use of fuzzy logic toolkit
Vyacheslav Repeta, Oleksandra Krykhovets, Yurii Kukura170-176
-
Selected issues concerning fibre-optic bending sensors
Les Hotra, Jacek Klimek, Ihor Helzhynskyy, Oksana Boyko, Svitlana Kovtun177-181
Archives
-
Vol. 15 No. 4
2025-12-20 27
-
Vol. 15 No. 3
2025-09-30 24
-
Vol. 15 No. 2
2025-06-27 24
-
Vol. 15 No. 1
2025-03-31 26
-
Vol. 14 No. 4
2024-12-21 25
-
Vol. 14 No. 3
2024-09-30 24
-
Vol. 14 No. 2
2024-06-30 24
-
Vol. 14 No. 1
2024-03-31 23
-
Vol. 13 No. 4
2023-12-20 24
-
Vol. 13 No. 3
2023-09-30 25
-
Vol. 13 No. 2
2023-06-30 14
-
Vol. 13 No. 1
2023-03-31 12
-
Vol. 12 No. 4
2022-12-30 16
-
Vol. 12 No. 3
2022-09-30 15
-
Vol. 12 No. 2
2022-06-30 16
-
Vol. 12 No. 1
2022-03-31 9
-
Vol. 11 No. 4
2021-12-20 15
-
Vol. 11 No. 3
2021-09-30 10
-
Vol. 11 No. 2
2021-06-30 11
-
Vol. 11 No. 1
2021-03-31 14
Main Article Content
DOI
Authors
Abstract
The growing adoption of serverless container services has created challenges in selecting optimal cloud platforms for production LLM deployments, yet comparative performance evaluations remain limited. This study evaluates AWS Fargate and Azure Container Apps for LLM deployments, investigating whether architectural differences cause substantial performance variations under diverse load patterns. We conducted systematic experiments using containerized Llama 3.2:1b across multiple scenarios: baseline measurements, inference tests with varying prompt lengths, streaming API performance, and concurrent load testing with progressive scaling. Each scenario was executed on both standard and auto-scaled infrastructure with 10 runs per configuration to ensure statistical reliability. Key findings reveal distinct platform characteristics: AWS Fargate demonstrates superior baseline API response times and time-to-first-token performance, while Azure Container Apps consistently outperforms AWS in inference processing for short and medium prompts with better consistency across test runs. Streaming performance shows platform-specific trade-offs, with AWS achieving lower initial latency but Azure providing superior token generation consistency. Under concurrent loads, both platforms maintain full capacity at lower concurrency levels, but AWS exhibits exponential response time degradation at higher loads while Azure shows more linear, predictable scaling behavior. Statistical analysis confirms significant performance differences across all metrics, validating that platform architecture fundamentally impacts LLM deployment performance. These findings indicate platform selection should align with specific workload requirements: AWS Fargate for latency-critical applications with steady loads, and Azure Container Apps for inference-intensive workloads requiring robust scaling and consistency. This study offers crucial benchmarking data for businesses deploying production-grade AI services on serverless container platforms.
Keywords:
References
[1] Abraham A., Yang J.: Analyzing the system features, usability, and performance of a containerized application on serverless cloud computing systems. Research Square 2023 [https://doi.org/10.21203/rs.3.rs-3167840/v1].
[2] Agache A. et al.: Firecracker: Lightweight virtualization for serverless applications. 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI '20), 2020, 419–434.
[3] Dittakavi R. S. S.: Cold start latency in serverless computing: Current trends and mitigation techniques. EDUZONE: International Peer Reviewed/Refereed Multidisciplinary Journal 12(2), 2023, 134–138.
[4] Golec M. et al.: Cold start latency in serverless computing: A systematic review, taxonomy, and future directions (Appendix). Journal of ACM 37(4), 2024, 1–8.
[5] Jain P. et al.: Performance analysis of various server hosting techniques. Procedia Computer Science 173, 2020, 70–77 [https://doi.org/10.1016/j.procs.2020.06.010].
[6] Jonnakuti S.: LLMs in the cloud: Best practices for scaling generative AI in regulated industries. International Journal of Engineering Technology Research & Management 8(2), 2024, 105–112.
[7] Llama3.2:1b. Meta, 2024 [https://ollama.com/library/llama3.2:1b].
[8] McGrath G., Brenner P. R.: Serverless computing: Design, implementation, and performance. IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), 2017, 405–410 [https://doi.org/10.1109/ICDCSW.2017.36].
[9] MSV J.: The evolution of serverless container platform on AWS Fargate. The New Stack, 2019 [https://thenewstack.io/the-evolution-of-serverless-container-platform-on-aws-fargate/].
[10] Ollama. GitHub - ollama/ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models. GitHub, 2025 [https://github.com/ollama/ollama].
[11] Petrovski T., Gusev M.: Container vs function as a service: Impact on cloud deployment for real-world applications. 47th MIPRO ICT and Electronics Convention (MIPRO), 2024, 869–874 [https://doi.org/10.1109/MIPRO60963.2024.10569811].
[12] Roloff E. et al.: High performance computing in the cloud: Deployment, performance and cost efficiency. IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom). Taipei, Taiwan, 2012, 371–378 [https://doi.org/10.1109/CloudCom.2012.6427549].
[13] Sadaqat M., Sánchez-Gordón M., Colomo-Palacios R.: Benchmarking serverless computing: Performance and usability. Journal of Information Technology Research 15(1), 2022, 1–17 [https://doi.org/10.4018/JITR.299374].
[14] Sagi S.: Overcoming challenges in deploying large language models for generative AI use cases: The role of containers and orchestration. International Journal of Computer Trends and Technology 72(2), 2024, 75–81 [https://doi.org/10.14445/22312803/IJCTT-V72I2P114].
[15] Seth D., Chintale P.: Performance benchmarking of serverless computing platforms. International Journal of Computer Trends and Technology 72(6), 2024, 160–167 [https://doi.org/10.14445/22312803/IJCTT-V72I6P121].
[16] Shrestha R., Nisha B.: Performance evaluation and comparison of microservices and serverless deployments in cloud. IEEE 8th International Conference on Smart Cloud (SmartCloud). Tokyo, Japan, 2023, 202–207 [https://doi.org/10.1109/SmartCloud58862.2023.00043].
[17] Srivastava S. et al.: The future of AI in production: Leveraging Kubernetes for Large Language Model deployment. International Journal for Multidisciplinary Research 7(1), 2025, 1–18.
[18] Stroh D., Mailach A., Siegmund N.: Themes of building LLM-based applications for production: A practitioner's view. arXiv preprint arXiv:2411.08574v2, 2024.
[19] Synergy Research Group: Cloud Market Jumped to $330 billion in 2024 – GenAI is Now Driving Half of the Growth. [https://www.srgresearch.com/articles/cloud-market-jumped-to-330-billion-in-2024-genai-is-now-driving-half-of-the-growth] (avaible: 7.06.2025).
[20] Waseem M. et al.: Containerization in multi-cloud environment: Roles, strategies, challenges, and solutions for effective implementation. arXiv preprint arXiv:2403.12980v2, 2024.
Article Details
Abstract views: 1

