Evaluation of the performance of LLMs deployments in selected cloud-based container services

Main Article Content

DOI

Mateusz Stęgierski

s97221@pollub.edu.pl

Piotr Szpak

s95585@pollub.edu.pl

https://orcid.org/0009-0007-0517-7250
Sławomir Przyłucki

s.przylucki@pollub.pl

https://orcid.org/0000-0001-9565-3802

Abstract

The growing adoption of serverless container services has created challenges in selecting optimal cloud platforms for production LLM deployments, yet comparative performance evaluations remain limited. This study evaluates AWS Fargate and Azure Container Apps for LLM deployments, investigating whether architectural differences cause substantial performance variations under diverse load patterns. We conducted systematic experiments using containerized Llama 3.2:1b across multiple scenarios: baseline measurements, inference tests with varying prompt lengths, streaming API performance, and concurrent load testing with progressive scaling. Each scenario was executed on both standard and auto-scaled infrastructure with 10 runs per configuration to ensure statistical reliability. Key findings reveal distinct platform characteristics: AWS Fargate demonstrates superior baseline API response times and time-to-first-token performance, while Azure Container Apps consistently outperforms AWS in inference processing for short and medium prompts with better consistency across test runs. Streaming performance shows platform-specific trade-offs, with AWS achieving lower initial latency but Azure providing superior token generation consistency. Under concurrent loads, both platforms maintain full capacity at lower concurrency levels, but AWS exhibits exponential response time degradation at higher loads while Azure shows more linear, predictable scaling behavior. Statistical analysis confirms significant performance differences across all metrics, validating that platform architecture fundamentally impacts LLM deployment performance. These findings indicate platform selection should align with specific workload requirements: AWS Fargate for latency-critical applications with steady loads, and Azure Container Apps for inference-intensive workloads requiring robust scaling and consistency. This study offers crucial benchmarking data for businesses deploying production-grade AI services on serverless container platforms.

Keywords:

serverless containers, performance comparison, language models, auto-scaling, load testing

References

Article Details

Stęgierski, M., Szpak, P., & Przyłucki, S. (2025). Evaluation of the performance of LLMs deployments in selected cloud-based container services. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 15(4), 142–150. https://doi.org/10.35784/iapgos.8206