Deep learning architectures for multiclass clothing recognition as the semantic core of automated virtual try-on systems

Roman Chekhmestruk; Olena Voitsekhovska; Svitlana Kyrylashchuk

doi:10.35784/iapgos.7957

PDF

Published: Jun 30, 2026

DOI: https://doi.org/10.35784/iapgos.7957

Issue Vol. 16 No. 2 (2026)

Performance evaluation of optimized deep learning model with Multilayered Max-Norm Regularization (MMNR) technique for brain tumour classification in MRI multi-modal images
Mulackal Chandran Binish, Vinu Thomas

5-14
Stroke detection from brain CT-images and its volume visualization
Rithu James, Appukuttan Harsha, Liza Annie Joseph

15-21
Adaptive filtering for noise reduction in photoplethysmography signals
Hicham Loumissi, Adil Barra, Najat Messaoudi, Othmane El Badlaoui, Bahloul Bensassi, Hicham Medromi

22-25
Evaluation of informational diagnostic criteria and severity biomarkers using a discrimination model in patients with COVID-19
Gryhoriy Gradil, Oleg Avrunin, Kateryna Yurko, Natalia Shushlyapina, Yuliia Kalashnyk-Vakulenko, Mariia Shostatska, Aigul Iskakova

26-31
Signal amplifiers in optical communication systems
Nurzhigit Smailov, Nurlybek Turar, Akezhan Sabibolda

32-36
Analysis of underwater communication systems based on hybrid Li-Fi technology
Nurzhigit Smailov, Aizhan Urazgaliyeva, Akezhan Sabibolda

37-43
Applying Box-Behnken design to research voice control automatic lighting systems
Oleksandr Burban, Mykola Polishchuk, Anatolii Tkachuk, Serhii Kostiuchko, Liliia Polishchuk, Valentyna Tkachuk

44-49
Paddy fields detection on Sentinel-2 satellite images using EfficientDet model
Suvarna Vani Koneru, Kamal Epuri, Bhuvanesh Kakumanu, Ram Dinesh Aduri

50-55
Models for assessing accuracy and reliability of fibre-optic gyroscope-based navigation systems
Maral Abulkhanova, Nurzhigit Smailov, Yerlan Tashtay, Gulbakhar Yussupova, Anar Khabay, Beibarys Sekenov, Akezhan Sabibolda

56-60
Aggregation of multimodal log and metric streams for neuro-fuzzy anomaly detection in computer systems
Andrii Mishchenko, Oleksii Shushura, Alona Kolomiiets, Andrii Donets, Olena Kosaruk

61-67
Static forensic analysis of file carving on SSDs uses NIST and ACPO method
Khoirul Anam Dahlan, Anton Yudhana, Herman Yuliansyah

68-75
Fuzzy logic-based security risk assessment in wireless sensor networks of Industrial IoT
Olena Semenova, Natalia Kryvinska, Olha Voitsekhovska, Andrii Dzhus, Volodymyr Martyniuk

76-83
Multicriteria optimisation of information protection system configuration based on the NSGA-II algorithm
Valeryi Lakhno, Myroslav Lakhno, Alona Desiatko, Bohdan Bebeshko

84-90
Method of structural-block coding of tuple transformant video images
Volodymyr Barannik, Dmytro Uzlov, Yevhenii Yelisieiev, Valeriy Barannik, Nina Petrukha, Mykhailo Babenko, Dmitry Barannik, Vladyslav Kostromytskyi, Oleh Kompaniiets, Artem Bychenko

91-101
Analysis of the increase in model forecasting accuracy after data normalization
Vladyslav Pylypenko, Vladyslava Skidan, Antonina Volivach

102-106
Optimizing parameters for 4D hyperchaotic system using Walrus Optimizer Algorithm
Karam Adel Abed, Omar Saber Qasim, Saad Fawzi Al-Azzawi

107-112
Iron coagulation optimization during water treatment using artificial intelligence tools
Andrii Safonyk, Ivan Tarhonii, Oleksandr Naumchuk, Vladyslav Danchenkov, Roman Zaichuk

113-117
Optimisation of the generating capacity of droop-based DGs integrated into an isolated AC microgrid using metaheuristic algorithms to minimise power losses
Tuan-Ho Le, Tham X. Nguyen, Robert Lis, Muhammad Jamshed Abbass

118-125
Chemical composition, structural and electrical properties of CdZnTeSe thick polycrystalline films
Yaroslav Znamenshchykov, Oleksii Lisovenko, Mykola Khvyshchun, Anatoliy Opanasyuk

126-130
Substantiation of a new method for separation of bulk materials on a vibro-friction separator
Mykola Bakum, Serhii Kharchenko, Anatolii Mykhailov, Mykola Krekot, Taras Shchur, Oleg Dzhidzhora

131-138
Software-based performance evaluation and forecasting of web applications using machine learning models
Liubov Oleshchenko

139-144
Comparative analysis of Java unit and integration testing tools: JUnit, TestNG and Spock
Dawid Grabek, Jan Gryta, Mariusz Dzieńkowski

145-151
Application of UML in the development process of computer games
Lyudmila Samchuk, Yuliia Povstiana, Yaroslav Tymoshchuk

152-155
Design of digital cooking assistant system with modern voice generative AI model
Robert Banasiak, Zdzisława Rowińska, Wojciech Szczucki, Dawid Jantosz, Łukasz Rembowski

156-161
Deep learning architectures for multiclass clothing recognition as the semantic core of automated virtual try-on systems
Roman Chekhmestruk, Olena Voitsekhovska, Svitlana Kyrylashchuk

162-172
Knowledge model "Tags about batches and containers" of the ERP system "PlasmIS" with the possibility of self-improvement using local llm models
Oleh Bisikalo, Valerii Starzhynskyi, Tetiana Molodetska, Nelia Burlaka

173-178
Paradigms of information technology impact on economic education
Artem Yurchenko, Inna Kharchenko, Volodymyr Shamonia, Vladyslav Bespalyi, Serhii Bohoslavskyi, Olena Semenikhina

179-186

Authors

Roman Chekhmestruk

chekhroma@gmail.com

Vinnytsia National Technical University, Ukraine

https://orcid.org/0000-0002-5362-8796

Olena Voitsekhovska

vojcexovska.o.v@vntu.edu.ua

Vinnytsia National Technical University, Ukraine

https://orcid.org/0000-0001-8755-1574

Svitlana Kyrylashchuk

kyrylashchuk@vntu.edu.ua

Vinnytsia National Technical University, Ukraine

https://orcid.org/0000-0002-8972-3541

Abstract

This article examines and substantiates the choice of deep learning architectures for multiclass clothing classification integrated into virtual try-on (VTO) systems. Systematically compared ResNet-50, EfficientNet-B4, and Vision Transformer (ViT-B/16) on DeepFashion2 and ModaNet datasets. ViT-B/16 achieved the highest accuracy of 92.4% Top-1 on DeepFashion2 and 88.9% on ModaNet, demonstrating an average cross-dataset accuracy drop of 3.9 percentage points, the smallest among evaluated models. Preliminary U²-Net segmentation statistically significantly improved macro-F₁ for all architectures (p < 0.001), with an average gain of 3.2 percentage points and reduction of the studio-to-street domain gap from 11 to 6 percentage points. EfficientNet-B4 provided the optimal accuracy-to-latency ratio, achieving 87% Top-1 accuracy at 60 FPS on consumer hardware (RTX 3060), while ViT-B/16 required optimization to maintain 45 FPS. The recommended strategy for industrial VTO systems combines U²-Net segmentation with architecture selection based on target platform capabilities, balancing visual fidelity and computational efficiency.

Keywords:

VTO-systems, segmentation, CNN, DeepFashion, vision transformer, online shopping

References

[1] ApparelX AI. iMaterialist-Fashion 2020 Dataset: Annotation Guide. Internal Report, 2021.

[2] ARM. Ethos-N78 NPU Technical Reference Manual. Version 3.0. Cambridge: ARM Ltd., 2024.

[3] Bazarevsky V. et al. “BlazePose: On-device real-time body pose tracking,” arXiv:2006.10204, 2020.

[4] Cao Z. et al. “Diffusion-driven domain adaptation for garment recognition,” arXiv:2503.08765, 2025.

[5] Chen L. et al. “Attention priors for clothing recognition,” Neurocomputing, vol. 540, pp. 145–159, 2024.

[6] Chen R. “Clothing classification with ResNet-50,” Stanford CS230 Report, 2023.

[7] Ding M. et al. “EfficientViT: Light-weight vision transformers with cascade self-attention,” in Proc. CVPR, 2025, pp. 1234–1243.

[8] Dosovitskiy A. et al. “An image is worth 16×16 words: Transformers for image recognition at scale,” ICLR, 2021.

[9] Ge Y. et al. “DeepFashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images,” in Proc. CVPR, 2019, pp. 5337–5345. DOI: https://doi.org/10.1109/CVPR.2019.00548

[10] GetRedo. Returns in the Fashion Industry: Balancing Fit, Style, and Sustainability. White Paper, 2024.

[11] Gretton A. et al. “A kernel two-sample test,” J. Mach. Learn. Res., vol. 13, pp. 723–773, 2012.

[12] He K., Zhang X., Ren S., Sun J. “Deep residual learning for image recognition,” in Proc. CVPR, 2016, pp. 770–778. DOI: https://doi.org/10.1109/CVPR.2016.90

[13] Hsieh C. et al. “Dress-Code: A large-scale dataset for virtual try-on,” Data in Brief, vol. 48, Art. 108541, 2024.

[14] Huang X., Belongie S. “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. ICCV, 2017, pp. 1501–1510. DOI: https://doi.org/10.1109/ICCV.2017.167

[15] Jiang B. et al. “Match R-CNN for person-clothing retrieval,” in Proc. CVPR, 2019, pp. 8030–8039.

[16] Jiang L. et al. “INT8 quantization of vision transformers for edge devices,” IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 4, pp. 1123–1136, 2025.

[17] Kim J. et al. “HF-VTON: High-fidelity virtual try-on with geometry consistency,” in Proc. CVPR, 2024, pp. 10247–10256.

[18] Li C. et al. “Virtual try-on systems in fashion consumption: A systematic review,” Applied Sciences, vol. 14, no. 24, Art. 11839, 2024. DOI: https://doi.org/10.3390/app142411839

[19] Li Y. et al. “Semi-supervised domain adaptation for clothing classification,” IEEE Trans. Multimedia, vol. 27, pp. 1132–1146, 2025.

[20] Lin T.-Y. et al. “Focal loss for dense object detection,” in Proc. ICCV, 2017, pp. 2999–3007. DOI: https://doi.org/10.1109/ICCV.2017.324

[21] Liu J. et al. “HYB-VITON: A hybrid CNN–ViT architecture for high-fidelity virtual try-on,” arXiv:2403.01234, 2024.

[22] Liu Z. et al. “DeepFashion: Powering robust clothes recognition and retrieval,” in Proc. CVPR, 2016, pp. 1096–1104.

[23] Mohan S. et al. “Vision transformers for fashion image recognition,” Electronics, vol. 12, no. 6, Art. 1251, 2023.

[24] Müller T. et al. “SVTON: Semantic-variant virtual try-on for fashion e-commerce,” in Proc. ECCV, 2024, pp. 440–457.

[25] National Retail Federation & Happy Returns. 2024 Retail Returns Report. Washington, DC: NRF, 2024.

[26] Park D. et al. “ST-VTON: Self-training vision transformer for virtual try-on,” Proc. ICCV Workshops, 2023, pp. 112–121.

[27] Park S. et al. “Cross-modal garment recognition with ResNet-BERT fusion,” Information Processing & Management, vol. 62, Art. 103249, 2025.

[28] Qiao Y. et al. “U²-Net: Going deeper with nested U-structures for salient object detection,” Pattern Recognition, vol. 106, Art. 107404, 2020. DOI: https://doi.org/10.1016/j.patcog.2020.107404

[29] ResearchAndMarkets. Virtual Try-On Market Analysis Report 2024–2030. Dublin: R&M Group, 2025.

[30] Shen Y. et al. “Diffusion-based size-variable virtual try-on technology and clothing warping,” arXiv:2504.00562, 2025.

[31] Song L. et al. “Image-based virtual try-on: Fidelity and simplification,” Signal Processing: Image Communication, vol. 116, Art. 107104, 2024.

[32] Sun K. et al. “High-resolution representations for human parsing,” arXiv:2305.08912, 2024.

[33] Tan M., Le Q. “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. ICML, 2019, pp. 6105–6114.

[34] Touvron H. et al. “Going deeper with image transformers,” in Proc. ICCV, 2021, pp. 32–42. DOI: https://doi.org/10.1109/ICCV48922.2021.00010

[35] Touvron H. et al. “Training data-efficient vision transformers & distillation,” in Proc. ICML, 2021, pp. 10347–10357.

[36] Wang Z. et al. “Multi-Pose VTON: Pose-consistent virtual try-on via pose guidance,” Pattern Recognition, vol. 146, Art. 109983, 2024.

[37] Xie Q. et al. “Self-training with noisy student improves ImageNet classification,” in Proc. CVPR, 2020, pp. 10687–10698. DOI: https://doi.org/10.1109/CVPR42600.2020.01070

[38] Xu B. et al. “GP-VTON: Guided-pose virtual try-on,” ACM Trans. Graphics, vol. 42, no. 4, Art. 102, 2023.

[39] Zhang H., Wang C. “CNN-ViT hybrid networks: A survey,” ACM Computing Surveys, in press, 2025.

[40] Zhang Q. et al. “HR-VTON: High-resolution virtual try-on network,” Computer Graphics Forum, vol. 42, no. 7, pp. 415–428, 2023.

[41] Zheng S. et al. “ModaNet: A large-scale street fashion dataset with polygon annotations,” in Proc. ACMMM, 2018, pp. 1670–1678. DOI: https://doi.org/10.1145/3240508.3240652

Chekhmestruk, R., Voitsekhovska, O., & Kyrylashchuk, S. (2026). Deep learning architectures for multiclass clothing recognition as the semantic core of automated virtual try-on systems. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 16(2), 162–172. https://doi.org/10.35784/iapgos.7957

Deep learning architectures for multiclass clothing recognition as the semantic core of automated virtual try-on systems

Issue Vol. 16 No. 2 (2026)

Archives

Authors

Abstract

Keywords:

References

License

Article Sidebar

Issue Vol. 16 No. 2 (2026)

Archives

Main Article Content

Authors

Abstract

Keywords:

References

Article Details

License