Deep learning architectures for multiclass clothing recognition as the semantic core of automated virtual try-on systems
Article Sidebar
Issue Vol. 16 No. 2 (2026)
-
Performance evaluation of optimized deep learning model with Multilayered Max-Norm Regularization (MMNR) technique for brain tumour classification in MRI multi-modal images
Mulackal Chandran Binish, Vinu Thomas5-14
-
Stroke detection from brain CT-images and its volume visualization
Rithu James, Appukuttan Harsha, Liza Annie Joseph15-21
-
Adaptive filtering for noise reduction in photoplethysmography signals
Hicham Loumissi, Adil Barra, Najat Messaoudi, Othmane El Badlaoui, Bahloul Bensassi, Hicham Medromi22-25
-
Evaluation of informational diagnostic criteria and severity biomarkers using a discrimination model in patients with COVID-19
Gryhoriy Gradil, Oleg Avrunin, Kateryna Yurko, Natalia Shushlyapina, Yuliia Kalashnyk-Vakulenko, Mariia Shostatska, Aigul Iskakova26-31
-
Signal amplifiers in optical communication systems
Nurzhigit Smailov, Nurlybek Turar, Akezhan Sabibolda32-36
-
Analysis of underwater communication systems based on hybrid Li-Fi technology
Nurzhigit Smailov, Aizhan Urazgaliyeva, Akezhan Sabibolda37-43
-
Applying Box-Behnken design to research voice control automatic lighting systems
Oleksandr Burban, Mykola Polishchuk, Anatolii Tkachuk, Serhii Kostiuchko, Liliia Polishchuk, Valentyna Tkachuk44-49
-
Paddy fields detection on Sentinel-2 satellite images using EfficientDet model
Suvarna Vani Koneru, Kamal Epuri, Bhuvanesh Kakumanu, Ram Dinesh Aduri50-55
-
Models for assessing accuracy and reliability of fibre-optic gyroscope-based navigation systems
Maral Abulkhanova, Nurzhigit Smailov, Yerlan Tashtay, Gulbakhar Yussupova, Anar Khabay, Beibarys Sekenov, Akezhan Sabibolda56-60
-
Aggregation of multimodal log and metric streams for neuro-fuzzy anomaly detection in computer systems
Andrii Mishchenko, Oleksii Shushura, Alona Kolomiiets, Andrii Donets, Olena Kosaruk61-67
-
Static forensic analysis of file carving on SSDs uses NIST and ACPO method
Khoirul Anam Dahlan, Anton Yudhana, Herman Yuliansyah68-75
-
Fuzzy logic-based security risk assessment in wireless sensor networks of Industrial IoT
Olena Semenova, Natalia Kryvinska, Olha Voitsekhovska, Andrii Dzhus, Volodymyr Martyniuk76-83
-
Multicriteria optimisation of information protection system configuration based on the NSGA-II algorithm
Valeryi Lakhno, Myroslav Lakhno, Alona Desiatko, Bohdan Bebeshko84-90
-
Method of structural-block coding of tuple transformant video images
Volodymyr Barannik, Dmytro Uzlov, Yevhenii Yelisieiev, Valeriy Barannik, Nina Petrukha, Mykhailo Babenko, Dmitry Barannik, Vladyslav Kostromytskyi, Oleh Kompaniiets, Artem Bychenko91-101
-
Analysis of the increase in model forecasting accuracy after data normalization
Vladyslav Pylypenko, Vladyslava Skidan, Antonina Volivach102-106
-
Optimizing parameters for 4D hyperchaotic system using Walrus Optimizer Algorithm
Karam Adel Abed, Omar Saber Qasim, Saad Fawzi Al-Azzawi107-112
-
Iron coagulation optimization during water treatment using artificial intelligence tools
Andrii Safonyk, Ivan Tarhonii, Oleksandr Naumchuk, Vladyslav Danchenkov, Roman Zaichuk113-117
-
Optimisation of the generating capacity of droop-based DGs integrated into an isolated AC microgrid using metaheuristic algorithms to minimise power losses
Tuan-Ho Le, Tham X. Nguyen, Robert Lis, Muhammad Jamshed Abbass118-125
-
Chemical composition, structural and electrical properties of CdZnTeSe thick polycrystalline films
Yaroslav Znamenshchykov, Oleksii Lisovenko, Mykola Khvyshchun, Anatoliy Opanasyuk126-130
-
Substantiation of a new method for separation of bulk materials on a vibro-friction separator
Mykola Bakum, Serhii Kharchenko, Anatolii Mykhailov, Mykola Krekot, Taras Shchur, Oleg Dzhidzhora131-138
-
Software-based performance evaluation and forecasting of web applications using machine learning models
Liubov Oleshchenko139-144
-
Comparative analysis of Java unit and integration testing tools: JUnit, TestNG and Spock
Dawid Grabek, Jan Gryta, Mariusz Dzieńkowski145-151
-
Application of UML in the development process of computer games
Lyudmila Samchuk, Yuliia Povstiana, Yaroslav Tymoshchuk152-155
-
Design of digital cooking assistant system with modern voice generative AI model
Robert Banasiak, Zdzisława Rowińska, Wojciech Szczucki, Dawid Jantosz, Łukasz Rembowski156-161
-
Deep learning architectures for multiclass clothing recognition as the semantic core of automated virtual try-on systems
Roman Chekhmestruk, Olena Voitsekhovska, Svitlana Kyrylashchuk162-172
-
Knowledge model "Tags about batches and containers" of the ERP system "PlasmIS" with the possibility of self-improvement using local llm models
Oleh Bisikalo, Valerii Starzhynskyi, Tetiana Molodetska, Nelia Burlaka173-178
-
Paradigms of information technology impact on economic education
Artem Yurchenko, Inna Kharchenko, Volodymyr Shamonia, Vladyslav Bespalyi, Serhii Bohoslavskyi, Olena Semenikhina179-186
Archives
-
Vol. 16 No. 2
2026-06-30 27
-
Vol. 16 No. 1
2026-03-30 27
-
Vol. 15 No. 4
2025-12-20 27
-
Vol. 15 No. 3
2025-09-30 24
-
Vol. 15 No. 2
2025-06-27 24
-
Vol. 15 No. 1
2025-03-31 26
-
Vol. 14 No. 4
2024-12-21 25
-
Vol. 14 No. 3
2024-09-30 24
-
Vol. 14 No. 2
2024-06-30 24
-
Vol. 14 No. 1
2024-03-31 23
-
Vol. 13 No. 4
2023-12-20 24
-
Vol. 13 No. 3
2023-09-30 25
-
Vol. 13 No. 2
2023-06-30 14
-
Vol. 13 No. 1
2023-03-31 12
-
Vol. 12 No. 4
2022-12-30 16
-
Vol. 12 No. 3
2022-09-30 15
-
Vol. 12 No. 2
2022-06-30 16
-
Vol. 12 No. 1
2022-03-31 9
Main Article Content
Authors
Abstract
This article examines and substantiates the choice of deep learning architectures for multiclass clothing classification integrated into virtual try-on (VTO) systems. Systematically compared ResNet-50, EfficientNet-B4, and Vision Transformer (ViT-B/16) on DeepFashion2 and ModaNet datasets. ViT-B/16 achieved the highest accuracy of 92.4% Top-1 on DeepFashion2 and 88.9% on ModaNet, demonstrating an average cross-dataset accuracy drop of 3.9 percentage points, the smallest among evaluated models. Preliminary U2-Net segmentation statistically significantly improved macro-F1 for all architectures (p < 0.001), with an average gain of 3.2 percentage points and reduction of the studio-to-street domain gap from 11 to 6 percentage points. EfficientNet-B4 provided the optimal accuracy-to-latency ratio, achieving 87% Top-1 accuracy at 60 FPS on consumer hardware (RTX 3060), while ViT-B/16 required optimization to maintain 45 FPS. The recommended strategy for industrial VTO systems combines U2-Net segmentation with architecture selection based on target platform capabilities, balancing visual fidelity and computational efficiency.
Keywords:
References
[1] ApparelX AI. iMaterialist-Fashion 2020 Dataset: Annotation Guide. Internal Report, 2021.
[2] ARM. Ethos-N78 NPU Technical Reference Manual. Version 3.0. Cambridge: ARM Ltd., 2024.
[3] Bazarevsky V. et al. “BlazePose: On-device real-time body pose tracking,” arXiv:2006.10204, 2020.
[4] Cao Z. et al. “Diffusion-driven domain adaptation for garment recognition,” arXiv:2503.08765, 2025.
[5] Chen L. et al. “Attention priors for clothing recognition,” Neurocomputing, vol. 540, pp. 145–159, 2024.
[6] Chen R. “Clothing classification with ResNet-50,” Stanford CS230 Report, 2023.
[7] Ding M. et al. “EfficientViT: Light-weight vision transformers with cascade self-attention,” in Proc. CVPR, 2025, pp. 1234–1243.
[8] Dosovitskiy A. et al. “An image is worth 16×16 words: Transformers for image recognition at scale,” ICLR, 2021.
[9] Ge Y. et al. “DeepFashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images,” in Proc. CVPR, 2019, pp. 5337–5345.
[10] GetRedo. Returns in the Fashion Industry: Balancing Fit, Style, and Sustainability. White Paper, 2024.
[11] Gretton A. et al. “A kernel two-sample test,” J. Mach. Learn. Res., vol. 13, pp. 723–773, 2012.
[12] He K., Zhang X., Ren S., Sun J. “Deep residual learning for image recognition,” in Proc. CVPR, 2016, pp. 770–778.
[13] Hsieh C. et al. “Dress-Code: A large-scale dataset for virtual try-on,” Data in Brief, vol. 48, Art. 108541, 2024.
[14] Huang X., Belongie S. “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. ICCV, 2017, pp. 1501–1510.
[15] Jiang B. et al. “Match R-CNN for person-clothing retrieval,” in Proc. CVPR, 2019, pp. 8030–8039.
[16] Jiang L. et al. “INT8 quantization of vision transformers for edge devices,” IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 4, pp. 1123–1136, 2025.
[17] Kim J. et al. “HF-VTON: High-fidelity virtual try-on with geometry consistency,” in Proc. CVPR, 2024, pp. 10247–10256.
[18] Li C. et al. “Virtual try-on systems in fashion consumption: A systematic review,” Applied Sciences, vol. 14, no. 24, Art. 11839, 2024.
[19] Li Y. et al. “Semi-supervised domain adaptation for clothing classification,” IEEE Trans. Multimedia, vol. 27, pp. 1132–1146, 2025.
[20] Lin T.-Y. et al. “Focal loss for dense object detection,” in Proc. ICCV, 2017, pp. 2999–3007.
[21] Liu J. et al. “HYB-VITON: A hybrid CNN–ViT architecture for high-fidelity virtual try-on,” arXiv:2403.01234, 2024.
[22] Liu Z. et al. “DeepFashion: Powering robust clothes recognition and retrieval,” in Proc. CVPR, 2016, pp. 1096–1104.
[23] Mohan S. et al. “Vision transformers for fashion image recognition,” Electronics, vol. 12, no. 6, Art. 1251, 2023.
[24] Müller T. et al. “SVTON: Semantic-variant virtual try-on for fashion e-commerce,” in Proc. ECCV, 2024, pp. 440–457.
[25] National Retail Federation & Happy Returns. 2024 Retail Returns Report. Washington, DC: NRF, 2024.
[26] Park D. et al. “ST-VTON: Self-training vision transformer for virtual try-on,” Proc. ICCV Workshops, 2023, pp. 112–121.
[27] Park S. et al. “Cross-modal garment recognition with ResNet-BERT fusion,” Information Processing & Management, vol. 62, Art. 103249, 2025.
[28] Qiao Y. et al. “U²-Net: Going deeper with nested U-structures for salient object detection,” Pattern Recognition, vol. 106, Art. 107404, 2020.
[29] ResearchAndMarkets. Virtual Try-On Market Analysis Report 2024–2030. Dublin: R&M Group, 2025.
[30] Shen Y. et al. “Diffusion-based size-variable virtual try-on technology and clothing warping,” arXiv:2504.00562, 2025.
[31] Song L. et al. “Image-based virtual try-on: Fidelity and simplification,” Signal Processing: Image Communication, vol. 116, Art. 107104, 2024.
[32] Sun K. et al. “High-resolution representations for human parsing,” arXiv:2305.08912, 2024.
[33] Tan M., Le Q. “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. ICML, 2019, pp. 6105–6114.
[34] Touvron H. et al. “Going deeper with image transformers,” in Proc. ICCV, 2021, pp. 32–42.
[35] Touvron H. et al. “Training data-efficient vision transformers & distillation,” in Proc. ICML, 2021, pp. 10347–10357.
[36] Wang Z. et al. “Multi-Pose VTON: Pose-consistent virtual try-on via pose guidance,” Pattern Recognition, vol. 146, Art. 109983, 2024.
[37] Xie Q. et al. “Self-training with noisy student improves ImageNet classification,” in Proc. CVPR, 2020, pp. 10687–10698.
[38] Xu B. et al. “GP-VTON: Guided-pose virtual try-on,” ACM Trans. Graphics, vol. 42, no. 4, Art. 102, 2023.
[39] Zhang H., Wang C. “CNN-ViT hybrid networks: A survey,” ACM Computing Surveys, in press, 2025.
[40] Zhang Q. et al. “HR-VTON: High-resolution virtual try-on network,” Computer Graphics Forum, vol. 42, no. 7, pp. 415–428, 2023.
[41] Zheng S. et al. “ModaNet: A large-scale street fashion dataset with polygon annotations,” in Proc. ACMMM, 2018, pp. 1670–1678.
Article Details
Abstract views: 22

