Improving image retrieval using CNN with PCA and Optimized K-Means clustering
Article Sidebar
Issue Vol. 22 No. 2 (2026)
-
Path planning in swarm robotics exploration using SARSA and ACO algorithms
Aicha HAFID, Riadh HOCINE, Lahcene GUEZOULI1-15
-
Detection of suspicious facial objects in neutral ATMs using deep learning architectures based on YOLOV8 and Faster R-CNN
Marco Manuel ARAGON PAUCAR, Kelvin Yhonson FERNANDEZ ACERO, Erasmo SULLA ESPINOZA16-32
-
Assessing the effectiveness of one-stage and two-stage methods for identifying high-voltage power grid equipment in UAV imagery
Thi Thanh Tan NGUYEN, Thi Thu Nga VU33-47
-
An automatic speech recognition approach for controlled medications prescription with natural language processing
Luis Enrique COLMENARES-GUILLÉN, Angel Axel MÉNDEZ-MENESES48-66
-
Improving image retrieval using CNN with PCA and Optimized K-Means clustering
Mohsin Hasan HUSSEIN, Ali Mohsin Ahmed AL-SABAAWI, Zakaria A. Hamed ALNAISH67-84
-
Numerical investigation into the hydrodynamic characteristics of water vortex turbines with varied blade angles
Sarwo EDHY SOFYAN, Zamzami, Akhyar AKHYAR, Suriadi, Agus SASMITO85-104
-
Optimization of the corporate cluster structure using the Tabu Search method
Andrzej IMIEŁOWSKI, Łukasz BANAŚ, Bogusław TWARÓG, Janusz BYTNAR105-116
-
Application controls audit framework in the context of ERP systems
Sakchai TANGPRASERT, Nalinpat BHUMPENPEIN117-125
-
Autonomous AI agents in digital markets: Economic implications for competition, pricing, and regulation
Elmira KYDYRBAYEVA, Balhiya SHOMSHEKOVA, Asset ABZHAKOV, Ainur ASHIMOVA, Assel NURTAYEVA126-137
-
Multi-criteria analysis of parameter impact in large-scale robotic 3D printing
Łukasz SOBASZEK, Ivan GAJDOŠ, Pavol ŠTEFČÁK138-147
-
Designing cloud-based knowledge management systems to improve organizational innovation
Hayfaa Subhi MALALLAH, Sherzad Mohammad AJEEL148-168
-
Data normalisation methods on microarray data
Inggih PERNAMA, Shir Li WANG, Hoi Yeh LEE, Suliana SULAIMAN, Hasnatul Nazuha HASSAN169-179
-
Log-based learning analytics of gamified Moodle activities: Quantifying student engagement
Iva GRUBJEŠIĆ, Tomislav IVANJKO, Vedran JURIČIĆ180-192
-
SFAB-Net: Semantic segmentation network for railway track surface defects based on Spatial Fusion and Adaptive Bottleneck feature enhancement
Qike WU, Sharafiz ABDUL RAHIM, Sai Hong TANG, Muhammad Azim AZIZI, Li ZHANG193-207
-
Machine learning approach to detect GAI-disguised academic programming plagiarism
Oscar KARNALIM, Yehezkiel David SETIAWAN, Maresha Caroline WIJANTO, Rossevine Artha NATHASYA208-224
Archives
-
Vol. 22 No. 2
2026-06-30 15
-
Vol. 22 No. 1
2026-03-31 15
-
Vol. 21 No. 4
2025-12-31 12
-
Vol. 21 No. 3
2025-09-30 12
-
Vol. 21 No. 2
2025-06-30 12
-
Vol. 21 No. 1
2025-03-31 12
-
Vol. 20 No. 4
2024-12-31 12
-
Vol. 20 No. 3
2024-09-30 12
-
Vol. 20 No. 2
2024-06-30 12
-
Vol. 20 No. 1
2024-03-30 12
-
Vol. 19 No. 4
2023-12-31 10
-
Vol. 19 No. 3
2023-09-30 10
-
Vol. 19 No. 2
2023-06-30 10
-
Vol. 19 No. 1
2023-03-31 10
-
Vol. 18 No. 4
2022-12-30 8
-
Vol. 18 No. 3
2022-09-30 8
-
Vol. 18 No. 2
2022-06-30 8
-
Vol. 18 No. 1
2022-03-31 8
Main Article Content
Authors
zakriahamoalnaish@uomosul.edu.iq
Abstract
Content-based image retrieval (CBIR) systems play an important role in many applications, including object recognition, digital forensics, and biomedical research. Handling large volumes of digital images in our daily lives requires the development of an efficient CBIR system. In this paper, a comprehensive and unified approach for image retrieval has been introduced. It brings together deep learning, feature reduction, and clustering optimization in a way that hasn't been explored before. The pre-trained VGG16 model has been used to extract rich, high-level features from images. To make the process more efficient and focused, Principal Component Analysis (PCA) is applied to reduce the number of features while retaining the most important information with minimal runtime. What sets our work apart is the use of hybrid K-means clustering with Particle Swarm Optimization (PSO). Instead of relying on random initialization, PSO helps find better starting points and improves the overall clustering by guiding K-means away from local minima. These well-known techniques are combined into a single, coherent framework called VGG16-PCA-PSO-K-means. The effectiveness of the proposed method has been evaluated on the Corel 1K and UC Merced Land Use datasets using mean average precision (mAP), clustering purity, recall, F-score, NDCG, and runtime. The results of the experiments conducted indicate that the proposed system achieves higher precision and clustering purity than state-of-the-art methods. The improvement ranged from 5% to 18% across different feature set sizes, with mAP@10 of 97.5% for the 10-class retrieval of the Corel 1K and mAP@10 of 96.7% for the 21-class retrieval of the UC Merced Land Use datasets, using only 30 features. Furthermore, the results of the Kendall's W and Friedman tests confirmed that the (VGG16-PCA-PSO-K-means) model achieved a higher rank than other methods in the literature, with a significant difference, highlighting its effectiveness and robustness.
Keywords:
Sustainable Development Goals (SDG)
- 9 - Industry, Innovation, Technology and Infrastructure
References
Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101
Al-Kababchee, S. G. M., Algamal, Z. Y., & Qasim, O. S. (2023). Enhancement of K-means clustering in big data based on the equilibrium optimizer algorithm. Journal of Intelligent Systems, 32(1), Article 20220230. https://doi.org/10.1515/jisys-2022-0230
Alkahya, M. A., Alreahan, H. O., & Algamal, Z. Y. (2023). Classification of breast cancer histopathological images using adaptive penalized logistic regression with the Wilcoxon rank sum test. Electronic Journal of Applied Statistical Analysis, 16(3), 507–518. https://doi.org/10.1285/i20705948v16n3p507
Al-Mani, I. A., Al-Sabaawi, A. M. A., & Hussien, M. H. (2022). A review paper of model-based collaborative filtering techniques. In Proceedings of the 2022 International Conference on Data Science and Intelligent Computing (ICDSIC). IEEE. https://doi.org/10.1109/ICDSIC56987.2022.10076148
Alnaish, Z. A. H., & Hasoon, S. O. (2023). Hybrid binary whale optimization algorithm based on taper-shaped transfer function for software defect prediction. Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska, 13(1), 74–79. https://doi.org/10.35784/iapgos.4569
Alneamy, J. S., & Alnaish, R. A. H. (2022). A comparative study among some natural-inspired optimization algorithms. In Proceedings of the 2022 8th International Conference on Contemporary Information Technology and Mathematics (ICCITM) (pp. 1–6). IEEE. https://doi.org/10.1109/ICCITM56309.2022.10031877
Al-Sabaawi, A. M. A., Karacan, H., & Yenice, Y. E. (2021). SVD++ and clustering approaches to alleviating the cold-start problem for recommendation systems. International Journal of Innovative Computing, Information and Control, 17(2), 383–396. https://doi.org/10.24507/ijicic.17.02.383
Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In Computer Vision – ECCV 2014: 13th European Conference, Proceedings, Part I (pp. 584–599). Springer. https://doi.org/10.1007/978-3-319-10590-1_38
Bimantara, I. M. S., & Widiartha, I. M. (2023). Optimization of K-Means clustering using particle swarm optimization algorithm for grouping traveler reviews data on TripAdvisor sites. Jurnal Ilmiah Kursor, 12(1), 1–10. https://doi.org/10.21107/kursor.v12i01.269
Blanchard, E. E., Oner, B., Allgood, A., Peterson, D. T., Zengul, F. D., & Brown, M. R. (2024). Evolution of simulation scholarship: A text mining exploration. Clinical Simulation in Nursing, 96, Article 101620. https://doi.org/10.1016/j.ecns.2024.101620
Cao, Z., Shaomin, M., Yongyu, X., & Dong, M. (2018). Image retrieval method based on CNN and dimension reduction. In Proceedings of the 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC). IEEE. https://doi.org/10.1109/SPAC46244.2018.8965601
Chen, R., Pan, L., Zhou, Y., & Lei, Q. (2020). Image retrieval based on deep feature extraction and reduction with improved CNN and PCA. Journal of Information Hiding and Privacy Protection, 2(2), 67–75. https://doi.org/10.32604/jihpp.2020.010472
Dai, L., Johar, M. G. M., & Alkawaz, M. H. (2024). Semi-supervised medical image segmentation via frequency attention with DCT and data exchange: The FAS-Net approach. Journal of Logistics, Informatics and Service Science, 11(11), 178–195. https://doi.org/10.33168/JLISS.2024.1111
Desai, P., Pujari, J., Sujatha, C., Kamble, A., & Kambli, A. (2021). Hybrid approach for content-based image retrieval using VGG16 layered architecture and SVM: An application of deep learning. SN Computer Science, 2(3), Article 170. https://doi.org/10.1007/s42979-021-00529-4
Gautam, G., & Khanna, A. (2024). Content-based image retrieval system using CNN-based deep learning models. Procedia Computer Science, 235, 3131–3141. https://doi.org/10.1016/j.procs.2024.04.296
Ghaleb, M. S., Ebied, H. M., Shedeed, H. A., & Tolba, M. F. (2021). Content-based image retrieval based on convolutional neural networks. In Proceedings of the 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS) (pp. 1–6). IEEE. https://doi.org/10.1109/ICICIS52592.2021.9694146
Ghaleb, M. S., Ebied, H. M., Shedeed, H. A., & Tolba, M. F. (2022). Image retrieval based on deep learning. Journal of System and Management Sciences, 12(2), 477–496.
Greenacre, M., Groenen, P. J. F., Hastie, T., d’Enza, A. I., Markos, A., & Tuzhilina, E. (2022). Principal component analysis. Nature Reviews Methods Primers, 2(1), Article 100. https://doi.org/10.33168/JSMS.2022.0226
Hasoon, S. O., & Jasim, Y. A. (2013). Diagnose window problems based on hybrid intelligence systems. Journal of Engineering Science & Technology (JESTEC), 8(5), 566–578.
Hassan, R. Q., Sultani, Z. N., & Dhannoon, B. N. (2023). Content-based image retrieval based on the Corel dataset using deep learning. International Journal of Artificial Intelligence, 12(2), 8938.
Hussein, M. H., Nawaf, H. N., & Bhaya, W. S. (2017). Exploiting the shared neighborhood to improve the quality of social community detection. In Proceedings of the 2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT). IEEE. https://doi.org/10.1109/NTICT.2017.7976121
Jan, S. L., & Shieh, G. (2025). An improved nonparametric test and sample size procedures for the randomized complete block designs. Sankhya B, 1–26. https://doi.org/10.1007/s13571-025-00362-2
Jeunen, O., Potapov, I., & Ustimenko, A. (2023). On (normalised) discounted cumulative gain as an off-policy evaluation metric for top-n recommendation. ArXiv, abs/2307.13555. https://doi.org/10.48550/arXiv.2307.13555
Kannagi, A., & Lanke, R. (2022). Image retrieval based on deep learning-convolutional neural networks. In Proceedings of the 2022 International Interdisciplinary Humanitarian Conference for Sustainability (IIHC). IEEE. https://doi.org/10.1109/IIHC55949.2022.10060450
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images [Technical report]. University of Toronto.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310
Legendre, P. (2005). Species associations: The Kendall coefficient of concordance revisited. Journal of Agricultural, Biological, and Environmental Statistics, 10(2), 226–245. https://doi.org/10.1198/108571105X46642
Lian, X., Xia, N., Dai, G., & Yang, H. (2025). An efficient joint training model for monaural noisy-reverberant speech recognition. Applied Acoustics, 228, Article 110322. https://doi.org/10.1016/j.apacoust.2024.110322
Maazalahi, M., & Hosseini, S. (2024). K-means and meta-heuristic algorithms for intrusion detection systems. Cluster Computing, 27(8), 10377–10419. https://doi.org/10.1007/s10586-024-04510-7
Mohammed, N. T., Hussein, M. H., & Rashid, A. J. (2020). PAM clustering aided Android malicious apps detection. In IOP Conference Series: Materials Science and Engineering, 928, 032041. IOP Publishing. https://doi.org/10.1088/1757-899X/928/3/032041
Mon, A. N., Pa, W. P., & Thu, Y. K. (2017). Exploring the effect of tones for Myanmar language speech recognition using a convolutional neural network (CNN). In Proceedings of the International Conference of the Pacific Association for Computational Linguistics. Springer. https://doi.org/10.1007/978-981-10-8438-6_25
Palla, M., & Karra, R. (2025). A comparative study of pre-trained CNN models with transfer learning for content-based image retrieval. Engineering, Technology & Applied Science Research, 15(4), 25820–25826. https://doi.org/10.48084/etasr.11496
Pardede, J., Sitohang, B., Akbar, S., & Khodra, M. L. (2019). Improving the performance of CBIR using an XGBoost classifier with deep CNN-based feature extraction. In Proceedings of the 2019 International Conference on Data and Software Engineering (ICoDSE). IEEE. https://doi.org/10.1109/ICoDSE48700.2019.9092754
Ran, X., Suyaroj, N., Tepsan, W., Ma, J., Zhou, X., & Deng, W. (2024). A hybrid genetic-fuzzy ant colony optimization algorithm for automatic K-means clustering in urban global positioning systems. Engineering Applications of Artificial Intelligence, 137, Article 109237. https://doi.org/10.1016/j.engappai.2024.109237
Saeed, A. Q., Aldulaimi, M. H., Ismail, I. A., Ahmed, I. M., Yahya, Y. A., Kharma, Q. M., & Ghazal, T. M. (2024). Integrating three machine learning algorithms in an ensemble learning model for improving content-based spam email recognition. Journal of Soft Computing and Data Mining, 5(2), 188–196. https://doi.org/10.30880/JSCDM.2024.05.02.014
Saritha, R. R., Paul, V., & Kumar, P. G. (2019). Content-based image retrieval using a deep learning process. Cluster Computing, 22(2), 4187–4200. https://doi.org/10.1007/s10586-018-1731-0
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv, abs/1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Sudhish, D. K., & Nair, L. R. (2024). Content-based image retrieval for medical diagnosis using fuzzy clustering and deep learning. Biomedical Signal Processing and Control, 88, Article 105620. https://doi.org/10.1016/j.bspc.2023.105620
Ugale, P., & Railkar, P. (2025). Content-based remote sensing image retrieval based on two-way feature representation using ResNet50 and modified multiscale local ternary pattern. Mathematical Modelling of Engineering Problems, 12(2), 657–668. https://doi.org/10.18280/mmep.120902
Wang, D., Tan, D., & Liu, L. (2018). Particle swarm optimization algorithm: An overview. Soft Computing, 22(2), 387–408. https://doi.org/10.1007/s00500-016-2474-6
Wang, X., Du, Y., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., & Han, X. (2023). RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval. Medical Image Analysis, 83, Article 102645. https://doi.org/10.1016/j.media.2022.102645
Younus, Z. S., Mohamad, D., Saba, T., Alkawaz, M. H., Rehman, A., Al-Rodhaan, M., & Al-Dhelaan, A. (2015). Content-based image retrieval using PSO and the k-means clustering algorithm. Arabian Journal of Geosciences, 8(8), 6211–6224. https://doi.org/10.1007/s12517-014-1584-7
Zhu, X., Zhang, Y., & Wang, F. (2016). Reducing the dimensionality of data with sparse autoencoder network. Journal of Shenyang Ligong University, 35(1).
Article Details
Abstract views: 18
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
