Improving image retrieval using CNN with PCA and Optimized K-Means clustering

Main Article Content

Mohsin Hasan HUSSEIN

mohsin.h@uokerbala.edu.iq

Ali Mohsin Ahmed AL-SABAAWI

ali.mohsin@uoninevah.edu.iq

Zakaria A. Hamed ALNAISH

zakriahamoalnaish@uomosul.edu.iq

Abstract

Content-based image retrieval (CBIR) systems play an important role in many applications, including object recognition, digital forensics, and biomedical research. Handling large volumes of digital images in our daily lives requires the development of an efficient CBIR system. In this paper, a comprehensive and unified approach for image retrieval has been introduced. It brings together deep learning, feature reduction, and clustering optimization in a way that hasn't been explored before. The pre-trained VGG16 model has been used to extract rich, high-level features from images. To make the process more efficient and focused, Principal Component Analysis (PCA) is applied to reduce the number of features while retaining the most important information with minimal runtime. What sets our work apart is the use of hybrid K-means clustering with Particle Swarm Optimization (PSO). Instead of relying on random initialization, PSO helps find better starting points and improves the overall clustering by guiding K-means away from local minima. These well-known techniques are combined into a single, coherent framework called VGG16-PCA-PSO-K-means. The effectiveness of the proposed method has been evaluated on the Corel 1K and UC Merced Land Use datasets using mean average precision (mAP), clustering purity, recall, F-score, NDCG, and runtime. The results of the experiments conducted indicate that the proposed system achieves higher precision and clustering purity than state-of-the-art methods. The improvement ranged from 5% to 18% across different feature set sizes, with mAP@10 of 97.5% for the 10-class retrieval of the Corel 1K and mAP@10 of 96.7% for the 21-class retrieval of the UC Merced Land Use datasets, using only 30 features. Furthermore, the results of the Kendall's W and Friedman tests confirmed that the (VGG16-PCA-PSO-K-means) model achieved a higher rank than other methods in the literature, with a significant difference, highlighting its effectiveness and robustness.

Keywords:

content-based image retrieval (CBIR) systems, particle swarm optimization (PSO) algorithms, VGG16, K-means

Sustainable Development Goals (SDG)

  • 9 - Industry, Innovation, Technology and Infrastructure

References

Article Details

HUSSEIN, M. H., AL-SABAAWI, A. M. A., & ALNAISH, Z. A. H. (2026). Improving image retrieval using CNN with PCA and Optimized K-Means clustering. Applied Computer Science, 22(2), 67–84. https://doi.org/10.35784/acs_8124