CKSD: Comprehensive Kurdish-Sorani database

Main Article Content

DOI

Jihad Anwar Qadir

jihad.qadir@uor.edu.krd

https://orcid.org/0000-0003-3958-814X
Samer Kais Jameel

samer.kais@uor.edu.krd

https://orcid.org/0000-0003-2236-9303
Wshyar Omar Khudhur

wshyar.khudhur@epu.edu.iq

Kamaran H. Manguri

kamaran@uor.edu.krd

https://orcid.org/0000-0001-8567-3367

Abstract

Every individual has a specific language with which he/she communicates. Each language has special letters and features distinguishing it from other languages. Ideas, cultures, and sciences are exchanged through some notions of languages, including retrieval, translation, and classification of texts from journals, books, journals, research, and the internet. It is accomplished through database availability. Unfortunately, due to some reasons, Kurdish language databases may be rare or non-existent. In the present study, a Comprehensive Kurdish-Sorani Database (CKSD) is generated, which contains datasets of dates, letters, and common words in the Kurdish language, as well as the documents employed for the extraction of these datasets. Elements of these collections were extracted from the written documents in 27 different fonts. It bestows a comprehensiveness feature to the CKSD database that can be utilized by researchers. In order to determine the extent to which classifiers can categorize such data, these data were utilized in this study. Indeed, this study demonstrated the reliability of this data and its suitability for use in the fields of machine learning and other artificial intelligence applications.

Keywords:

CKSD, OCR, font recognition, character recognition, font style

References

Article Details

Qadir, J. A., Jameel, S. K., Khudhur, W. O., & Manguri, K. H. (2025). CKSD: Comprehensive Kurdish-Sorani database. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 15(1), 153–156. https://doi.org/10.35784/iapgos.6521