Examination of text's lexis using a Polish dictionary

Roman Voitovych; Edyta Łukasik

doi:10.35784/jcsi.2731

PDF

Published: Dec 30, 2021

DOI: https://doi.org/10.35784/jcsi.2731

Issue Vol. 21 (2021)

Articles

Comparison of selected mathematical functions for the analysis of growth behavior of items and physical interpretation of Avrami-Weibull function
Keshra Sangwal

259-278
Comparison of classical machine learning algorithms in the task of handwritten digits classification
Oleksandr Voloshchenko, Małgorzata Plechawska-Wójcik

279-286
The comparative analysis of Java frameworks: Spring Boot, Micronaut and Quarkus
Maciej Jeleń, Mariusz Dzieńkowski

287-294
Usability analysis taking into consideration the aspects of accessibility of selected university websites
Karol Kałan, Damian Karpiuk, Mariusz Dzieńkowski

295-302
A comparison of conventional and deep learning methods of image classification
Maryna Dovbnych, Małgorzata Plechawska–Wójcik

303-308
Comparative analysis of connection performance with databases via JDBC interface and ORM programming frameworks
Mateusz Żuchnik, Piotr Kopniak

309-315
Examination of text's lexis using a Polish dictionary
Roman Voitovych, Edyta Łukasik

316-323
Comparison of capabilities of the Unity environment and LibGDX in terms of computer game development
Piotr Kosidło, Karol Kowalczyk, Marcin Badurowicz

324-329
Performance analysis of the TensorFlow library with different optimisation algorithms
Maciej Wadas, Jakub Smołka

330-335
Analysis of user experience during interaction with selected CMS platforms
Michał Miszczak, Mariusz Dzieńkowski

336-343
Analysis of polish community on streaming platform twitch.tv during COVID-19 epidemy
Kamil Jeżowski, Marcin Badurowicz

344-348
A study of the user experience when interacting with applications that work with sports armbands to monitor human activity
Mateusz Kiryczuk, Paweł Kocyła, Mariusz Dzieńkowski

349-355
Performance comparison of programming interfaces on the example of REST API, GraphQL and gRPC
Mariusz Śliwa, Beata Pańczyk

356-361
Digital entertainment in the face of COVID-19
Adam Jarszak

362-366
Symfony and Laravel – a comparative analysis of PHP programming frameworks
Krzysztof Kuflewski, Mariusz Dzieńkowski

367-372
A comparative analysis of cryptocurrency wallet management tools
Kamil Biernacki, Małgorzata Plechawska-Wójcik

373-377
Analysis of data storage methods available in the Android SDK
Dominika Kornaś

378-382
An analysis of the possibility of realization steganography in C#
Piotr Pawlak, Jakub Podgórniak, Grzegorz Kozieł

383-390

Authors

Roman Voitovych

roman.voitovych@pollub.edu.pl

Lublin University of Technology, Poland

Edyta Łukasik

e.lukasik@pollub.pl

Lublin University of Technology, Poland

https://orcid.org/0000-0003-3644-9769

Abstract

This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove a compact hypothesis concerning similar books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Overall static behaviour of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows drawing conclusions regarding the connection between any arbitrary books based solely on their vocabulary.

Keywords:

natural language processing, lexis analysis, Jaccard similarity coefficient, Partitioning Around Medoids

References

R. Singh, S. Singh, Text Similarity Measures in News Articles by Vector Space Model Using NLP, Journal of The Institution of Engineers (India): Series B 102 (2021) 329–338. DOI: https://doi.org/10.1007/s40031-020-00501-5

A. Huang, Similarity Measures for Text Document Clustering, Proceedings of the Sixth New Zealand Computer Science Research Student Conference 4 (2008) 49–56.

M. B. Magara, S. O. Ojo, T. Zuva, A Comparative Analysis of Text Similarity Measures and Algorithms in Research Paper Recommender Systems, 2018 Conference on Information Communications Technology and Society (2018) 1–5.

A. W. Qurashi, V. Holmes, A. P. Johnson, Document Processing: Methods for Semantic Text Similarity Analysis, In 2020 International Conference on INnovations in Intelligent SysTems and Applications (2020) 1–6. DOI: https://doi.org/10.1109/INISTA49547.2020.9194665

W. H. Gomaa, A. A. Fahmy, A Survey of Text Similarity Approaches, International Journal of Computer Applications 68 (2013) 13–18. DOI: https://doi.org/10.5120/11638-7118

S. Bekmirzaev, T. H. Kim, B. C. Lee, Pairwise Similarity Analysis and Quality Estimation on Classical Chinese Poetry of Ancient Korea in 15th Century, International Journal of Applied Engineering Research 12 (2017) 13884–13890.

D. M. Kaplan, D. M. Blei, A Computational Approach to Style in American Poetry, In Seventh IEEE International Conference on Data Mining (2007) 553–558. DOI: https://doi.org/10.1109/ICDM.2007.76

C. D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing, MIT press, 1999.

R. Grishman, Computational Linguistics: An Introduction, Cambridge University Press, 1986. DOI: https://doi.org/10.1017/CBO9780511611797

R. Grzegorczykowa, R. Laskowski, H. Wróbel, Gramatyka współczesnego języka polskiego. Morfologia, Wydawnictwo Naukowe PWN, 1999.

S. Niwattanakul, J. Singthongchai, E. Naenudorn, S. Wanapu, Using of Jaccard Coefficient for Keywords Similarity, In Proceedings of the International Multiconference of Engineers and Computer Scientists 1 (2013) 380–384.

L. Kaufman, P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, 2009.

Słownik języka polskiego, https://sjp.pl, [18.09.2021].

Voitovych, R., & Łukasik, E. (2021). Examination of text’s lexis using a Polish dictionary. Journal of Computer Sciences Institute, 21, 316–323. https://doi.org/10.35784/jcsi.2731

Examination of text's lexis using a Polish dictionary

Issue Vol. 21 (2021)

Archives

Authors

Abstract

Keywords:

References

License

Article Sidebar

Issue Vol. 21 (2021)

Archives

Main Article Content

Authors

Abstract

Keywords:

References

Article Details

License