Examination of text's lexis using a Polish dictionary
Article Sidebar
Open full text
Issue Vol. 21 (2021)
-
Comparison of selected mathematical functions for the analysis of growth behavior of items and physical interpretation of Avrami-Weibull function
Keshra Sangwal259-278
-
Comparison of classical machine learning algorithms in the task of handwritten digits classification
Oleksandr Voloshchenko, Małgorzata Plechawska-Wójcik279-286
-
The comparative analysis of Java frameworks: Spring Boot, Micronaut and Quarkus
Maciej Jeleń, Mariusz Dzieńkowski287-294
-
Usability analysis taking into consideration the aspects of accessibility of selected university websites
Karol Kałan, Damian Karpiuk, Mariusz Dzieńkowski295-302
-
A comparison of conventional and deep learning methods of image classification
Maryna Dovbnych, Małgorzata Plechawska–Wójcik303-308
-
Comparative analysis of connection performance with databases via JDBC interface and ORM programming frameworks
Mateusz Żuchnik, Piotr Kopniak309-315
-
Examination of text's lexis using a Polish dictionary
Roman Voitovych, Edyta Łukasik316-323
-
Comparison of capabilities of the Unity environment and LibGDX in terms of computer game development
Piotr Kosidło, Karol Kowalczyk, Marcin Badurowicz324-329
-
Performance analysis of the TensorFlow library with different optimisation algorithms
Maciej Wadas, Jakub Smołka330-335
-
Analysis of user experience during interaction with selected CMS platforms
Michał Miszczak, Mariusz Dzieńkowski336-343
-
Analysis of polish community on streaming platform twitch.tv during COVID-19 epidemy
Kamil Jeżowski, Marcin Badurowicz344-348
-
A study of the user experience when interacting with applications that work with sports armbands to monitor human activity
Mateusz Kiryczuk, Paweł Kocyła, Mariusz Dzieńkowski349-355
-
Performance comparison of programming interfaces on the example of REST API, GraphQL and gRPC
Mariusz Śliwa, Beata Pańczyk356-361
-
Digital entertainment in the face of COVID-19
Adam Jarszak362-366
-
Symfony and Laravel – a comparative analysis of PHP programming frameworks
Krzysztof Kuflewski, Mariusz Dzieńkowski367-372
-
A comparative analysis of cryptocurrency wallet management tools
Kamil Biernacki, Małgorzata Plechawska-Wójcik373-377
-
Analysis of data storage methods available in the Android SDK
Dominika Kornaś378-382
-
An analysis of the possibility of realization steganography in C#
Piotr Pawlak, Jakub Podgórniak, Grzegorz Kozieł383-390
Main Article Content
DOI
Authors
Abstract
This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preassembled dictionary and Jaccard index, we managed to prove a compact hypothesis concerning similar books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Overall static behaviour of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows drawing conclusions regarding the connection between any arbitrary books based solely on their vocabulary.
Keywords:
References
R. Singh, S. Singh, Text Similarity Measures in News Articles by Vector Space Model Using NLP, Journal of The Institution of Engineers (India): Series B 102 (2021) 329–338. DOI: https://doi.org/10.1007/s40031-020-00501-5
A. Huang, Similarity Measures for Text Document Clustering, Proceedings of the Sixth New Zealand Computer Science Research Student Conference 4 (2008) 49–56.
M. B. Magara, S. O. Ojo, T. Zuva, A Comparative Analysis of Text Similarity Measures and Algorithms in Research Paper Recommender Systems, 2018 Conference on Information Communications Technology and Society (2018) 1–5.
A. W. Qurashi, V. Holmes, A. P. Johnson, Document Processing: Methods for Semantic Text Similarity Analysis, In 2020 International Conference on INnovations in Intelligent SysTems and Applications (2020) 1–6. DOI: https://doi.org/10.1109/INISTA49547.2020.9194665
W. H. Gomaa, A. A. Fahmy, A Survey of Text Similarity Approaches, International Journal of Computer Applications 68 (2013) 13–18. DOI: https://doi.org/10.5120/11638-7118
S. Bekmirzaev, T. H. Kim, B. C. Lee, Pairwise Similarity Analysis and Quality Estimation on Classical Chinese Poetry of Ancient Korea in 15th Century, International Journal of Applied Engineering Research 12 (2017) 13884–13890.
D. M. Kaplan, D. M. Blei, A Computational Approach to Style in American Poetry, In Seventh IEEE International Conference on Data Mining (2007) 553–558. DOI: https://doi.org/10.1109/ICDM.2007.76
C. D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing, MIT press, 1999.
R. Grishman, Computational Linguistics: An Introduction, Cambridge University Press, 1986. DOI: https://doi.org/10.1017/CBO9780511611797
R. Grzegorczykowa, R. Laskowski, H. Wróbel, Gramatyka współczesnego języka polskiego. Morfologia, Wydawnictwo Naukowe PWN, 1999.
S. Niwattanakul, J. Singthongchai, E. Naenudorn, S. Wanapu, Using of Jaccard Coefficient for Keywords Similarity, In Proceedings of the International Multiconference of Engineers and Computer Scientists 1 (2013) 380–384.
L. Kaufman, P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, 2009.
Słownik języka polskiego, https://sjp.pl, [18.09.2021].
Article Details
Abstract views: 204
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
