Model of the text classification system using fuzzy sets

Dmytro Salahor

s97218@pollub.edu.pl
Politechnika Lubelska (Poland)

Jakub Smołka


Politechnika Lubelska (Poland)

Abstract

Classification of work’s subject area by keywords is an actual and important task. This article describes algorithms for classifying keywords by subject area. A model was developed using both algorithms and tested on test data. The results were compared with the results of other existing algorithms suitable for this tasks. The obtained results of the model were analysed. This algorithm can be used in real-life tasks.


Keywords:

text classification; “fuzzy” sets, classification, fuzzy rules, fuzzy logic

L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and regression trees, Wadsworth & Brooks, Pacific Grove, 1984.
  Google Scholar

G. V. Kass, An exploratory technique for investigating large quantities of categorical data, Applied Statistics 29 (1980) 119–127.
DOI: https://doi.org/10.2307/2986296   Google Scholar

E. B. Hunt, J. Marin, P. J. Stone, Experiments in induction, Academic, New York, 1966.
  Google Scholar

R. S. Michalski, J. G. Carbonell, T. M. Mitchell, Machine learning. An artificial intelligence approach (1983) 463–482.
DOI: https://doi.org/10.1007/978-3-662-12405-5   Google Scholar

J. R. Quinlan, Induction of decision trees, Machine Learning 1 (1986) 81–106.
DOI: https://doi.org/10.1007/BF00116251   Google Scholar

B. Boser, I. Guyon, V. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of annual conference computational learning theory, ACM Press, Pittsburgh (1992) 144–152.
DOI: https://doi.org/10.1145/130385.130401   Google Scholar

C. Cortes, V. Vapnik, Support vector networks, Machine Learning 20 (1995) 273–297.
DOI: https://doi.org/10.1007/BF00994018   Google Scholar

J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Transactions on Information Theory 44 (1998) 1926–1940.
DOI: https://doi.org/10.1109/18.705570   Google Scholar

J. Shawe-Taylor, N. Cristianini, Margin distribution and soft margin, Advances in large margin classifiers, MIT Press, Cambridge (2000) 349–358.
  Google Scholar

T. Joachims, Text categorization with support vector machines: Learning with many relevant features, Proceedings of the European conference on machine learning, Springer, Berlin (1998) 137–142.
DOI: https://doi.org/10.1007/BFb0026683   Google Scholar

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review 65 (1958) 386–408.
DOI: https://doi.org/10.1037/h0042519   Google Scholar

J. Schmidhuber, Deep Learning in Neural Networks: An Overview, Neural Networks 61 (2015) 85–117.
DOI: https://doi.org/10.1016/j.neunet.2014.09.003   Google Scholar

S. E. Dreyfus, Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure, Journal of Guidance, Control, and Dynamics 13 (1990) 926–928.
DOI: https://doi.org/10.2514/3.25422   Google Scholar

E. Mizutani, S. E. Dreyfus, K. Nishio, On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application, IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium 2 (2000) 167–172.
DOI: https://doi.org/10.1109/IJCNN.2000.857892   Google Scholar

M. A. Fauzi, Automatic Complaint Classification System Using Classifier Ensembles, Telfor Journal 10 (2018) 123-128.
DOI: https://doi.org/10.5937/telfor1802123A   Google Scholar

D. Lewis, Naive Bayes at forty: the independence assumption in information retrieval, Proceedings of the 10th European Conference on Machine Learning, Springer, Berlin (1998) 4–15.
DOI: https://doi.org/10.1007/BFb0026666   Google Scholar

A. McCallum, K. Nigam, A comparison of event models for Naive Bayes text classification, AAAI-98 Workshop on Learning for Text Categorization, AAAI Press, California (1998) 41–48.
  Google Scholar

R. Lau, R. Rosenfeld, S. Roukos, Adaptive language modelling using the maximum entropy principle. Proceedings of the ARPA Human Language Technology Workshop, San Francisco (1993) 108–113.
DOI: https://doi.org/10.3115/1075671.1075695   Google Scholar

A. L. Berger, S. A. Della Pietra, V. J. Della Pietra, A maximum entropy approach to natural language processing, Computational Linguistics 22 (1996) 39–71.
  Google Scholar

N. S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician 46 (1992) 175–185.
DOI: https://doi.org/10.1080/00031305.1992.10475879   Google Scholar

T. K. Ho, Random Decision Forests, Proceedings of the 3rd International Conference on Document Analysis and Recognition 14–16, Montreal (1995) 278–282.
  Google Scholar

T. K. Ho, The Random Subspace Method for Constructing Decision Forests, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832–844, http://dx.doi.org/10.1109/34.709601.
DOI: https://doi.org/10.1109/34.709601   Google Scholar

A. Ciapetti, R. Di Florio, L. Lomasto, G. Miscione, G. Ruggiero, D. Toti, NETHIC: A System for Automatic Text Classification using Neural Networks and Hierarchical Taxonomies, Proceedings of the 21st International Conference on Enterprise Information Systems 1 (2019) 296-306.
DOI: https://doi.org/10.5220/0007709702960306   Google Scholar

G. Krishnalal, S. Rengarajan, K. Srinivasagan, A New Text Mining Approach Based on HMM-SVM for Web News Classification, International Journal of Computer Applications 1 (2010) 98-104. DOI. 10.5120/395-589
DOI: https://doi.org/10.5120/395-589   Google Scholar

L. E. Baum, T. Petrie, Statistical Inference for Probabilistic Functions of Finite State Markov Chains, The Annals of Mathematical Statistics 37 (2019) 1554–1563.
DOI: https://doi.org/10.1214/aoms/1177699147   Google Scholar

L. E. Baum, G. R. Sell, Growth transformations for functions on manifolds, Pacific Journal of Mathematics 27 (1968) 211–227.
DOI: https://doi.org/10.2140/pjm.1968.27.211   Google Scholar

M. I. Khaleel, I. I. Hmeidi, H. M. Najadat, An Automatic Text Classification System Based on Genetic Algorithm, Proceedings of the The 3rd Multidisciplinary International Social Networks Conference on Social Informatics 31 (2016) 1–7.
DOI: https://doi.org/10.1145/2955129.2955174   Google Scholar

N. Medagoda, S. Shanmuganathan, Keywords based temporal sentiment analysis, 12th International Conference on Fuzzy Systems and Knowledge Discovery (2015) 1418-1425.
DOI: https://doi.org/10.1109/FSKD.2015.7382152   Google Scholar

R. Batool, A. M. Khattak, J. Maqbool, S. Lee, Precise tweet classification and sentiment analysis, 12th International Conference on Computer and Information Science (2013) 461-466.
DOI: https://doi.org/10.1109/ICIS.2013.6607883   Google Scholar

M. A. Gadamer, A. Horzyk, Semi-automatic contextual analysis and correction of texts by specialized linguistic graphs, AGH University of Science and Technology, 2019.
  Google Scholar

L. A. Zadeh, Fuzzy sets, Information and Control 8 (1965) 338-353.
DOI: https://doi.org/10.1016/S0019-9958(65)90241-X   Google Scholar

P. Karczmarek, Selected problems of face recognition and decision-making theory, Wydawnictwo Politechniki Lubelskiej, 2018.
  Google Scholar

The website for Elsevier B.V., Open database, https://www.scopus.com, [01.04.2021].
  Google Scholar

M. Riedmiller, H. Braun, A direct adaptive method for faster backpropagation learning: the RPROP algorithm, Proceedings of the IEEE International Conference on Neural Networks 16 Piscataway (1993) 586-591.
DOI: https://doi.org/10.1109/ICNN.1993.298623   Google Scholar

J Platt. Fast Training of Support Vector Machines Using Sequential Minimal Optimization, Advances in Kernel Methods: Support Vector Learning (1999) 185-208.
DOI: https://doi.org/10.7551/mitpress/1130.003.0016   Google Scholar

S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy, Improvements to Platt's SMO Algorithm for SVM Classifier Design, Neural Computation 13 (2001) 637-649, http://dx.doi.org/10.1162/089976601300014493.
DOI: https://doi.org/10.1162/089976601300014493   Google Scholar

S. L. Salzberg. C4.5: Programs for Machine Learning by J. Ross Quinlan, Machine Learning 16, Morgan Kaufmann Publishers (1994) 235–240, http://dx.doi.org/10.1007/BF00993309.
DOI: https://doi.org/10.1007/BF00993309   Google Scholar

J. Shafer, R. Agrawal, M. Mehta, SPRINT: A scalable parallel classifier for data mining, VLDB, 2000.
  Google Scholar

Download


Published
2021-06-30

Cited by

Salahor, D., & Smołka, J. (2021). Model of the text classification system using fuzzy sets. Journal of Computer Sciences Institute, 19, 144–150. https://doi.org/10.35784/jcsi.2634

Authors

Dmytro Salahor 
s97218@pollub.edu.pl
Politechnika Lubelska Poland

Authors

Jakub Smołka 

Politechnika Lubelska Poland

Statistics

Abstract views: 241
PDF downloads: 214