Model of the text classification system using fuzzy sets


Abstract

Classification of work’s subject area by keywords is an actual and important task. This article describes algorithms for classifying keywords by subject area. A model was developed using both algorithms and tested on test data. The results were compared with the results of other existing algorithms suitable for this tasks. The obtained results of the model were analysed. This algorithm can be used in real-life tasks.


Keywords

text classification; “fuzzy” sets, classification, fuzzy rules, fuzzy logic

L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and regression trees, Wadsworth & Brooks, Pacific Grove, 1984.

G. V. Kass, An exploratory technique for investigating large quantities of categorical data, Applied Statistics 29 (1980) 119–127.

E. B. Hunt, J. Marin, P. J. Stone, Experiments in induction, Academic, New York, 1966.

R. S. Michalski, J. G. Carbonell, T. M. Mitchell, Machine learning. An artificial intelligence approach (1983) 463–482.

J. R. Quinlan, Induction of decision trees, Machine Learning 1 (1986) 81–106.

B. Boser, I. Guyon, V. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of annual conference computational learning theory, ACM Press, Pittsburgh (1992) 144–152.

C. Cortes, V. Vapnik, Support vector networks, Machine Learning 20 (1995) 273–297.

J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Transactions on Information Theory 44 (1998) 1926–1940.

J. Shawe-Taylor, N. Cristianini, Margin distribution and soft margin, Advances in large margin classifiers, MIT Press, Cambridge (2000) 349–358.

T. Joachims, Text categorization with support vector machines: Learning with many relevant features, Proceedings of the European conference on machine learning, Springer, Berlin (1998) 137–142.

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review 65 (1958) 386–408.

J. Schmidhuber, Deep Learning in Neural Networks: An Overview, Neural Networks 61 (2015) 85–117.

S. E. Dreyfus, Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure, Journal of Guidance, Control, and Dynamics 13 (1990) 926–928.

E. Mizutani, S. E. Dreyfus, K. Nishio, On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application, IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium 2 (2000) 167–172.

M. A. Fauzi, Automatic Complaint Classification System Using Classifier Ensembles, Telfor Journal 10 (2018) 123-128.

D. Lewis, Naive Bayes at forty: the independence assumption in information retrieval, Proceedings of the 10th European Conference on Machine Learning, Springer, Berlin (1998) 4–15.

A. McCallum, K. Nigam, A comparison of event models for Naive Bayes text classification, AAAI-98 Workshop on Learning for Text Categorization, AAAI Press, California (1998) 41–48.

R. Lau, R. Rosenfeld, S. Roukos, Adaptive language modelling using the maximum entropy principle. Proceedings of the ARPA Human Language Technology Workshop, San Francisco (1993) 108–113.

A. L. Berger, S. A. Della Pietra, V. J. Della Pietra, A maximum entropy approach to natural language processing, Computational Linguistics 22 (1996) 39–71.

N. S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician 46 (1992) 175–185.

T. K. Ho, Random Decision Forests, Proceedings of the 3rd International Conference on Document Analysis and Recognition 14–16, Montreal (1995) 278–282.

T. K. Ho, The Random Subspace Method for Constructing Decision Forests, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832–844, http://dx.doi.org/10.1109/34.709601.

A. Ciapetti, R. Di Florio, L. Lomasto, G. Miscione, G. Ruggiero, D. Toti, NETHIC: A System for Automatic Text Classification using Neural Networks and Hierarchical Taxonomies, Proceedings of the 21st International Conference on Enterprise Information Systems 1 (2019) 296-306.

G. Krishnalal, S. Rengarajan, K. Srinivasagan, A New Text Mining Approach Based on HMM-SVM for Web News Classification, International Journal of Computer Applications 1 (2010) 98-104. DOI. 10.5120/395-589

L. E. Baum, T. Petrie, Statistical Inference for Probabilistic Functions of Finite State Markov Chains, The Annals of Mathematical Statistics 37 (2019) 1554–1563.

L. E. Baum, G. R. Sell, Growth transformations for functions on manifolds, Pacific Journal of Mathematics 27 (1968) 211–227.

M. I. Khaleel, I. I. Hmeidi, H. M. Najadat, An Automatic Text Classification System Based on Genetic Algorithm, Proceedings of the The 3rd Multidisciplinary International Social Networks Conference on Social Informatics 31 (2016) 1–7.

N. Medagoda, S. Shanmuganathan, Keywords based temporal sentiment analysis, 12th International Conference on Fuzzy Systems and Knowledge Discovery (2015) 1418-1425.

R. Batool, A. M. Khattak, J. Maqbool, S. Lee, Precise tweet classification and sentiment analysis, 12th International Conference on Computer and Information Science (2013) 461-466.

M. A. Gadamer, A. Horzyk, Semi-automatic contextual analysis and correction of texts by specialized linguistic graphs, AGH University of Science and Technology, 2019.

L. A. Zadeh, Fuzzy sets, Information and Control 8 (1965) 338-353.

P. Karczmarek, Selected problems of face recognition and decision-making theory, Wydawnictwo Politechniki Lubelskiej, 2018.

The website for Elsevier B.V., Open database, https://www.scopus.com, [01.04.2021].

M. Riedmiller, H. Braun, A direct adaptive method for faster backpropagation learning: the RPROP algorithm, Proceedings of the IEEE International Conference on Neural Networks 16 Piscataway (1993) 586-591.

J Platt. Fast Training of Support Vector Machines Using Sequential Minimal Optimization, Advances in Kernel Methods: Support Vector Learning (1999) 185-208.

S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy, Improvements to Platt's SMO Algorithm for SVM Classifier Design, Neural Computation 13 (2001) 637-649, http://dx.doi.org/10.1162/089976601300014493.

S. L. Salzberg. C4.5: Programs for Machine Learning by J. Ross Quinlan, Machine Learning 16, Morgan Kaufmann Publishers (1994) 235–240, http://dx.doi.org/10.1007/BF00993309.

J. Shafer, R. Agrawal, M. Mehta, SPRINT: A scalable parallel classifier for data mining, VLDB, 2000.

Download

Published : 2021-06-30


Salahor, D., & Smołka, J. (2021). Model of the text classification system using fuzzy sets. Journal of Computer Sciences Institute, 19, 144-150. https://doi.org/10.35784/jcsi.2634

Dmytro Salahor  s97218@pollub.edu.pl
  Poland
Jakub Smołka 
Politechnika Lubelska  Poland