UNBALANCED MULTICLASS CLASSIFICATION WITH ADAPTIVE SYNTHETIC MULTINOMIAL NAIVE BAYES APPROACH

Fatkhurokhman Fauzi

fatkhurokhmanf@unimus.ac.id
Universitas Muhammadiyah Semarang, Department of Statistics (Indonesia)
https://orcid.org/0000-0002-8277-8638

. Ismatullah


Universitas Muhammadiyah Semarang, Department of Statistics (Indonesia)
http://orcid.org/0009-0005-7472-1761

Indah Manfaati Nur


Universitas Muhammadiyah Semarang, Department of Statistics (Indonesia)
http://orcid.org/0000-0002-1017-7323

Abstract

Opinions related to rising fuel prices need to be seen and analysed. Public opinion is closely related to public policy in Indonesia in the future. Twitter is one of the media that people use to convey their opinions. This study uses sentiment analysis to look at this phenomenon. Sentiment is divided into three categories: positive, neutral, and negative. The methods used in this research are Adaptive Synthetic Multinomial Naive Bayes, Adaptive Synthetic k-nearest neighbours, and Adaptive Synthetic Random Forest. The Adaptive Synthetic method is used to handle unbalanced data. The data used in this study are public arguments per province in Indonesia. The results obtained in this study are negative sentiments that dominate all provinces in Indonesia. There is a relationship between negative sentiment and the level of education, internet use, and the human development index. Adaptive Synthetic Multinomial Naive Bayes performed better than other methods, with an accuracy of 0.882. The highest accuracy of the Adaptive Synthetic Multinomial Naive Bayes method is 0.990 in Papua Barat Province.


Keywords:

adaptive synthetic, classification, imbalance data, accuracy

Ahuja R. et al.: The Impact of Features Extraction on the Sentiment Analysis. Procedia Computer Science 152, 2019, 341–348 [http://doi.org/10.1016/j.procs.2019.05.008].
DOI: https://doi.org/10.1016/j.procs.2019.05.008   Google Scholar

Ali H. et al.: Deep Learning-Based Election Results Prediction Using Twitter Activity. Soft Computing 26(16), 2022, 7535–43 [http://doi.org/10.1007/s00500-021-06569-5].
DOI: https://doi.org/10.1007/s00500-021-06569-5   Google Scholar

Amity U. et al.: Abstract Proceedings of International Conference on Automation, Computational and Technology Management (ICACTM-2019), 2019.
  Google Scholar

Andrian R. et al.: K-Nearest Neighbor (k-NN) Classification for Recognition of the Batik Lampung Motifs. Journal of Physics: Conference Series 1338(1), 2019 [http://doi.org/10.1088/1742-6596/1338/1/012061].
DOI: https://doi.org/10.1088/1742-6596/1338/1/012061   Google Scholar

Asian J. et al.: Sentiment Analysis for the Brazilian Anesthesiologist Using Multi-Layer Perceptron Classifier and Random Forest Methods. Journal Online Informatika 7(1), 2022, 132 [http://doi.org/10.15575/join.v7i1.900].
DOI: https://doi.org/10.15575/join.v7i1.900   Google Scholar

Balaram A., Vasundra S.: Prediction of Software Fault-Prone Classes Using Ensemble Random Forest with Adaptive Synthetic Sampling Algorithm. Automated Software Engineering 29(1), 2021, 6 [http://doi.org/10.1007/s10515-021-00311-z].
DOI: https://doi.org/10.1007/s10515-021-00311-z   Google Scholar

Budiawan Zulfikar W. et al.: Sentiment Analysis on Social Media Against Public Policy Using Multinomial Naive Bayes. Scientific Journal of Informatics 10(1), 2023 [http://doi.org/10.15294/sji.v10i1.39952].
DOI: https://doi.org/10.15294/sji.v10i1.39952   Google Scholar

Bustillos A. et al.: Approaching Dehumanizing Interactions: Joint Consideration of Other-, Meta-, and Self-Dehumanization. Current Opinion in Behavioral Sciences 49, 2023, 101233 [http://doi.org/10.1016/j.cobeha.2022.101233].
DOI: https://doi.org/10.1016/j.cobeha.2022.101233   Google Scholar

Eberwein T.: ‘Trolls’ or ‘Warriors of Faith’?: Differentiating Dysfunctional Forms of Media Criticism in Online Comments. Journal of Information, Communication and Ethics in Society 18(1), 2020, 131–143 [http://doi.org/10.1108/JICES-08-2019-0090].
DOI: https://doi.org/10.1108/JICES-08-2019-0090   Google Scholar

Farisi A. A. et al.: Sentiment Analysis on Hotel Reviews Using Multinomial Naive Bayes Classifier. Journal of Physics: Conference Series 1192(1), 2019 [http://doi.org/10.1088/1742-6596/1192/1/012024].
DOI: https://doi.org/10.1088/1742-6596/1192/1/012024   Google Scholar

Gazali Mahmud F. et al.: Implementation Of K-Nearest Neighbor Algorithm With SMOTE For Hotel Reviews Sentiment Analysis. Sinkron: Jurnal Dan Penelitian Teknik Informatika 8(2), 2023, 595–602 [http://doi.org/10.33395/sinkron.v8i2.12214].
DOI: https://doi.org/10.33395/sinkron.v8i2.12214   Google Scholar

Ghosh D., Cabrera J.: Enriched Random Forest for High Dimensional Genomic Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 19(5), 2022, 2817–2828 [http://doi.org/10.1109/TCBB.2021.3089417].
DOI: https://doi.org/10.1109/TCBB.2021.3089417   Google Scholar

Hasdyna N. et al.: Improving the Performance of K-Nearest Neighbor Algorithm by Reducing the Attributes of Dataset Using Gain Ratio. Journal of Physics: Conference Series 1566(1), 2020 [http://doi.org/10.1088/1742-6596/1566/1/012090].
DOI: https://doi.org/10.1088/1742-6596/1566/1/012090   Google Scholar

He H. et al.: ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, 1322–1328 [http://doi.org/10.1109/IJCNN.2008.4633969].
DOI: https://doi.org/10.1109/IJCNN.2008.4633969   Google Scholar

Herhianto A.: Sentiment Analysis Menggunakan Naive Bayes Classifier (Nbc) Pada Tweet Tentang Zakat. 2020.
  Google Scholar

Hossain E. et al.: Sentiment Polarity Detection on Bengali Book Reviews Using Multinomial Naive Bayes. Progress in Advanced Computing and Intelligent Engineering (ed.Chhabi Rani Panigrahi et al.), Springer Singapore, 2021, 281–292.
DOI: https://doi.org/10.1007/978-981-33-4299-6_23   Google Scholar

Hu Z. et al.: A Novel Wireless Network Intrusion Detection Method Based on Adaptive Synthetic Sampling and an Improved Convolutional Neural Network. IEEE Access 8, 2020, 195741–195751 [http://doi.org/10.1109/ACCESS.2020.3034015].
DOI: https://doi.org/10.1109/ACCESS.2020.3034015   Google Scholar

Jalilifard A. et al.: Semantic Sensitive TF-IDF to Determine Word Relevance in Documents, 2020 [http://doi.org/10.1007/978-981-33-6977-1].
DOI: https://doi.org/10.1007/978-981-33-6987-0_27   Google Scholar

Jiang C. et al.: Benchmarking State-of-the-Art Imbalanced Data Learning Approaches for Credit Scoring. Expert Systems with Applications 213, 2023, 118878 [http://doi.org/10.1016/j.eswa.2022.118878].
DOI: https://doi.org/10.1016/j.eswa.2022.118878   Google Scholar

Koh J. E. W. et al: Automated Classification of Attention Deficit Hyperactivity Disorder and Conduct Disorder Using Entropy Features with ECG Signals. Computers in Biology and Medicine 140, 2022, 105120 [http://doi.org/10.1016/j.compbiomed.2021.105120].
DOI: https://doi.org/10.1016/j.compbiomed.2021.105120   Google Scholar

Kurniasih A., Lindung P. M.: On the Role of Text Preprocessing in BERT Embedding-Based DNNs for Classifying Informal Texts. International Journal of Advanced Computer Science and Applications 13(6), 2022, 927–934 [http://doi.org/10.14569/IJACSA.2022.01306109].
DOI: https://doi.org/10.14569/IJACSA.2022.01306109   Google Scholar

Kurniawati Y. E. et al.: Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-KNN (ADASYN-KNN) for Multiclass Imbalance Learning on Laboratory Test Data. 2018 4th International Conference on Science and Technology (ICST), 2018, 1–6 [http://doi.org/10.1109/ICSTC.2018.8528679].
DOI: https://doi.org/10.1109/ICSTC.2018.8528679   Google Scholar

Leelawat N. et al.: Twitter Data Sentiment Analysis of Tourism in Thailand during the COVID-19 Pandemic Using Machine Learning. Heliyon 8(10), 2022, e10894 [http://doi.org/10.1016/j.heliyon.2022.e10894].
DOI: https://doi.org/10.1016/j.heliyon.2022.e10894   Google Scholar

Liu J. et al.: A Fast Network Intrusion Detection System Using Adaptive Synthetic Oversampling and LightGBM. Computers & Security 106, 2021, 102289 [http://doi.org/10.1016/j.cose.2021.102289].
DOI: https://doi.org/10.1016/j.cose.2021.102289   Google Scholar

Liu Y., Wu H.: Prediction of Road Traffic Congestion Based on Random Forest. 2017 10th International Symposium on Computational Intelligence and Design (ISCID) 2, 2017, 361–364 [http://doi.org/10.1109/ISCID.2017.216].
DOI: https://doi.org/10.1109/ISCID.2017.216   Google Scholar

Lytvyn V. et al.: Identifying Textual Content Based on Thematic Analysis of Similar Texts in Big Data. 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT) 2, 2019, 84–91 [http://doi.org/10.1109/STC-CSIT.2019.8929808].
DOI: https://doi.org/10.1109/STC-CSIT.2019.8929808   Google Scholar

Mayo M.: A General Approach to Preprocessing Text Data, 2017.
  Google Scholar

Moosavian A. et al.: Comparison of Two Classifiers; K-Nearest Neighbor and Artificial Neural Network, for Fault Diagnosis on a Main Engine Journal-Bearing. Shock and Vibration 20(2), 2013, 263–272 [http://doi.org/10.3233/SAV-2012-00742].
DOI: https://doi.org/10.1155/2013/360236   Google Scholar

Nadhifah D. et al.: Analysis of the Impact of the Increase in Fuel Oil (BBM) on Household Economic Activities. Journal of Contemporary Gender and Child Studies (JCGCS) 1(1), 2022 [https://zia-research.com/index.php/jcgcs].
DOI: https://doi.org/10.61253/jcgcs.v1i1.54   Google Scholar

Nazrul Syed S.: Multinomial Naive Bayes Classifier for Text Analysis (Python). Towards Data Science, 2018.
  Google Scholar

Patel A. et al.: Sentiment Analysis of Customer Feedback and Reviews for Airline Services Using Language Representation Model. Procedia Computer Science 218, 2023, 2459–2467 [http://doi.org/10.1016/j.procs.2023.01.221].
DOI: https://doi.org/10.1016/j.procs.2023.01.221   Google Scholar

Rahman R. et al.: Sentiment Analysis on Bengali Movie Reviews Using Multinomial Naive Bayes. 2021 24th International Conference on Computer and Information Technology (ICCIT), 2021, 1–6 [http://doi.org/10.1109/ICCIT54785.2021.9689787].
DOI: https://doi.org/10.1109/ICCIT54785.2021.9689787   Google Scholar

Rennie J. D. M. et al.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers, 2003.
  Google Scholar

Ridho Lubis A. et al.: The Effect of the TF-IDF Algorithm in Times Series in Forecasting Word on Social Media. Indonesian Journal of Electrical Engineering and Computer Science 22(2), 2021, 976 [http://doi.org/10.11591/ijeecs.v22.i2.pp976-984].
DOI: https://doi.org/10.11591/ijeecs.v22.i2.pp976-984   Google Scholar

Sahib N. G. et al.: Sentiment Analysis of Social Media Comments in Mauritius. IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), 2023, 860–865 [http://doi.org/10.1109/CCWC57344.2023.10099291].
DOI: https://doi.org/10.1109/CCWC57344.2023.10099291   Google Scholar

Salauddin Khan M. et al.: Comparison of Multiclass Classification Techniques Using Dry Bean Dataset. International Journal of Cognitive Computing in Engineering 4, 2023, 6–20 [http://doi.org/10.1016/j.ijcce.2023.01.002].
DOI: https://doi.org/10.1016/j.ijcce.2023.01.002   Google Scholar

Solikah M., Dian N.: The Effectiveness of the Guided Inquiries Learning Model on the Critical Thinking Ability of Students. Jurnal Pijar Mipa 17(2), 2022, 184–191 [http://doi.org/10.29303/jpm.v17i2.3276].
DOI: https://doi.org/10.29303/jpm.v17i2.3276   Google Scholar

Surya P. P. et al.: Analysis of User Emotions and Opinion Using Multinomial Naive Bayes Classifier. 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2019, 410–415 [http://doi.org/10.1109/ICECA.2019.8822096].
DOI: https://doi.org/10.1109/ICECA.2019.8822096   Google Scholar

Yang J. et al.: Delineation of Urban Growth Boundaries Using a Patch-Based Cellular Automata Model under Multiple Spatial and Socio-Economic Scenarios. Sustainability (Switzerland) 11(21), 2019 [http://doi.org/10.3390/su11216159].
DOI: https://doi.org/10.3390/su11216159   Google Scholar

Yu B. et al.: Classification Method for Failure Modes of RC Columns Based on Class-Imbalanced Datasets. Structures 48, 2023, 694–705 [http://doi.org/10.1016/j.istruc.2022.12.063].
DOI: https://doi.org/10.1016/j.istruc.2022.12.063   Google Scholar

Zamsuri A. et al.: Classification of Multiple Emotions in Indonesian Text Using The K-Nearest Neighbor Method. Journal of Applied Engineering and Technological Science (JAETS) 4(2), 2023, 1012–1021 [http://doi.org/10.37385/jaets.v4i2.1964].
DOI: https://doi.org/10.37385/jaets.v4i2.1964   Google Scholar

Zhai J. et al.: Binary Imbalanced Data Classification Based on Diversity Oversampling by Generative Models. Information Sciences 585, 2022, 313–43 [http://doi.org/10.1016/j.ins.2021.11.058].
DOI: https://doi.org/10.1016/j.ins.2021.11.058   Google Scholar

Download


Published
2023-09-30

Cited by

Fauzi, F., Ismatullah, ., & Manfaati Nur, I. (2023). UNBALANCED MULTICLASS CLASSIFICATION WITH ADAPTIVE SYNTHETIC MULTINOMIAL NAIVE BAYES APPROACH. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 13(3), 64–70. https://doi.org/10.35784/iapgos.3740

Authors

Fatkhurokhman Fauzi 
fatkhurokhmanf@unimus.ac.id
Universitas Muhammadiyah Semarang, Department of Statistics Indonesia
https://orcid.org/0000-0002-8277-8638

Authors

. Ismatullah 

Universitas Muhammadiyah Semarang, Department of Statistics Indonesia
http://orcid.org/0009-0005-7472-1761

Authors

Indah Manfaati Nur 

Universitas Muhammadiyah Semarang, Department of Statistics Indonesia
http://orcid.org/0000-0002-1017-7323

Statistics

Abstract views: 99
PDF downloads: 67