UNBALANCED MULTICLASS CLASSIFICATION WITH ADAPTIVE SYNTHETIC MULTINOMIAL NAIVE BAYES APPROACH
Fatkhurokhman Fauzi
fatkhurokhmanf@unimus.ac.idUniversitas Muhammadiyah Semarang, Department of Statistics (Indonesia)
https://orcid.org/0000-0002-8277-8638
. Ismatullah
Universitas Muhammadiyah Semarang, Department of Statistics (Indonesia)
http://orcid.org/0009-0005-7472-1761
Indah Manfaati Nur
Universitas Muhammadiyah Semarang, Department of Statistics (Indonesia)
http://orcid.org/0000-0002-1017-7323
Abstract
Opinions related to rising fuel prices need to be seen and analysed. Public opinion is closely related to public policy in Indonesia in the future. Twitter is one of the media that people use to convey their opinions. This study uses sentiment analysis to look at this phenomenon. Sentiment is divided into three categories: positive, neutral, and negative. The methods used in this research are Adaptive Synthetic Multinomial Naive Bayes, Adaptive Synthetic k-nearest neighbours, and Adaptive Synthetic Random Forest. The Adaptive Synthetic method is used to handle unbalanced data. The data used in this study are public arguments per province in Indonesia. The results obtained in this study are negative sentiments that dominate all provinces in Indonesia. There is a relationship between negative sentiment and the level of education, internet use, and the human development index. Adaptive Synthetic Multinomial Naive Bayes performed better than other methods, with an accuracy of 0.882. The highest accuracy of the Adaptive Synthetic Multinomial Naive Bayes method is 0.990 in Papua Barat Province.
Keywords:
adaptive synthetic, classification, imbalance data, accuracyReferences
Ahuja R. et al.: The Impact of Features Extraction on the Sentiment Analysis. Procedia Computer Science 152, 2019, 341–348 [http://doi.org/10.1016/j.procs.2019.05.008].
DOI: https://doi.org/10.1016/j.procs.2019.05.008
Google Scholar
Ali H. et al.: Deep Learning-Based Election Results Prediction Using Twitter Activity. Soft Computing 26(16), 2022, 7535–43 [http://doi.org/10.1007/s00500-021-06569-5].
DOI: https://doi.org/10.1007/s00500-021-06569-5
Google Scholar
Amity U. et al.: Abstract Proceedings of International Conference on Automation, Computational and Technology Management (ICACTM-2019), 2019.
Google Scholar
Andrian R. et al.: K-Nearest Neighbor (k-NN) Classification for Recognition of the Batik Lampung Motifs. Journal of Physics: Conference Series 1338(1), 2019 [http://doi.org/10.1088/1742-6596/1338/1/012061].
DOI: https://doi.org/10.1088/1742-6596/1338/1/012061
Google Scholar
Asian J. et al.: Sentiment Analysis for the Brazilian Anesthesiologist Using Multi-Layer Perceptron Classifier and Random Forest Methods. Journal Online Informatika 7(1), 2022, 132 [http://doi.org/10.15575/join.v7i1.900].
DOI: https://doi.org/10.15575/join.v7i1.900
Google Scholar
Balaram A., Vasundra S.: Prediction of Software Fault-Prone Classes Using Ensemble Random Forest with Adaptive Synthetic Sampling Algorithm. Automated Software Engineering 29(1), 2021, 6 [http://doi.org/10.1007/s10515-021-00311-z].
DOI: https://doi.org/10.1007/s10515-021-00311-z
Google Scholar
Budiawan Zulfikar W. et al.: Sentiment Analysis on Social Media Against Public Policy Using Multinomial Naive Bayes. Scientific Journal of Informatics 10(1), 2023 [http://doi.org/10.15294/sji.v10i1.39952].
DOI: https://doi.org/10.15294/sji.v10i1.39952
Google Scholar
Bustillos A. et al.: Approaching Dehumanizing Interactions: Joint Consideration of Other-, Meta-, and Self-Dehumanization. Current Opinion in Behavioral Sciences 49, 2023, 101233 [http://doi.org/10.1016/j.cobeha.2022.101233].
DOI: https://doi.org/10.1016/j.cobeha.2022.101233
Google Scholar
Eberwein T.: ‘Trolls’ or ‘Warriors of Faith’?: Differentiating Dysfunctional Forms of Media Criticism in Online Comments. Journal of Information, Communication and Ethics in Society 18(1), 2020, 131–143 [http://doi.org/10.1108/JICES-08-2019-0090].
DOI: https://doi.org/10.1108/JICES-08-2019-0090
Google Scholar
Farisi A. A. et al.: Sentiment Analysis on Hotel Reviews Using Multinomial Naive Bayes Classifier. Journal of Physics: Conference Series 1192(1), 2019 [http://doi.org/10.1088/1742-6596/1192/1/012024].
DOI: https://doi.org/10.1088/1742-6596/1192/1/012024
Google Scholar
Gazali Mahmud F. et al.: Implementation Of K-Nearest Neighbor Algorithm With SMOTE For Hotel Reviews Sentiment Analysis. Sinkron: Jurnal Dan Penelitian Teknik Informatika 8(2), 2023, 595–602 [http://doi.org/10.33395/sinkron.v8i2.12214].
DOI: https://doi.org/10.33395/sinkron.v8i2.12214
Google Scholar
Ghosh D., Cabrera J.: Enriched Random Forest for High Dimensional Genomic Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 19(5), 2022, 2817–2828 [http://doi.org/10.1109/TCBB.2021.3089417].
DOI: https://doi.org/10.1109/TCBB.2021.3089417
Google Scholar
Hasdyna N. et al.: Improving the Performance of K-Nearest Neighbor Algorithm by Reducing the Attributes of Dataset Using Gain Ratio. Journal of Physics: Conference Series 1566(1), 2020 [http://doi.org/10.1088/1742-6596/1566/1/012090].
DOI: https://doi.org/10.1088/1742-6596/1566/1/012090
Google Scholar
He H. et al.: ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, 1322–1328 [http://doi.org/10.1109/IJCNN.2008.4633969].
DOI: https://doi.org/10.1109/IJCNN.2008.4633969
Google Scholar
Herhianto A.: Sentiment Analysis Menggunakan Naive Bayes Classifier (Nbc) Pada Tweet Tentang Zakat. 2020.
Google Scholar
Hossain E. et al.: Sentiment Polarity Detection on Bengali Book Reviews Using Multinomial Naive Bayes. Progress in Advanced Computing and Intelligent Engineering (ed.Chhabi Rani Panigrahi et al.), Springer Singapore, 2021, 281–292.
DOI: https://doi.org/10.1007/978-981-33-4299-6_23
Google Scholar
Hu Z. et al.: A Novel Wireless Network Intrusion Detection Method Based on Adaptive Synthetic Sampling and an Improved Convolutional Neural Network. IEEE Access 8, 2020, 195741–195751 [http://doi.org/10.1109/ACCESS.2020.3034015].
DOI: https://doi.org/10.1109/ACCESS.2020.3034015
Google Scholar
Jalilifard A. et al.: Semantic Sensitive TF-IDF to Determine Word Relevance in Documents, 2020 [http://doi.org/10.1007/978-981-33-6977-1].
DOI: https://doi.org/10.1007/978-981-33-6987-0_27
Google Scholar
Jiang C. et al.: Benchmarking State-of-the-Art Imbalanced Data Learning Approaches for Credit Scoring. Expert Systems with Applications 213, 2023, 118878 [http://doi.org/10.1016/j.eswa.2022.118878].
DOI: https://doi.org/10.1016/j.eswa.2022.118878
Google Scholar
Koh J. E. W. et al: Automated Classification of Attention Deficit Hyperactivity Disorder and Conduct Disorder Using Entropy Features with ECG Signals. Computers in Biology and Medicine 140, 2022, 105120 [http://doi.org/10.1016/j.compbiomed.2021.105120].
DOI: https://doi.org/10.1016/j.compbiomed.2021.105120
Google Scholar
Kurniasih A., Lindung P. M.: On the Role of Text Preprocessing in BERT Embedding-Based DNNs for Classifying Informal Texts. International Journal of Advanced Computer Science and Applications 13(6), 2022, 927–934 [http://doi.org/10.14569/IJACSA.2022.01306109].
DOI: https://doi.org/10.14569/IJACSA.2022.01306109
Google Scholar
Kurniawati Y. E. et al.: Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-KNN (ADASYN-KNN) for Multiclass Imbalance Learning on Laboratory Test Data. 2018 4th International Conference on Science and Technology (ICST), 2018, 1–6 [http://doi.org/10.1109/ICSTC.2018.8528679].
DOI: https://doi.org/10.1109/ICSTC.2018.8528679
Google Scholar
Leelawat N. et al.: Twitter Data Sentiment Analysis of Tourism in Thailand during the COVID-19 Pandemic Using Machine Learning. Heliyon 8(10), 2022, e10894 [http://doi.org/10.1016/j.heliyon.2022.e10894].
DOI: https://doi.org/10.1016/j.heliyon.2022.e10894
Google Scholar
Liu J. et al.: A Fast Network Intrusion Detection System Using Adaptive Synthetic Oversampling and LightGBM. Computers & Security 106, 2021, 102289 [http://doi.org/10.1016/j.cose.2021.102289].
DOI: https://doi.org/10.1016/j.cose.2021.102289
Google Scholar
Liu Y., Wu H.: Prediction of Road Traffic Congestion Based on Random Forest. 2017 10th International Symposium on Computational Intelligence and Design (ISCID) 2, 2017, 361–364 [http://doi.org/10.1109/ISCID.2017.216].
DOI: https://doi.org/10.1109/ISCID.2017.216
Google Scholar
Lytvyn V. et al.: Identifying Textual Content Based on Thematic Analysis of Similar Texts in Big Data. 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT) 2, 2019, 84–91 [http://doi.org/10.1109/STC-CSIT.2019.8929808].
DOI: https://doi.org/10.1109/STC-CSIT.2019.8929808
Google Scholar
Mayo M.: A General Approach to Preprocessing Text Data, 2017.
Google Scholar
Moosavian A. et al.: Comparison of Two Classifiers; K-Nearest Neighbor and Artificial Neural Network, for Fault Diagnosis on a Main Engine Journal-Bearing. Shock and Vibration 20(2), 2013, 263–272 [http://doi.org/10.3233/SAV-2012-00742].
DOI: https://doi.org/10.1155/2013/360236
Google Scholar
Nadhifah D. et al.: Analysis of the Impact of the Increase in Fuel Oil (BBM) on Household Economic Activities. Journal of Contemporary Gender and Child Studies (JCGCS) 1(1), 2022 [https://zia-research.com/index.php/jcgcs].
DOI: https://doi.org/10.61253/jcgcs.v1i1.54
Google Scholar
Nazrul Syed S.: Multinomial Naive Bayes Classifier for Text Analysis (Python). Towards Data Science, 2018.
Google Scholar
Patel A. et al.: Sentiment Analysis of Customer Feedback and Reviews for Airline Services Using Language Representation Model. Procedia Computer Science 218, 2023, 2459–2467 [http://doi.org/10.1016/j.procs.2023.01.221].
DOI: https://doi.org/10.1016/j.procs.2023.01.221
Google Scholar
Rahman R. et al.: Sentiment Analysis on Bengali Movie Reviews Using Multinomial Naive Bayes. 2021 24th International Conference on Computer and Information Technology (ICCIT), 2021, 1–6 [http://doi.org/10.1109/ICCIT54785.2021.9689787].
DOI: https://doi.org/10.1109/ICCIT54785.2021.9689787
Google Scholar
Rennie J. D. M. et al.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers, 2003.
Google Scholar
Ridho Lubis A. et al.: The Effect of the TF-IDF Algorithm in Times Series in Forecasting Word on Social Media. Indonesian Journal of Electrical Engineering and Computer Science 22(2), 2021, 976 [http://doi.org/10.11591/ijeecs.v22.i2.pp976-984].
DOI: https://doi.org/10.11591/ijeecs.v22.i2.pp976-984
Google Scholar
Sahib N. G. et al.: Sentiment Analysis of Social Media Comments in Mauritius. IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), 2023, 860–865 [http://doi.org/10.1109/CCWC57344.2023.10099291].
DOI: https://doi.org/10.1109/CCWC57344.2023.10099291
Google Scholar
Salauddin Khan M. et al.: Comparison of Multiclass Classification Techniques Using Dry Bean Dataset. International Journal of Cognitive Computing in Engineering 4, 2023, 6–20 [http://doi.org/10.1016/j.ijcce.2023.01.002].
DOI: https://doi.org/10.1016/j.ijcce.2023.01.002
Google Scholar
Solikah M., Dian N.: The Effectiveness of the Guided Inquiries Learning Model on the Critical Thinking Ability of Students. Jurnal Pijar Mipa 17(2), 2022, 184–191 [http://doi.org/10.29303/jpm.v17i2.3276].
DOI: https://doi.org/10.29303/jpm.v17i2.3276
Google Scholar
Surya P. P. et al.: Analysis of User Emotions and Opinion Using Multinomial Naive Bayes Classifier. 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2019, 410–415 [http://doi.org/10.1109/ICECA.2019.8822096].
DOI: https://doi.org/10.1109/ICECA.2019.8822096
Google Scholar
Yang J. et al.: Delineation of Urban Growth Boundaries Using a Patch-Based Cellular Automata Model under Multiple Spatial and Socio-Economic Scenarios. Sustainability (Switzerland) 11(21), 2019 [http://doi.org/10.3390/su11216159].
DOI: https://doi.org/10.3390/su11216159
Google Scholar
Yu B. et al.: Classification Method for Failure Modes of RC Columns Based on Class-Imbalanced Datasets. Structures 48, 2023, 694–705 [http://doi.org/10.1016/j.istruc.2022.12.063].
DOI: https://doi.org/10.1016/j.istruc.2022.12.063
Google Scholar
Zamsuri A. et al.: Classification of Multiple Emotions in Indonesian Text Using The K-Nearest Neighbor Method. Journal of Applied Engineering and Technological Science (JAETS) 4(2), 2023, 1012–1021 [http://doi.org/10.37385/jaets.v4i2.1964].
DOI: https://doi.org/10.37385/jaets.v4i2.1964
Google Scholar
Zhai J. et al.: Binary Imbalanced Data Classification Based on Diversity Oversampling by Generative Models. Information Sciences 585, 2022, 313–43 [http://doi.org/10.1016/j.ins.2021.11.058].
DOI: https://doi.org/10.1016/j.ins.2021.11.058
Google Scholar
Authors
Fatkhurokhman Fauzifatkhurokhmanf@unimus.ac.id
Universitas Muhammadiyah Semarang, Department of Statistics Indonesia
https://orcid.org/0000-0002-8277-8638
Authors
. IsmatullahUniversitas Muhammadiyah Semarang, Department of Statistics Indonesia
http://orcid.org/0009-0005-7472-1761
Authors
Indah Manfaati NurUniversitas Muhammadiyah Semarang, Department of Statistics Indonesia
http://orcid.org/0000-0002-1017-7323
Statistics
Abstract views: 176PDF downloads: 116
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.