A COMPARATIVE STUDY ON PERFORMANCE OF BASIC AND ENSEMBLE CLASSIFIERS WITH VARIOUS DATASETS
Archana Gunakala
archu.gunakala@gmail.comResearch Scholar (India)
https://orcid.org/0000-0002-3375-1893
Afzal Hussain Shahid
Assistant Professor, Senior (Grade-I) (India)
https://orcid.org/0009-0001-9815-108X
Abstract
Classification plays a critical role in machine learning (ML) systems for processing images, text and high -dimensional data. Predicting class labels from training data is the primary goal of classification. An optimal model for a particular classification problem is chosen on the basis of the model's performance and execution time. This paper compares and analyses the performance of basic as well as ensemble classifiers utilizing 10 -fold cross validation and also discusses their essential concepts, advantages, and disadvantages. In this study five basic classifiers namely Naïve Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) and the ensemble of all the five classifiers along with few more combinations are compared with five University of California Irvine (UCI) ML Repository datasets and a Diabetes Health Indicators dataset from kaggle repository. To analyze and compare the performance of classifiers, evaluation metrics like Accuracy, Recall, Precision, Area Under Curve (AUC) and F-Score are used. Experimental results showed that SVM performs best on two out of the six datasets (Diabetes Health Indicators and waveform), RF performs best for Arrhythmia, Sonar, Tic-tac-toe datasets, and the best ensemble combination is found to be DT+SVM+RF on Ionosphere dataset having respective accuracies 72.58%, 90.38%, 81.63%, 73.59%, 94.78% and 94.01% and the proposed ensemble combinations outperformed over the conventional models for few datasets.
Keywords:
Classification, Naïve Bayes, Neural Network, Support Vector Machine, Decision Tree, Ensemble Learning, Random ForestReferences
Alshayeji, M. H., Ellethy, H., Abed, S., & Gupta, R. (2022). Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach. Biomedical Signal Processing and Control, 71(PA), 103141. https://doi.org/10.1016/j.bspc.2021.103141
DOI: https://doi.org/10.1016/j.bspc.2021.103141
Google Scholar
Alshdaifat, E., Al-hassan, M., & Aloqaily, A. (2021). Effective heterogeneous ensemble classification: An alternative approach for selecting base classifiers. ICT Express, 7(3), 342–349. https://doi.org/10.1016/j.icte.2020.11.005
DOI: https://doi.org/10.1016/j.icte.2020.11.005
Google Scholar
Baumann, P., Hochbaum, D. S., & Yang, Y. T. (2019). A comparative study of the leading machine learning techniques and two new optimization algorithms. European Journal of Operational Research, 272(3), 1041–1057. https://doi.org/10.1016/j.ejor.2018.07.009
DOI: https://doi.org/10.1016/j.ejor.2018.07.009
Google Scholar
bin Basir, M. A., & binti Ahmad, F. (2017). New Feature Selection Model Based Ensemble Rule Classifiers Method for Dataset Classification. International Journal of Artificial Intelligence & Applications, 8(2), 37–43. https://doi.org/10.5121/ijaia.2017.8204
DOI: https://doi.org/10.5121/ijaia.2017.8204
Google Scholar
Chandrika, Divya, C., Gowramma, G. S., & Varun, C. R. (2018). A comparative analysis on evaluation of classification algorithms based on ionospheric data. International Journal of Computer Sciences and Engineering, 6(5), 636–640. https://doi.org/10.26438/ijcse/v6i5.636640
DOI: https://doi.org/10.26438/ijcse/v6i5.636640
Google Scholar
Consuegra-Ayala, J. P., Gutiérrez, Y., Almeida-Cruz, Y., & Palomar, M. (2022). Intelligent ensembling of autoML system outputs for solving classification problems. Information Sciences, 609, 766–780. https://doi.org/10.1016/j.ins.2022.07.061
DOI: https://doi.org/10.1016/j.ins.2022.07.061
Google Scholar
Ecemis, C., Acu, N., & Sari, Z. (2022). Classification of Imbalanced Cardiac Arrhythmia Data. European Journal of Science and Technology, 34, 546-552. https://doi.org/10.31590/ejosat.1083423
DOI: https://doi.org/10.31590/ejosat.1083423
Google Scholar
Fang, X., Klawohn, J., De Sabatino, A., Kundnani, H., Ryan, J., Yu, W., & Hajcak, G. (2022). Accurate classification of depression through optimized machine learning models on high-dimensional noisy data. Biomedical Signal Processing and Control, 71(Part B), 103237. https://doi.org/10.1016/j.bspc.2021.103237
DOI: https://doi.org/10.1016/j.bspc.2021.103237
Google Scholar
Farhat, N. H. (1992). Photonit neural networks and learning mathines the role of electron-trapping materials. IEEE Expert-Intelligent Systems and Their Applications, 7(5), 63–72. https://doi.org/10.1109/64.163674
DOI: https://doi.org/10.1109/64.163674
Google Scholar
Fath, A. H., Madanifar, F., & Abbasi, M. (2020). Implementation of multilayer perceptron (MLP) and radial basis function (RBF) neural networks to predict solution gas-oil ratio of crude oil systems. Petroleum, 6(1), 80–91. https://doi.org/10.1016/j.petlm.2018.12.002
DOI: https://doi.org/10.1016/j.petlm.2018.12.002
Google Scholar
Ganie, S. M., & Malik, M. B. (2022). An Ensemble Machine Learning Approach for Predicting Type-II Diabetes Mellitus based on Lifestyle Indicators. Healthcare Analytics, 2, 100092. https://doi.org/10.1016/j.health.2022.100092
DOI: https://doi.org/10.1016/j.health.2022.100092
Google Scholar
Gupta, V., Srinivasan, S., & Kudli, S. S. (2014). Prediction and Classification of Cardiac Arrhythmia. https://cs229.stanford.edu/proj2014/Vasu%20Gupta,%20Sharan%20Srinivasan,%20Sneha%20Kudli,%20Prediction%20and%20Classification%20of%20Cardiac%20Arrhythmia.pdf
Google Scholar
Hongle, D., Yan, Z., Lin, Z., Yeh-Cheng, C., Gang, K., & Chen, Y.-C. (2022). Selective Ensemble Learning Algorithm for Imbalanced Dataset. Preprint. https://doi.org/10.21203/rs.3.rs-721493/v1
DOI: https://doi.org/10.21203/rs.3.rs-721493/v1
Google Scholar
Jia, J., & Qiu, W. (2020). Research on an ensemble classification algorithm based on differential privacy. IEEE Access, 8, 93499–93513. https://doi.org/10.1109/ACCESS.2020.2995058
DOI: https://doi.org/10.1109/ACCESS.2020.2995058
Google Scholar
Kilincer, I. F., Ertam, F., & Sengur, A. (2021). Machine learning methods for cyber security intrusion detection: Datasets and comparative study. Computer Networks, 188, 107840. https://doi.org/10.1016/j.comnet.2021.107840
DOI: https://doi.org/10.1016/j.comnet.2021.107840
Google Scholar
Kushwah, J. S., Kumar, A., Patel, S., Soni, R., Gawande, A., & Gupta, S. (2021). Comparative study of regressor and classifier with decision tree using modern tools. Materials Today: Proceedings, 56(6), 3571-3576. https://doi.org/10.1016/j.matpr.2021.11.635
DOI: https://doi.org/10.1016/j.matpr.2021.11.635
Google Scholar
Ma, T. M., Yamamori, K., & Thida, A. (2020). A comparative approach to naïve bayes classifier and support vector machine for email spam classification. 2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 (pp. 324–326). IEEE. https://doi.org/10.1109/GCCE50665.2020.9291921
DOI: https://doi.org/10.1109/GCCE50665.2020.9291921
Google Scholar
Maniruzzaman, M., Jahanur Rahman, M., Ahammed, B., Abedin, M. M., Suri, H. S., Biswas, M., El-Baz, A., Bangeas, P., Tsoulfas, G., & Suri, J. S. (2019). Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Computer Methods and Programs in Biomedicine, 176, 173–193. https://doi.org/10.1016/j.cmpb.2019.04.008
DOI: https://doi.org/10.1016/j.cmpb.2019.04.008
Google Scholar
Mohamed, A. R. (2017). Comparative Study of Four Supervised Machine Learning Techniques for Classification. International Journal of Applied Science and Technology, 7(2), 5–18.
Google Scholar
Nazari, E., Aghemiri, M., Avan, A., Mehrabian, A., & Tabesh, H. (2021). Machine learning approaches for classification of colorectal cancer with and without feature selection method on microarray data. Gene Reports, 25, 101419. https://doi.org/10.1016/j.genrep.2021.101419
DOI: https://doi.org/10.1016/j.genrep.2021.101419
Google Scholar
Ngo, G., Beard, R., & Chandra, R. (2022). Evolutionary bagging for ensemble learning. Neurocomputing, 510, 1-14. https://doi.org/10.1016/j.neucom.2022.08.055
DOI: https://doi.org/10.1016/j.neucom.2022.08.055
Google Scholar
Patel, H. H., & Prajapati, P. (2018). Study and analysis of decision tree based classification algorithms. International Journal of Computer Sciences and Engineering, 6(10), 74–78. https://doi.org/10.26438/ijcse/v6i10.7478
DOI: https://doi.org/10.26438/ijcse/v6i10.7478
Google Scholar
Patel, N., & Upadhyay, S. (2012). Study of various decision tree pruning methods with their empirical comparison in WEKA. International Journal of Computer Applications, 60(12), 20–25. https://doi.org/10.5120/9744-4304
DOI: https://doi.org/10.5120/9744-4304
Google Scholar
Priyanka, & Kumar, D. (2020). Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269. https://doi.org/10.1504/ijids.2020.108141
DOI: https://doi.org/10.1504/IJIDS.2020.108141
Google Scholar
Pugliese, R., Regondi, S., & Marini, R. (2021). Machine learning-based approach: global trends, research directions, and regulatory standpoints. Data Science and Management, 4, 19–29. https://doi.org/10.1016/j.dsm.2021.12.002
DOI: https://doi.org/10.1016/j.dsm.2021.12.002
Google Scholar
Punyapornwithaya, V., Klaharn, K., Arjkumpa, O., & Sansamur, C. (2022). Exploring the predictive capability of machine learning models in identifying foot and mouth disease outbreak occurrences in cattle farms in an endemic setting of Thailand. Preventive Veterinary Medicine, 207, 105706. https://doi.org/10.1016/J.PREVETMED.2022.105706
DOI: https://doi.org/10.1016/j.prevetmed.2022.105706
Google Scholar
Qian, X., Zhou, Z., Hu, J., Zhu, J., Huang, H., & Dai, Y. (2021). A comparative study of kernel-based vector machines with probabilistic outputs for medical diagnosis. Biocybernetics and Biomedical Engineering, 41(4), 1486–1504. https://doi.org/10.1016/j.bbe.2021.09.003
DOI: https://doi.org/10.1016/j.bbe.2021.09.003
Google Scholar
Revathi, A., Kaladevi, R., Ramana, K., Jhaveri, R. H., Kumar, M. R., & Kumar, M. S. P. (2022). Early detection of cognitive decline using machine learning algorithm and cognitive ability test. Security and Communication Networks, 2022, 4190023. https://doi.org/10.1155/2022/4190023
DOI: https://doi.org/10.1155/2022/4190023
Google Scholar
Rezvani, S., & Wang, X. (2022). Neurocomputing intuitionistic fuzzy twin support vector machines for imbalanced data. Neurocomputing, 507, 16–25. https://doi.org/10.1016/j.neucom.2022.07.083
DOI: https://doi.org/10.1016/j.neucom.2022.07.083
Google Scholar
Sevinç, E. (2022). An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Computers and Industrial Engineering, 165, 107912. https://doi.org/10.1016/j.cie.2021.107912
DOI: https://doi.org/10.1016/j.cie.2021.107912
Google Scholar
Shafi, A. S. M., Molla, M. M. I., Jui, J. J., & Rahman, M. M. (2020). Detection of colon cancer based on microarray dataset using machine learning as a feature selection and classification techniques. SN Applied Sciences, 2(7), 1–8. https://doi.org/10.1007/s42452-020-3051-2
DOI: https://doi.org/10.1007/s42452-020-3051-2
Google Scholar
Shi, Q., Suganthan, P. N., & Katuwal, R. (2022). Weighting and pruning based ensemble deep random vector functional link network for tabular data classification. arXiv:2201.05809. http://arxiv.org/abs/2201.05809
DOI: https://doi.org/10.1016/j.patcog.2022.108879
Google Scholar
Swathy, M., & Saruladha, K. (2021). A comparative study of classification and prediction of cardio-vascular diseases (cvd) using machine learning and deep learning techniques. ICT Express, 8(1), 109-116. https://doi.org/10.1016/j.icte.2021.08.021
DOI: https://doi.org/10.1016/j.icte.2021.08.021
Google Scholar
Tewari, S., & Dwivedi, U. D. (2020). A comparative study of heterogeneous ensemble methods for the identification of geological lithofacies. Journal of Petroleum Exploration and Production Technology, 10(5), 1849–1868. https://doi.org/10.1007/s13202-020-00839-y
DOI: https://doi.org/10.1007/s13202-020-00839-y
Google Scholar
Thirunavukkarasu, K., Singh, A. S., Rai, P., & Gupta, S. (2018). Classification of IRIS dataset using classification based KNN Algorithm in supervised learning. 2018 4th International Conference on Computing Communication and Automation, ICCCA 2018 (pp. 4–7). IEEE. https://doi.org/10.1109/CCAA.2018.8777643
DOI: https://doi.org/10.1109/CCAA.2018.8777643
Google Scholar
Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1), 1–16. https://doi.org/10.1186/s12911-019-1004-8
DOI: https://doi.org/10.1186/s12911-019-1004-8
Google Scholar
Wade, B. S. C., Joshi, S. H., Gutman, B. A., & Thompson, P. M. (2017). Machine learning on high dimensional shape data from subcortical brain surfaces: A comparison of feature selection and classification methods. Pattern Recognition, 63, 731–739. https://doi.org/10.1016/j.patcog.2016.09.034
DOI: https://doi.org/10.1016/j.patcog.2016.09.034
Google Scholar
Wei, X., Zou, N., Zeng, L., & Pei, Z. (2022). PolyJet 3D printing: Predicting color by multilayer perceptron neural network. Annals of 3D Printed Medicine, 5, 100049. https://doi.org/10.1016/j.stlm.2022.100049
DOI: https://doi.org/10.1016/j.stlm.2022.100049
Google Scholar
Yakut, Ö., & Bolat, E. D. (2022). A high-performance arrhythmic heartbeat classification using ensemble learning method and PSD based feature extraction approach. Biocybernetics and Biomedical Engineering, 42(2), 667–680. https://doi.org/10.1016/j.bbe.2022.05.004
DOI: https://doi.org/10.1016/j.bbe.2022.05.004
Google Scholar
Yogita, B., Akanksha, M., Shefali, A., Tanya, M., & Gresha, B. (2020). Classification of Cardiac Arrhythmia Using Kernelized SVM. 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (pp. 922-926). IEEE. https://doi.org/10.1109/ICOEI48184.2020.9143000.
DOI: https://doi.org/10.1109/ICOEI48184.2020.9143000
Google Scholar
Authors
Archana Gunakalaarchu.gunakala@gmail.com
Research Scholar India
https://orcid.org/0000-0002-3375-1893
Authors
Afzal Hussain ShahidAssistant Professor, Senior (Grade-I) India
https://orcid.org/0009-0001-9815-108X
Statistics
Abstract views: 332PDF downloads: 185
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
Similar Articles
- Jolanta BRZOZOWSKA, Jakub PIZOŃ, Gulzhan BAYTIKENOVA, Arkadiusz GOLA, Alfiya ZAKIMOVA, Katarzyna PIOTROWSKA, DATA ENGINEERING IN CRISP-DM PROCESS PRODUCTION DATA – CASE STUDY , Applied Computer Science: Vol. 19 No. 3 (2023)
- Grzegorz RADZKI, Amila THIBBOTUWAWA, Grzegorz BOCEWICZ, UAVS FLIGHT ROUTES OPTIMIZATION IN CHANGING WEATHER CONDITIONS – CONSTRAINT PROGRAMMING APPROACH , Applied Computer Science: Vol. 15 No. 3 (2019)
- Andrzej ŁUKASZEWICZ, Jerzy JÓZWIK, Kamil CYBUL, IMPACT OF FRICTION COEFFICIENT VARIATION ON TEMPERATURE FIELD IN ROTARY FRICTION WELDING OF METALS – FEM STUDY , Applied Computer Science: Vol. 19 No. 3 (2023)
- Maciej NABOŻNY, ASYNCHRONOUS INFORMATION DISTRIBUTION AND CLUSTER STATE SYNCHRONIZATION , Applied Computer Science: Vol. 14 No. 1 (2018)
- Erizal ERIZAL, Mohammad DIQI, PERFORMANCE EVALUATION OF STOCK PREDICTION MODELS USING EMAGRU , Applied Computer Science: Vol. 19 No. 3 (2023)
- Krzysztof Michalczyk, Mariusz Warzecha, Robert Baran, A NEW METHOD FOR GENERATING VIRTUAL MODELS OF NONLINEAR HELICAL SPRINGS BASED ON A RIGOROUS MATHEMATICAL MODEL , Applied Computer Science: Vol. 19 No. 2 (2023)
- Krzysztof OSTROWSKI, AN EFFECTIVE METAHEURISTIC FOR TOURIST TRIP PLANNING IN PUBLIC TRANSPORT NETWORKS , Applied Computer Science: Vol. 14 No. 2 (2018)
- Hakan AYDIN, Ahmet SERTBAŞ, CYBER SECURITY IN INDUSTRIAL CONTROL SYSTEMS (ICS): A SURVEY OF ROWHAMMER VULNERABILITY , Applied Computer Science: Vol. 18 No. 2 (2022)
You may also start an advanced similarity search for this article.