A COMPARATIVE STUDY ON PERFORMANCE OF BASIC AND ENSEMBLE CLASSIFIERS WITH VARIOUS DATASETS

Archana Gunakala

archu.gunakala@gmail.com
Research Scholar (India)
https://orcid.org/0000-0002-3375-1893

Afzal Hussain Shahid


Assistant Professor, Senior (Grade-I) (India)
https://orcid.org/0009-0001-9815-108X

Abstract

Classification plays a critical role in machine learning (ML) systems for processing images, text and high -dimensional data. Predicting class labels from training data is the primary goal of classification. An optimal model for a particular classification problem is chosen on the basis of the model's performance and execution time. This paper compares and analyses the performance of basic as well as ensemble classifiers utilizing 10 -fold cross validation and also discusses their essential concepts, advantages, and disadvantages. In this study five basic classifiers namely Naïve Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) and the ensemble of all the five classifiers along with few more combinations are compared with five University of California Irvine (UCI) ML Repository datasets and a Diabetes Health Indicators dataset from kaggle repository. To analyze and compare the performance of classifiers, evaluation metrics like Accuracy, Recall, Precision, Area Under Curve (AUC) and F-Score are used. Experimental results showed that SVM performs best on two out of the six datasets (Diabetes Health Indicators and waveform), RF performs best for Arrhythmia, Sonar, Tic-tac-toe datasets, and the best ensemble combination is found to be DT+SVM+RF on Ionosphere dataset having respective accuracies 72.58%, 90.38%, 81.63%, 73.59%, 94.78% and 94.01% and the proposed ensemble combinations outperformed over the conventional models for few datasets.


Keywords:

Classification, Naïve Bayes, Neural Network, Support Vector Machine, Decision Tree, Ensemble Learning, Random Forest

Alshayeji, M. H., Ellethy, H., Abed, S., & Gupta, R. (2022). Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach. Biomedical Signal Processing and Control, 71(PA), 103141. https://doi.org/10.1016/j.bspc.2021.103141
DOI: https://doi.org/10.1016/j.bspc.2021.103141   Google Scholar

Alshdaifat, E., Al-hassan, M., & Aloqaily, A. (2021). Effective heterogeneous ensemble classification: An alternative approach for selecting base classifiers. ICT Express, 7(3), 342–349. https://doi.org/10.1016/j.icte.2020.11.005
DOI: https://doi.org/10.1016/j.icte.2020.11.005   Google Scholar

Baumann, P., Hochbaum, D. S., & Yang, Y. T. (2019). A comparative study of the leading machine learning techniques and two new optimization algorithms. European Journal of Operational Research, 272(3), 1041–1057. https://doi.org/10.1016/j.ejor.2018.07.009
DOI: https://doi.org/10.1016/j.ejor.2018.07.009   Google Scholar

bin Basir, M. A., & binti Ahmad, F. (2017). New Feature Selection Model Based Ensemble Rule Classifiers Method for Dataset Classification. International Journal of Artificial Intelligence & Applications, 8(2), 37–43. https://doi.org/10.5121/ijaia.2017.8204
DOI: https://doi.org/10.5121/ijaia.2017.8204   Google Scholar

Chandrika, Divya, C., Gowramma, G. S., & Varun, C. R. (2018). A comparative analysis on evaluation of classification algorithms based on ionospheric data. International Journal of Computer Sciences and Engineering, 6(5), 636–640. https://doi.org/10.26438/ijcse/v6i5.636640
DOI: https://doi.org/10.26438/ijcse/v6i5.636640   Google Scholar

Consuegra-Ayala, J. P., Gutiérrez, Y., Almeida-Cruz, Y., & Palomar, M. (2022). Intelligent ensembling of autoML system outputs for solving classification problems. Information Sciences, 609, 766–780. https://doi.org/10.1016/j.ins.2022.07.061
DOI: https://doi.org/10.1016/j.ins.2022.07.061   Google Scholar

Ecemis, C., Acu, N., & Sari, Z. (2022). Classification of Imbalanced Cardiac Arrhythmia Data. European Journal of Science and Technology, 34, 546-552. https://doi.org/10.31590/ejosat.1083423
DOI: https://doi.org/10.31590/ejosat.1083423   Google Scholar

Fang, X., Klawohn, J., De Sabatino, A., Kundnani, H., Ryan, J., Yu, W., & Hajcak, G. (2022). Accurate classification of depression through optimized machine learning models on high-dimensional noisy data. Biomedical Signal Processing and Control, 71(Part B), 103237. https://doi.org/10.1016/j.bspc.2021.103237
DOI: https://doi.org/10.1016/j.bspc.2021.103237   Google Scholar

Farhat, N. H. (1992). Photonit neural networks and learning mathines the role of electron-trapping materials. IEEE Expert-Intelligent Systems and Their Applications, 7(5), 63–72. https://doi.org/10.1109/64.163674
DOI: https://doi.org/10.1109/64.163674   Google Scholar

Fath, A. H., Madanifar, F., & Abbasi, M. (2020). Implementation of multilayer perceptron (MLP) and radial basis function (RBF) neural networks to predict solution gas-oil ratio of crude oil systems. Petroleum, 6(1), 80–91. https://doi.org/10.1016/j.petlm.2018.12.002
DOI: https://doi.org/10.1016/j.petlm.2018.12.002   Google Scholar

Ganie, S. M., & Malik, M. B. (2022). An Ensemble Machine Learning Approach for Predicting Type-II Diabetes Mellitus based on Lifestyle Indicators. Healthcare Analytics, 2, 100092. https://doi.org/10.1016/j.health.2022.100092
DOI: https://doi.org/10.1016/j.health.2022.100092   Google Scholar

Gupta, V., Srinivasan, S., & Kudli, S. S. (2014). Prediction and Classification of Cardiac Arrhythmia. https://cs229.stanford.edu/proj2014/Vasu%20Gupta,%20Sharan%20Srinivasan,%20Sneha%20Kudli,%20Prediction%20and%20Classification%20of%20Cardiac%20Arrhythmia.pdf
  Google Scholar

Hongle, D., Yan, Z., Lin, Z., Yeh-Cheng, C., Gang, K., & Chen, Y.-C. (2022). Selective Ensemble Learning Algorithm for Imbalanced Dataset. Preprint. https://doi.org/10.21203/rs.3.rs-721493/v1
DOI: https://doi.org/10.21203/rs.3.rs-721493/v1   Google Scholar

Jia, J., & Qiu, W. (2020). Research on an ensemble classification algorithm based on differential privacy. IEEE Access, 8, 93499–93513. https://doi.org/10.1109/ACCESS.2020.2995058
DOI: https://doi.org/10.1109/ACCESS.2020.2995058   Google Scholar

Kilincer, I. F., Ertam, F., & Sengur, A. (2021). Machine learning methods for cyber security intrusion detection: Datasets and comparative study. Computer Networks, 188, 107840. https://doi.org/10.1016/j.comnet.2021.107840
DOI: https://doi.org/10.1016/j.comnet.2021.107840   Google Scholar

Kushwah, J. S., Kumar, A., Patel, S., Soni, R., Gawande, A., & Gupta, S. (2021). Comparative study of regressor and classifier with decision tree using modern tools. Materials Today: Proceedings, 56(6), 3571-3576. https://doi.org/10.1016/j.matpr.2021.11.635
DOI: https://doi.org/10.1016/j.matpr.2021.11.635   Google Scholar

Ma, T. M., Yamamori, K., & Thida, A. (2020). A comparative approach to naïve bayes classifier and support vector machine for email spam classification. 2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 (pp. 324–326). IEEE. https://doi.org/10.1109/GCCE50665.2020.9291921
DOI: https://doi.org/10.1109/GCCE50665.2020.9291921   Google Scholar

Maniruzzaman, M., Jahanur Rahman, M., Ahammed, B., Abedin, M. M., Suri, H. S., Biswas, M., El-Baz, A., Bangeas, P., Tsoulfas, G., & Suri, J. S. (2019). Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Computer Methods and Programs in Biomedicine, 176, 173–193. https://doi.org/10.1016/j.cmpb.2019.04.008
DOI: https://doi.org/10.1016/j.cmpb.2019.04.008   Google Scholar

Mohamed, A. R. (2017). Comparative Study of Four Supervised Machine Learning Techniques for Classification. International Journal of Applied Science and Technology, 7(2), 5–18.
  Google Scholar

Nazari, E., Aghemiri, M., Avan, A., Mehrabian, A., & Tabesh, H. (2021). Machine learning approaches for classification of colorectal cancer with and without feature selection method on microarray data. Gene Reports, 25, 101419. https://doi.org/10.1016/j.genrep.2021.101419
DOI: https://doi.org/10.1016/j.genrep.2021.101419   Google Scholar

Ngo, G., Beard, R., & Chandra, R. (2022). Evolutionary bagging for ensemble learning. Neurocomputing, 510, 1-14. https://doi.org/10.1016/j.neucom.2022.08.055
DOI: https://doi.org/10.1016/j.neucom.2022.08.055   Google Scholar

Patel, H. H., & Prajapati, P. (2018). Study and analysis of decision tree based classification algorithms. International Journal of Computer Sciences and Engineering, 6(10), 74–78. https://doi.org/10.26438/ijcse/v6i10.7478
DOI: https://doi.org/10.26438/ijcse/v6i10.7478   Google Scholar

Patel, N., & Upadhyay, S. (2012). Study of various decision tree pruning methods with their empirical comparison in WEKA. International Journal of Computer Applications, 60(12), 20–25. https://doi.org/10.5120/9744-4304
DOI: https://doi.org/10.5120/9744-4304   Google Scholar

Priyanka, & Kumar, D. (2020). Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269. https://doi.org/10.1504/ijids.2020.108141
DOI: https://doi.org/10.1504/IJIDS.2020.108141   Google Scholar

Pugliese, R., Regondi, S., & Marini, R. (2021). Machine learning-based approach: global trends, research directions, and regulatory standpoints. Data Science and Management, 4, 19–29. https://doi.org/10.1016/j.dsm.2021.12.002
DOI: https://doi.org/10.1016/j.dsm.2021.12.002   Google Scholar

Punyapornwithaya, V., Klaharn, K., Arjkumpa, O., & Sansamur, C. (2022). Exploring the predictive capability of machine learning models in identifying foot and mouth disease outbreak occurrences in cattle farms in an endemic setting of Thailand. Preventive Veterinary Medicine, 207, 105706. https://doi.org/10.1016/J.PREVETMED.2022.105706
DOI: https://doi.org/10.1016/j.prevetmed.2022.105706   Google Scholar

Qian, X., Zhou, Z., Hu, J., Zhu, J., Huang, H., & Dai, Y. (2021). A comparative study of kernel-based vector machines with probabilistic outputs for medical diagnosis. Biocybernetics and Biomedical Engineering, 41(4), 1486–1504. https://doi.org/10.1016/j.bbe.2021.09.003
DOI: https://doi.org/10.1016/j.bbe.2021.09.003   Google Scholar

Revathi, A., Kaladevi, R., Ramana, K., Jhaveri, R. H., Kumar, M. R., & Kumar, M. S. P. (2022). Early detection of cognitive decline using machine learning algorithm and cognitive ability test. Security and Communication Networks, 2022, 4190023. https://doi.org/10.1155/2022/4190023
DOI: https://doi.org/10.1155/2022/4190023   Google Scholar

Rezvani, S., & Wang, X. (2022). Neurocomputing intuitionistic fuzzy twin support vector machines for imbalanced data. Neurocomputing, 507, 16–25. https://doi.org/10.1016/j.neucom.2022.07.083
DOI: https://doi.org/10.1016/j.neucom.2022.07.083   Google Scholar

Sevinç, E. (2022). An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Computers and Industrial Engineering, 165, 107912. https://doi.org/10.1016/j.cie.2021.107912
DOI: https://doi.org/10.1016/j.cie.2021.107912   Google Scholar

Shafi, A. S. M., Molla, M. M. I., Jui, J. J., & Rahman, M. M. (2020). Detection of colon cancer based on microarray dataset using machine learning as a feature selection and classification techniques. SN Applied Sciences, 2(7), 1–8. https://doi.org/10.1007/s42452-020-3051-2
DOI: https://doi.org/10.1007/s42452-020-3051-2   Google Scholar

Shi, Q., Suganthan, P. N., & Katuwal, R. (2022). Weighting and pruning based ensemble deep random vector functional link network for tabular data classification. arXiv:2201.05809. http://arxiv.org/abs/2201.05809
DOI: https://doi.org/10.1016/j.patcog.2022.108879   Google Scholar

Swathy, M., & Saruladha, K. (2021). A comparative study of classification and prediction of cardio-vascular diseases (cvd) using machine learning and deep learning techniques. ICT Express, 8(1), 109-116. https://doi.org/10.1016/j.icte.2021.08.021
DOI: https://doi.org/10.1016/j.icte.2021.08.021   Google Scholar

Tewari, S., & Dwivedi, U. D. (2020). A comparative study of heterogeneous ensemble methods for the identification of geological lithofacies. Journal of Petroleum Exploration and Production Technology, 10(5), 1849–1868. https://doi.org/10.1007/s13202-020-00839-y
DOI: https://doi.org/10.1007/s13202-020-00839-y   Google Scholar

Thirunavukkarasu, K., Singh, A. S., Rai, P., & Gupta, S. (2018). Classification of IRIS dataset using classification based KNN Algorithm in supervised learning. 2018 4th International Conference on Computing Communication and Automation, ICCCA 2018 (pp. 4–7). IEEE. https://doi.org/10.1109/CCAA.2018.8777643
DOI: https://doi.org/10.1109/CCAA.2018.8777643   Google Scholar

Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1), 1–16. https://doi.org/10.1186/s12911-019-1004-8
DOI: https://doi.org/10.1186/s12911-019-1004-8   Google Scholar

Wade, B. S. C., Joshi, S. H., Gutman, B. A., & Thompson, P. M. (2017). Machine learning on high dimensional shape data from subcortical brain surfaces: A comparison of feature selection and classification methods. Pattern Recognition, 63, 731–739. https://doi.org/10.1016/j.patcog.2016.09.034
DOI: https://doi.org/10.1016/j.patcog.2016.09.034   Google Scholar

Wei, X., Zou, N., Zeng, L., & Pei, Z. (2022). PolyJet 3D printing: Predicting color by multilayer perceptron neural network. Annals of 3D Printed Medicine, 5, 100049. https://doi.org/10.1016/j.stlm.2022.100049
DOI: https://doi.org/10.1016/j.stlm.2022.100049   Google Scholar

Yakut, Ö., & Bolat, E. D. (2022). A high-performance arrhythmic heartbeat classification using ensemble learning method and PSD based feature extraction approach. Biocybernetics and Biomedical Engineering, 42(2), 667–680. https://doi.org/10.1016/j.bbe.2022.05.004
DOI: https://doi.org/10.1016/j.bbe.2022.05.004   Google Scholar

Yogita, B., Akanksha, M., Shefali, A., Tanya, M., & Gresha, B. (2020). Classification of Cardiac Arrhythmia Using Kernelized SVM. 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (pp. 922-926). IEEE. https://doi.org/10.1109/ICOEI48184.2020.9143000.
DOI: https://doi.org/10.1109/ICOEI48184.2020.9143000   Google Scholar

Download


Published
2023-03-31

Cited by

Gunakala, A., & Shahid, A. H. (2023). A COMPARATIVE STUDY ON PERFORMANCE OF BASIC AND ENSEMBLE CLASSIFIERS WITH VARIOUS DATASETS . Applied Computer Science, 19(1), 107–132. https://doi.org/10.35784/acs-2023-08

Authors

Archana Gunakala 
archu.gunakala@gmail.com
Research Scholar India
https://orcid.org/0000-0002-3375-1893

Authors

Afzal Hussain Shahid 

Assistant Professor, Senior (Grade-I) India
https://orcid.org/0009-0001-9815-108X

Statistics

Abstract views: 185
PDF downloads: 147


License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.


Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.