IMPROVING CORONARY HEART DISEASE PREDICTION BY OUTLIER ELIMINATION
Lubna RIYAZ
lubna.riyaz122@gmail.comPG Department of Computer Sciences, University of Kashmir, Srinagar (India)
Muheet Ahmed BUTT
PG Department of Computer Sciences, University of Kashmir, Srinagar (India)
Majid ZAMAN
Directorate of IT & SS, University of Kashmir, Srinagar (India)
Abstract
Nowadays, heart disease is the major cause of deaths globally. According to a survey conducted by the World Health Organization, almost 18 million people die of heart diseases (or cardiovascular diseases) every day. So, there should be a system for early detection and prevention of heart disease. Detection of heart disease mostly depends on the huge pathological and clinical data that is quite complex. So, researchers and other medical professionals are showing keen interest in accurate prediction of heart disease. Heart disease is a general term for a large number of medical conditions related to heart and one of them is the coronary heart disease (CHD). Coronary heart disease is caused by the amassing of plaque on the artery walls. In this paper, various machine learning base and ensemble classifiers have been applied on heart disease dataset for efficient prediction of coronary heart disease. Various machine learning classifiers that have been employed include k-nearest neighbor, multilayer perceptron, multinomial naïve bayes, logistic regression, decision tree, random forest and support vector machine classifiers. Ensemble classifiers that have been used include majority voting, weighted average, bagging and boosting classifiers. The dataset used in this study is obtained from the Framingham Heart Study which is a long-term, ongoing cardiovascular study of people from the Framingham city in Massachusetts, USA. To evaluate the performance of the classifiers, various evaluation metrics including accuracy, precision, recall and f1 score have been used. According to our results, the best accuracy was achieved by logistic regression, random forest, majority voting, weighted average and bagging classifiers but the highest accuracy among these was achieved using weighted average ensemble classifier.
Keywords:
coronary heart disease, machine learning, ensembles, outlier detection, framinghamReferences
Ashraf, M., Zaman, M., & Ahmed, M. (2018a). Using ensemble stackingc method and base classifiers to ameliorate prediction accuracy of pedagogical data. Procedia Computer Science, 132(Iccids), 1021–1040. https://doi.org/10.1016/j.procs.2018.05.018
DOI: https://doi.org/10.1016/j.procs.2018.05.018
Google Scholar
Ashraf, M., Zaman, M., & Ahmed, M. (2018b). Performance analysis and different subject combinations: an empirical and analytical discourse of educational data mining. Proceedings of the 8th International Conference Confluence 2018 on Cloud Computing, Data Science and Engineering, Confluence 2018 (pp. 287–292). IEEE. https://doi.org/10.1109/CONFLUENCE.2018.8442633
DOI: https://doi.org/10.1109/CONFLUENCE.2018.8442633
Google Scholar
Ashraf, M., Zaman, M., & Ahmed, M. (2019). To ameliorate classification accuracy using ensemble vote approach and base classifiers. In Advances in Intelligent Systems and Computing (vol 813). Springer Singapore. https://doi.org/10.1007/978-981-13-1498-8_29
DOI: https://doi.org/10.1007/978-981-13-1498-8_29
Google Scholar
Ashraf, M., Zaman, M., & Ahmed, M. (2020). An intelligent prediction system for educational data mining based on ensemble and filtering approaches. Procedia Computer Science, 167(2019), 1471–1483. https://doi.org/10.1016/j.procs.2020.03.358
DOI: https://doi.org/10.1016/j.procs.2020.03.358
Google Scholar
Bashir, S., Khan, Z. S., Hassan Khan, F., Anjum, A., & Bashir, K. (2019). Improving Heart Disease Prediction Using Feature Selection Approaches. Proceedings of 2019 16th International Bhurban Conference on Applied Sciences and Technology, (pp. 619–623). IEEE. https://doi.org/10.1109/IBCAST.2019.8667106
DOI: https://doi.org/10.1109/IBCAST.2019.8667106
Google Scholar
Benhar, H., Idri, A., & Fernández-Alemán, J. L. (2019). A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery. Journal of Medical Systems, 43(1), 17. https://doi.org/10.1007/s10916-018-1134-z
DOI: https://doi.org/10.1007/s10916-018-1134-z
Google Scholar
Cardiovascular (Heart) Diseases: Types and Treatments. (n.d.). Retrieved January 8, 2022 from https://www.webmd.com/heart-disease/guide/diseases-cardiovascular
Google Scholar
Chandra Shekar, K., Chandra, P., & Venugopala Rao, K. (2019). An Ensemble Classifier Characterized by Genetic Algorithm with Decision Tree for the Prophecy of Heart Disease. In Lecture Notes in Networks and Systems (Vol. 74). Springer Singapore. https://doi.org/10.1007/978-981-13-7082-3_2
DOI: https://doi.org/10.1007/978-981-13-7082-3_2
Google Scholar
Coronary artery disease: Causes, symptoms, and treatment. (n.d.). Retrieved December 22, 2021 from https://www.medicalnewstoday.com/articles/184130
Google Scholar
Coronary heart disease – NHS. (n.d.). Retrieved December 22, 2021 from https://www.nhs.uk/conditions/coronaryheart-disease/
Google Scholar
Coronary Heart Disease | NHLBI, NIH. (n.d.). Retrieved December 22, 2021 from https://www.nhlbi.nih.gov/healthtopics/coronary-heart-disease
Google Scholar
Data Jabberwocky: Decision Tree Mathematical Formulation. (n.d.). Retrieved December 26, 2021 from http://fiascodata.blogspot.com/2018/08/decision-tree-mathematical-formulation.html
Google Scholar
Decision Tree – GeeksforGeeks. (n.d.). Retrieved December 26, 2021 from https://www.geeksforgeeks.org/decisiontree/
Google Scholar
Decision Trees in Machine Learning | by Prashant Gupta | Towards Data Science. (n.d.). Retrieved December 26, 2021 from https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052
Google Scholar
Dun, B., Wang, E., & Majumder, S. (2016). Heart Disease Diagnosis on Medical Data Using Ensemble Learning. Computer Science, 1(1), 1–5.
Google Scholar
El-Shafeiy, E. A., El-Desouky, A. I., & Elghamrawy, S. M. (2018). Prediction of Liver Diseases Based on Machine Learning Technique for Big Data. Advances in Intelligent Systems and Computing, 723, 362–374. https://doi.org/10.1007/978-3-319-74690-6_36
DOI: https://doi.org/10.1007/978-3-319-74690-6_36
Google Scholar
Entropy: How Decision Trees Make Decisions | by Sam T | Towards Data Science. (n.d.). Retrieved December 26, 2021 from https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
Google Scholar
Entropy and Information Gain in Decision Trees | by Jeremiah Lutes | Towards Data Science. (n.d.). Retrieved December 26, 2021 from https://towardsdatascience.com/entropy-and-information-gain-in-decisiontrees-c7db67a3a293
Google Scholar
Framingham Heart Study. (n.d.). Retrieved September 9, 2021 from https://framinghamheartstudy.org/
Google Scholar
Gokulnath, C. B., & Shantharajah, S. P. (2019). An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Computing, 22(s6), 14777–14787. https://doi.org/10.1007/s10586-018-2416-4
DOI: https://doi.org/10.1007/s10586-018-2416-4
Google Scholar
Heart disease – Symptoms and causes - Mayo Clinic. (n.d.). Retrieved January 8, 2022 from
Google Scholar
https://www.mayoclinic.org/diseases-conditions/heart-disease/symptoms-causes/syc-20353118
Google Scholar
K-Nearest Neighbor(KNN) Algorithm for Machine Learning - Javatpoint. (n.d.). Retrieved December 26, 2021 from https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
Google Scholar
Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal, 15, 104–116. https://doi.org/10.1016/J.CSBJ.2016.12.005
DOI: https://doi.org/10.1016/j.csbj.2016.12.005
Google Scholar
Latha, C. B. C., & Jeeva, S. C. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16, 100203. https://doi.org/10.1016/j.imu.2019.100203
DOI: https://doi.org/10.1016/j.imu.2019.100203
Google Scholar
Less than $1: How WHO thinks that can save 7 million lives. (n.d.). Retrieved January 9, 2022 from https://www.downtoearth.org.in/news/health/less-than-1-how-who-thinks-that-can-save-7-million-lives80679
Google Scholar
Logistic Regression - an overview | ScienceDirect Topics. (n.d.). Retrieved December 26, 2021 from https://www.sciencedirect.com/topics/computer-science/logistic-regression
Google Scholar
Mir, N. M., Khan, S., Butt, M. A., & Zaman, M. (2016). An experimental evaluation of Bayesian classifiers applied to intrusion detection. Indian Journal of Science and Technology, 9(12), 1–13. https://doi.org/10.17485/ijst/2016/v9i12/86291
DOI: https://doi.org/10.17485/ijst/2016/v9i12/86291
Google Scholar
Mohd, R., Butt, M. A., & Baba, M. Z. (2020). GWLM–NARX: Grey Wolf Levenberg–Marquardt-based neural network for rainfall prediction. Data Technologies and Applications, 54(1), 85–102. https://doi.org/10.1108/DTA-08-2019-0130
DOI: https://doi.org/10.1108/DTA-08-2019-0130
Google Scholar
Mohd, R., Butt, M. A., & Baba, M. Z. (2019). SALM-NARX: Self adaptive LM-based NARX model for the prediction of rainfall. Proceedings of the International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2018 (pp. 580–585). IEEE. https://doi.org/10.1109/ISMAC.2018.8653747
DOI: https://doi.org/10.1109/I-SMAC.2018.8653747
Google Scholar
Multilayer Perceptron – an overview | ScienceDirect Topics. (n.d.). Retrieved December 26, 2021 from https://www.sciencedirect.com/topics/computer-science/multilayer-perceptron
Google Scholar
Multinomial Naive Bayes Explained: Function, Advantages & Disadvantages, Applications in 2021 | upGrad blog. (n.d.). Retrieved December 26, 2021 from https://www.upgrad.com/blog/multinomial-naive-bayesexplained/
Google Scholar
Otoom, A. F., Abdallah, E. E., Kilani, Y., & Kefaye, A. (2015). Effective Diagnosis and Monitoring of Heart Disease. International Journal of Software Engineering and Its Applications, 9(1), 143–156.
Google Scholar
Riyaz, L., Butt, M. A., Zaman, M., & Ayob, O. (2022). Heart Disease Prediction Using Machine Learning Techniques: A Quantitative Review. Advances in Intelligent Systems and Computing (pp. 81–94). Springer. https://doi.org/10.1007/978-981-16-3071-2_8
DOI: https://doi.org/10.1007/978-981-16-3071-2_8
Google Scholar
Sakai, K., & Yamada, K. (2019). Machine learning studies on major brain diseases: 5-year trends of 2014–2018. Japanese Journal of Radiology, 37, 34–72. https://doi.org/10.1007/s11604-018-0794-4
DOI: https://doi.org/10.1007/s11604-018-0794-4
Google Scholar
Salvatore, C., Cerasa, A., Castiglioni, I., Gallivanone, F., Augimeri, A., Lopez, M., Arabia, G., Morelli, M., Gilardi, M. C., & Quattrone, A. (2014). Machine learning on brain MRI data for differential diagnosis of Parkinson’s disease and Progressive Supranuclear Palsy. Journal of Neuroscience Methods, 222, 230–237. https://doi.org/10.1016/J.JNEUMETH.2013.11.016
DOI: https://doi.org/10.1016/j.jneumeth.2013.11.016
Google Scholar
Shinde, R., Arjun, S., Patil, P., & Waghmare, P. J. (2015). An Intelligent Heart Disease Prediction System Using K-Means Clustering and Naïve Bayes Algorithm. International Journal of Computer Science and Information Technolog, 6(1), 637–639.
Google Scholar
Takci, H. (2018). Improvement of heart attack prediction by the feature selection methods. Turkish Journal of Electrical Engineering and Computer Sciences, 26(1), 1–10. https://doi.org/10.3906/elk-1611-235
DOI: https://doi.org/10.3906/elk-1611-235
Google Scholar
Thaiparnit, S., Kritsanasung, S., & Chumuang, N. (2019). A Classification for Patients with Heart Disease Based on Hoeffding Tree. JCSSE 2019 – 16th International Joint Conference on Computer Science and Software Engineering: Knowledge Evolution Towards Singularity of Man-Machine Intelligence (pp. 352–357). IEEE. https://doi.org/10.1109/JCSSE.2019.8864158
DOI: https://doi.org/10.1109/JCSSE.2019.8864158
Google Scholar
Wei, S., Zhao, X., & Miao, C. (2018). A comprehensive exploration to the machine learning techniques for diabetes identification. IEEE World Forum on Internet of Things, WF-IoT 2018 - Proceedings, (pp. 291–295). IEEE.
Google Scholar
https://doi.org/10.1109/WF-IOT.2018.8355130
DOI: https://doi.org/10.1109/WF-IoT.2018.8355130
Google Scholar
Wu, C. C., Yeh, W. C., Hsu, W. D., Islam, M. M., Nguyen, P. A., Poly, T. N., Wang, Y. C., Yang, H. C., & Li, Y. C. (2019). Prediction of fatty liver disease using machine learning algorithms. Computer Methods and Programs in Biomedicine, 170, 23–29. https://doi.org/10.1016/J.CMPB.2018.12.032
DOI: https://doi.org/10.1016/j.cmpb.2018.12.032
Google Scholar
Zaman, M., Kaul, S., & Ahmed, M. (2020). Analytical comparison between the information gain and gini index using historical geographical data. International Journal of Advanced Computer Science and Applications, 11(5), 429–440. https://doi.org/10.14569/IJACSA.2020.0110557
DOI: https://doi.org/10.14569/IJACSA.2020.0110557
Google Scholar
Zaman, M., Quadri, S. M. K., & Butt, M. A. (2012). Information translation: A practitioners approach. Lecture Notes in Engineering and Computer Science, 1, 45–47.
Google Scholar
Authors
Lubna RIYAZlubna.riyaz122@gmail.com
PG Department of Computer Sciences, University of Kashmir, Srinagar India
Authors
Muheet Ahmed BUTTPG Department of Computer Sciences, University of Kashmir, Srinagar India
Authors
Majid ZAMANDirectorate of IT & SS, University of Kashmir, Srinagar India
Statistics
Abstract views: 233PDF downloads: 82
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
Most read articles by the same author(s)
- Sheikh Amir FAYAZ, Majid ZAMAN, Muheet Ahmed BUTT, Sameer KAUL, HOW MACHINE LEARNING ALGORITHMS ARE USED IN METEOROLOGICAL DATA CLASSIFICATION: A COMPARATIVE APPROACH BETWEEN DT, LMT, M5-MT, GRADIENT BOOSTING AND GWLM-NARX MODELS , Applied Computer Science: Vol. 18 No. 4 (2022)
Similar Articles
- Damian GIEBAS, Rafał WOJSZCZYK, ORDER VIOLATION IN MULTITHREADED APPLICATIONS AND ITS DETECTION IN STATIC CODE ANALYSIS PROCESS , Applied Computer Science: Vol. 16 No. 4 (2020)
- Michał TOMCZYK, Anna PLICHTA, Mariusz MIKULSKI, APPLICATION OF WAVELET – NEURAL METHOD TO DETECT BACKLASH ZONE IN ELECTROMECHANICAL SYSTEMS GENERATING NOISES , Applied Computer Science: Vol. 15 No. 4 (2019)
- Hae Chan Na, Yoon Sang Kim, A STUDY ON AN AR-BASED CIRCUIT PRACTICE , Applied Computer Science: Vol. 20 No. 1 (2024)
- Konrad KANIA, Mariusz MAZUREK, Tomasz RYMARCZYK, APPLICATION OF FINITE DIFFERENCE METHOD FOR MEASUREMENT SIMULATION IN ULTRASOUND TRANSMISSION TOMOGRAPHY , Applied Computer Science: Vol. 18 No. 2 (2022)
- Lei Liu, Eric B. Blancaflor, Mideth Abisado, A LIGHTWEIGHT MULTI-PERSON POSE ESTIMATION SCHEME BASED ON JETSON NANO , Applied Computer Science: Vol. 19 No. 1 (2023)
- Przemysław KRAKOWSKI, Robert KARPIŃSKI, Marcin MACIEJEWSKI, APPLICATIONS OF MODERN IMAGING TECHNOLOGY IN ORTHOPAEDIC TRAUMA SURGERY , Applied Computer Science: Vol. 14 No. 3 (2018)
- Esraa Alaa MAHAREEK, Doaa Rizk FATHY, Eman Karm ELSAYED, Nahed ELDESOUKY, Kamal Abdelraouf ELDAHSHAN, VIOLENCE PREDICTION IN SURVEILLANCE VIDEOS , Applied Computer Science: Vol. 20 No. 3 (2024)
You may also start an advanced similarity search for this article.