IMPROVING CORONARY HEART DISEASE PREDICTION BY OUTLIER ELIMINATION

Lubna RIYAZ

lubna.riyaz122@gmail.com
PG Department of Computer Sciences, University of Kashmir, Srinagar (India)

Muheet Ahmed BUTT


PG Department of Computer Sciences, University of Kashmir, Srinagar (India)

Majid ZAMAN


Directorate of IT & SS, University of Kashmir, Srinagar (India)

Abstract

Nowadays, heart disease is the major cause of deaths globally. According to a survey conducted by the World Health Organization, almost 18 million people die of heart diseases (or cardiovascular diseases) every day. So, there should be a system for early detection and prevention of heart disease. Detection of heart disease mostly depends on the huge pathological and clinical data that is quite complex. So, researchers and other medical professionals are showing keen interest in accurate prediction of heart disease.  Heart disease is a general term for a large number of medical conditions related to heart and one of them is the coronary heart disease (CHD). Coronary heart disease is caused by the amassing of plaque on the artery walls. In this paper, various machine learning base and ensemble classifiers have been applied on heart disease dataset for efficient prediction of coronary heart disease. Various machine learning classifiers that have been employed include k-nearest neighbor, multilayer perceptron, multinomial naïve bayes, logistic regression, decision tree, random forest and support vector machine classifiers. Ensemble classifiers that have been used include majority voting, weighted average, bagging and boosting classifiers. The dataset used in this study is obtained from the Framingham Heart Study which is a long-term, ongoing cardiovascular study of people from the Framingham city in Massachusetts, USA. To evaluate the performance of the classifiers, various evaluation metrics including accuracy, precision, recall and f1 score have been used. According to our results, the best accuracy was achieved by logistic regression, random forest, majority voting, weighted average and bagging classifiers but the highest accuracy among these was achieved using weighted average ensemble classifier. 


Keywords:

coronary heart disease, machine learning, ensembles, outlier detection, framingham

Ashraf, M., Zaman, M., & Ahmed, M. (2018a). Using ensemble stackingc method and base classifiers to ameliorate prediction accuracy of pedagogical data. Procedia Computer Science, 132(Iccids), 1021–1040. https://doi.org/10.1016/j.procs.2018.05.018
DOI: https://doi.org/10.1016/j.procs.2018.05.018   Google Scholar

Ashraf, M., Zaman, M., & Ahmed, M. (2018b). Performance analysis and different subject combinations: an empirical and analytical discourse of educational data mining. Proceedings of the 8th International Conference Confluence 2018 on Cloud Computing, Data Science and Engineering, Confluence 2018 (pp. 287–292). IEEE. https://doi.org/10.1109/CONFLUENCE.2018.8442633
DOI: https://doi.org/10.1109/CONFLUENCE.2018.8442633   Google Scholar

Ashraf, M., Zaman, M., & Ahmed, M. (2019). To ameliorate classification accuracy using ensemble vote approach and base classifiers. In Advances in Intelligent Systems and Computing (vol 813). Springer Singapore. https://doi.org/10.1007/978-981-13-1498-8_29
DOI: https://doi.org/10.1007/978-981-13-1498-8_29   Google Scholar

Ashraf, M., Zaman, M., & Ahmed, M. (2020). An intelligent prediction system for educational data mining based on ensemble and filtering approaches. Procedia Computer Science, 167(2019), 1471–1483. https://doi.org/10.1016/j.procs.2020.03.358
DOI: https://doi.org/10.1016/j.procs.2020.03.358   Google Scholar

Bashir, S., Khan, Z. S., Hassan Khan, F., Anjum, A., & Bashir, K. (2019). Improving Heart Disease Prediction Using Feature Selection Approaches. Proceedings of 2019 16th International Bhurban Conference on Applied Sciences and Technology, (pp. 619–623). IEEE. https://doi.org/10.1109/IBCAST.2019.8667106
DOI: https://doi.org/10.1109/IBCAST.2019.8667106   Google Scholar

Benhar, H., Idri, A., & Fernández-Alemán, J. L. (2019). A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery. Journal of Medical Systems, 43(1), 17. https://doi.org/10.1007/s10916-018-1134-z
DOI: https://doi.org/10.1007/s10916-018-1134-z   Google Scholar

Cardiovascular (Heart) Diseases: Types and Treatments. (n.d.). Retrieved January 8, 2022 from https://www.webmd.com/heart-disease/guide/diseases-cardiovascular
  Google Scholar

Chandra Shekar, K., Chandra, P., & Venugopala Rao, K. (2019). An Ensemble Classifier Characterized by Genetic Algorithm with Decision Tree for the Prophecy of Heart Disease. In Lecture Notes in Networks and Systems (Vol. 74). Springer Singapore. https://doi.org/10.1007/978-981-13-7082-3_2
DOI: https://doi.org/10.1007/978-981-13-7082-3_2   Google Scholar

Coronary artery disease: Causes, symptoms, and treatment. (n.d.). Retrieved December 22, 2021 from https://www.medicalnewstoday.com/articles/184130
  Google Scholar

Coronary heart disease – NHS. (n.d.). Retrieved December 22, 2021 from https://www.nhs.uk/conditions/coronaryheart-disease/
  Google Scholar

Coronary Heart Disease | NHLBI, NIH. (n.d.). Retrieved December 22, 2021 from https://www.nhlbi.nih.gov/healthtopics/coronary-heart-disease
  Google Scholar

Data Jabberwocky: Decision Tree Mathematical Formulation. (n.d.). Retrieved December 26, 2021 from http://fiascodata.blogspot.com/2018/08/decision-tree-mathematical-formulation.html
  Google Scholar

Decision Tree – GeeksforGeeks. (n.d.). Retrieved December 26, 2021 from https://www.geeksforgeeks.org/decisiontree/
  Google Scholar

Decision Trees in Machine Learning | by Prashant Gupta | Towards Data Science. (n.d.). Retrieved December 26, 2021 from https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052
  Google Scholar

Dun, B., Wang, E., & Majumder, S. (2016). Heart Disease Diagnosis on Medical Data Using Ensemble Learning. Computer Science, 1(1), 1–5.
  Google Scholar

El-Shafeiy, E. A., El-Desouky, A. I., & Elghamrawy, S. M. (2018). Prediction of Liver Diseases Based on Machine Learning Technique for Big Data. Advances in Intelligent Systems and Computing, 723, 362–374. https://doi.org/10.1007/978-3-319-74690-6_36
DOI: https://doi.org/10.1007/978-3-319-74690-6_36   Google Scholar

Entropy: How Decision Trees Make Decisions | by Sam T | Towards Data Science. (n.d.). Retrieved December 26, 2021 from https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
  Google Scholar

Entropy and Information Gain in Decision Trees | by Jeremiah Lutes | Towards Data Science. (n.d.). Retrieved December 26, 2021 from https://towardsdatascience.com/entropy-and-information-gain-in-decisiontrees-c7db67a3a293
  Google Scholar

Framingham Heart Study. (n.d.). Retrieved September 9, 2021 from https://framinghamheartstudy.org/
  Google Scholar

Gokulnath, C. B., & Shantharajah, S. P. (2019). An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Computing, 22(s6), 14777–14787. https://doi.org/10.1007/s10586-018-2416-4
DOI: https://doi.org/10.1007/s10586-018-2416-4   Google Scholar

Heart disease – Symptoms and causes - Mayo Clinic. (n.d.). Retrieved January 8, 2022 from
  Google Scholar

https://www.mayoclinic.org/diseases-conditions/heart-disease/symptoms-causes/syc-20353118
  Google Scholar

K-Nearest Neighbor(KNN) Algorithm for Machine Learning - Javatpoint. (n.d.). Retrieved December 26, 2021 from https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
  Google Scholar

Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal, 15, 104–116. https://doi.org/10.1016/J.CSBJ.2016.12.005
DOI: https://doi.org/10.1016/j.csbj.2016.12.005   Google Scholar

Latha, C. B. C., & Jeeva, S. C. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16, 100203. https://doi.org/10.1016/j.imu.2019.100203
DOI: https://doi.org/10.1016/j.imu.2019.100203   Google Scholar

Less than $1: How WHO thinks that can save 7 million lives. (n.d.). Retrieved January 9, 2022 from https://www.downtoearth.org.in/news/health/less-than-1-how-who-thinks-that-can-save-7-million-lives80679
  Google Scholar

Logistic Regression - an overview | ScienceDirect Topics. (n.d.). Retrieved December 26, 2021 from https://www.sciencedirect.com/topics/computer-science/logistic-regression
  Google Scholar

Mir, N. M., Khan, S., Butt, M. A., & Zaman, M. (2016). An experimental evaluation of Bayesian classifiers applied to intrusion detection. Indian Journal of Science and Technology, 9(12), 1–13. https://doi.org/10.17485/ijst/2016/v9i12/86291
DOI: https://doi.org/10.17485/ijst/2016/v9i12/86291   Google Scholar

Mohd, R., Butt, M. A., & Baba, M. Z. (2020). GWLM–NARX: Grey Wolf Levenberg–Marquardt-based neural network for rainfall prediction. Data Technologies and Applications, 54(1), 85–102. https://doi.org/10.1108/DTA-08-2019-0130
DOI: https://doi.org/10.1108/DTA-08-2019-0130   Google Scholar

Mohd, R., Butt, M. A., & Baba, M. Z. (2019). SALM-NARX: Self adaptive LM-based NARX model for the prediction of rainfall. Proceedings of the International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2018 (pp. 580–585). IEEE. https://doi.org/10.1109/ISMAC.2018.8653747
DOI: https://doi.org/10.1109/I-SMAC.2018.8653747   Google Scholar

Multilayer Perceptron – an overview | ScienceDirect Topics. (n.d.). Retrieved December 26, 2021 from https://www.sciencedirect.com/topics/computer-science/multilayer-perceptron
  Google Scholar

Multinomial Naive Bayes Explained: Function, Advantages & Disadvantages, Applications in 2021 | upGrad blog. (n.d.). Retrieved December 26, 2021 from https://www.upgrad.com/blog/multinomial-naive-bayesexplained/
  Google Scholar

Otoom, A. F., Abdallah, E. E., Kilani, Y., & Kefaye, A. (2015). Effective Diagnosis and Monitoring of Heart Disease. International Journal of Software Engineering and Its Applications, 9(1), 143–156.
  Google Scholar

Riyaz, L., Butt, M. A., Zaman, M., & Ayob, O. (2022). Heart Disease Prediction Using Machine Learning Techniques: A Quantitative Review. Advances in Intelligent Systems and Computing (pp. 81–94). Springer. https://doi.org/10.1007/978-981-16-3071-2_8
DOI: https://doi.org/10.1007/978-981-16-3071-2_8   Google Scholar

Sakai, K., & Yamada, K. (2019). Machine learning studies on major brain diseases: 5-year trends of 2014–2018. Japanese Journal of Radiology, 37, 34–72. https://doi.org/10.1007/s11604-018-0794-4
DOI: https://doi.org/10.1007/s11604-018-0794-4   Google Scholar

Salvatore, C., Cerasa, A., Castiglioni, I., Gallivanone, F., Augimeri, A., Lopez, M., Arabia, G., Morelli, M., Gilardi, M. C., & Quattrone, A. (2014). Machine learning on brain MRI data for differential diagnosis of Parkinson’s disease and Progressive Supranuclear Palsy. Journal of Neuroscience Methods, 222, 230–237. https://doi.org/10.1016/J.JNEUMETH.2013.11.016
DOI: https://doi.org/10.1016/j.jneumeth.2013.11.016   Google Scholar

Shinde, R., Arjun, S., Patil, P., & Waghmare, P. J. (2015). An Intelligent Heart Disease Prediction System Using K-Means Clustering and Naïve Bayes Algorithm. International Journal of Computer Science and Information Technolog, 6(1), 637–639.
  Google Scholar

Takci, H. (2018). Improvement of heart attack prediction by the feature selection methods. Turkish Journal of Electrical Engineering and Computer Sciences, 26(1), 1–10. https://doi.org/10.3906/elk-1611-235
DOI: https://doi.org/10.3906/elk-1611-235   Google Scholar

Thaiparnit, S., Kritsanasung, S., & Chumuang, N. (2019). A Classification for Patients with Heart Disease Based on Hoeffding Tree. JCSSE 2019 – 16th International Joint Conference on Computer Science and Software Engineering: Knowledge Evolution Towards Singularity of Man-Machine Intelligence (pp. 352–357). IEEE. https://doi.org/10.1109/JCSSE.2019.8864158
DOI: https://doi.org/10.1109/JCSSE.2019.8864158   Google Scholar

Wei, S., Zhao, X., & Miao, C. (2018). A comprehensive exploration to the machine learning techniques for diabetes identification. IEEE World Forum on Internet of Things, WF-IoT 2018 - Proceedings, (pp. 291–295). IEEE.
  Google Scholar

https://doi.org/10.1109/WF-IOT.2018.8355130
DOI: https://doi.org/10.1109/WF-IoT.2018.8355130   Google Scholar

Wu, C. C., Yeh, W. C., Hsu, W. D., Islam, M. M., Nguyen, P. A., Poly, T. N., Wang, Y. C., Yang, H. C., & Li, Y. C. (2019). Prediction of fatty liver disease using machine learning algorithms. Computer Methods and Programs in Biomedicine, 170, 23–29. https://doi.org/10.1016/J.CMPB.2018.12.032
DOI: https://doi.org/10.1016/j.cmpb.2018.12.032   Google Scholar

Zaman, M., Kaul, S., & Ahmed, M. (2020). Analytical comparison between the information gain and gini index using historical geographical data. International Journal of Advanced Computer Science and Applications, 11(5), 429–440. https://doi.org/10.14569/IJACSA.2020.0110557
DOI: https://doi.org/10.14569/IJACSA.2020.0110557   Google Scholar

Zaman, M., Quadri, S. M. K., & Butt, M. A. (2012). Information translation: A practitioners approach. Lecture Notes in Engineering and Computer Science, 1, 45–47.
  Google Scholar

Download


Published
2022-03-30

Cited by

RIYAZ, L., BUTT, M. A. ., & ZAMAN, M. . (2022). IMPROVING CORONARY HEART DISEASE PREDICTION BY OUTLIER ELIMINATION. Applied Computer Science, 18(1), 70–88. https://doi.org/10.23743/acs-2022-06

Authors

Lubna RIYAZ 
lubna.riyaz122@gmail.com
PG Department of Computer Sciences, University of Kashmir, Srinagar India

Authors

Muheet Ahmed BUTT 

PG Department of Computer Sciences, University of Kashmir, Srinagar India

Authors

Majid ZAMAN 

Directorate of IT & SS, University of Kashmir, Srinagar India

Statistics

Abstract views: 233
PDF downloads: 82


License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.