AUTOMATIC IDENTIFICATION OF DYSPHONIAS USING MACHINE LEARNING ALGORITHMS
Miguel Angel BELLO RIVERA
podriaservirte@gmail.comTecnológico Nacional de México (Mexico)
https://orcid.org/0009-0003-6641-3094
Carlos Alberto REYES GARCÍA
National Institute of Astrophysics, Optics, and Electronics (INAOE) (Mexico)
Tania Cristal TALAVERA ROJAS
La Universidad Autónoma de Asunción (UAA) (Paraguay)
https://orcid.org/0000-0001-7656-3115
Perfecto Malaquías QUINTERO FLORES
El Tecnológico Nacional de México/Instituto Tecnológico de Apizaco (Mexico)
https://orcid.org/0000-0001-7651-4364
Rodolfo Eleazar PÉREZ LOAIZA
El Tecnológico Nacional de México/Instituto Tecnológico de Apizaco (Mexico)
https://orcid.org/0000-0002-6500-258X
Abstract
Dysphonia is a prevalent symptom of some respiratory diseases that affects voice quality, even for prolonged periods. For its diagnosis, speech-language pathologists make use of different acoustic parameters to perform objective evaluations on patients and determine the type of dysphonia that affects them, such as hyperfunctional and hypofunctional dysphonia, which is important because each type requires a different treatment. In the field of artificial intelligence this problem has been addressed through the use of acoustic parameters that are used as input data to train machine learning and deep learning models. However, its purpose is usually to identify whether a patient is ill or not, making binary classifications between healthy voices and voices with dysphonia, but not between dysphonias. In this paper, harmonic-to-noise ratio, cepstral peak prominence-smoothed, zero crossing rate and the means of the Mel frequency cepstral coefficients (2-19) are used to make multiclass classification of voices with euphony, hyperfunction and hypofunction by means of six machine learning algorithms, which are: Random Forest, K nearest neighbors, Logistic regression, Decision trees, Support vector machines and Naive Bayes. In order to evaluate which of them presents a better performance to identify the three voice classes, bootstrap.632 was used. It is concluded that the best confidence interval ranges from 87% to 92%, in terms of accuracy for the K Nearest Neighbors model. Results can be implemented in the development of a complementary application for the clinical diagnosis or monitoring of a patient under the supervision of a specialist.
Keywords:
dysphonia, machine learning, multiclass classification, voice signalReferences
Altayeb, M., & Al-Ghraibah, A. (2022). Classification of three pathological voices based on specific features groups using support vector machine. International Journal of Electrical and Computer Engineering (IJECE), 12(1), 946-956. https://doi.org/10.11591/ijece.v12i1.pp946-956
DOI: https://doi.org/10.11591/ijece.v12i1.pp946-956
Google Scholar
Behlau, M., & Pontes, P. (1989). Avaliação Global da Voz. Editora Paulista Publicações Médicas.
Google Scholar
Behlau, M., Madazio, G., Feijó, D., Azevedo, R., Gielow, I., & Rehder, M. (2005). Perfeccionamiento vocal y tratamiento fonoaudiológico de las disfonías. In M. Behlau (Eds.), Voz: O livro do especialista. Thieme Revinter.
Google Scholar
Celdrán, E. M. (2015). Naturaleza fonética de la consonante ‘ye’en español. Normas: revista de estudios lingüísticos hispánicos, 5, 117-131. https://doi.org/10.7203/Normas.5.6825
DOI: https://doi.org/10.7203/Normas.5.6825
Google Scholar
Cesari, U., De Pietro, G., Marciano, E., Niri, C., Sannino, G., & Verde, L. (2018). A new database of healthy and pathological voices. Computers & Electrical Engineering, 68, 310-321. https://doi.org/10.1016/j.compeleceng.2018.04.008
DOI: https://doi.org/10.1016/j.compeleceng.2018.04.008
Google Scholar
Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28. https://doi.org/10.38094/jastt20165
DOI: https://doi.org/10.38094/jastt20165
Google Scholar
Chen, L., & Chen, J. (2022). Deep neural network for automatic classification of pathological voice signals. Journal of Voice, 36(2), 288.e15-288.e24. https://doi.org/10.1016/j.jvoice.2020.05.029
DOI: https://doi.org/10.1016/j.jvoice.2020.05.029
Google Scholar
Daniels, L., & Minot, N. (2019). An introduction to statistics and data analysis using Stata®: From research design to final report. Sage Publications.
Google Scholar
Descamps, G., Verset, L., Trelcat, A., Hopkins, C., Lechien, J. R., Journe, F., & Saussez, S. (2020). ACE2 protein landscape in the head and neck region: the conundrum of SARS-CoV-2 infection. Biology, 9(8), 235. https://doi.org/10.3390%2Fbiology9080235
DOI: https://doi.org/10.3390/biology9080235
Google Scholar
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78(382), 316-331. https://doi.org/10.2307/2288636
DOI: https://doi.org/10.1080/01621459.1983.10477973
Google Scholar
Farias, P. (2016). Guía clínica para el especialista en laringe y voz. Librería Akadia Editorial.
Google Scholar
Flórez-Gómez, A. F., Orozco-Arroyave, J. R., & Roldán-Vasco, S. (2022). Correlación entre espacios de características acústicas del habla y trastornos clínicos de la voz en pacientes con disfagia. TecnoLógicas, 25(53), e2220. https://doi.org/10.22430/22565337.2220
DOI: https://doi.org/10.22430/22565337.2220
Google Scholar
Hassan, A., Shahin, I., & Alsabek, M. B. (2020). COVID-19 detection system using recurrent neural networks. 2020 International conference on communications, computing, cybersecurity, and informatics (CCCI) (pp. 1-5). IEEE. https://doi.org/10.1109/CCCI49893.2020.9256562
DOI: https://doi.org/10.1109/CCCI49893.2020.9256562
Google Scholar
Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., Schiergens, T. S., Herrler, G., Wu, N.-H., Nitsche, A., Müller, M. A., Drosten, C., & Pöhlmann, S. (2020). SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell, 181(2), 271-280.e8. https://doi.org/10.1016/j.cell.2020.02.052
DOI: https://doi.org/10.1016/j.cell.2020.02.052
Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. Springer.
DOI: https://doi.org/10.1007/978-1-4614-7138-7
Google Scholar
López, J. A. P. (1997). Los trastornos de la voz en el personal docente de logroño. Estudio de la voz en los profesionales de la enseñanza. (Doctoral dissertation, Universidad de Navarra).
Google Scholar
López, J. A. P. (2000). Estudio de la prevalencia de los trastornos de la voz en el personal docente de Logroño. Zubía, 12, 111-145.
Google Scholar
Murphy, K. P. (2006). Naive bayes classifiers. University of British Columbia, 18(60), 1-8.
Google Scholar
Núñez-Batalla, F., Cartón-Corona, N., Vasile, G., García-Cabo, P., Fernández-Vanes, L., & Llorente-Pendás, J. L. (2019). Validez de las medidas del pico cepstral para la valoración objetiva de la disfonía en sujetos de habla hispana. Acta Otorrinolaringológica Española, 70(4), 222-228. https://doi.org/10.1016/j.otoeng.2018.04.005
DOI: https://doi.org/10.1016/j.otorri.2018.04.008
Google Scholar
Radha, N., Sachin Madhavan, R. M., & Sameera holy, S. (2021). Parkinson’s Disease detection using Machine Learning Techniques. International Journal of Early Childhood Special Education (INT-JECSE), 30(2), 543. https://doi.org/10.24205/03276716.2020.4055
Google Scholar
Rivera, M. A. B., Flores, P. M. Q., Loaiza, R. E. P., & Rivera, L. G. (2022). Analysis of audio signals using deep learning algorithms applied to COVID diagnostic systems. 2022 IEEE Mexican International Conference on Computer Science (ENC) (pp. 1-6). IEEE. https://doi.org/10.1109/ENC56672.2022.9882932
DOI: https://doi.org/10.1109/ENC56672.2022.9882932
Google Scholar
Schonlau, M., & Zou, R. Y. (2020). The random forest algorithm for statistical learning. The Stata Journal, 20(1), 3-29. https://doi.org/10.1177/1536867X20909688
DOI: https://doi.org/10.1177/1536867X20909688
Google Scholar
Taunk, K., De, S., Verma, S., & Swetapadma, A. (2019). A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 1255-1260). IEEE. https://doi.org/10.1109/ICCS45141.2019.9065747
DOI: https://doi.org/10.1109/ICCS45141.2019.9065747
Google Scholar
Verdaguer, J. M., Górriz, C., Prim, M. P., del Palacio, A. J., Gavilán, J., & de Diego, J. I. (2008). Análisis de los cambios en el espectrograma tras la intubación endotraqueal. Acta Otorrinolaringológica Española, 59(5), 217-222. https://doi.org/10.1016/S0001-6519(08)73298-9
DOI: https://doi.org/10.1016/S0001-6519(08)73298-9
Google Scholar
Verde, L., De Pietro, G., Alrashoud, M., Ghoneim, A., Al-Mutib, K. N., & Sannino, G. (2019). Leveraging artificial intelligence to improve voice disorder identification through the use of a reliable mobile app. IEEE Access, 7, 124048-124054. https://doi.org/10.1109/ACCESS.2019.2938265
DOI: https://doi.org/10.1109/ACCESS.2019.2938265
Google Scholar
Woldert-Jokisz, B. (2007). Saarbruecken voice database. Computer Science.
Google Scholar
Authors
Miguel Angel BELLO RIVERApodriaservirte@gmail.com
Tecnológico Nacional de México Mexico
https://orcid.org/0009-0003-6641-3094
Authors
Carlos Alberto REYES GARCÍANational Institute of Astrophysics, Optics, and Electronics (INAOE) Mexico
Authors
Tania Cristal TALAVERA ROJASLa Universidad Autónoma de Asunción (UAA) Paraguay
https://orcid.org/0000-0001-7656-3115
Authors
Perfecto Malaquías QUINTERO FLORESEl Tecnológico Nacional de México/Instituto Tecnológico de Apizaco Mexico
https://orcid.org/0000-0001-7651-4364
Authors
Rodolfo Eleazar PÉREZ LOAIZAEl Tecnológico Nacional de México/Instituto Tecnológico de Apizaco Mexico
https://orcid.org/0000-0002-6500-258X
Statistics
Abstract views: 435PDF downloads: 200
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
Similar Articles
- Nataliya SHABLIY, Serhii LUPENKO, Nadiia LUTSYK, Oleh YASNIY, Olha MALYSHEVSKA, KEYSTROKE DYNAMICS ANALYSIS USING MACHINE LEARNING METHODS , Applied Computer Science: Vol. 17 No. 4 (2021)
- Baigo HAMUNA, Sri PUJIYATI, Jonson Lumban GAOL, Totok HESTIRIANOTO, CLASSIFICATION AND PREDICTION OF BENTHIC HABITAT FROM SCIENTIFIC ECHOSOUNDER DATA: APPLICATION OF MACHINE LEARNING ALGORITHMS , Applied Computer Science: Vol. 20 No. 4 (2024)
- Islam MOHAMED, Mohamed EL-WAKAD, Khaled ABBAS, Mohamed ABOAMER, Nader A. Rahman MOHAMED, PUPIL DIAMETER AND MACHINE LEARNING FOR DEPRESSION DETECTION: A COMPARATIVE STUDY WITH DEEP LEARNING MODELS , Applied Computer Science: Vol. 20 No. 4 (2024)
- Jerzy JÓZWIK, Magdalena ZAWADA-MICHAŁOWSKA, Monika KULISZ, Paweł TOMIŁO, Marcin BARSZCZ, Paweł PIEŚKO, Michał LELEŃ, Kamil CYBUL, MODELING THE OPTIMAL MEASUREMENT TIME WITH A PROBE ON THE MACHINE TOOL USING MACHINE LEARNING METHODS , Applied Computer Science: Vol. 20 No. 2 (2024)
- Robert KARPIŃSKI, Przemysław KRAKOWSKI, Józef JONAK, Anna MACHROWSKA, Marcin MACIEJEWSKI, COMPARISON OF SELECTED CLASSIFICATION METHODS BASED ON MACHINE LEARNING AS A DIAGNOSTIC TOOL FOR KNEE JOINT CARTILAGE DAMAGE BASED ON GENERATED VIBROACOUSTIC PROCESSES , Applied Computer Science: Vol. 19 No. 4 (2023)
- Amina ALYAMANI, Oleh YASNIY, CLASSIFICATION OF EEG SIGNAL BY METHODS OF MACHINE LEARNING , Applied Computer Science: Vol. 16 No. 4 (2020)
- Anitha Rani PALAKAYALA, Kuppusamy P, A QUALITATIVE AND QUANTITATIVE APPROACH USING MACHINE LEARNING AND NON-MOTOR SYMPTOMS FOR PARKINSON’S DISEASE CLASSIFICATION. A HIERARCHICAL STUDY , Applied Computer Science: Vol. 20 No. 3 (2024)
- Shahil SHARMA, Rajnesh LAL, Bimal KUMAR, DEVELOPING MACHINE LEARNING APPLICATION FOR EARLY CARDIOVASCULAR DISEASE (CVD) RISK DETECTION IN FIJI: A DESIGN SCIENCE APPROACH , Applied Computer Science: Vol. 20 No. 3 (2024)
- Hawkar ASAAD, Shavan ASKAR, Ahmed KAKAMIN, Nayla FAIQ, EXPLORING THE IMPACT OF ARTIFICIAL INTELLIGENCE ON HUMANROBOT COOPERATION IN THE CONTEXT OF INDUSTRY 4.0 , Applied Computer Science: Vol. 20 No. 2 (2024)
- Puja SARAF, Jayantrao PATIL, Rajnikant WAGH, ENHANCING TOMATO LEAF DISEASE DETECTION THROUGH MULTIMODAL FEATURE FUSION , Applied Computer Science: Vol. 20 No. 4 (2024)
You may also start an advanced similarity search for this article.