Applying of machine learning in the construction of a voice-controlled interface on the example of a music player


The following paper presents the results of research on the impact of machine learning in the construction of a voice-controlled interface. Two different models were used for the analysys: a feedforward neural network containing one hidden layer and a more complicated convolutional neural network. What is more, a comparison of the applied models was presented. This comparison was performed in terms of quality and the course of training.


machine learning; neural network; speech recognition

[1] J. Ye, R. J. Povinelli, M. T. Johnson: „Phenome classification using naive Bayes classifier in reconstructed phase space”, IEEE Digital Signal Processing Workshop, 2002
[2] A. Sanchis, A. Juan, E. Vidal: „A Word-Based Naive Bayes Classifier for Confidence Estimation in Speech Recognition”, IEEE Transactions on audio, speech and language processing, vol. 20, NO. 2, 2012
[3] N. Smith, M. Gales: „Speech Recognition using SVMs”, Cambridge University Engineering Dept, 2002
[4] C. Ittichaichareon, S. Suksri, T. Yingthawornsuk: „Speech Recognition using MFCC”, International Conference on Computer Graphics, Simulation and Modeling , 2012
[5] W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. Torres-Carrasquillo: „Support vector machines for speaker and language recognition”, Computer Speech and Language 20 210–229, 2006
[6] A. Ganapathiraju, J. E. Hamaker, J. Picone: „Applications of Support Vector Machines to speech recognition”, IEEE Transactions on signal processing, vol 52, NO. 8, 2004
[7] K. Al Smadi, I. Trrad, T. Al Smadi: „ Artificial Intelligence for Speech Recognition Based on Neural Networks”, Journal of Signal and Information Processing, 2015, 6, 66-72, 2015
[8] W. Gevaert, G. Tsenov, V. Mladenov: „Neural networks used for speech recognition”,Journal of automatic control, University of Belgrade, vol. 20:1-7 , 2010
[9] M. Tunckanat, R. Kurban S. Sagiroglu: „Voice Recognition Based On Neural Networks”, IJCI Proceedings of International Conference on Signal Processing, ISSN 1304-2386, Volume:1, Number:2, 2003
[10] A. Ahad, A. Fayyaz, T. Mehmood: „Speech Recognition using Multilayer Perceptron”, Students Conference, ISCON apos:02. IEEE Volume 1, Issue, 16-17, 2002
[11] T. N. Sainath, C. Parada: „Convolutional Neural Networks for Small-footprint Keyword Spotting”, Interspeech, 2015
[12] K. J. Piczak: „Environmental Sound Classification With Convolutional Neural Networks”, IEEE International Workshop on Machine Learning For Signal Processing, 2015
[13] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu: „Convolutional Neural Networks for Speech Recognition”, IEEE/ACM Transactions on audio, speech and language processing, vol. 22, NO. 10, 2014
[14] Tadeusiewicz R.: Sieci neuronowe. Akademicka Oficyna Wydawnicza RM, 1993
[15] Nielsen M.: Neural Networks and Deep Learning. Determination Press, 2015
[16] Goodfellow I., Bengio Y. i Courville A., Deep Learning, 2016
[17] Wprowadzenie do kowolucyjnych sieci neuronowych, czerwiec 2019
[18] Strona projektu Tensorflow, czerwiec 2019
[19] Pete Warden: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition, 2018
[20] Repozytorium biblioteki Tensorflow, czerwiec 2019
[21] Dokumentacja biblioteki Seaborn, czerwiec 2019

Published : 2019-12-30

Basiakowski, J. (2019). Applying of machine learning in the construction of a voice-controlled interface on the example of a music player . Journal of Computer Sciences Institute, 13, 302-309.

Jakub Basiakowski
Lublin University of Technology  Poland