PORÓWNANIE ALGORYTMÓW OPTYMALIZACJI KLASYFIKATORA CZASOWEGO DO SYSTEMU ROZPOZNAWANIA MOWY

Yedilkhan Amirgaliyev

amir_ed@mail.ru
1 Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan, 2 Suleyman Demirel University, Almaty, Kazakhstan (Kazachstan)
http://orcid.org/0000-0002-6528-0619

Kuanyshbay Kuanyshbay


1 Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan, 2 Suleyman Demirel University, Almaty, Kazakhstan (Kazachstan)
http://orcid.org/0000-0001-5952-8609

Aisultan Shoiynbek


1 Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan, 2 Suleyman Demirel University, Almaty, Kazakhstan (Kazachstan)
http://orcid.org/0000-0002-9328-8300

Abstrakt

W artykule dokonano oceny i porównania wydajności trzech znanych algorytmów optymalizacyjnych (Adagrad, Adam, Momentum) w celu przyspieszenia treningu sieci neuronowej algorytmu CTC do rozpoznawania mowy. Dla algorytmów CTC wykorzystano rekurencyjną sieć neuronową, w szczególności LSTM, która jest efektywnym i często używanym modelem. Dane zostały pobrane z wydziału VCTK Uniwersytetu w Edynburgu. Wyniki algorytmów optymalizacyjnych zostały ocenione na podstawie wskaźników Label error i CTC loss.


Słowa kluczowe:

rekurencyjna sieć neuronowa, metody wyszukiwania, akustyka, język modelowania systemów

Amirgaliev Y., Hahn M., Mussabayev T.: The speech signal segmentation algorithm using pitch synchronous analysis. Journal Open Computer Science 7(1)/2017, 1–8.
DOI: https://doi.org/10.1515/comp-2017-0001   Google Scholar

Andrychowicz M., Denil M., Colmenarejo S.G., Hoffman M.W., Pfau D., Schaul T., Shillingford B., de Freitas N.: Learning to learn by gradient descent by gradient descent. 30th Conference on Neural Information Processing Systems NIPS 2016.
  Google Scholar

Bahdanau D., Cho K., Bengio Y.: Neural machine translation by jointly learning to align and translate. Proc. ICLR, 2015.
  Google Scholar

Bengio Y., Ducharme R., Vincent P., Jauvin C.: A Neural Probabilistic Language Model. Journal of Machine Learning Research 3/2003, 1137–1155.
  Google Scholar

Bottou L.: Large-Scale Machine Learning with Stochastic Gradient Descent. NEC Labs America, Princeton.
  Google Scholar

Duchi J., Hazan E., Singer Y.: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12/2011, 2121–2159.
  Google Scholar

Gales M., Young S.: The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing 1(3)/2007, 195–304.
DOI: https://doi.org/10.1561/2000000004   Google Scholar

Graves A., Fernandez S., Gomez F., Schmidhuber J.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006.
DOI: https://doi.org/10.1145/1143844.1143891   Google Scholar

Graves A., Jaitly N.: Towards End-to-End Speech Recognition with Recurrent Neural Networks. Proceedings of the 31st International Conference on Machine Learning 2014.
  Google Scholar

Kingma D.P., Ba J.: Adam: A Method For Stochastic Optimization. Proc. 3rd International Conference for Learning Representations. 2015 arXiv:1412.6980v9.
  Google Scholar

Loizou N., Richtarik P.: Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods. 2017, arXiv:1712.09677v2
  Google Scholar

Mussabayev R.R., Amirgaliyev N., Tairova A.T., Mussabayev T.R., Koibagarov K.C.: The technology for the automatic formation of the personal digital voice pattern. Application of Information and Communication Technologies AICT 2016.
DOI: https://doi.org/10.1109/ICAICT.2016.7991733   Google Scholar

Schuster M., Paliwal K.K.: Bidirectional recurrent neural networks. Signal Processing. IEEE Transactions 45(11)/1997, 2673–2681.
DOI: https://doi.org/10.1109/78.650093   Google Scholar

Sutskever I., Vinyals O., Le Q.V.: Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 2014, 3104–3112.
  Google Scholar

Wiseman S., Rush A.M.: Sequence-to-Sequence Learning as Beam-Search Optimization. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016.
DOI: https://doi.org/10.18653/v1/D16-1137   Google Scholar

Yu D., Li J.: Recent Progresses in Deep Learning based Acoustic Models. Tencent AI Lab, Microsoft AI and Research, 2018.
  Google Scholar


Opublikowane
2019-09-26

Cited By / Share

Amirgaliyev, Y., Kuanyshbay, K., & Shoiynbek, A. (2019). PORÓWNANIE ALGORYTMÓW OPTYMALIZACJI KLASYFIKATORA CZASOWEGO DO SYSTEMU ROZPOZNAWANIA MOWY. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 9(3), 54–57. https://doi.org/10.35784/iapgos.234

Autorzy

Yedilkhan Amirgaliyev 
amir_ed@mail.ru
1 Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan, 2 Suleyman Demirel University, Almaty, Kazakhstan Kazachstan
http://orcid.org/0000-0002-6528-0619

Autorzy

Kuanyshbay Kuanyshbay 

1 Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan, 2 Suleyman Demirel University, Almaty, Kazakhstan Kazachstan
http://orcid.org/0000-0001-5952-8609

Autorzy

Aisultan Shoiynbek 

1 Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan, 2 Suleyman Demirel University, Almaty, Kazakhstan Kazachstan
http://orcid.org/0000-0002-9328-8300

Statystyki

Abstract views: 262
PDF downloads: 311