A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION

Nancy WOODS

chyn.woods@gmail.com
University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan (Nigeria)

Gideon BABATUNDE


* University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan (Nigeria)

Abstract

Effective decision-making in industry conditions requires access and proper presentation of manufacturing data on the realised manufacturing process. Although the frequently applied ERP systems allow for recording economic events, their potential for decision support is limited. The article presents an original system for reporting manufacturing data based on Business Intelligence technology as a support for junior and middle management. As an example a possibility of utilising data from ERP systems to support decision-making in the field of purchases and logistics in  small and medium enterprises.


Keywords:

Spoken Language Recognition, Computer Vision, Image Recognition, CNN

Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 22(10), 1533–1545. https://doi.org/10.1109/taslp.2014.2339736
DOI: https://doi.org/10.1109/TASLP.2014.2339736   Google Scholar

Adami, A., & Hermansky, H. (2003). Segmentation of speech for speaker and language recognition. EUROSPEECH-2003 (pp. 841–844). Geneva. Retrieved from https://www.academia.edu/32317887/Segmentation_of_speech_for_speaker_and_language_recognition
DOI: https://doi.org/10.21437/Eurospeech.2003-189   Google Scholar

Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., ... Narang, S. (2015). Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. CoRR, abs/1512.02595. Retrieved from https://arxiv.org/abs/1512.02595v1
  Google Scholar

Ashby, M., & Maidment, J. (2005). Introducing phonetic science. Cambridge University Press.
DOI: https://doi.org/10.1017/CBO9780511808852   Google Scholar

Bartz, C., Herold, T., Yang, H., & Meinel, C. (2017). Language Identification Using Deep Convolutional Recurrent Neural Networks. In D. Liu, S. Xie, Y. Li, D. Zhao, & E. El-Alfy (Eds.), Neural Information Processing ICONIP 2017. Lecture Notes in Computer Science (vol. 10639). Springer. https://doi.org/10.1007/978-3-319-70136-3_93
DOI: https://doi.org/10.1007/978-3-319-70136-3_93   Google Scholar

Boussard, J., Deveau, A., & Pyron, J. (2017). Methods for Spoken Language Identification. Retrieved from http://cs229.stanford.edu/proj2017/final-reports/5239784.pdf
  Google Scholar

Eberhard, D. M., Simons, G. F., & Fennig, C. D. (Eds.). (2020). Ethnologue: Languages of the World. Retrieved from http://www.ethnologue.com
  Google Scholar

Kirchhoff, K. (2006). Language characteristics. In T. Schultz, & K. Kirchhoff (Eds.), Multilingual Speech Processing (pp. 5–33). Elsevier.
DOI: https://doi.org/10.1016/B978-012088501-5/50005-6   Google Scholar

Li, H., Ma, B., & Lee, K. A. (2013). Spoken Language Recognition: From Fundamentals to Practice. Proceedings of the IEEE, 101(5), 1136–1159. https://doi.org/10.1109/JPROC.2012.2237151
DOI: https://doi.org/10.1109/JPROC.2012.2237151   Google Scholar

Muthusamy, Y. K., Cole, R., & Oshika, B. (1992). The OGI multi-language telephone speech corpus. Int. Conf. Spoken Lang. Process, 895-898. Retrieved from https://pdfs.semanticscholar.org/aad7/274fdd57191e89f9df2880a50ec14581d671.pdf
DOI: https://doi.org/10.21437/ICSLP.1992-276   Google Scholar

Navratil, J. (2001). Spoken language recognition A step toward multilinguality in speech processing. IEEE Trans. Speech Audio Process, 9(6), 678–685. https://doi.org/10.1109/89.943345
DOI: https://doi.org/10.1109/89.943345   Google Scholar

Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Proc. Interspeech 2019 (pp. 2613–2617). https://doi.org/10.21437/interspeech.2019-2680
DOI: https://doi.org/10.21437/Interspeech.2019-2680   Google Scholar

Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech re-synthesis. Journal of Acoustical Society of America, 105(1), 512–521. https://doi.org/10.1121/1.424522
DOI: https://doi.org/10.1121/1.424522   Google Scholar

Safitri, N. E., Zahra, A., & Adriani, M. (2016). Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Javanese Languages. Procedia Computer Science 81 (pp. 182–187). Elsevier. https://doi.org/10.1016/j.procs.2016.04.047
DOI: https://doi.org/10.1016/j.procs.2016.04.047   Google Scholar

Sugiyama, M. (1991). Automatic language recognition using acoustic features. International Conference on Acoustics, Speech, and Signal Processing (pp. 813–816). Toronto. https://doi.org/10.1109/icassp.1991.150461
DOI: https://doi.org/10.1109/ICASSP.1991.150461   Google Scholar

Torres-Carrasquillo, P., Singer, E., Kohler, M., Greene, R., Reynolds, D., & Deller, J. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In ICSLP-2002 (pp. 89–92). Denver. https://doi.org/10.1109/icassp.2002.5743828
DOI: https://doi.org/10.21437/ICSLP.2002-74   Google Scholar

Zhao, J., Shu, H., Zhang, L., Wang, X., Gong, Q., & Li, P. (2008). Cortical competition during language discrimination. NeuroImage, 43(3), 624–633. https://doi.org/10.1016/j.neuroimage.2008.07.025
DOI: https://doi.org/10.1016/j.neuroimage.2008.07.025   Google Scholar

Zissman, M. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44. https://doi.org/10.1109/icassp.1993.319323
DOI: https://doi.org/10.1109/TSA.1996.481450   Google Scholar

Zissman, M. A. (1993). Automatic language identification using Gaussian mixture and hidden Markov models. IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. 399–402). IEEE. https://doi.org/10.1109/tsa.1996.481450
DOI: https://doi.org/10.1109/ICASSP.1993.319323   Google Scholar

Download


Published
2020-09-30

Cited by

WOODS, N. ., & BABATUNDE, G. . (2020). A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION. Applied Computer Science, 16(3), 56–68. https://doi.org/10.23743/acs-2020-21

Authors

Nancy WOODS 
chyn.woods@gmail.com
University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan Nigeria

Authors

Gideon BABATUNDE 

* University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan Nigeria

Statistics

Abstract views: 216
PDF downloads: 29


License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.


Similar Articles

<< < 1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.