A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION

Nancy  WOODS; Gideon  BABATUNDE

doi:10.23743/acs-2020-21

A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION

Nancy WOODS

chyn.woods@gmail.com
University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan (Nigeria)

Gideon BABATUNDE

* University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan (Nigeria)

DOI: https://doi.org/10.23743/acs-2020-21

Abstract

Effective decision-making in industry conditions requires access and proper presentation of manufacturing data on the realised manufacturing process. Although the frequently applied ERP systems allow for recording economic events, their potential for decision support is limited. The article presents an original system for reporting manufacturing data based on Business Intelligence technology as a support for junior and middle management. As an example a possibility of utilising data from ERP systems to support decision-making in the field of purchases and logistics in small and medium enterprises.

Keywords:

Spoken Language Recognition, Computer Vision, Image Recognition, CNN

References

Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 22(10), 1533–1545. https://doi.org/10.1109/taslp.2014.2339736
DOI: https://doi.org/10.1109/TASLP.2014.2339736 Google Scholar

Adami, A., & Hermansky, H. (2003). Segmentation of speech for speaker and language recognition. EUROSPEECH-2003 (pp. 841–844). Geneva. Retrieved from https://www.academia.edu/32317887/Segmentation_of_speech_for_speaker_and_language_recognition
DOI: https://doi.org/10.21437/Eurospeech.2003-189 Google Scholar

Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., ... Narang, S. (2015). Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. CoRR, abs/1512.02595. Retrieved from https://arxiv.org/abs/1512.02595v1
Google Scholar

Ashby, M., & Maidment, J. (2005). Introducing phonetic science. Cambridge University Press.
DOI: https://doi.org/10.1017/CBO9780511808852 Google Scholar

Bartz, C., Herold, T., Yang, H., & Meinel, C. (2017). Language Identification Using Deep Convolutional Recurrent Neural Networks. In D. Liu, S. Xie, Y. Li, D. Zhao, & E. El-Alfy (Eds.), Neural Information Processing ICONIP 2017. Lecture Notes in Computer Science (vol. 10639). Springer. https://doi.org/10.1007/978-3-319-70136-3_93
DOI: https://doi.org/10.1007/978-3-319-70136-3_93 Google Scholar

Boussard, J., Deveau, A., & Pyron, J. (2017). Methods for Spoken Language Identification. Retrieved from http://cs229.stanford.edu/proj2017/final-reports/5239784.pdf
Google Scholar

Eberhard, D. M., Simons, G. F., & Fennig, C. D. (Eds.). (2020). Ethnologue: Languages of the World. Retrieved from http://www.ethnologue.com
Google Scholar

Kirchhoff, K. (2006). Language characteristics. In T. Schultz, & K. Kirchhoff (Eds.), Multilingual Speech Processing (pp. 5–33). Elsevier.
DOI: https://doi.org/10.1016/B978-012088501-5/50005-6 Google Scholar

Li, H., Ma, B., & Lee, K. A. (2013). Spoken Language Recognition: From Fundamentals to Practice. Proceedings of the IEEE, 101(5), 1136–1159. https://doi.org/10.1109/JPROC.2012.2237151
DOI: https://doi.org/10.1109/JPROC.2012.2237151 Google Scholar

Muthusamy, Y. K., Cole, R., & Oshika, B. (1992). The OGI multi-language telephone speech corpus. Int. Conf. Spoken Lang. Process, 895-898. Retrieved from https://pdfs.semanticscholar.org/aad7/274fdd57191e89f9df2880a50ec14581d671.pdf
DOI: https://doi.org/10.21437/ICSLP.1992-276 Google Scholar

Navratil, J. (2001). Spoken language recognition A step toward multilinguality in speech processing. IEEE Trans. Speech Audio Process, 9(6), 678–685. https://doi.org/10.1109/89.943345
DOI: https://doi.org/10.1109/89.943345 Google Scholar

Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. Proc. Interspeech 2019 (pp. 2613–2617). https://doi.org/10.21437/interspeech.2019-2680
DOI: https://doi.org/10.21437/Interspeech.2019-2680 Google Scholar

Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech re-synthesis. Journal of Acoustical Society of America, 105(1), 512–521. https://doi.org/10.1121/1.424522
DOI: https://doi.org/10.1121/1.424522 Google Scholar

Safitri, N. E., Zahra, A., & Adriani, M. (2016). Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Javanese Languages. Procedia Computer Science 81 (pp. 182–187). Elsevier. https://doi.org/10.1016/j.procs.2016.04.047
DOI: https://doi.org/10.1016/j.procs.2016.04.047 Google Scholar

Sugiyama, M. (1991). Automatic language recognition using acoustic features. International Conference on Acoustics, Speech, and Signal Processing (pp. 813–816). Toronto. https://doi.org/10.1109/icassp.1991.150461
DOI: https://doi.org/10.1109/ICASSP.1991.150461 Google Scholar

Torres-Carrasquillo, P., Singer, E., Kohler, M., Greene, R., Reynolds, D., & Deller, J. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In ICSLP-2002 (pp. 89–92). Denver. https://doi.org/10.1109/icassp.2002.5743828
DOI: https://doi.org/10.21437/ICSLP.2002-74 Google Scholar

Zhao, J., Shu, H., Zhang, L., Wang, X., Gong, Q., & Li, P. (2008). Cortical competition during language discrimination. NeuroImage, 43(3), 624–633. https://doi.org/10.1016/j.neuroimage.2008.07.025
DOI: https://doi.org/10.1016/j.neuroimage.2008.07.025 Google Scholar

Zissman, M. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44. https://doi.org/10.1109/icassp.1993.319323
DOI: https://doi.org/10.1109/TSA.1996.481450 Google Scholar

Zissman, M. A. (1993). Automatic language identification using Gaussian mixture and hidden Markov models. IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. 399–402). IEEE. https://doi.org/10.1109/tsa.1996.481450
DOI: https://doi.org/10.1109/ICASSP.1993.319323 Google Scholar

Download

Published

2020-09-30

Cited by

WOODS, N. ., & BABATUNDE, G. . (2020). A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION. Applied Computer Science, 16(3), 56–68. https://doi.org/10.23743/acs-2020-21

Authors

Nancy WOODS
chyn.woods@gmail.com
University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan Nigeria

Authors

Gideon BABATUNDE

* University of Ibadan, Faculty of Science, Department of Computer Science, Oyo State Ibadan Nigeria

Statistics

Abstract views: 216
PDF downloads: 29

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.

A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION

Nancy WOODS

Gideon BABATUNDE

Abstract

Keywords:

References

Authors

Authors

Statistics

License

Most read articles by the same author(s)

Similar Articles

CURRENT ISSUE

Make a Submission

Lublin University of Technology Publishing House

Copyright