K4F-Net: Lightweight multi-view speech emotion recognition with Kronecker convolution and cross-language robustness

Main Article Content

DOI

Paweł POWROŹNIK

p.powroznik@pollub.pl

Maria SKUBLEWSKA-PASZKOWSKA

maria.paszkowska@pollub.pl

Abstract

Speech emotion recognition has been gaining importance for years, but most of the existing models are based on a single signal representation or conventional convolutional layers with a large number of parameters. In this study, we propose a compact multi-representation architecture that combines four images of the speech signal: spectrogram, MFCC features, wavelet scalogram, and fuzzy transform maps. Furthermore, the application of Kronecker convolution for efficient feature extraction with an extended receptive field is shown. Another novelty is cross-fusion, a mechanism that models interactions between branches without significantly increasing complexity. The core of the network is complemented by a transformer-based block and language-independent adversarial learning. The model is evaluated in a scenario of quadruple cross-lingual tests covering four data corpora for four languages: English, German, Polish and Danish. It is trained on three languages and tested on the fourth, achieving a weighted accuracy of 96.3%. In addition, the influence of selected activation functions on the classification quality is investigated. Ablation analysis shows that removing the Kronecker convolution reduces the efficiency by 5.6%, and removing the fuzzy transform representation by 4.7%. The obtained results indicate that the combination of Kronecker convolution, multi-channel fusion, and adversarial learning is a promising direction for building universal, language-independent emotion recognition systems.

Keywords:

features fusion, speech emotion recognition, Kronecker convolution, speech signal processing

References

Article Details

POWROŹNIK, P., & SKUBLEWSKA-PASZKOWSKA, M. (2025). K4F-Net: Lightweight multi-view speech emotion recognition with Kronecker convolution and cross-language robustness. Applied Computer Science, 21(4), 110–126. https://doi.org/10.35784/acs_8130