ELSHAMY, Ghada, et al. “A Multi-Modal Transformer-Based Model for Generative Visual Dialog System”.
Applied Computer Science
, vol. 21, no. 1, Mar. 2025, pp. 1-17, doi:10.35784/acs_6856.