ELSHAMY, Ghada; ALFONSE, Marco; HEGAZY, Islam; AREF, Mostafa. A multi-modal transformer-based model for generative visual dialog system. Applied Computer Science, [S. l.], v. 21, n. 1, p. 1–17, 2025. DOI: 10.35784/acs_6856. Disponível em: https://ph.pollub.pl/index.php/acs/article/view/6856. Acesso em: 15 jul. 2026.