Selected Publications

CVPR 2025
Davide Caffagni*, Sara Sarto*, M. Cornia, L. Baraldi, R. Cucchiara · CVPR 2025
An approach enabling multimodal queries — image + text — to search multimodal document collections through a novel Transformer-based recurrent cell integrating textual and visual features across layers.
ICCV Workshop 2025
Federico Cocchi*, Nicholas Moratelli*, Davide Caffagni*, Sara Sarto*, M. Cornia, L. Baraldi, R. Cucchiara · ICCV Workshop 2025
A new family of MLLMs integrating modern language models with diverse visual backbones.
IJCAI 2025
Sara Sarto, M. Cornia, R. Cucchiara · IJCAI 2025
An overview of image captioning evaluation, discussing metric evolution, limitations, challenges from longer MLLM captions, and metric adaptability.
CVPR Workshop 2024
D. Caffagni*, F. Cocchi*, N. Moratelli*, Sara Sarto*, M. Cornia, L. Baraldi, R. Cucchiara · CVPR Workshop 2024
Integration of external document knowledge into an MLLM through hierarchical retrieval.
ACL 2024
D. Caffagni*, F. Cocchi*, L. Barsellotti*, N. Moratelli*, Sara Sarto*, L. Baraldi*, M. Cornia, L. Baraldi, R. Cucchiara · ACL Findings 2024
A comprehensive review of recent visual-based MLLMs, analyzing architectures, alignment strategies, and training techniques.
CBMI 2022
Sara Sarto*, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara · CBMI 2022
An image captioning approach with a kNN memory, with retrieval from an external corpus to aid the generation process.