Sara Sarto

I completed my Ph.D. at the University of Modena and Reggio Emilia, conducting my research at AImageLab, co-advised by Prof. Rita Cucchiara, Prof.Lorenzo Baraldi and Prof. Marcella Cornia.

My current research focuses on cutting-edge multimodal architectures and their integration with advanced retrieval techniques. I have extensive experience with NLP tasks and foundation models such as CLIP, and have worked extensively with vision-and-language architectures, primarily focusing on their evaluation and addressing the problem of hallucination. Recently, I have been working on the training and development of multimodal large language models, including LLaVA and its derivatives.

During my Ph.D., I also spent six months as a research intern at Amazon in London.

Recent News

→ View All News

Some Important Milestones

View Full CV (PDF)

2026
PhD Dissertation

AImageLab, University of Modena and Reggio Emilia
2025
Doctoral Consortium at CVPR 2025
2025
Amazon Research Internship

6-month internship at Amazon London.
2022
Best Student Paper Award (CBMI)

"Retrieval-augmented Transformer for Image Captioning”
2022
Starting PhD Student

AImageLab, University of Modena and Reggio Emilia
2022
Master Thesis Award

Premio alla Memoria Davide Rabotti
2022
M.S in Artificial Intelligence

University of Modena and Reggio Emilia.

All News

Paper Accepted @CVPR

Publication date: 17/03/2025

🎉 Happy to share that our paper "ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering" has been accepted at CVPR 2026 in Denver! 🎉

Paper Accepted @CVPR

Publication date: 22/11/2025

🎉 Happy to share that our paper "Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval" has been accepted at CVPR 2025 in Nashville! 🎉

EuroHPC Extreme Scale grant

Publication date: 10/04/2025

Our project "VISTA – Versatile Intelligent Systems for Tailored and Adaptive Next-Generation Multimodal AI" was accepted for the EuroHPC Extreme Scale grant, with an allocation of almost 1M GPU hours. Read the news on the UNIMORE website

Introducing LLaVA-MORE

Publication date: 08/03/2024

🎉 We are introducing LLaVA-MORE, a family of models that enhances LLaVA by integrating LLaMA 3.1 as the language model. Check out our Github repo! 🎉

Participation to National and European Projects

ELLIOT Project

2025 – Ongoing

Contributing to the European initiative for developing multimodal AI systems.

PRIN 2022: Vision-Language Reasoning

2023 – Ongoing

Participating in the Italian National Research Project on multimodal reasoning and vision-language alignment.

Recent News

Some Important Milestones

PhD Dissertation

Doctoral Consortium at CVPR 2025

Amazon Research Internship

Best Student Paper Award (CBMI)

Starting PhD Student

Master Thesis Award

M.S in Artificial Intelligence

All News

Paper Accepted @CVPR

Paper Accepted @CVPR

EuroHPC Extreme Scale grant

Introducing LLaVA-MORE

ELLIOT Project

PRIN 2022: Vision-Language Reasoning