Yuxuan Luo

E-mail: 2401112141 (at) stu (dot) pku (dot) edu (dot) cn

I am Yuxuan Luo(罗宇轩), a second year Ph.D. student at Wangxuan Institute of Computer Science (WICT), Peking University, advised by Prof. Zhouhui Lian. I received my Bachelor’s degree in Artificial Intelligence from Yuanpei College, Peking University.

My research focuses on understanding and generating knowledge-intensive visual media. I study how images carry factual, disciplinary, and cultural information:

how to interpret dense knowledge images,
how to make generative models convey that knowledge precisely, and
how to measure the faithfulness and clarity of generated visuals.

This work connects unified multimodal understanding and generation with model reasoning and the broader question of whether generative systems function as world-models.

Technically, I focus on multimodal LLMs (mLLMs), diffusion and autoregressive image models, and post-training paradigms such as fine-tuning, LoRA/instruction tuning, and evaluation pipelines. I am actively seeking collaborations — if you’re interested in working with me on knowledge images, please contact me by e-mail.

news

Sep 19, 2025	MMMG was accepted by Neurips 2025! MMMG is a large-scale discipline-image benchmark designed to assess text-to-image (T2I) models on their ability to generate faithful and readable visuals. Our cases span 10 disciplines and 6 educational levels.
Jun 25, 2025	CalliReader: Contextualizing Chinese Calligraphy via an Embedding-aligned Vision Language Model was accepted by ICCV 2025! This paper proposes CalliReader, a novel VLM that solves Chinese Calligraphy Contextualization. We also release the first page-level Calligraphy dataset and CalliBench.
Apr 06, 2024	CalliRewrite: Recovering Handwriting Behaviors from Calligraphy Images without Supervision has been selected as a finalist for the IEEE ICRA 2024 Best Paper Award in Service Robotics! This research was conducted during my undergraduate studies under the guidance of Prof. Zhouhui Lian.

selected publications

Neurips 2025

MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning

Yuxuan Luo, Yuhui Yuan, Junwen Chen, and 6 more authors

arXiv preprint arXiv:2506.10963, 2025

PDF Code Website
ICCV 2025

CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model

Yuxuan Luo, Jiaqi Tang, Chenyi Huang, and 2 more authors

arXiv preprint arXiv:2503.06472, 2025

PDF Code
ICRA 2024 Best Paper Candidate

CalliRewrite: recovering handwriting behaviors from calligraphy images without supervision

Yuxuan Luo, Zekun Wu, and Zhouhui Lian

In 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024

PDF Code Website