Biography | Experiences | Publications | Interests | Service

Fangxun Shu

I am currently working at Bytedance, collaborating with Prof. Si Liu, Hongsheng Li, and Cihang Xie. My research primarily focuses on efficient multimodal large language models (MLLMs), exploring innovative architectures such as Mixture of Experts (MoE) to enhance model scalability and efficiency, as well as advanced training paradigms such as knowledge distillation to optimize performance while reducing computational costs. Ultimately, I aim to create powerful yet resource-efficient models that push the boundaries of multimodal intelligence.

News

May. 2025 - 1 papers accepted to ACL'25. (T2I-FactualBench)

Jan. 2025 - 3 papers accepted to ICLR'25.(LLaVA-MoD, ReKV, ARM)

Dec. 2024 - 2 papers accepted to AAAI'25. (MARS, HSA-DPO)

May. 2024 - 1 papers accepted to TMM'24. (MAC)

Experiences

Bytedance

Researcher

Jun.2025 - Current

Alibaba Group

Researcher

Apr.2021 - Jun.2025

Bytedance AI Lab

Research Intern

Apr.2020 - Oct.2020

Megvii

Research Intern

Dec.2019 - Mar.2020

Publications

Full Publications: Google Scholar

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi,

Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Si Liu, Hongsheng Li, Hao Jiang
International Conference on Learning Representations (ICLR), 2025
arXiv / code

MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

Fangxun Shu, Biaolong Chen, Yue Liao, Jinqiao Wang, Si Liu
IEEE Transactions on Multimedia (TMM), 2024
camera-ready

Audio-Visual LLM for Video Understanding

Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie
ICCV What is Next in Multimodal Foundation Models Workshop, 2025
arXiv

Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data

Lei Zhang*, Fangxun Shu*, Tianyang Liu, Sucheng Ren, Hao Jiang, Cihang Xie
Tech report, 2024
arXiv

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Wanggui He*, Siming Fu*, Mushui Liu*, Xierui Wang*, Wenyi Xiao*, Fangxun Shu*, Yi Wang,

Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, LeiLei Gan, Hao Jiang
AAAI Conference on Artificial Intelligence (AAAI), 2025
arXiv

Interests

Mulitimodal Large Language Models, including: (1) efficient architecture design and training paradigm (2) effective alignment and reasoning.

Services

Reviewer for CVPR, ICLR, NIPS,and ICML.