I am currently working at Alibaba, collaborating with Prof. Si Liu, Hongsheng Li, and Cihang Xie. I had fun research experiences at Bytedance AI Lab (2020) and Megvii (2019). I obtained the master degree at Shanghai Jiao Tong University (2021) and bachelor degree at Nanjing University of Posts and Telecommunications (2017).
My research primarily focuses on efficient multimodal large language models (MLLMs), exploring innovative architectures such as Mixture of Experts (MoE) to enhance model scalability and efficiency, as well as advanced training paradigms such as knowledge distillation to optimize performance while reducing computational costs. Ultimately, I aim to create powerful yet resource-efficient models that push the boundaries of multimodal intelligence.
Experiences
Publications
Full Publications: Google Scholar
-
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi,
Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Si Liu, Hongsheng Li, Hao Jiang
arXiv / code
International Conference on Learning Representations (ICLR), 2025
-
MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu, Biaolong Chen, Yue Liao, Jinqiao Wang, Si Liu
camera-ready
IEEE Transactions on Multimedia (TMM), 2024
-
Audio-Visual LLM for Video Understanding
Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie
arXiv
Tech report, 2023
-
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He*, Siming Fu*, Mushui Liu*, Xierui Wang*, Wenyi Xiao*, Fangxun Shu*, Yi Wang,
Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, LeiLei Gan, Hao Jiang
arXiv
AAAI Conference on Artificial Intelligence (AAAI), 2025
-
Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data
Lei Zhang*, Fangxun Shu*, Tianyang Liu, Sucheng Ren, Hao Jiang, Cihang Xie
arXiv
Tech report, 2024
-
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Fangxun Shu,
Hao Jiang, Linchao Zhu
arXiv
AAAI Conference on Artificial Intelligence (AAAI), 2025
-
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di, Zhelun Yu, Guanghao Zhang, Haoyuan Li, Zhong Tao, Hao Cheng, Bolin Li,
Wanggui He, Fangxun Shu, Hao Jiang
arXiv
International Conference on Learning Representations (ICLR), 2025
Interests
Mulitimodal Large Language Models, including: (1) efficient architecture design and training paradigm (2) effective alignment and reasoning.
Services
Reviewer for CVPR, ICLR, NIPS,and ICML.