photo

Email: shufangxun@gmail.com

Fangxun Shu

Fangxun Shu

I am currently working at Alibaba, collaborating with Prof. Si Liu, Hongsheng Li, and Cihang Xie. I had fun research experiences at Bytedance AI Lab (2020) and Megvii (2019). I obtained the master degree at Shanghai Jiao Tong University (2021) and bachelor degree at Nanjing University of Posts and Telecommunications (2017).

My research primarily focuses on efficient multimodal large language models (MLLMs), exploring innovative architectures such as Mixture of Experts (MoE) to enhance model scalability and efficiency, as well as advanced training paradigms such as knowledge distillation to optimize performance while reducing computational costs. Ultimately, I aim to create powerful yet resource-efficient models that push the boundaries of multimodal intelligence.

News

  • Dec. 2024 - 2 papers accepted to AAAI'25. (MARS, HSA-DPO)
  • May. 2024 - 1 papers accepted to TMM'24. (MAC)

Experiences

  • Research Intern
    Dec.2019 - Mar.2020

Publications

Full Publications: Google Scholar

  • LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

    Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi,

    Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Si Liu, Hongsheng Li, Hao Jiang
    International Conference on Learning Representations (ICLR), 2025

    arXiv / code
  • MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

    Fangxun Shu, Biaolong Chen, Yue Liao, Jinqiao Wang, Si Liu
    IEEE Transactions on Multimedia (TMM), 2024

    camera-ready
  • Audio-Visual LLM for Video Understanding

    Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie
    Tech report, 2023

    arXiv
  • MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

    Wanggui He*, Siming Fu*, Mushui Liu*, Xierui Wang*, Wenyi Xiao*, Fangxun Shu*, Yi Wang,

    Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, LeiLei Gan, Hao Jiang
    AAAI Conference on Artificial Intelligence (AAAI), 2025

    arXiv
  • Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data

    Lei Zhang*, Fangxun Shu*, Tianyang Liu, Sucheng Ren, Hao Jiang, Cihang Xie
    Tech report, 2024

    arXiv
  • Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

    Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Fangxun Shu,

    Hao Jiang, Linchao Zhu
    AAAI Conference on Artificial Intelligence (AAAI), 2025

    arXiv
  • Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

    Shangzhe Di, Zhelun Yu, Guanghao Zhang, Haoyuan Li, Zhong Tao, Hao Cheng, Bolin Li,

    Wanggui He, Fangxun Shu, Hao Jiang
    International Conference on Learning Representations (ICLR), 2025

    arXiv

Interests

Mulitimodal Large Language Models, including: (1) efficient architecture design and training paradigm (2) effective alignment and reasoning.

Services

Reviewer for CVPR, ICLR, NIPS,and ICML.