👨🎓 About Me
I am a third-year Ph.D. student at Peking University, advised by Prof. Xuejun Yang and Prof. Wenjing Yang. I earned my B.S. degree at China University of Geosciences in 2023. Prior to that, I served for two years in the People’s Liberation Army.
My primary research interest focus on Foundation Models for Multimodal Learning. I am also interested in Causal Inference and Reinforcement Learning. My overarching research goal is to build reliable and generalizable multimodal intelligence, with a focus on developing principled methods that integrate vision, language, and structured reasoning under real-world conditions.
Currently I am working on Efficient Pretraining and Fine-tuning of Multimodal Large Language Models.
I am actively seeking research discussions and collaboration opportunities, so feel free to contact me! 😄
📝 Publications
* Equal Contribution, † Corresponding Author, ‡ Project Leader, # Core Contributor
- MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Xukai Wang*, Xuanbo Liu*, Mingrui Chen*, Haitian Zhong*, Xuanlin Yang*, Bohan Zeng*, Jinbo Hu*, Hao Liang, Junbo Niu, Xuchen Li, Ruitao Wu, Ruichuan An, Yang Shi, Liu Liu, Xu-Yao Zhang, Qiang Liu, Zhouchen Lin, Wentao Zhang†, Bin Dong† - AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen, Yue Ding, Weihong Lin, Jingyun Hua, Linli Yao, Yang Shi, Bozhou Li, Yuanxing Zhang, Qiang Liu†, Pengfei Wan, Liang Wang, Tieniu Tan - RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
Yang Shi#, Yuhao Dong#‡, Yue Ding#, Yuran Wang#, Xuanyu Zhu#, Sheng Zhou#, Wenting Liu#, Haochen Tian#, Rundong Wang#, Huanqian Wang, Zuyan Liu, Bohan Zeng, Ruizhe Chen, Qixun Wang, Zhuoran Zhang, Xinlong Chen, Chengzhuo Tong, Bozhou Li, Chaoyou Fu, Qiang Liu, Haotian Wang†, Wenjing Yang, Yuanxing Zhang†, Pengfei Wan, Yi-Fan Zhang†, Ziwei Liu† - OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
Zhihong Chen*, Xuehai Bai*, Yang Shi*, Chaoyou Fu, Huanyu Zhang, Haotian Wang, Xiaoyan Sun, Zhang Zhang, Liang Wang, Yuanxing Zhang†, Pengfei Wan, Yi-Fan Zhang†‡ - BaseReward: A Strong Baseline for Multimodal Reward Model
Yi-Fan Zhang*, Haihua Yang*‡, Huanyu Zhang, Yang Shi, Zezhou Chen, Haochen Tian, Chaoyou Fu†, Kai Wu, Bo Cui, Xu Wang, Jianfei Pan, Haotian Wang, Zhang Zhang†, Liang Wang - VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Xinlong Chen, Yuanxing Zhang, Yushuo Guan, Bohan Zeng, Yang Shi, Sihan Yang, Pengfei Wan, Qiang Liu†, Liang Wang, Tieniu Tan - MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios [NeurIPS 2025]
Yang Shi#, Huanqian Wang#, Wulin Xie#, Huanyao Zhang#, Lijie Zhao#, YiFan Zhang#†, Xinfeng Li, Chaoyou Fu, Zhuoer Wen, Wenting Liu, Zhuoran Zhang, Xinlong Chen, Bohan Zeng, Sihan Yang, Yuanxing Zhang‡, Pengfei Wan, Haotian Wang†, Wenjing Yang† - Mavors: Multi-granularity Video Representation for Multimodal Large Language Model [ACM MM 2025]
Yang Shi*, Jiaheng Liu*, Yushuo Guan*, Zhenhua Wu, Yuanxing Zhang†, Zihao Wang, Weihong Lin, Jingyun Hua, Zekun Wang, Xinlong Chen, Bohan Zeng, Wentao Zhang, Fuzheng Zhang, Wenjing Yang, Di Zhang - MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Wulin Xie*, Yi-Fan Zhang*‡, Chaoyou Fu, Yang Shi, Bingyan Nie, Hongkai Chen, Zhang Zhang, Liang Wang, Tieniu Tan - MM-RLHF: The Next Step Forward in Multimodal LLM Alignment [ICML 2025]
Yi-Fan Zhang‡, Tao Yu, Haochen Tian, Chaoyou Fu†, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen†, Fan Yang, Zhang Zhang†, Tingting Gao, Di Zhang, Liang Wang, Rong Jin, Tieniu Tan - EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
Zhili Cheng‡, Yuge Tu#, Ran Li#, Shiqi Dai#, Jinyi Hu#‡, Shengding Hu, Jiahao Li, Yang Shi, Tianyu Yu, Weize Chen, Lei Shi, Maosong Sun† - Debiasing Multimodal Large Language Models via Penalization of Language Priors [ACM MM 2025]
YiFan Zhang*, Yang Shi*, Weichen Yu, Qingsong Wen†, Xue Wang, Wenjing Yang, Zhang Zhang, Liang Wang, Rong Jin
👨💻 Work Experience
- Research Intern at Kling AI, Kuaishou Technology, 2025.02 - Present
- Research Intern at THUNLP, Tsinghua University, 2023.11 - 2025.02
📚 Education
- Ph.D. School of Computer Science, Peking University, 2023 - Present
- B.S. School of Computer Science, China University of Geosciences, 2019 - 2023
🌟 Honors & Awards
- Ruiming Alumni Scholarship, 1‰ , 2021
- China National Scholarship, 0.2% , 2020
