Pengrui Lu

lupengrui@sjtu.edu.cn

Shanghai, China

I am a senior undergraduate student in Computer Science at Shanghai Jiao Tong University (SJTU), where I am a member of the ACM Honors Class — an elite program for the top 5% of students with a spirit of innovation and challenge.

I am currently a research intern at SII and GAIR Lab, advised by Prof. Pengfei Liu. I was a research intern at UC Merced, advised by Prof. Ming-Hsuan Yang, from August to December 2025.

My research interests focus on LLM/AI evaluation and benchmarking, including building robust evaluation frameworks for large language models, multimodal systems, and AI coding agents.

news

Feb 08, 2026	New preprint: ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development is now available on arXiv!
Feb 08, 2026	Our paper ProjDevBench was featured on 量子位 (QbitAI), a leading Chinese tech media outlet!
Jul 21, 2025	New preprint: ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry is now available on arXiv!

selected publications

(* denotes equal contribution)

arXiv

ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development

Pengrui Lu^*, Shiqi Zhang^*, Yunzhong Hou^*, and 8 more authors

arXiv preprint arXiv:2602.01655, 2026

Abs arXiv Bib HTML Code

@article{lu2026projdevbench,
  title = {ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development},
  author = {Lu, Pengrui and Zhang, Shiqi and Hou, Yunzhong and Ye, Lyumanshan and Huang, Chaoyi and Chen, Zixi and Zeng, Ji and Jiang, Hantao and Liu, Pengfei and Wang, Yiwei and Yang, Ming-Hsuan},
  year = {2026},
  journal = {arXiv preprint arXiv:2602.01655},
}

arXiv
ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry

Tianze Xu^*, Pengrui Lu^*, Lyumanshan Ye, and 2 more authors

arXiv preprint arXiv:2507.16280, 2025

Abs arXiv Bib HTML Code

The first benchmark focused on evaluating the capabilities of Deep AI Research Systems (DARS) on frontier AI scientific questions, featuring 65 expertly curated research questions across 35 distinct AI research subjects with a dual assessment framework.
@article{lu2025researcherbench, title = {ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry}, author = {Xu, Tianze and Lu, Pengrui and Ye, Lyumanshan and Hu, Xiangkun and Liu, Pengfei}, year = {2025}, journal = {arXiv preprint arXiv:2507.16280}, }
arXiv
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, and 4 more authors

arXiv preprint arXiv:2504.03160, 2025

Abs arXiv Bib HTML Code

The first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning in real-world environments with authentic web search interactions.
@article{zheng2025deepresearcher, title = {DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments}, author = {Zheng, Yuxiang and Fu, Dayuan and Hu, Xiangkun and Cai, Xiaojie and Ye, Lyumanshan and Lu, Pengrui and Liu, Pengfei}, year = {2025}, journal = {arXiv preprint arXiv:2504.03160}, }