Jason Klein Liu

Research on code large language models and RLHF. This page summarizes selected publications and projects.

Publications

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models.

Siming Huang, Tianhao Cheng, Jason Klein Liu, Jiaran Hao, Jie Fu, Qian Liu, Zili Wang.

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence.

Yuliang Liu, Junjie Lu, Zhaoling Chen, Chaofeng Qu, Jason Klein Liu, Chuheng Zhang, Wei Shen, Zhouhan Lin.

ICML 2025 Arxiv Preprint ↗

Reinforce++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward.

Jian Hu, Jason Klein Liu, Wei Shen.

Under review Arxiv Preprint ↗

Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization.

KeZhao Liu, Jason Klein Liu, YiMing Liu.

Under review Arxiv Preprint ↗

LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs.

Yunhui Xia, Wei Shen, Yan Wang, Jason Klein Liu, Huifeng Sun, Siyue Wu, Jian Hu, Xiaolong Xu.

Under review Arxiv Preprint ↗

OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework.

Jian Hu, Xibin Wu, Wei Shen, Jason Klein Liu, Weixun Wang, Songlin Jiang, Haoran Wang, Hao Chen, Bin Chen, Wenkai Fang, Xianyu, Yu Cao, Haotian Xu, Yiming Liu.

Under review Arxiv Preprint ↗

Projects

OpenCoder-LLM Project page ↗
  • An open and reproducible code LLM family that matches the performance of top-tier code LLMs.
  • Reproducible training data and detailed training protocols.
  • Core developer: data pipeline, model evaluation.
INF-o1: π₀ Project page ↗
  • Initiate the journey to the infinity of LLM reasoning.
  • Provide a solid starting point for developing a robust policy for subsequent RL.
  • Designed a data pipeline for NLP2SQL: synthesis, filtering, augmentation.
INF-RL-Coder-32B Hugging Face ↗
  • Ranked #1 among open-source models on BIRD single-model leaderboard.
  • Explored rejection sampling fine-tuning based on in-context distillation.
  • Used reinforcement learning to improve selecting appropriate CoT paths.
LeetCodeDataset GitHub ↗
  • Dataset of algorithmic problems suitable for LLM training and evaluation.
  • Comprising Python LeetCode problems for training/evaluating large language models.
  • Core developer: data pipeline.
OpenRLHF GitHub ↗
  • High-performance RLHF framework built on Ray, DeepSpeed, and HF Transformers.
  • Participating in framework design and community discussions.
  • Contributions: Reference Offload, GRPO, KL estimator.