Publications
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models.
ACL 2025
Arxiv Preprint ↗
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence.
ICML 2025
Arxiv Preprint ↗
Reinforce++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward.
Under review
Arxiv Preprint ↗
Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization.
Under review
Arxiv Preprint ↗
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs.
Under review
Arxiv Preprint ↗
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework.
Under review
Arxiv Preprint ↗
Projects
OpenCoder-LLM
Project page ↗
- An open and reproducible code LLM family that matches the performance of top-tier code LLMs.
- Reproducible training data and detailed training protocols.
- Core developer: data pipeline, model evaluation.
INF-o1: π₀
Project page ↗
- Initiate the journey to the infinity of LLM reasoning.
- Provide a solid starting point for developing a robust policy for subsequent RL.
- Designed a data pipeline for NLP2SQL: synthesis, filtering, augmentation.
INF-RL-Coder-32B
Hugging Face ↗
- Ranked #1 among open-source models on BIRD single-model leaderboard.
- Explored rejection sampling fine-tuning based on in-context distillation.
- Used reinforcement learning to improve selecting appropriate CoT paths.
LeetCodeDataset
GitHub ↗
- Dataset of algorithmic problems suitable for LLM training and evaluation.
- Comprising Python LeetCode problems for training/evaluating large language models.
- Core developer: data pipeline.
OpenRLHF
GitHub ↗
- High-performance RLHF framework built on Ray, DeepSpeed, and HF Transformers.
- Participating in framework design and community discussions.
- Contributions: Reference Offload, GRPO, KL estimator.