Follow
Han Zhong
Han Zhong
Verified email at stu.pku.edu.cn - Homepage
Title
Cited by
Cited by
Year
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
Forty-first International Conference on Machine Learning, 2024
53*2024
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962, 2022
53*2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
X Chen, H Zhong, Z Yang, Z Wang, L Wang
International Conference on Machine Learning, 3773-3793, 2022
482022
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
H Zhong, Z Yang, Z Wang, MI Jordan
Journal of Machine Learning Research 24 (35), 1-52, 2023
44*2023
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
arXiv preprint arXiv:2205.15512, 2022
442022
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
International Conference on Machine Learning, 27117-27142, 2022
432022
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
Thirty-seventh Conference on Neural Information Processing Systems, 2023
27*2023
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
International Conference on Machine Learning, 24496-24523, 2022
252022
A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes
H Zhong, T Zhang
Advances in Neural Information Processing Systems 36, 2024
242024
Why robust generalization in deep learning is difficult: Perspective of expressive power
B Li, J Jin, H Zhong, J Hopcroft, L Wang
Advances in Neural Information Processing Systems 35, 4370-4384, 2022
222022
Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage
J Blanchet, M Lu, T Zhang, H Zhong
Advances in Neural Information Processing Systems 36, 2024
192024
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
H Zhong, Z Yang, Z Wang, C Szepesvári
arXiv preprint arXiv:2110.08984, 2021
182021
Nearly optimal policy optimization with stable at any time guarantee
T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao
International Conference on Machine Learning, 24243-24265, 2022
152022
DPO Meets PPO: Reinforced Token Optimization for RLHF
H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang
arXiv preprint arXiv:2404.18922, 2024
142024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
R Yang, X Pan, F Luo, S Qiu, H Zhong, D Yu, J Chen
arXiv preprint arXiv:2402.10207, 2024
122024
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
R Yang, H Zhong, J Xu, A Zhang, C Zhang, L Han, T Zhang
arXiv preprint arXiv:2310.12955, 2023
82023
Tackling heavy-tailed rewards in reinforcement learning with function approximation: Minimax optimal and instance-dependent regret bounds
J Huang, H Zhong, L Wang, L Yang
Advances in Neural Information Processing Systems 36, 2024
62024
Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
J Hu, H Zhong, C Jin, L Wang
arXiv preprint arXiv:2210.15598, 2022
62022
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
H Zhong, J Huang, L Yang, L Wang
Advances in Neural Information Processing Systems 34, 2021
62021
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
International Conference on Learning Representations, 2021/9/29, 2021
6*2021
The system can't perform the operation now. Try again later.
Articles 1–20