Follow
Han Zhong
Han Zhong
Verified email at stu.pku.edu.cn - Homepage
Title
Cited by
Cited by
Year
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
H Zhong, Z Yang, Z Wang, MI Jordan
Journal of Machine Learning Research 24 (35), 1-52, 2023
17*2023
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
International Conference on Machine Learning, 27117-27142, 2022
142022
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
arXiv preprint arXiv:2205.15512, 2022
102022
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
H Zhong, Z Yang, Z Wang, C Szepesvári
arXiv preprint arXiv:2110.08984, 2021
72021
A posterior sampling framework for interactive decision making
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962, 2022
42022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
International Conference on Machine Learning, 24496-24523, 2022
42022
Why robust generalization in deep learning is difficult: Perspective of expressive power
B Li, J Jin, H Zhong, J Hopcroft, L Wang
Advances in Neural Information Processing Systems 35, 4370-4384, 2022
32022
Nearly optimal policy optimization with stable at any time guarantee
T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao
International Conference on Machine Learning, 24243-24265, 2022
32022
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
International Conference on Learning Representations, 2021/9/29, 2021
3*2021
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
H Zhong, EX Fang, Z Yang, Z Wang
arXiv preprint arXiv:2012.14098, 2020
32020
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
X Chen, H Zhong, Z Yang, Z Wang, L Wang
International Conference on Machine Learning, 3773-3793, 2022
12022
Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
H Zhong, J Hu, Y Xue, T Li, L Wang
arXiv preprint arXiv:2302.10796, 2023
2023
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
Y Yang, H Zhong, T Wu, B Liu, L Wang, SS Du
arXiv preprint arXiv:2302.01477, 2023
2023
Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
J Hu, H Zhong, C Jin, L Wang
arXiv preprint arXiv:2210.15598, 2022
2022
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
H Zhong, J Huang, L Yang, L Wang
Advances in Neural Information Processing Systems 34, 2021
2021
The system can't perform the operation now. Try again later.
Articles 1–15