Maximize to explore: One objective function fusing estimation, planning, and exploration Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang Advances in Neural Information Processing Systems 36, 2024 | 17* | 2024 |
Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy Z Liu, M Lu, Z Wang, M Jordan, Z Yang International Conference on Machine Learning, 13870-13911, 2022 | 17 | 2022 |
Learning from demonstration: Provably efficient adversarial policy imitation with linear function approximation Z Liu, Y Zhang, Z Fu, Z Yang, Z Wang International conference on machine learning, 14094-14138, 2022 | 14* | 2022 |
Reason for future, act for now: A principled architecture for autonomous LLM agents Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu, Z Wang | 10* | 2023 |
Guarded policy optimization with imperfect online demonstrations Z Xue, Z Peng, Q Li, Z Liu, B Zhou arXiv preprint arXiv:2303.01728, 2023 | 3 | 2023 |
Can Large Language Models Play Games? A Case Study of A Self-Play Approach H Guo, Z Liu, Y Zhang, Z Wang arXiv preprint arXiv:2403.05632, 2024 | 1 | 2024 |
A Principled Framework for Knowledge-enhanced Large Language Model S Wang, Z Liu, Z Wang, J Guo arXiv preprint arXiv:2311.11135, 2023 | 1 | 2023 |
Sample-Efficient Multi-Agent RL: An Optimization Perspective N Xiong, Z Liu, Z Wang, Z Yang arXiv preprint arXiv:2310.06243, 2023 | 1 | 2023 |
How Can LLM Guide RL? A Value-Based Approach S Zhang, S Zheng, S Ke, Z Liu, W Jin, J Yuan, Y Yang, H Yang, Z Wang arXiv preprint arXiv:2402.16181, 2024 | | 2024 |