Follow
Thomas William Anthony
Thomas William Anthony
Google DeepMind
Verified email at google.com
Title
Cited by
Cited by
Year
Thinking fast and slow with deep learning and tree search
TW Anthony, Z Tian, D Barber
Advances in Neural Information Processing Systems, 5360-5370, 2017
3962017
Openspiel: A framework for reinforcement learning in games
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint arXiv:1908.09453, 2019
2552019
Mastering the game of Stratego with model-free multiagent reinforcement learning
J Perolat, B De Vylder, D Hennes, E Tarassov, F Strub, V de Boer, ...
Science 378 (6623), 990-996, 2022
1562022
From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization
J Perolat, R Munos, JB Lespiau, S Omidshafiei, M Rowland, P Ortega, ...
International Conference on Machine Learning, 8525-8535, 2021
802021
On the role of planning in model-based deep reinforcement learning
JB Hamrick, AL Friesen, F Behbahani, A Guez, F Viola, S Witherspoon, ...
arXiv preprint arXiv:2011.04021, 2020
722020
Learning to Play No-Press Diplomacy with Best Response Policy Iteration
T Anthony, T Eccles, A Tacchetti, J Kramár, I Gemp, TC Hudson, N Porcel, ...
arXiv preprint arXiv:2006.04635, 2020
512020
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
TW Anthony, R Nishihara, P Moritz, T Salimans, J Schulman
arXiv preprint arXiv:1904.03646, 2019
312019
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games
E Hughes, TW Anthony, T Eccles, JZ Leibo, D Balduzzi, Y Bachrach
arXiv preprint arXiv:2003.00799, 2020
242020
OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019)
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint cs.LG/1908.09453, 2019
232019
Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent
I Gemp, R Savani, M Lanctot, Y Bachrach, T Anthony, R Everett, ...
arXiv preprint arXiv:2106.01285, 2021
182021
Smooth markets: A basic mechanism for organizing gradient-based learners
D Balduzzi, WM Czarnecki, TW Anthony, IM Gemp, E Hughes, JZ Leibo, ...
arXiv preprint arXiv:2001.04678, 2020
162020
ITERATIVE EMPIRICAL GAME SOLVING VIA SINGLE POLICY BEST RESPONSE
MO Smith, T Anthony, MP Wellman
16*
Learning to play against any mixture of opponents
MO Smith, T Anthony, MP Wellman
Frontiers in Artificial Intelligence 6, 2023
132023
Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers
L Marris, I Gemp, T Anthony, A Tacchetti, S Liu, K Tuyls
Advances in Neural Information Processing Systems 35, 5586-5600, 2022
122022
Expert iteration
TW Anthony
UCL (University College London), 2021
62021
Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas
U Madhushani, KR McKee, JP Agapiou, JZ Leibo, R Everett, T Anthony, ...
arXiv preprint arXiv:2305.00768, 2023
42023
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
M Lanctot, J Schultz, N Burch, MO Smith, D Hennes, T Anthony, J Perolat
arXiv preprint arXiv:2303.03196, 2023
42023
Designing all-pay auctions using deep learning and multi-agent simulation
I Gemp, T Anthony, J Kramar, T Eccles, A Tacchetti, Y Bachrach
Scientific Reports 12 (1), 16937, 2022
42022
Developing, evaluating and scaling learning agents in multi-agent environments
I Gemp, T Anthony, Y Bachrach, A Bhoopchand, K Bullard, J Connor, ...
AI Communications 35 (4), 271-284, 2022
42022
Strategic Knowledge Transfer
MO Smith, T Anthony, MP Wellman
Journal of Machine Learning Research 24 (233), 1-96, 2023
32023
The system can't perform the operation now. Try again later.
Articles 1–20