Follow
Samy Jelassi
Samy Jelassi
Verified email at fas.harvard.edu - Homepage
Title
Cited by
Cited by
Year
Vision transformers provably learn spatial structure
S Jelassi, M Sander, Y Li
Advances in Neural Information Processing Systems 35, 37822-37836, 2022
902022
A momentumized, adaptive, dual averaged gradient method
A Defazio, S Jelassi
Journal of Machine Learning Research 23 (144), 1-34, 2022
86*2022
Global convergence of neuron birth-death dynamics
G Rotskoff, S Jelassi, J Bruna, E Vanden-Eijnden
International Conference on Machine Learning, 2019
66*2019
A mean-field analysis of two-player zero-sum games
C Domingo-Enrich, S Jelassi, A Mensch, G Rotskoff, J Bruna
Advances in neural information processing systems 33, 20215-20226, 2020
602020
A permutation-equivariant neural network architecture for auction design
J Rahme, S Jelassi, J Bruna, SM Weinberg
Proceedings of the AAAI conference on artificial intelligence 35 (6), 5664-5672, 2021
562021
Auction learning as a two-player game
J Rahme, S Jelassi, SM Weinberg
arXiv preprint arXiv:2006.05684, 2020
502020
Towards understanding how momentum improves generalization in deep learning
S Jelassi, Y Li
International Conference on Machine Learning, 9965-10040, 2022
432022
Repeat after me: Transformers are better than state space models at copying
S Jelassi, D Brandfonbrener, SM Kakade, E Malach
arXiv preprint arXiv:2402.01032, 2024
412024
Length generalization in arithmetic transformers
S Jelassi, S d'Ascoli, C Domingo-Enrich, Y Wu, Y Li, F Charton
arXiv preprint arXiv:2306.15400, 2023
332023
Smoothed analysis of the low-rank approach for smooth semidefinite programs
T Pumir, S Jelassi, N Boumal
Advances in Neural Information Processing Systems 31, 2018
292018
Towards closing the gap between the theory and practice of SVRG
O Sebbouh, N Gazagnadou, S Jelassi, F Bach, R Gower
Advances in neural information processing systems 32, 2019
212019
Dissecting adaptive methods in GANs
S Jelassi, D Dobre, A Mensch, Y Li, G Gidel
arXiv preprint arXiv:2210.04319, 2022
19*2022
Depth separation beyond radial functions
L Venturi, S Jelassi, T Ozuch, J Bruna
Journal of machine learning research 23 (122), 1-56, 2022
182022
Extra-gradient with player sampling for faster convergence in n-player games
S Jelassi, C Domingo-Enrich, D Scieur, A Mensch, J Bruna
International Conference on Machine Learning, 4736-4745, 2020
14*2020
Depth Dependence of P Learning Rates in ReLU MLPs
S Jelassi, B Hanin, Z Ji, SJ Reddi, S Bhojanapalli, S Kumar
arXiv preprint arXiv:2305.07810, 2023
62023
Universal length generalization with turing programs
K Hou, D Brandfonbrener, S Kakade, S Jelassi, E Malach
arXiv preprint arXiv:2407.03310, 2024
42024
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
K Li, S Jelassi, H Zhang, S Kakade, M Wattenberg, D Brandfonbrener
arXiv preprint arXiv:2402.14688, 2024
32024
Mixture of Parrots: Experts improve memorization more than reasoning
S Jelassi, C Mohri, D Brandfonbrener, A Gu, N Vyas, N Anand, ...
arXiv preprint arXiv:2410.19034, 2024
12024
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
A Prabhakar, Y Li, K Narasimhan, S Kakade, E Malach, S Jelassi
arXiv preprint arXiv:2410.13025, 2024
12024
Collective Model Intelligence Requires Compatible Specialization
J Pari, S Jelassi, P Agrawal
arXiv preprint arXiv:2411.02207, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20