I recently completed my PhD in computer science at the University of Southern California, where I was advised by Stefanos Nikolaidis and Fei Sha, and specialized in meta-reinforcement learning and autocurriculum learning. I have completed research internships at Google DeepMind to study effective LLM sampling strategies for producing RL post-training data, and Google Brain to work on semi-supervised skill learning. Before USC, I studied adversarial machine learning under the guidance of Junfeng Yang at Columbia University.
Experience
Google DeepMind (Gemini RL & Code)
đź’Ľ Research Scientist Intern (Summer 2025)
Designed agentic LLM sampling methods which improved pass@k on Humanity's Last Exam, with immediate applications for RL (RLVR) post-training and inference-time scaling.
University of Southern California (Viterbi School of Engineering)
🎓 Ph.D. in Computer Science (2020-2025)
Defended thesis, "Open-Ended Training of Adaptive Agents", which examines the intersection of online meta-reinforcement learning agents and autocurriculum methods. The ultimate perspective is that the relationship between a learner and its task distribution should be treated as fundamental rather than separate or incidental.
Google Research (Brain Team)
đź’Ľ Student Researcher (2022)
Columbia University (Fu Foundation School of Engineering)
🎓 B.S. in Computer Science - Intelligent Systems (2020)
Bard College at Simon's Rock (Early College)
🎓 A.A. (2017) and B.A. in Computer Science (2020)
Preprints
Scale-Resistant Learning Objectives Produce Emergent Internal Autocurricula [Paper]
Establishes a novel connection between scale-resistant actor-critic meta-RL learners and autocurriculum methods——namely, their similar effect on the loss distribution across level difficulty during training——and demonstrates that the two interventions, though related, are synergistic.
Selected publications
Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity [Site] [Paper] [Code]
Introduces DIVA, an evolutionary approach which uses quality diversity (QD) optimization for generating diverse tasks to train adaptive agents in open-ended simulators.
ALMA: Hierarchical Learning for Composite Multi-Agent Tasks [Paper] [Code]
Presents ALMA, a general end-to-end learning method for hierarchically structured multi-agent tasks, resulting in sophisticated coordination behavior and outperforming competitive MARL baselines.
Possibility Before Utility: Learning And Using Hierarchical Affordances [Paper] [Code]
Introduces HAL, a hierarchical reinforcement learning (HRL) approach that learns a high-level world model of subtask affordances, and uses it to explore and learn more efficiently by pruning impossible (i.e. unafforded) subtasks.