I am a machine learning researcher and engineer who designs scalable, open-ended learning systems for adaptive agents. My long-term vision is to build autonomous agents that are reliable collaborators and enablers of human creativity.

I recently completed my PhD in computer science at the University of Southern California, where I was advised by Stefanos Nikolaidis and Fei Sha, and specialized in meta-reinforcement learning and autocurriculum learning. I have completed research internships at Google DeepMind to study effective LLM sampling strategies for producing RL post-training data, and Google Brain to work on semi-supervised skill learning. Before USC, I studied adversarial machine learning under the guidance of Junfeng Yang at Columbia University.

Experience

Google DeepMind (Gemini RL & Code)

Google DeepMind (Gemini RL & Code)

đź’Ľ Research Scientist Intern (Summer 2025)

Designed agentic LLM sampling methods which improved pass@k on Humanity's Last Exam, with immediate applications for RL (RLVR) post-training and inference-time scaling.

University of Southern California (Viterbi School of Engineering)

University of Southern California (Viterbi School of Engineering)

🎓 Ph.D. in Computer Science (2020-2025)

Defended thesis, "Open-Ended Training of Adaptive Agents", which examines the intersection of online meta-reinforcement learning agents and autocurriculum methods. The ultimate perspective is that the relationship between a learner and its task distribution should be treated as fundamental rather than separate or incidental.

Google Research (Brain Team)

Google Research (Brain Team)

đź’Ľ Student Researcher (2022)

Columbia University (Fu Foundation School of Engineering)

Columbia University (Fu Foundation School of Engineering)

🎓 B.S. in Computer Science - Intelligent Systems (2020)

Bard College at Simon's Rock (Early College)

Bard College at Simon's Rock (Early College)

🎓 A.A. (2017) and B.A. in Computer Science (2020)


Preprints

Scale-Resistant Learning Objectives Produce Emergent Internal Autocurricula

Scale-Resistant Learning Objectives Produce Emergent Internal Autocurricula [Paper]

R Costales, S Nikolaidis

In preparation (Dec. 2025)

Establishes a novel connection between scale-resistant actor-critic meta-RL learners and autocurriculum methods——namely, their similar effect on the loss distribution across level difficulty during training——and demonstrates that the two interventions, though related, are synergistic.

Selected publications

Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity [Site] [Paper] [Code]

R Costales, S Nikolaidis

Neural Information Processing Systems (NeurIPS) 2024

Introduces DIVA, an evolutionary approach which uses quality diversity (QD) optimization for generating diverse tasks to train adaptive agents in open-ended simulators.

ALMA: Hierarchical Learning for Composite Multi-Agent Tasks

ALMA: Hierarchical Learning for Composite Multi-Agent Tasks [Paper] [Code]

S Iqbal, R Costales, F Sha

Neural Information Processing Systems (NeurIPS) 2022

Presents ALMA, a general end-to-end learning method for hierarchically structured multi-agent tasks, resulting in sophisticated coordination behavior and outperforming competitive MARL baselines.

Possibility Before Utility: Learning And Using Hierarchical Affordances

Possibility Before Utility: Learning And Using Hierarchical Affordances [Paper] [Code]

R Costales, S Iqbal, F Sha

🏅 (Spotlight) Int. Conf. on Learning Representations (ICLR) 2022

Introduces HAL, a hierarchical reinforcement learning (HRL) approach that learns a high-level world model of subtask affordances, and uses it to explore and learn more efficiently by pruning impossible (i.e. unafforded) subtasks.