Thomas Jiralerspong

I am a PhD student in computer science co-supervised by Yoshua Bengio and Guillaume Lajoie at Mila and Université de Montréal.

I am also currently an AI research fellow at Anthropic, working on mechanistic interpretability of emergently misaligned models.

I have a wide array of interests, most of which I have been fortunate enough to work on:

LLMs/VLMs:
- LLM agents (internship at Occam AI)
- LLM internals and mechanistic interpretability (paper on compositionality in LLMs)
- LLM applications (paper on causal graph discovery with LLMs)
Neuroscience/Cognitive science/Psychology inspired AI:
- Input-driven learning (paper on bias-only learning)
Modularity:
- Compositionality (papers on complexity-based theory of compositionality and compositionality in LLMs)
- Discrete Representations (paper on discovering discrete subgoals for RL)
Model-based AI:
- Causality (papers on causal graph discovery with LLMs and causal imputation)
- Model-based reinforcement learning (paper on temporally extended tree-search planning)
AI for good:
- AI for Healthcare (paper on RL for Mechanical Ventilation)
- AI for Climate Change (paper on RL for HVAC Control)
- AI for Autonomous Driving (Internship at Waabi)
- AI for Drug Discovery

In my free time, I enjoy traveling, watching/analyzing good movies, and writing sad songs (check out my (very) amateur music here)!

Reach out at thomas.jiralerspong@mila.quebec if there is anything you want to discuss, I’m always happy to talk!