Do Language Models Know When to Change Their Mind?
Working technical report. Code at github.com/jbarnes850/metacognition.
Working technical report. Code at github.com/jbarnes850/metacognition.
Given a capable model, how should you spend test-time compute? More training, more samples, or smarter selection?
Every frontier model I tested correctly identifies the ground-truth threat when it acts. That looked promising until I gave eight of them actual containment tools and watched them act on 45-97.5% of episodes they should have left alone.
Part 1 of a series on practical post-training pipelines for deployed agents.
This is a working note on research in progress. If you’re working on adaptive evaluation, continual learning, or tool-use agents, reach out at jbarnes850@gmail.com or Twitter.
This is a working note on how I think about world models: what they are, how to train them, and how they sit alongside agents. It’s written for a technical audience, and many of the ideas borrow from human learning.
My first attempt at building a distributed learning system wasn’t for a tech company. It was for a network of food banks.
My daughter was born in November of 2023. At the time, I was a new Dad asking AI every question I could think of. I even recorded her cries, desperately prompting AI: “Tell me what this means—help me!” (welcome to parenting in the age of AI)