Writing

February 2, 2026 AI Engineering

Where should test-time compute go? Surprisal-guided selection in verifiable environments

Given a capable model, how should you spend test-time compute? More training, more samples, or smarter selection?

January 23, 2026 AI Security Research

Frontier Security Agents Don't Lack Detection. They Lack Restraint.

LLMs achieve 94% precision on alert classification. That number looked promising until I gave four frontier models actual containment tools and watched them act on 82.5% of episodes they should have left alone.

Read more →

January 15, 2026 AI Engineering

When Sampling Beats Training: Multi-Turn RL's Cost-Benefit Problem

Part 1 of a series on practical post-training pipelines for deployed agents.

Read more →

January 13, 2026 AI Research

Rethinking Evaluation for Agents That Never Stop Learning

This is a working note on research in progress. If you’re working on adaptive evaluation, continual learning, or tool-use agents, reach out at jbarnes850@gmail.com or Twitter.

Read more →

November 19, 2025 AI Engineering

Building a World Model of Consequence

This is a working note on how I think about world models: what they are, how to train them, and how they sit alongside agents. It’s written for a technical audience, and many of the ideas borrow from human learning.

Read more →

July 16, 2025 Distributed Systems

My Agents Keep Failing. Yours Will Too.

My first attempt at building a distributed learning system wasn’t for a tech company. It was for a network of food banks.

Read more →

April 30, 2025 Technology

Everything is Changing...Again

My daughter was born in November of 2023. At the time, I was a new Dad asking AI every question I could think of. I even recorded her cries, desperately prompting AI: “Tell me what this means—help me!” (welcome to parenting in the age of AI)

Read more →