Home / Blog

Essays and my paper feed.

Series · Ongoing
Paper Feed
Monthly notes on under-appreciated research. Latest issue: June 2026.
Essay · May 19, 2026
Attackers are not burning their best jailbreaks (yet)
We might be overestimating current model safeguards, because attackers may strategically withhold their best jailbreaks until the release of more capable models
Essay · April 10, 2026
Finding Widespread Cheating on Popular Agent Benchmarks
We find over 1,000 instances of cheating across 28+ submissions on 9 benchmarks, including the top 3 Terminal-Bench 2 agents.
Essay · March 18, 2026
Introducing OpenConjecture, a living dataset of mathematics conjectures from the ArXiv
We are releasing OpenConjecture, a dataset of (currently) 890 unproved conjectures from recent arXiv math papers. On a small subset, GPT-5.4 finds candidate proofs or counterexamples, and formalizes several in Lean.