Blog | Davis Brown

Series · Ongoing

Paper Feed

Monthly notes on under-appreciated research. Latest issue: July 2026.

Essay · May 19, 2026

Attackers are not burning their best jailbreaks (yet)

We might be overestimating current model safeguards, because attackers may strategically withhold their best jailbreaks until the release of more capable models

Essay · April 10, 2026

Finding Widespread Cheating on Popular Agent Benchmarks

We find over 1,000 instances of cheating across 28+ submissions on 9 benchmarks, including the top 3 Terminal-Bench 2 agents.

Essay · March 18, 2026

Introducing OpenConjecture, a living dataset of mathematics conjectures from the ArXiv

We are releasing OpenConjecture, a dataset of (currently) 890 unproved conjectures from recent arXiv math papers. On a small subset, GPT-5.4 finds candidate proofs or counterexamples, and formalizes several in Lean.