Paper Feed

Paper Feed: April 2026

As of 04/01/26.

Monitoring and Misalignment

How we monitor internal coding agents for misalignment
OpenAI (2026)
Metagaming matters for training, evaluation, and oversight
Bronson Schoen, Jenny Nitishinskaya (2026)
Eval awareness in Claude Opus 4.6's BrowseComp performance
Anthropic (2026)
Stress-testing asynchronous monitoring of AI coding agents
AISI (2026); companion paper: Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
COT control: The Word Disappears, but the Thought Does Not
Pranjal Garg (2026)
AIs exhibit a (misaligned) drive to stop early
Ryan Greenblatt (2026)

Generalization

How far does alignment midtraining generalize?
Tomek Korbak, Cameron Raymond, Micah Carroll, [...], Ian Kivlichan (2026)
Scaling Reward Modeling without Human Supervision
Jingxuan Fan, Yueying Li, Zhenting Qi, [...], Hanlin Zhang (2026)
AIs Should Have Proactive Prosocial Drives
Tom Davidson, William MacAskill (2026)
Are AIs more likely to pursue on-episode or beyond-episode reward?
Anders Cairns Woodruff, Alex Mallen (2026)
Running list of conjectures about neural networks
Charles Foster (2023 - current)

Security

Private Post-Training and Inference for Frontier Models
Rudolf Laine, Tanya Verma, Daniel McCann-Sayles, Jules Drean (2026)
My computer got self-hacked because of OpenClaw
Aaron Zhao, Ilia Shumailov, Cheng Zhang, [...], Zehui Li (2026)
Boundary Point Jailbreaking of Black-Box LLMs
Xander Davies, Giorgi Giglemiani, Edmund Lau, [...], Yarin Gal (2026)
Quantifying Frontier LLM Capabilities for Container Sandbox Escape
Rahul Marchand, Art O Cathain, Jerome Wynne, [...], Harry Coppock (2026)

AI Economics and Forecasting

When Does Automating Research Produce Explosive Growth?
Tom Davidson, Basil Halperin, Thomas Houlden, Anton Korinek (2026)
Measuring AI R&D Automation
Alan Chan, Ranay Padarath, Joe Kwon, Hilary Greaves, Markus Anderljung (2026)
How are AI agents used? Evidence from 177,000 MCP tools
Merlin Stein (2026)
We spent 2 hours working in the future
Thomas Kwa, Tom Cunningham (2026)
Anthropic Economic Index report: Learning curves
Maxim Massenkoff, Eva Lyubich, Peter McCrory, [...], Ryan Heller (2026)
AI's capability improvements haven't come from it getting less affordable
Anders Cairns Woodruff (2026)
Final training runs account for a minority of R&D compute spending
Jean-Stanislas Denain (2026)

Why this is notable

Catch-up effects exist, but they are less pronounced than I naively expected.
Computational Arbitrage in AI Model Markets
Ricardo Olmedo, Bernhard Schölkopf, Moritz Hardt (2026)
Understanding and tracking developments in robotics
Janvi (2026)
Concrete projects to prepare for superintelligence
Will MacAskill (2026)
Discussion with Daniel Kokotajlo and Eli Lifland on AI worldviews and cruxes
Yafah Edelman (2026)

Miscellaneous

Model-Preserving Adaptive Rounding
Albert Tseng, Zhaofeng Sun, Christopher De Sa (2025)