Paper Feed

Paper Feed: May 2026

Highlighting research I find interesting and think may deserve more attention (as of 04/21/26).

Training and Interpretability

Synthetic Data for any Differentiable Target
Tristan Thrush, Sung Min Park, Herman Brunborg, [...], Tatsunori Hashimoto (2026)
Mechanisms of Introspective Awareness
Uzay Macar, Li Yang, Atticus Wang, Peter Wallich, Emmanuel Ameisen, Jack Lindsey (2026)
Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
Ishaq Aden-Ali, Noah Golowich, Allen Liu, Abhishek Shetty, Ankur Moitra, Nika Haghtalab (2026)
Disentangling MLP Neuron Weights in Vocabulary Space
Asaf Avrahamy, Yoav Gur-Arieh, Mor Geva (2026)

Misalignment and Generalization

Comment on: Current AIs seem pretty misaligned to me
Sam Marks (2026)

Some notes

See also the Greenblatt follow-up.
From personas to intentions: towards a science of motivations for AI models
David Africa, Jacob Pfau (2026)
How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?
Dylan Xu, Alek Westover, Vivek Hebbar, [...], Julian Stastny (2026)
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
Cameron Pattison, Lorenzo Manuali, Seth Lazar (2026)

Security, Governance, and Agents

Coding Agents Are Changing the Biosecurity Risk Landscape
Luca Righetti, Kamile Lukosiute, James Black (2026)
LinuxArena: A Control Setting for AI Agents in Live Production Software Environments
Tyler Tracy, Ram Potham, Nick Kuhn, [...], Aryan Bhatt (2026)

AI Economics and Forecasting

WAGE-Bench: Measuring the Economic Value of AI in Real Work
Kris Gulati (2026)
The least understood driver of AI progress
Anson Ho (2026)