Paper Feed: April 2025

Highlighting research I find interesting and think may deserve more attention (as of 04/01/25) from academia, government, or the AI safety community.

Evals

Science of DL / Interp

Compute / Reasoning

General Safety

Security/Control

Miscellaneous

  • Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier
    Zachary Wojtowicz, Simon DeDeo (2025)
    Why this is notable
    I found this framework very useful. For example, I think this kind of set-up can mostly explain why (current) workshop paper submissions from AI-scientists are damaging to the field. The story is something like: the reviewing bar for workshops is quite low and submitting a paper is mainly a signal / proof-of-work. At a minimum, a workshop paper is a proof-of-mental-work that the author is thinking somewhat coherently about a topic. AI scientists undermine this signal by circumventing the (not very robust) workshop review process.
  • attention is logarithmic, actually
    spike (2025)