I am a first-year PhD student at the University of Pennsylvania advised by Eric Wong and Hamed Hassani, with research interests in AI safety, security, and the science of deep learning. I'm also a research scientist in the National Security Directorate at Pacific Northwest National Lab.

I've most recently been thinking about a) evals and b) AI control and security. See below for my current working papers, and please do reach out if you'd like to chat.

Paper feed of research I've found interesting and think may deserve more attention.

In my free time, I like to run, ski, and (re-)read Robert Caro's biographies.

Recent Papers
2025
Adaptively evaluating models with task elicitation
Davis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, Eric Wong
Paper Description
Task elicitation visualization
Language models have a 'jagged frontier' of capabilities and behaviors, performing exceptionally well on some tasks and brittle on others. We map this frontier by adaptively probing the model under evaluation with new tasks. We term this procedure as task elicitation, and generate new hard tasks in truthfulness, forecasting, social harms, and more.
2025
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics
{Herman Chau, Helen Jenne, Davis Brown}, Jesse He, Mark Raugas, Sara C. Billey, Henry Kvinge
Paper Description
Semistandard Young tableaux
Datasets to evaluate the ability of models to generate new conjectures in research-level algebraic combinatorics. Many of the problems we select are currently open and pose strong challenges to both frontier models and mathematicians using models-as-tools. For each problem, we provide a large amount of associated data and train narrow models to both serve as baselines and objects of study for interpretability.
2024
How Does LLM Compression Affect Weight Exfiltration Attacks?
Davis Brown, JP Rivera, Mantas Mazeika
Paper Description
Model exfiltration diagram. Higher decompression costs enable smaller weights.
Models can be compressed far more than standard practice suggests (e.g., 4 bits-per-parameter) if one is willing to do a bit of additional training to 'decompress' them afterwards. This increases the risk of model weight exfiltration. We find some early evidence that larger models are easier / cheaper to compress in this way.
EMNLP '23
Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Davis Brown, Charles Godfrey, Nicholas Konz, Jonathan Tu, Henry Kvinge
Paper Description
We apply model-diffing methods to compare the hidden layers between different language models. We show that these measures can identify and locate generalization properties of models that are invisible if you just look at test set performance.
HiLD ICML '23
On Privileged and Convergent Bases in Neural Network Representations
{Davis Brown, Nikhil Vyas}, Yamini Bansal
Paper Description
The neuron basis of neural networks 'matters,' in that it is necessary to have one to achieve good performance. However, it does not matter that much, in that the basis is inconsistent across training runs.
NeurIPS '22
On the Symmetries of Deep Learning Models and their Internal Representations
{Charles Godfrey, Davis Brown}, Tegan Emerson, Henry Kvinge
Paper Description
We characterize how activation functions lead to certain symmetries. Then, we provide some experiments on model stitching, where we 'glue' together different hidden layers of models by learning permutations between their respective neurons-- this works surprisingly well.