Hi, I'm Trenton.

I'm a Member of Technical Staff on the Alignment Science team at Anthropic. I'm currently enabling Claude to automatically audit and detect misalignment.

About Me

If the world was void of both interesting research questions and global catastrophic risks(!), you'd find me backpacking around the world with my film camera. I still try to do this when I have time off and get the chance to travel somewhere cool.

Publications

Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
Thomas Jiralerspong, Trenton Bricken
arXiv, February 2026
Findings from a Pilot Anthropic—OpenAI Alignment Evaluation Exercise
Samuel R. Bowman, Megha Srivastava, Jon Kutasov, Rowan Wang, Trenton Bricken, Benjamin Wright, Ethan Perez, and Nicholas Carlini
Anthropic, August 2025
Building and evaluating alignment auditing agents
Trenton Bricken, Rowan Wang, Sam Bowman, Euan Ong, Johannes Treutlein, Jeff Wu, Evan Hubinger, Samuel Marks
Anthropic, July 2025
On the Biology of a Large Language Model
Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivoire, Thomas Conerly, Chris Olah, Joshua Batson*
Anthropic, March 2025
Circuit Tracing: Revealing Computational Graphs in Language Models
Emmanuel Ameisen*, Jack Lindsey*, Adam Pearce*, Wes Gurnee*, Nicholas L. Turner*, Brian Chen*, Craig Citro*, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivoire, Thomas Conerly, Chris Olah, Joshua Batson*
Anthropic, March 2025
Auditing Language Models for Hidden Objectives
Samuel Marks, Johannes Treutlein, Trenton Bricken, Jack Lindsey, Jonathan Marcus, Siddharth Mishra-Sharma, Daniel Ziegler, Emmanuel Ameisen, Joshua Batson, Tim Belonax, Samuel R. Bowman, Shan Carter, Brian Chen, Hoagy Cunningham, Carson Denison, Florian Dietz, Satvik Golechha, Akbir Khan, Jan Kirchner, Jan Leike, Austin Meek, Kei Nishimura-Gasparian, Euan Ong, Christopher Olah, Adam Pearce, Fabien Roger, Jeanne Salle, Andy Shih, Meg Tong, Drake Thomas, Kelley Rivoire, Adam Jermyn, Monte MacDiarmid, Tom Henighan, Evan Hubinger
Anthropic, March 2025
Insights on Crosscoder Model Diffing
Siddharth Mishra-Sharma, Trenton Bricken, Jack Lindsey, Adam Jermyn, Jonathan Marcus, Kelley Rivoire, Christopher Olah, Thomas Henighan
Anthropic, February 2025
Stage-Wise Model Diffing
Trenton Bricken, Siddharth Mishra-Sharma, Jonathan Marcus, Adam Jermyn, Christopher Olah, Kelley Rivoire, Thomas Henighan
Anthropic, December 2024
Using Dictionary Learning Features as Classifiers
Trenton Bricken, Jonathan Marcus, Siddharth Mishra-Sharma, Meg Tong, Ethan Perez, Mrinank Sharma, Kelley Rivoire, Thomas Henighan; edited by Adam Jermyn
Anthropic, October 2024
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Adly Templeton*, Tom Conerly*, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, Hoagy Cunningham, Nicholas L Turner, Callum McDougall, Monte MacDiarmid, Alex Tamkin, Esin Durmus, Tristan Hume, Francesco Mosconi, C. Daniel Freeman, Theodore R. Sumers, Edward Rees, Joshua Batson, Adam Jermyn, Shan Carter, Chris Olah, Tom Henighan
Anthropic, May 2024
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Trenton Bricken*, Adly Templeton*, Joshua Batson*, Brian Chen*, Adam Jermyn*, Tom Conerly, Nicholas L Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carter, Tom Henighan, Chris Olah
Anthropic, October 2023
Emergence of Sparse Representations from Noise
Trenton Bricken*, Rylan Schaeffer, Bruno Olshausen, Gabriel Kreiman
ICML, May 2023
Sparse Distributed Memory is a Continual Learner
Trenton Bricken*, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel Kreiman
ICLR, September 2022
Attention Approximates Sparse Distributed Memory
Trenton Bricken*, Cengiz Pehlevan
NeurIPS, December 2021
High-content screening of coronavirus genes for innate immune suppression reveals enhanced potency of SARS-CoV-2 proteins.
Erika J Olson*, David M Brown*, Timothy Z Chang, Lin Ding, Tai L Ng, H. Sloane Weiss, Peter Koch, Yukiye Koide, Nathan Rollins, Pia Mach, Tobias Meisinger, Trenton Bricken, Joshus Rollins, Yun Zhang, Colin Molloy, Yun Zhang, Briodget N Queenan, Timothy Mitchison, Debora Marks, Jeffrey C Way, John I Glass, Pamela A Silver
bioRxiv, March 2021
Computationally Optimized SARS-CoV-2 MHC Class I and II Vaccine Formulations Predicted to Target Human Haplotype Distributions.
Ge Liu*, Brandon Carter*, Trenton Bricken, Siddhartha Jain, Mathias Viard, Mary Carrington, David K Gifford
Cell Systems, July 2020

See Google Scholar for a full list of publications and citation record.

Talks

Podcasts

Dwarkesh Podcast (2025):

Dwarkesh Podcast (2024):

Contact

I am pretty active on Twitter. My DMs are open and you should feel free to reach out but I can't promise I'll be good at replying! I sometimes upload my film photography to Instagram and to my portfolio.