Hi, I'm Trenton.

I’m a Member of Technical Staff on the Mechanistic Interpretability team at Anthropic. A nice overview of our mission can be found here. I’m currently working on using dictionary learning to disentangle superposition in artificial neural networks.

Information about me:

  • I’ve paused (for the forseeable future) my PhD research. I was investigating the extent to which there is convergence between machine learning and neuroscience. This work was done through the “Systems, Synthetic and Quantitative Biology” Program at Harvard in the Kreiman Lab and supported by the NSF Graduate Research Fellowship. I also spent time at the Berkeley Redwood Center for Theoretical Neuroscience as a visiting researcher.
  • I graduated from Duke University in May 2020 with a self-made major in “Minds and Machines: Biological and Artificial Intelligence”. I was lucky to attend as a Robertson Scholar, which provided full funding during all four years, including summer experiences.
  • During my time at Duke, I spent a year (June 2018 - May 2019) doing research in Dr. Michael Lynch’s Lab attempting to use machine learning to design new CRISPR guide RNAs for safer, more effective genome editing. Afterwards, I was affiliated with Dr. Debora Marks’s Lab at Harvard Medical School, first as a summer intern and then throughout my Senior year including my Senior Thesis research.

I am involved in the movement/philosophy/set of ideas that is Effective Altruism. I am also a fan of prediction markets and make public forecasts on Metaculus here. If the world was void of both interesting research questions and global catastrophic risks(!), you’d find me backpacking around the world with my film camera. I still try to do this when I have time off and get the chance to travel somewhere cool.

Have any feedback for me? Please consider filling out this anonymous feedback form so I can learn and grow.

Publications (in reverse chronological order):

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Adly Templeton*, Tom Conerly*, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, Hoagy Cunningham, Nicholas L Turner, Callum McDougall, Monte MacDiarmid, Alex Tamkin, Esin Durmus, Tristan Hume, Francesco Mosconi, C. Daniel Freeman, Theodore R. Sumers, Edward Rees, Joshua Batson, Adam Jermyn, Shan Carter, Chris Olah, Tom Henighan
*(Core Contributor)
Anthropic, May 2024
[paper] [blog-post] [tweet-thread]

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Trenton Bricken*, Adly Templeton*, Joshua Batson*, Brian Chen*, Adam Jermyn*, Tom Conerly, Nicholas L Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carter, Tom Henighan, Chris Olah
*(Core Contributor)
Anthropic, October 2023
[paper] [blog-post] [tweet-thread]

Emergence of Sparse Representations from Noise
Trenton Bricken*, Rylan Schaeffer, Bruno Olshausen, Gabriel Kreiman
*(First author)
ICML, May 2023
[paper]

Sparse Distributed Memory is a Continual Learner
Trenton Bricken*, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel Kreiman
*(First author)
ICLR, September 2022
[paper] [tweet-thread]

Attention Approximates Sparse Distributed Memory
Trenton Bricken*, Cengiz Pehlevan
*(First author)
NeurIPS, December 2021
[paper] [blog-post] [tweet-thread]

MIT Center for Brains Minds+ Machines Talk:

I gave a longer talk that enabled me to cover more of SDM’s biological plausibilty to the VSA Online community here.

High-content screening of coronavirus genes for innate immune suppression reveals enhanced potency of SARS-CoV-2 proteins.
Erika J Olson*, David M Brown*, Timothy Z Chang, Lin Ding, Tai L Ng, H. Sloane Weiss, Peter Koch, Yukiye Koide, Nathan Rollins, Pia Mach, Tobias Meisinger, Trenton Bricken, Joshus Rollins, Yun Zhang, Colin Molloy, Yun Zhang, Briodget N Queenan, Timothy Mitchison, Debora Marks, Jeffrey C Way, John I Glass, Pamela A Silver
*(First authors)
bioRxiv, March 2021
[preprint] [tweet-thread]

Computationally Optimized SARS-CoV-2 MHC Class I and II Vaccine Formulations Predicted to Target Human Haplotype Distributions.
Ge Liu*, Brandon Carter*, Trenton Bricken, Siddhartha Jain, Mathias Viard, Mary Carrington, David K Gifford
*(First authors)
Cell Systems, July 2020
[paper] [code] [preprint] [tweet-thread]

My Google Scholar profile can be found here.

Invited Talks:

Podcasts:

Past Projects (in reverse chronological order):

  • Upside Down Free Energy - Fall 2020 - Motivated by progress in “Upside Down” supervised reinforcement learning, I tried to connect it Friston’s Free Energy Principle (FEP) and develop more hierarchical versions of FEP. This required first implementing benchmarks of the existing “Upside Down” RL algorithms (see next entry). I was starting to get somewhat interesting results but RL is really hard and I started down the rabbit hole of Sparse Distributed Memory. See the GitHub repository for a draft PDF write up. Thanks to Beren Millidge and Alec Tschantz for their supervision and discussions about this project.

  • RewardConditionedUDRL - Fall 2020 - Open source codebase combining implementations of Reward Conditioned Policies and Training Agents using Upside-Down Reinforcement Learning. The former had no public implementation and the latter had a few implemented as Jupyter Notebooks but that had a number of issues I flagged eg. here and here. I hope this open source codebase will serve to both fully replicate the aforementioned papers and be used as a starting point for further research in the exciting domain of supervised RL.

  • SARS-CoV-2 mutation effects and 3D structure prediction from sequence covariation. - Summer 2020 - Collaborated with the Marks lab to help produce their SARS-CoV-2 mutation effect and 3D structure predictions using EVCouplings.
    Website: https://marks.hms.harvard.edu/sars-cov-2

  • RL Learning Byzantine Fault Tolerant (BFT) Consensus Protocols - Senior Year - Supervised by Dr. Kartik Nayak, final class project turned research project. Investigated the ability for deep reinforcement learning agents to discover and prove BFT consensus protocols. This was a great way to learn more about reinforcement learning but the tasks were too difficult for the agents to learn given the algorithms we were attempting to use. A write up of the project and uncleaned version of the codebase is available here.

  • Protein Generation and Optimization - Supervised by Dr. Debora Marks’s Lab as my Senior Thesis - This research was motivated by the promise of recent developments in our ability to predict protein functionality and the problem of finding novel sequences that maximize this prediction. We tried developing a new solution using invertible neural networks and variational inference to approximate the intractable distribution of any protein function predictor with reason to believe it would outperform Markov Chain Monte Carlo methods. My senior thesis write up of the work and where it seemed to succeed and fail can be found with the codebase here.

  • PyTorch Discrete Normalizing Flows - Winter Break 2019 - Learning about Discrete Normalizing Flows from “Discrete Flows: Invertible Generative Models of Discrete Data”, by Dustin Tran et al. https://arxiv.org/pdf/1905.10347.pdf, I tried implementing them using the coded provided in edward2 but found that none of it worked. I ended up porting all of the code into PyTorch, which required making a number of modifications and getting it working on a toy example. This repo as of October 2022 has 95 Github stars and two developers have reached out to collaborate and help me replicate the results.

  • Tail Free Sampling - Independent project, advice from friends and mentors - Developed a new method to sample sequences from autoregressive neural networks for open-ended sequence generation.

  • Primary and Tertiary Protein AutoEncoder - Final Class Project - Investigated if a deep AutoEncoder could learn the relationship between protein sequence and tertiary structure in order to then do either sequence or structure optimization in the latent space. It didn’t work very well but I learned a lot!

  • Facebook Chatbot for Spaced Repetition Learning - HackDuke 2016 - Spaced Repetition is both wonderful and highly neglected. Can we make it more popular and easy to do routinely using a Facebook Chatbot to both harass and motivate us? Got everything working! But there were always more bugs and this didn’t solve the fundamental problem of spaced repetition learning still taking a huge amount of motivation. You could argue that presenting the cards over Messenger just created more distractions.

Other Locations on the Interwebs

I am pretty active on Twitter where you can DM me if you’re trying to get in touch. I sometimes upload my film photography to Instagram and to my portfolio.