Austin Meek

I am a researcher focused on AI safety and digital minds. Currently I'm working on a mix of monitoring and control research and on building evaluations for digital minds. I'm funded by UK AISI's Alignment Project and Longview Philanthropy's Digital Sentience Consortium. I'm also a PhD candidate at the University of Delaware, where I've done computational neuroscience research alongside the above topics. Previously I was a MATS scholar where I worked on evaluations for chain-of-thought monitorability.

I'm interested in how we instill values and preferences, and other types of high level cognitive phenomena, in current frontier models. Building infrastructure and monitoring systems to catch misalignment also allows us to better understand the failure modes of our current attempts at the alignment problem, and to iterate on them for future iterations while protecting us in the present. I'm interested in both black-box and white-box methods to make progress on these research questions.

I've previously collaborated with researchers at the Center for AI Safety, Anthropic, and other safety organizations. I'm open to further collaborations in general, and if you have similar research interests I'd like to hear about some of your current projects and ideas! I've also mentored for the SPAR program in the past and am happy to meet new collaborators through that.