We need safe AI
AI systems have made remarkable progress over the past decade. Yet as AI systems become more capable, they have started to raise serious safety concerns. Today's AI systems face issues around bias, environmental impact, and surveillance. Tomorrow's more capable, autonomous AI systems could pose even greater challenges – from misuse by malicious actors to the possibility of systems pursuing harmful objectives at scale.
As we race to develop solutions to these challenges, we already have a blueprint for a flexible and safer intelligence: the human brain. We've evolved sophisticated mechanisms for safe exploration, graceful handling of novel situations, and cooperation. Understanding and reverse-engineering these neural mechanisms could be key to developing AI systems that are aligned with human values.
The human brain might seem like a counterintuitive model for developing safe AI systems: we engage in war, exhibit systematic biases, and often fall short of our lofty ambitions. However, our brains have specific properties that are worth emulating from an AI safety perspective. What we propose is a selective approach to studying the brain as a blueprint for safe AI systems.
We've prepared an exhaustive technical roadmap to make this vision a reality. Our goals are:
- To galvanize the community around a coordinated research effort.
- To help decision makers and funders understand promising approaches.
- To orient neuroscientists and AI researchers technically in this space.
This website hosts this high-level introduction and takeaways for decision makers, the technical roadmap, and soon, policy pieces.
Let's be ambitious about neuroscience
To date, traditional neuroscience has moved far too slowly to impact AI development on relevant timescales. The pace of capability advancement far outstrips our ability to study and understand biological intelligence through conventional means. If neuroscience is to meaningfully contribute to AI safety, we need to dramatically accelerate our ability to record, analyze, simulate, and understand neural systems.
The good news is that the catalysts for large-scale neuroscience are already here, thanks in part to massive investments made by the BRAIN Initiative in the past decade. New recording technologies can capture the activity of thousands of neurons simultaneously. Advanced microscopy techniques let us map neural circuits with unprecedented precision. High-throughput behavioral systems allow us to study complex cognitive behaviors at scale. Virtual neuroscience is far more feasible than in the past thanks to dramatically lower compute costs and advances in tool-based machine learning.
The time to begin this work is now. Current AI systems are already powerful enough to raise serious safety concerns, yet still far from human-level capabilities in many domains. This gives us a crucial window of opportunity to understand and implement safety mechanisms inspired by neuroscience before more advanced AI systems are developed.
This effort won’t just benefit AI safety – it will help us understand the brain. It could speed up the translational timelines for new neurotechnologies, lead to breakthroughs in treating neurological conditions, and allow neuroscientists to do experiments cheaper and faster. This creates a "default good" scenario where even if the direct impact on AI safety is smaller than hoped, other fields will benefit greatly.
We hope you see what we see – a differential path toward a more human-like AI, advancing neuroscience, neurotechnology, and our understanding of neurological diseases along the way.
7 ways neuroscience can make AI safer
We've organized our roadmap around these 7 themes. For each, we perform in-depth technical analysis, identify key bottlenecks and make recommendations for further research and investment:
- Reverse-engineer the representations of sensory systems. Understanding how the brain achieves robust perception and handles novel situations could help us build AI systems that are more resistant to adversarial attacks and better at generalizing to new situations.
- Create embodied digital twins. Functional simulations of brain activity combined with physical models of bodies and environments could help us understand how embodied cognition contributes to safe and robust behavior.
- Develop detailed simulations. Creating detailed biophysical simulations of neural circuits could capture the fundamental constraints of biological intelligence, which serve as templates for building safer AI systems.
- Build better cognitive architectures. Based on our understanding of how the brain implements capabilities like theory of mind, causal reasoning, and cooperation, we could build modular, probabilistic and transparent cognitive architectures that better align with human values and intentions.
- Advance brain-informed process supervision. Using neural and behavioral data, we could fine-tune existing AI models to better align with brains and encourage safe behavior.
- Reverse-engineer loss functions of the brain. Using functional, structural and behavioral data to determine the loss functions of the brain, we could derive better training objectives for AI systems.
- Leverage neuroscience-inspired methods for mechanistic interpretability. We could apply the tools of neuroscience we use to study biological neural networks to understand artificial ones, and vice-versa. This could help make AI systems more transparent and verifiable, and help us accelerate learning from the brain.
Importantly, these approaches are not independent – progress in one area means progress in others. We cannot afford to pursue these directions sequentially – we need a coordinated effort that advances them all in parallel. This means investing in neurotechnology development, scaling up neural recording capabilities, and building neural models at scale across abstraction levels.
Who are we?
We're a group of neuroscientists and AI researchers supporting ambitious research in NeuroAI. This roadmap was led by Patrick Mineault, PhD. It was supported by the Amaranth Foundation. Read more about the authors here.
Email us at neuroaisafety@amaranth.foundation, or sign up to our newsletter for updates.