AI interpretability
We build neural networks we can't read. Can we ever truly understand what they learn inside?
What makes this fascinating
We build minds we can't read — Neural networks work remarkably well, yet their internal reasoning is largely a black box.
Mechanistic interpretability — Researchers reverse-engineer the circuits and “features” inside a model, almost neuron by neuron.
Why it matters for safety — We can't fully trust or audit a system whose decisions we can't explain.
Frequently asked questions
- What is AI interpretability?
- It is the effort to understand what happens inside neural networks — why a model produces a given output — rather than treating it as an opaque black box.
- Why can't we understand how AI models work?
- Large models spread their 'knowledge' across billions of numerical weights with no human-readable structure, so their internal reasoning is not directly inspectable.
- Why does interpretability matter?
- Without it we cannot fully trust, debug, or guarantee the safety of AI systems in high-stakes settings — which makes interpretability central to AI safety.
More summits in Computer Science
Artificial general intelligence
Can a machine match the full, flexible breadth of human thought — and how would we know?
The foundations of cryptography
All modern security rests on a bet: that some problems are truly hard. Can we ever prove it?
Quantum computing's true power
Which problems can quantum machines crack that ordinary computers never will?
Provably correct software
Can we ever write software proven, mathematically, to be free of bugs and exploits?
Ready to climb?
Learn it the whole way up — from the fundamentals to the frontier.