Computer Science

AI interpretability

We build neural networks we can't read. Can we ever truly understand what they learn inside?

What makes this fascinating

Frequently asked questions

What is AI interpretability?
It is the effort to understand what happens inside neural networks — why a model produces a given output — rather than treating it as an opaque black box.
Why can't we understand how AI models work?
Large models spread their 'knowledge' across billions of numerical weights with no human-readable structure, so their internal reasoning is not directly inspectable.
Why does interpretability matter?
Without it we cannot fully trust, debug, or guarantee the safety of AI systems in high-stakes settings — which makes interpretability central to AI safety.

More summits in Computer Science

Ready to climb?

Learn it the whole way up — from the fundamentals to the frontier.