Introduction to Mechanistic Interpretability
Manage episode 458945499 series 3498845
Our introduction introduces common mech interp concepts, to prepare you for the rest of this session's resources.
Original text: https://aisafetyfundamentals.com/blog/introduction-to-mechanistic-interpretability/
Author(s): Sarah Hastings-Woodhouse
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.
Розділи
1. Introduction to Mechanistic Interpretability (00:00:00)
2. Why might mechanistic interpretability be useful? (00:01:16)
3. Looking inside neural networks (00:03:34)
4. What makes mechanistic interpretability hard? (00:06:33)
5. Addressing polysemanticity (00:08:34)
85 епізодів