Artwork

Вміст надано Daniel Filan. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Daniel Filan або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.
Player FM - додаток Podcast
Переходьте в офлайн за допомогою програми Player FM !

29 - Science of Deep Learning with Vikrant Varma

2:13:46
 
Поширити
 

Manage episode 414614825 series 2844728
Вміст надано Daniel Filan. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Daniel Filan або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes. What's going on with these discoveries? Are they all they're cracked up to be, and if so, how are they working? In this episode, I talk to Vikrant Varma about his research getting to the bottom of these questions.

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

Topics we discuss, and timestamps:

0:00:36 - Challenges with unsupervised LLM knowledge discovery, aka contra CCS

0:00:36 - What is CCS?

0:09:54 - Consistent and contrastive features other than model beliefs

0:20:34 - Understanding the banana/shed mystery

0:41:59 - Future CCS-like approaches

0:53:29 - CCS as principal component analysis

0:56:21 - Explaining grokking through circuit efficiency

0:57:44 - Why research science of deep learning?

1:12:07 - Summary of the paper's hypothesis

1:14:05 - What are 'circuits'?

1:20:48 - The role of complexity

1:24:07 - Many kinds of circuits

1:28:10 - How circuits are learned

1:38:24 - Semi-grokking and ungrokking

1:50:53 - Generalizing the results

1:58:51 - Vikrant's research approach

2:06:36 - The DeepMind alignment team

2:09:06 - Follow-up work

The transcript: axrp.net/episode/2024/04/25/episode-29-science-of-deep-learning-vikrant-varma.html

Vikrant's Twitter/X account: twitter.com/vikrantvarma_

Main papers:

- Challenges with unsupervised LLM knowledge discovery: arxiv.org/abs/2312.10029

- Explaining grokking through circuit efficiency: arxiv.org/abs/2309.02390

Other works discussed:

- Discovering latent knowledge in language models without supervision (CCS): arxiv.org/abs/2212.03827

- Eliciting Latent Knowledge: How to Tell if your Eyes Deceive You: https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit

- Discussion: Challenges with unsupervised LLM knowledge discovery: lesswrong.com/posts/wtfvbsYjNHYYBmT3k/discussion-challenges-with-unsupervised-llm-knowledge-1

- Comment thread on the banana/shed results: lesswrong.com/posts/wtfvbsYjNHYYBmT3k/discussion-challenges-with-unsupervised-llm-knowledge-1?commentId=hPZfgA3BdXieNfFuY

- Fabien Roger, What discovering latent knowledge did and did not find: lesswrong.com/posts/bWxNPMy5MhPnQTzKz/what-discovering-latent-knowledge-did-and-did-not-find-4

- Scott Emmons, Contrast Pairs Drive the Performance of Contrast Consistent Search (CCS): lesswrong.com/posts/9vwekjD6xyuePX7Zr/contrast-pairs-drive-the-empirical-performance-of-contrast

- Grokking: Generalizing Beyond Overfitting on Small Algorithmic Datasets: arxiv.org/abs/2201.02177

- Keeping Neural Networks Simple by Minimizing the Minimum Description Length of the Weights (Hinton 1993 L2): dl.acm.org/doi/pdf/10.1145/168304.168306

- Progress measures for grokking via mechanistic interpretability: arxiv.org/abs/2301.0521

Episode art by Hamish Doodles: hamishdoodles.com

  continue reading

49 епізодів

Artwork
iconПоширити
 
Manage episode 414614825 series 2844728
Вміст надано Daniel Filan. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Daniel Filan або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes. What's going on with these discoveries? Are they all they're cracked up to be, and if so, how are they working? In this episode, I talk to Vikrant Varma about his research getting to the bottom of these questions.

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

Topics we discuss, and timestamps:

0:00:36 - Challenges with unsupervised LLM knowledge discovery, aka contra CCS

0:00:36 - What is CCS?

0:09:54 - Consistent and contrastive features other than model beliefs

0:20:34 - Understanding the banana/shed mystery

0:41:59 - Future CCS-like approaches

0:53:29 - CCS as principal component analysis

0:56:21 - Explaining grokking through circuit efficiency

0:57:44 - Why research science of deep learning?

1:12:07 - Summary of the paper's hypothesis

1:14:05 - What are 'circuits'?

1:20:48 - The role of complexity

1:24:07 - Many kinds of circuits

1:28:10 - How circuits are learned

1:38:24 - Semi-grokking and ungrokking

1:50:53 - Generalizing the results

1:58:51 - Vikrant's research approach

2:06:36 - The DeepMind alignment team

2:09:06 - Follow-up work

The transcript: axrp.net/episode/2024/04/25/episode-29-science-of-deep-learning-vikrant-varma.html

Vikrant's Twitter/X account: twitter.com/vikrantvarma_

Main papers:

- Challenges with unsupervised LLM knowledge discovery: arxiv.org/abs/2312.10029

- Explaining grokking through circuit efficiency: arxiv.org/abs/2309.02390

Other works discussed:

- Discovering latent knowledge in language models without supervision (CCS): arxiv.org/abs/2212.03827

- Eliciting Latent Knowledge: How to Tell if your Eyes Deceive You: https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit

- Discussion: Challenges with unsupervised LLM knowledge discovery: lesswrong.com/posts/wtfvbsYjNHYYBmT3k/discussion-challenges-with-unsupervised-llm-knowledge-1

- Comment thread on the banana/shed results: lesswrong.com/posts/wtfvbsYjNHYYBmT3k/discussion-challenges-with-unsupervised-llm-knowledge-1?commentId=hPZfgA3BdXieNfFuY

- Fabien Roger, What discovering latent knowledge did and did not find: lesswrong.com/posts/bWxNPMy5MhPnQTzKz/what-discovering-latent-knowledge-did-and-did-not-find-4

- Scott Emmons, Contrast Pairs Drive the Performance of Contrast Consistent Search (CCS): lesswrong.com/posts/9vwekjD6xyuePX7Zr/contrast-pairs-drive-the-empirical-performance-of-contrast

- Grokking: Generalizing Beyond Overfitting on Small Algorithmic Datasets: arxiv.org/abs/2201.02177

- Keeping Neural Networks Simple by Minimizing the Minimum Description Length of the Weights (Hinton 1993 L2): dl.acm.org/doi/pdf/10.1145/168304.168306

- Progress measures for grokking via mechanistic interpretability: arxiv.org/abs/2301.0521

Episode art by Hamish Doodles: hamishdoodles.com

  continue reading

49 епізодів

Усі епізоди

×
 
Loading …

Ласкаво просимо до Player FM!

Player FM сканує Інтернет для отримання високоякісних подкастів, щоб ви могли насолоджуватися ними зараз. Це найкращий додаток для подкастів, який працює на Android, iPhone і веб-сторінці. Реєстрація для синхронізації підписок між пристроями.

 

Короткий довідник

Слухайте це шоу, досліджуючи
Відтворити