Artwork

Вміст надано Daniel Bashir. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Daniel Bashir або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.
Player FM - додаток Podcast
Переходьте в офлайн за допомогою програми Player FM !

Jonathan Frankle: From Lottery Tickets to LLMs

1:08:22
 
Поширити
 

Manage episode 380972969 series 2975159
Вміст надано Daniel Bashir. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Daniel Bashir або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

In episode 96 of The Gradient Podcast, Daniel Bashir speaks to Jonathan Frankle.

Jonathan is the Chief Scientist at MosaicML and (as of release). Jonathan completed his PhD at MIT, where he investigated the properties of sparse neural networks that allow them to train effectively through his lottery ticket hypothesis. He also spends a portion of his time working on technology policy, and currently works with the OECD to implement the AI principles he helped develop in 2019.

Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pub

Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter

Outline:

* (00:00) Intro

* (02:35) Jonathan’s background and work

* (04:25) Origins of the Lottery Ticket Hypothesis

* (06:00) Jonathan’s empiricism and approach to science

* (08:25) More Karl Popper discourse + hot takes

* (09:45) Walkthrough of the Lottery Ticket Hypothesis

* (12:00) Issues with the Lottery Ticket Hypothesis as a statement

* (12:30) Jonathan’s advice for PhD students, on asking good questions

* (15:55) Strengths and Promise of the Lottery Ticket Hypothesis

* (18:55) More Lottery Ticket Hypothesis Papers

* (19:10) Comparing Rewinding and Fine-tuning

* (23:00) Care in making experimental choices

* (25:05) Linear Mode Connectivity and the Lottery Ticket Hypothesis

* (27:50) On what is being measured and how

* (28:50) “The outcome of optimization is determined to a linearly connected region”

* (31:15) On good metrics

* (32:54) On the Predictability of Pruning Across Scales — scaling laws for pruning

* (34:40) The paper’s takeaway

* (38:45) Pruning Neural Networks at Initialization — on a scientific disagreement

* (45:00) On making takedown papers useful

* (46:15) On what can be known early in training

* (49:15) Jonathan’s perspective on important research questions today

* (54:40) MosaicML

* (55:19) How Mosaic got started

* (56:17) Mosaic highlights

* (57:33) Customer stories

* (1:00:30) Jonathan’s work and perspectives on AI policy

* (1:05:45) The key question: what we want

* (1:07:35) Outro

Links:

* Jonathan’s homepage and Twitter

* Papers

* The Lottery Ticket Hypothesis and follow-up work

* Comparing Rewinding and Fine-tuning in Neural Network Pruning

* Linear Mode Connectivity and the LTH

* On the Predictability of Pruning Across Scales

* Pruning Neural Networks at Initialization: Why Are We Missing The Mark?

* Desirable Inefficiency


Get full access to The Gradient at thegradientpub.substack.com/subscribe
  continue reading

150 епізодів

Artwork
iconПоширити
 
Manage episode 380972969 series 2975159
Вміст надано Daniel Bashir. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Daniel Bashir або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

In episode 96 of The Gradient Podcast, Daniel Bashir speaks to Jonathan Frankle.

Jonathan is the Chief Scientist at MosaicML and (as of release). Jonathan completed his PhD at MIT, where he investigated the properties of sparse neural networks that allow them to train effectively through his lottery ticket hypothesis. He also spends a portion of his time working on technology policy, and currently works with the OECD to implement the AI principles he helped develop in 2019.

Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pub

Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter

Outline:

* (00:00) Intro

* (02:35) Jonathan’s background and work

* (04:25) Origins of the Lottery Ticket Hypothesis

* (06:00) Jonathan’s empiricism and approach to science

* (08:25) More Karl Popper discourse + hot takes

* (09:45) Walkthrough of the Lottery Ticket Hypothesis

* (12:00) Issues with the Lottery Ticket Hypothesis as a statement

* (12:30) Jonathan’s advice for PhD students, on asking good questions

* (15:55) Strengths and Promise of the Lottery Ticket Hypothesis

* (18:55) More Lottery Ticket Hypothesis Papers

* (19:10) Comparing Rewinding and Fine-tuning

* (23:00) Care in making experimental choices

* (25:05) Linear Mode Connectivity and the Lottery Ticket Hypothesis

* (27:50) On what is being measured and how

* (28:50) “The outcome of optimization is determined to a linearly connected region”

* (31:15) On good metrics

* (32:54) On the Predictability of Pruning Across Scales — scaling laws for pruning

* (34:40) The paper’s takeaway

* (38:45) Pruning Neural Networks at Initialization — on a scientific disagreement

* (45:00) On making takedown papers useful

* (46:15) On what can be known early in training

* (49:15) Jonathan’s perspective on important research questions today

* (54:40) MosaicML

* (55:19) How Mosaic got started

* (56:17) Mosaic highlights

* (57:33) Customer stories

* (1:00:30) Jonathan’s work and perspectives on AI policy

* (1:05:45) The key question: what we want

* (1:07:35) Outro

Links:

* Jonathan’s homepage and Twitter

* Papers

* The Lottery Ticket Hypothesis and follow-up work

* Comparing Rewinding and Fine-tuning in Neural Network Pruning

* Linear Mode Connectivity and the LTH

* On the Predictability of Pruning Across Scales

* Pruning Neural Networks at Initialization: Why Are We Missing The Mark?

* Desirable Inefficiency


Get full access to The Gradient at thegradientpub.substack.com/subscribe
  continue reading

150 епізодів

Усі епізоди

×
 
Loading …

Ласкаво просимо до Player FM!

Player FM сканує Інтернет для отримання високоякісних подкастів, щоб ви могли насолоджуватися ними зараз. Це найкращий додаток для подкастів, який працює на Android, iPhone і веб-сторінці. Реєстрація для синхронізації підписок між пристроями.

 

Короткий довідник

Слухайте це шоу, досліджуючи
Відтворити