Artwork

Вміст надано The TDS team. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією The TDS team або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.
Player FM - додаток Podcast
Переходьте в офлайн за допомогою програми Player FM !

110. Alex Turner - Will powerful AIs tend to seek power?

46:57
 
Поширити
 

Manage episode 317972145 series 2546508
Вміст надано The TDS team. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією The TDS team або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

Today’s episode is somewhat special, because we’re going to be talking about what might be the first solid quantitative study of the power-seeking tendencies that we can expect advanced AI systems to have in the future.

For a long time, there’s kind of been this debate in the AI safety world, between:

  • People who worry that powerful AIs could eventually displace, or even eliminate humanity altogether as they find more clever, creative and dangerous ways to optimize their reward metrics on the one hand, and
  • People who say that’s Terminator-bating Hollywood nonsense that anthropomorphizes machines in a way that’s unhelpful and misleading.

Unfortunately, recent work in AI alignment — and in particular, a spotlighted 2021 NeurIPS paper — suggests that the AI takeover argument might be stronger than many had realized. In fact, it’s starting to look like we ought to expect to see power-seeking behaviours from highly capable AI systems by default. These behaviours include things like AI systems preventing us from shutting them down, repurposing resources in pathological ways to serve their objectives, and even in the limit, generating catastrophes that would put humanity at risk.

As concerning as these possibilities might be, it’s exciting that we’re starting to develop a more robust and quantitative language to describe AI failures and power-seeking. That’s why I was so excited to sit down with AI researcher Alex Turner, the author of the spotlighted NeurIPS paper on power-seeking, and discuss his path into AI safety, his research agenda and his perspective on the future of AI on this episode of the TDS podcast.

***

Intro music:

➞ Artist: Ron Gelinas

➞ Track Title: Daybreak Chill Blend (original mix)

➞ Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

- 2:05 Interest in alignment research

- 8:00 Two camps of alignment research

- 13:10 The NeurIPS paper

- 17:10 Optimal policies

- 25:00 Two-piece argument

- 28:30 Relaxing certain assumptions

- 32:45 Objections to the paper

- 39:00 Broader sense of optimization

- 46:35 Wrap-up

  continue reading

132 епізодів

Artwork
iconПоширити
 
Manage episode 317972145 series 2546508
Вміст надано The TDS team. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією The TDS team або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

Today’s episode is somewhat special, because we’re going to be talking about what might be the first solid quantitative study of the power-seeking tendencies that we can expect advanced AI systems to have in the future.

For a long time, there’s kind of been this debate in the AI safety world, between:

  • People who worry that powerful AIs could eventually displace, or even eliminate humanity altogether as they find more clever, creative and dangerous ways to optimize their reward metrics on the one hand, and
  • People who say that’s Terminator-bating Hollywood nonsense that anthropomorphizes machines in a way that’s unhelpful and misleading.

Unfortunately, recent work in AI alignment — and in particular, a spotlighted 2021 NeurIPS paper — suggests that the AI takeover argument might be stronger than many had realized. In fact, it’s starting to look like we ought to expect to see power-seeking behaviours from highly capable AI systems by default. These behaviours include things like AI systems preventing us from shutting them down, repurposing resources in pathological ways to serve their objectives, and even in the limit, generating catastrophes that would put humanity at risk.

As concerning as these possibilities might be, it’s exciting that we’re starting to develop a more robust and quantitative language to describe AI failures and power-seeking. That’s why I was so excited to sit down with AI researcher Alex Turner, the author of the spotlighted NeurIPS paper on power-seeking, and discuss his path into AI safety, his research agenda and his perspective on the future of AI on this episode of the TDS podcast.

***

Intro music:

➞ Artist: Ron Gelinas

➞ Track Title: Daybreak Chill Blend (original mix)

➞ Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

- 2:05 Interest in alignment research

- 8:00 Two camps of alignment research

- 13:10 The NeurIPS paper

- 17:10 Optimal policies

- 25:00 Two-piece argument

- 28:30 Relaxing certain assumptions

- 32:45 Objections to the paper

- 39:00 Broader sense of optimization

- 46:35 Wrap-up

  continue reading

132 епізодів

Усі епізоди

×
 
Loading …

Ласкаво просимо до Player FM!

Player FM сканує Інтернет для отримання високоякісних подкастів, щоб ви могли насолоджуватися ними зараз. Це найкращий додаток для подкастів, який працює на Android, iPhone і веб-сторінці. Реєстрація для синхронізації підписок між пристроями.

 

Короткий довідник