Artwork

Вміст надано JACK WAUDBY. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією JACK WAUDBY або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.
Player FM - додаток Podcast
Переходьте в офлайн за допомогою програми Player FM !

Alex Isenko | Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines | #1

24:32
 
Поширити
 

Архівні серії ("Канал неактуальний" status)

When? This feed was archived on October 13, 2022 19:36 (1+ y ago). Last successful fetch was on August 21, 2022 08:18 (1+ y ago)

Why? Канал неактуальний status. Нашим серверам не вдалося отримати доступ до каналу подкасту протягом тривалого періоду часу.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 337873979 series 3383877
Вміст надано JACK WAUDBY. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією JACK WAUDBY або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

Summary:

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines.


In this interview Alex talks about his in-depth analysis of data preprocessing pipelines from four different machine learning domains. Additionally, he discusses a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Alex and his collaborators have developed an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying their generated insights to real-world use-cases, an increased throughput of 3x to 13x can be obtained compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.

Questions:

0:36 - Can you explain to our listeners what is a deep learning pipeline?

1:33 - In this pipepline how does data pre-processing become a bottleneck?

5:40 - In the paper you analyse several different domains, can you go into more details about the domains and pipelines?

6:49 - What are the key insights from your analysis?

8:28 - What are the other insights?

13:23 - Your paper introduces PRESTO the opens source profiling library, can you tell us more about that?

15:56 - How does this compare to other tools in the space?

18:46 - Who will find PRESTO useful?

20:13 - What is the most interesting, unexpected, or challenging lesson you encountered whilst working on this topic?

22:10 - What do you have planned for future research?

Links:

Contact Info:



Our GDPR privacy policy was updated on August 8, 2022. Visit acast.com/privacy for more information.

  continue reading

11 епізодів

Artwork
iconПоширити
 

Архівні серії ("Канал неактуальний" status)

When? This feed was archived on October 13, 2022 19:36 (1+ y ago). Last successful fetch was on August 21, 2022 08:18 (1+ y ago)

Why? Канал неактуальний status. Нашим серверам не вдалося отримати доступ до каналу подкасту протягом тривалого періоду часу.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 337873979 series 3383877
Вміст надано JACK WAUDBY. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією JACK WAUDBY або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

Summary:

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines.


In this interview Alex talks about his in-depth analysis of data preprocessing pipelines from four different machine learning domains. Additionally, he discusses a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Alex and his collaborators have developed an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying their generated insights to real-world use-cases, an increased throughput of 3x to 13x can be obtained compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.

Questions:

0:36 - Can you explain to our listeners what is a deep learning pipeline?

1:33 - In this pipepline how does data pre-processing become a bottleneck?

5:40 - In the paper you analyse several different domains, can you go into more details about the domains and pipelines?

6:49 - What are the key insights from your analysis?

8:28 - What are the other insights?

13:23 - Your paper introduces PRESTO the opens source profiling library, can you tell us more about that?

15:56 - How does this compare to other tools in the space?

18:46 - Who will find PRESTO useful?

20:13 - What is the most interesting, unexpected, or challenging lesson you encountered whilst working on this topic?

22:10 - What do you have planned for future research?

Links:

Contact Info:



Our GDPR privacy policy was updated on August 8, 2022. Visit acast.com/privacy for more information.

  continue reading

11 епізодів

Усі епізоди

×
 
Loading …

Ласкаво просимо до Player FM!

Player FM сканує Інтернет для отримання високоякісних подкастів, щоб ви могли насолоджуватися ними зараз. Це найкращий додаток для подкастів, який працює на Android, iPhone і веб-сторінці. Реєстрація для синхронізації підписок між пристроями.

 

Короткий довідник