Artwork

Вміст надано Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.
Player FM - додаток Podcast
Переходьте в офлайн за допомогою програми Player FM !

Next-Gen Data Modeling, Integrity, and Governance with YODA

55:55
 
Поширити
 

Manage episode 357217977 series 2510642
Вміст надано Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.
Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.
The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.
This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation.
To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer.
Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more.
ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles.
EPISODE LINKS

  continue reading

Розділи

1. Intro (00:00:00)

2. What is Yotpo? (00:02:29)

3. Building an ETL framework based on Spark (00:05:25)

4. What is Apache Spark? (00:10:18)

5. Decoupling the data model (00:15:40)

6. Using data mesh principles (00:18:51)

7. How to address different data personas (00:22:24)

8. What is the "shift left" movement? (00:26:35)

9. How can organizations change the way they treat their data? (00:28:47)

10. Use-cases for tooling and documenting data sets (00:31:01)

11. Schema vs. schema-less (00:32:07)

12. What is YODA? (00:40:07)

13. Takeaways from the conversation with Doron and Liran (00:48:35)

14. It's a wrap! (00:52:45)

265 епізодів

Artwork
iconПоширити
 
Manage episode 357217977 series 2510642
Вміст надано Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.
Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.
The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.
This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation.
To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer.
Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more.
ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles.
EPISODE LINKS

  continue reading

Розділи

1. Intro (00:00:00)

2. What is Yotpo? (00:02:29)

3. Building an ETL framework based on Spark (00:05:25)

4. What is Apache Spark? (00:10:18)

5. Decoupling the data model (00:15:40)

6. Using data mesh principles (00:18:51)

7. How to address different data personas (00:22:24)

8. What is the "shift left" movement? (00:26:35)

9. How can organizations change the way they treat their data? (00:28:47)

10. Use-cases for tooling and documenting data sets (00:31:01)

11. Schema vs. schema-less (00:32:07)

12. What is YODA? (00:40:07)

13. Takeaways from the conversation with Doron and Liran (00:48:35)

14. It's a wrap! (00:52:45)

265 епізодів

Tất cả các tập

×
 
Loading …

Ласкаво просимо до Player FM!

Player FM сканує Інтернет для отримання високоякісних подкастів, щоб ви могли насолоджуватися ними зараз. Це найкращий додаток для подкастів, який працює на Android, iPhone і веб-сторінці. Реєстрація для синхронізації підписок між пристроями.

 

Короткий довідник