Кращі подкасти про Data Engineering (2026)

1
How Snowflake Uses Airflow Sensors To Keep Financial Metrics Accurate with Ayush Pradhan 18:28

2d ago18:28

18:28

Clean financial reporting depends on orchestration that catches issues early, prevents bad data from spreading and helps teams react fast when something breaks. In this episode, Ayush Pradhan, Senior Analytics Engineer at Snowflake, joins us to explain how Snowflake’s finance data team relies on @Apache Airflow, sensors and dbt to keep revenue, cos…

1
Beyond Dashboards: How Data Teams Earn a Seat at the Table 49:21

6d ago49:21

49:21

Summary In this episode Goutham Budati about his Data–Perspective–Action framework and how it empowers data teams to become true business partners. Gautham traces his path from automating Excel reports to leading high‑impact data organizations, then breaks down why technical excellence alone isn’t enough: teams must pair reliable data systems with …

1
The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft 25:46

25d ago25:46

25:46

In this episode of the Data Engineering Show, host Benjamin Wagner sits down with Ritesh Varyani, Staff Software Engineer at Lyft, to explore how the company manages a sophisticated multi-engine data stack serving thousands of engineers, while simultaneously integrating AI across infrastructure and user-facing analytics. What You'll Learn: How to a…

1
Unfreezing The Data Lake: The Future-Proof File Format 59:24

13d ago59:24

59:24

Summary In this episode PhD researcher Xinyu Zeng talks about F3, the “future-proof file format” designed to address today’s hardware realities and evolving workloads. He digs into the limitations of Parquet and ORC - especially CPU-bound decoding, metadata overhead for wide-table projections, and poor random-access behavior for ML training and ser…

1
From Context to Semantics: How Metadata Powers Agentic AI 1:06:17

20d ago1:06:17

1:06:17

Summary In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms are evolving from human-centric catalogs into the foundational context layer for AI and agentic systems. They discuss the origins and growth of OpenMetadata and Collate, why “context” is necessary but “semantics” is critical for precise AI outcomes, an…

1
From Data Engineering to AI Engineering: Where the Lines Blur 26:59

27d ago26:59

26:59

Summary In this solo episode of the Data Engineering Podcast, host Tobias Macey reflects on how AI has transformed the practice and pace of data engineering over time. Starting from its origins in the Hadoop and cloud warehouse era, he explores the discipline's evolution through ML engineering and MLOps to today's blended boundaries between data, M…

1
The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff 19:30

1M ago19:30

19:30

The integration of data orchestration and machine learning is critical to operational efficiency in healthcare tech. Vivian Health leverages Airflow to power both its ETL pipelines and ML workflows while maintaining strict compliance standards. Max Calehuff, Lead Data Engineer at Vivian Health, joins us to discuss how his team uses Airflow for ML o…

1
Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics 58:48

1M ago58:48

58:48

Summary In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with data beyond SQL. Michael shares the origins of Malloy from his and Lloyd Tabb’s experience at Looker, why SQL’s mental model often fights human problem solving, and how Malloy aims to be a composable, maintainable language that treats SQL as the assem…

1
Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers 34:24

1M ago34:24

34:24

The evolution of Intercom’s data infrastructure reveals how a well-built orchestration system can scale to serve global needs. With thousands of DAGs powering analytics, AI and customer operations, the team’s approach combines technical depth with organizational insight. In this episode, András Gombosi, Senior Engineering Manager of Data Infra and …

1
Blurring Lines: Data, AI, and the New Playbook for Team Velocity 1:00:57

2M ago1:00:57

1:00:57

Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and how just‑in‑time retrieval via MCP and CLIs lets agents gather what they n…

1
How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie 23:10

2M ago23:10

23:10

Building scalable, reproducible workflows for scientific computing often requires bridging the gap between research flexibility and enterprise reliability. In this episode, Anja MacKenzie, Expert for Cheminformatics at Covestro, explains how her team uses Airflow and Kubernetes to create a shared, self-service platform for computational chemistry. …

1
60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu 19:55

2M ago19:55

19:55

What does MLOps look like when you are deploying 60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. M…

1
State, Scale, and Signals: Rethinking Orchestration with Durable Execution 51:46

2M ago51:46

51:46

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and…

1
Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin 21:16

2M ago21:16

21:16

The use of Apache Airflow in financial services demands a balance between innovation and compliance. Agile Engine’s approach to orchestration showcases how secure, auditable workflows can scale even within the constraints of regulatory environments. In this episode, Valentyn Druzhynin, Senior Data Engineer at AgileEngine, discusses how his team lev…

1
The AI Data Paradox: High Trust in Models, Low Trust in Data 51:35

2M ago51:35

51:35

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems,…

1
How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar 23:48

2M ago23:48

23:48

The life sciences industry relies on data accuracy, regulatory insight and quality intelligence. Building a unified system that keeps these elements aligned is no small feat. In this episode, we welcome Shankar Mahindar, Senior Data Engineer II at Redica Systems. We discuss how the team restructures its data platform with Airflow to strengthen gove…

1
Bridging the AI–Data Gap: Collect, Curate, Serve 50:40

2M ago50:40

50:40

Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle …

1
How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov 24:28

2M ago24:28

24:28

The Financial Times leverages Airflow and AI to uncover powerful stories hidden within vast, unstructured data. In this episode, Zdravko Hvarlingov, Senior Software Engineer at the Financial Times, discusses building multi-tenant Airflow systems and AI-driven pipelines that surface stories that might otherwise be missed. Zdravko walks through entit…

1
Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access 1:05:00

3M ago1:05:00

1:05:00

Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vect…

1
Episode 2: AWS Data Store Mastery 14:16

3M ago14:16

14:16

Where should you put your data? We tackle Domain 2 (26% of the DEA-C01 exam) by comparing Redshift, DynamoDB, and RDS. Learn how to design optimal schemas, use the AWS Glue Data Catalog, and implement S3 Lifecycle Policies to manage data lifespan and control costs.James

1
Episode 4: The Data Fortress: Securing and Governing Data for the DEA-C01 12:20

3M ago12:20

12:20

Lock down your data platform! This is the final domain, Domain 4 (18% of the DEA-C01 exam). We cover essential security best practices: using IAM and Lake Formation for access control, enforcing encryption with KMS (at rest and in transit), and securing network access via VPC and Security Groups.James

1
Episode 3: The Pipeline Pit Crew: Monitoring, Troubleshooting, and Optimizing Your AWS Data 12:36

3M ago12:36

12:36

Keep your data pipelines running smoothly! This episode covers Domain 3 (22% of the DEA-C01 exam). We dive into setting up alarms with CloudWatch, troubleshooting stuck jobs with Glue Logs, optimizing performance and cost in Redshift, and ensuring data quality with AWS Glue DataBrew.James

1
Episode 1: Mastering the AWS Data Assembly Line 18:05

3M ago18:05

18:05

This is the essential guide to Domain 1: Data Ingestion and Transformation—the biggest section (34%) of the AWS Certified Data Engineer - Associate (DEA-C01) exam! We break down the core components of a successful data pipeline. Learn to compare Batch vs. Streaming with services like Kinesis and DMS, master ETL/ELT using AWS Glue and EMR, and orche…

1
Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo 29:36

3M ago29:36

29:36

The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines. In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration.…

1
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies 1:04:16

3M ago1:04:16

1:04:16

Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and cr…

1
Transforming Data Pipelines at XENA Intelligence with Naseem Shah 28:32

3M ago28:32

28:32

The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities. In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed A…

1
Context Engineering as a Discipline: Building Governed AI Analytics 51:58

3M ago51:58

51:58

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and a…

1
Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith 24:03

3M ago24:03

24:03

Using Airflow to orchestrate geospatial data pipelines unlocks powerful efficiencies for data teams. The combination of scalable processing and visual observability streamlines workflows, reduces costs and improves iteration speed. In this episode, Alex Iannicelli, Staff Software Engineer at Overture Maps Foundation, and Daniel Smith, Senior Soluti…

1
Block Bad Data Before the Write with Nike’s Ashok Singamaneni 20:20

3M ago20:20

20:20

The Firebolt Data Bros

1
The Data Model That Captures Your Business: Metric Trees Explained 1:01:05

3M ago1:01:05

1:01:05

Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has…

1
Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya 19:04

3M ago19:04

19:04

PepsiCo’s data platform drives insights across finance, marketing and data science. Delivering stability, scalability and developer delight is central to its success, and engineering leadership plays a key role in making this possible. In this episode, Kunal Bhattacharya, Senior Manager of Data Platform Engineering at PepsiCo, shares how his team m…

1
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra 56:31

3M ago56:31

56:31

Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting…

1
Building a Unified Data Platform at Pattern with William Graham 24:09

4M ago24:09

24:09

The orchestration of data workflows at scale requires both flexibility and security. At Pattern, decoupling scheduling from orchestration has reshaped how data teams manage large-scale pipelines. In this episode, we are joined by William Graham, Senior Data Engineer at Pattern, who explains how his team leverages Apache Airflow alongside their open…

1
How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty 25:34

4M ago25:34

25:34

The evolution of Airflow continues to shape data orchestration and monitoring strategies. Leveraging it beyond traditional ETL use cases opens powerful new possibilities for proactive support and internal operations. In this episode, we are joined by Collin McNulty, Sr. Director of Global Support at Astronomer, who shares insights from his journey …

1
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture 52:58

4M ago52:58

52:58

Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Ma…

1
Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal 21:38

4M ago21:38

21:38

In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performanc…

1
Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov 19:26

4M ago19:26

19:26

The shift to a unified data platform is reshaping how pharmaceutical companies manage and orchestrate data. Establishing standards across regions and teams ensures scalability and efficiency in handling large-scale analytics. In this episode, Evgenii Prusov, Senior Data Platform Engineer of Daiichi Sankyo Europe GmbH, joins us to discuss building a…

Подкасти, які варто послухати

Подкасти на тему Data Engineering

Подкасти, які варто послухати

Короткий довідник