This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
…
continue reading
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io ...
…
continue reading
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading hig ...
…
continue reading
Discussions around Data Engineering
…
continue reading
Databases and data engineering episodes of Software Engineering Daily
…
continue reading
Unlocking the Power of Data: A Guide for Leaders and Executives" As a leader or executive, you know the importance of data in driving business decisions and staying ahead of the competition. But, with the increasing amount of data generated daily, it can be overwhelming to know where to start and how to utilize this valuable asset effectively. This blog, with multiple topics, addresses the technical terminology in data engineering and analytics on the cloud.
…
continue reading
StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base. In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the…
…
continue reading

1
Aligning Business and Data: The Essential Role of Data Modeling
1:06:51
1:06:51
Відтворити пізніше
Відтворити пізніше
Списки
Подобається
Подобається
1:06:51Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that dat…
…
continue reading
Explore the future of AI-powered business intelligence with Lei Tang, CTO and Co-founder of Fabi.ai, as he discusses the evolution from traditional self-service BI to "Vibe-analytics." Learn how AI is transforming data accessibility, enabling anyone to perform sophisticated analytics without deep technical expertise. From building trust in AI-gener…
…
continue reading
The evolution of orchestration in Airflow continues with innovations that address both scalability and security. From improving executor reliability to enabling remote execution, these advancements reshape how organizations manage data pipelines. In this episode, we’re joined by Ian Buss, Principal Software Engineer at Astronomer, and Piotr Chomiak…
…
continue reading
Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and lineage, as well as the challenges of data integration. He explores t…
…
continue reading
Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable. In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access…
…
continue reading

1
High Performance And Low Overhead Graphs With KuzuDB
1:01:29
1:01:29
Відтворити пізніше
Відтворити пізніше
Списки
Подобається
Подобається
1:01:29Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. He discusses the usability and scalability of KuzuDB, emphasizing its…
…
continue reading
Managing financial data at scale requires precise orchestration and proactive monitoring to maintain operational efficiency. In this episode, we are joined by Adeolu Adegboye, Data Engineer at Moniepoint Group, who shares how his team uses data pipelines and workflow automation to manage high volumes of transactions, ensure timely alerts and suppor…
…
continue reading

1
Bridging Data and Decision-Making: AI's Role in Modern Analytics
1:10:44
1:10:44
Відтворити пізніше
Відтворити пізніше
Списки
Подобається
Подобається
1:10:44Summary In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds in data analytics and how their experiences have shaped their approa…
…
continue reading
The evolution of Airflow has reached a milestone with the introduction of remote execution in Airflow 3, enabling flexible orchestration across distributed environments. In this episode, Jens Scheffler, Test Execution Cluster Technical Architect at Bosch, shares insights on how his team’s need for large-scale, cross-environment testing influenced t…
…
continue reading
Summary In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates to enhance storage capabilities, discussing the evolution of S3 from …
…
continue reading
Managing modern data platforms means navigating a web of complex infrastructure, competing team needs and evolving security standards. For data teams to truly thrive, infrastructure must become both accessible and compliant without sacrificing velocity or reliability. In this episode, we’re joined by Cory O’Daniel, CEO and Co-Founder at Massdriver,…
…
continue reading
Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as…
…
continue reading

1
Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber
25:31
Journey inside Uber's innovative AI assistant "Genie" with Paarth Chotani, Staff Engineer at Uber, as he shares how they're revolutionizing on-call support using LLMs and vector search. From processing massive amounts of internal documentation to building scalable RAG pipelines, discover how Uber tackles the challenges of implementing AI assistants…
…
continue reading
Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in warehouse environments. Dan discusses the challenges of handling continuously evolving datasets and the importance of incremental data processing for optimized resource use and reduced latency. He expla…
…
continue reading
Telemetry has the potential to guide the future of Airflow, but only if it’s implemented transparently and with community trust. In this episode, we’re joined by Bolke de Bruin, Director at Metyis and a long-time Airflow PMC member. Bolke discusses how telemetry has been handled in the past, why it matters now and what it will take to get it right.…
…
continue reading
Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured d…
…
continue reading
Contributing to open-source projects can be daunting, but it can also unlock unexpected innovation. This episode showcases how one engineer’s journey with Apache Airflow led to impactful UI enhancements and infrastructure solutions at scale. Shubham Raj, Software Engineer II at Cloudera, shares how his team built a drag-and-drop DAG editor for non-…
…
continue reading
Managing data pipelines at scale is not just a technical challenge. It is also an organizational one. At Lyft, success means empowering dozens of teams to build with autonomy while enforcing governance and best practices across thousands of workflows. In this episode, we speak with Yunhao Qing, Software Engineer at Lyft, about building a governed d…
…
continue reading
Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing data quality with delivery speed, and the socio-technical challenges …
…
continue reading
Summary In this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agent systems, Arun shares insights on building agentic systems at an org…
…
continue reading
Understanding the complexities of Apache Airflow can be daunting for newcomers and seasoned data engineers. But with the right guidance, mastering the tool becomes an achievable milestone. In this episode, Marc Lamberti, Head of Customer Education at Astronomer, joins us to share his journey from Udemy instructor to driving education at Astronomer,…
…
continue reading

1
Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi
30:09
The flexibility of Airflow plays a pivotal role in enabling decentralized data architectures and empowering cross-functional teams. In this episode, we speak with Alberto Crespi, Data Architect at lastminute.com, who shares how his team scales Airflow across 12 teams while supporting both vertical and horizontal structures under a data mesh approac…
…
continue reading

1
Dagster's New Era: Modularizing Data Transformation in the Age of AI
1:01:37
1:01:37
Відтворити пізніше
Відтворити пізніше
Списки
Подобається
Подобається
1:01:37Summary In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insights on how it will ultimately enhance productivity and expand softwa…
…
continue reading
Innovation in orchestration is redefining how engineers approach both traditional ETL pipelines and emerging AI workloads. Understanding how to harness Airflow’s flexibility and observability is essential for teams navigating today’s evolving data landscape. In this episode, Anu Pabla, Principal Engineer at The ODP Corporation, joins us to discuss …
…
continue reading
Summary In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data engineering efforts, Alex shares insights on enhancing Starburst's platform to support AI applications, including an A…
…
continue reading
AI's transformative impact on data engineering and analytics is reshaping how professionals create value, shifting focus from technical skills to strategic thinking and communication. In this episode of The Data Engineering Show, the bros talk with Sumit Gupta, Lead BI Engineer at Notion, about his journey through prominent tech companies, modern d…
…
continue reading
The orchestration layer is foundational to building robust AI- and ML-powered data pipelines, especially in complex hybrid enterprise environments. IBM’s partnership with Astronomer reflects a strategic alignment to simplify and scale Airflow-based workflows across industries. In this episode, we’re joined by IBM’s Senior Product Manager, BJ Adesoj…
…
continue reading

1
Amazon S3: The Backbone of Modern Data Systems
1:01:01
1:01:01
Відтворити пізніше
Відтворити пізніше
Списки
Подобається
Подобається
1:01:01Summary In this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazon S3 and its profound impact on data architecture. From her work on compute systems to leading the development and operations of S3, Mylan shares insights on how S3 has become a foundational element …
…
continue reading
Efficient orchestration and maintainability are crucial for data engineering at scale. Gil Reich, Data Developer for Data Science at Wix, shares how his team reduced code duplication, standardized pipelines, and improved Airflow task orchestration using a Python-based framework built within the data science team. In this episode, Gil explains how t…
…
continue reading
Summary In this episode of the Data Engineering Podcast Chakravarthy Kotaru talks about scaling data operations through standardized platform offerings. From his roots as an Oracle developer to leading the data platform at a major online travel company, Chakravarthy shares insights on managing diverse database technologies and providing databases a…
…
continue reading

1
Modernizing Legacy Data Systems With Airflow at Procter & Gamble with Adonis Castillo Cordero
22:13
Legacy architecture and AI workloads pose unique challenges at scale, especially in a global enterprise with complex data systems. In this episode, we explore strategies to proactively monitor and optimize pipelines while minimizing downstream failures. Adonis Castillo Cordero, Senior Automation Manager at Procter & Gamble, joins us to share action…
…
continue reading
Summary In this episode of the Data Engineering Podcast, host Tobias Macy welcomes back Shinji Kim to discuss the evolving role of semantic layers in the era of AI. As they explore the challenges of managing vast data ecosystems and providing context to data users, they delve into the significance of semantic layers for AI applications. They dive i…
…
continue reading
Building reliable data pipelines starts with maintaining strong data quality standards and creating efficient systems for auditing, publishing and monitoring. In this episode, we explore the real-world patterns and best practices for ensuring data pipelines stay accurate, scalable and trustworthy. Joseph Machado, Senior Data Engineer at Netflix, jo…
…
continue reading