Real Python Podcast Episode #142 Title Artwork

Episode 142: Orchestrating Large and Small Projects With Apache Airflow

The Real Python Podcast

Jan 27, 2023 54m

Have you worked on a project that needed an orchestration tool? How do you define the workflow of an entire data pipeline or a messaging system with Python? This week on the show, Calvin Hendryx-Parker is back to talk about using Apache Airflow and orchestrating Python projects.

Episode Sponsor:

Calvin is the co-founder and CTO of Six Feet Up and a Python Web Conference co-organizer. He’s recently been working on a massive project that requires thousands of jobs involving transferring and transforming data. Through his research into orchestration systems, he found Apache Airflow.

Airflow is an open-source tool to define, schedule, and monitor workflows. The platform is pure Python and integrates with a wide variety of services. We discuss how workflows are defined by creating directed acyclic graphs (DAG).

Calvin talks about how a recent project outgrew the system and how his team built a clever solution using Python. We also discuss the upcoming Python Web Conference and what virtual attendees can expect.

Topics:

  • 00:00:00 – Introduction
  • 00:02:24 – Describing the large data pipeline
  • 00:04:38 – What format was the data in?
  • 00:06:04 – Was the format of the data changed for storage?
  • 00:09:34 – Data engineering and describing sources and targets
  • 00:11:29 – Apache Airflow orchestration and hitting limitations
  • 00:18:12 – Sponsor: CData Software
  • 00:18:54 – DAG: Directed acyclic graphs
  • 00:22:29 – Streaming data and other tool choices
  • 00:25:38 – Overcoming DAG Factory limitations
  • 00:31:49 – Another industry example for Airflow
  • 00:34:24 – Finding solutions as a consultancy
  • 00:35:12 – Is there a minimum-size project for Airflow?
  • 00:37:37 – Django under the hood
  • 00:38:31 – Video Course Spotlight
  • 00:39:58 – The Python Web Conference 2023
  • 00:44:24 – Do you have any upcoming conference talks?
  • 00:45:53 – How can people follow your work online?
  • 00:46:52 – IndyPy talk by Mariatta Wijaya
  • 00:48:01 – What are you excited about in the world of Python?
  • 00:51:45 – What do you want to learn next?
  • 00:53:22 – Thanks and goodbye

Show Links: