Apache Airflow is an
open-source workflow management platform for data engineering pipelines. It started at
Airbnb
Airbnb, Inc. ( ), based in San Francisco, California, operates an online marketplace focused on short-term homestays and experiences. The company acts as a broker and charges a commission from each booking. The company was founded in 2008 b ...
in October 2014 as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow
user interface
In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
. From the beginning, the project was made open source, becoming an
Apache Incubator
Apache Incubator is the gateway for open-source projects intended to become fully fledged Apache Software Foundation projects.
The Incubator project was created in October 2002 to provide an entry path to the Apache Software Foundation for projec ...
project in March 2016 and a top-level
Apache Software Foundation project in January 2019.
Airflow is written in
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
, and workflows are created via Python scripts. Airflow is designed under the principle of "configuration as code". While other "configuration as code" workflow platforms exist using markup languages like
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, using Python allows developers to import libraries and classes to help them create their workflows.
Overview
Airflow uses
directed acyclic graph
In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one v ...
s (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. DAGs can be run either on a defined schedule (e.g. hourly or daily) or based on external event triggers (e.g. a file appearing in
Hive
A hive may refer to a beehive, an enclosed structure in which some honey bee species live and raise their young.
Hive or hives may also refer to:
Arts
* ''Hive'' (game), an abstract-strategy board game published in 2001
* "Hive" (song), a 201 ...
). Previous DAG-based schedulers like
Oozie and Azkaban tended to rely on multiple
configuration file
In computing, configuration files (commonly known simply as config files) are files used to configure the parameters and initial settings for some computer programs. They are used for user applications, server processes and operating system ...
s and
file system trees to create a DAG, whereas in Airflow, DAGs can often be written in one Python file.
Managed providers
Three notable providers offer ancillary services around the core open source project
Astronomerhas built a
SaaS
Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software.
SaaS is con ...
tool and
Kubernetes
Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
-deployable Airflow stack that assists with monitoring, alerting, devops, and cluster management. Cloud Composer is a managed version of Airflow that runs on
Google Cloud Platform (GCP) and integrates well with other GCP services. Starting from November 2020,
Amazon Web Services
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide d ...
offers Managed Workflows for Apache Airflow.
References
External links
*
{{DEFAULTSORT:Airflow
Airflow
Airflow, or air flow, is the movement of air. The primary cause of airflow is the existence of air. Air behaves in a fluid manner, meaning particles naturally flow from areas of higher pressure to those where the pressure is lower. Atmospheric ...
Big data products