Apache Beam is an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
unified programming model to define and execute data processing
pipelines, including
ETL,
batch
Batch may refer to:
Food and drink
* Batch (alcohol), an alcoholic fruit beverage
* Batch loaf, a type of bread popular in Ireland
* A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirra ...
and
stream
A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a stream ...
(continuous) processing.
Beam Pipelines are defined using one of the provided
SDKs and executed in one of the Beam’s supported ''runners'' (
distributed processing
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
back-ends) including
Apache Flink
Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink exec ...
,
Apache Samza
Apache Samza is an open-source, near-realtime, asynchronous computational framework for stream processing developed by the Apache Software Foundation in Scala and Java. It has been developed in conjunction with Apache Kafka. Both were originally ...
,
Apache Spark
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californi ...
, and
Google Cloud Dataflow
Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.
History
Google Cloud Dataflow was announced in June, 2014 and released to the general public as an open beta in Apr ...
.
History
Apache Beam
is one implementation of the Dataflow model paper.
The Dataflow model is based on previous work on distributed processing abstractions at Google, in particular on FlumeJava
and Millwheel.
Google released an open SDK implementation of the Dataflow model in 2014 and an environment to execute Dataflows locally (non-distributed) as well as in the
Google Cloud Platform service.
Timeline
Apache Beam makes minor releases every 6 weeks.
See also
*
List of Apache Software Foundation projects This list of Apache Software Foundation projects contains the software development projects of the Apache Software Foundation (ASF).
Besides the projects, there are a few other distinct areas of Apache:
*Incubator: for aspiring ASF projects
*Attic ...
References
{{DEFAULTSORT:Beam
Apache Software Foundation
Apache Software Foundation projects
Big data products
Cluster computing
Distributed stream processing
Google software
Hadoop
Java platform
Free software programmed in Java (programming language)