HOME

TheInfoList



OR:

Apache Beam is an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
unified programming model to define and execute data processing pipelines, including ETL,
batch Batch may refer to: Food and drink * Batch (alcohol), an alcoholic fruit beverage * Batch loaf, a type of bread popular in Ireland * A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirra ...
and
stream A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a stream ...
(continuous) processing. Beam Pipelines are defined using one of the provided SDKs and executed in one of the Beam’s supported ''runners'' (
distributed processing A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
back-ends) including
Apache Flink Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink exec ...
,
Apache Samza Apache Samza is an open-source, near-realtime, asynchronous computational framework for stream processing developed by the Apache Software Foundation in Scala and Java. It has been developed in conjunction with Apache Kafka. Both were originally ...
,
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californi ...
, and
Google Cloud Dataflow Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem. History Google Cloud Dataflow was announced in June, 2014 and released to the general public as an open beta in Apr ...
.


History

Apache Beam is one implementation of the Dataflow model paper. The Dataflow model is based on previous work on distributed processing abstractions at Google, in particular on FlumeJava and Millwheel. Google released an open SDK implementation of the Dataflow model in 2014 and an environment to execute Dataflows locally (non-distributed) as well as in the Google Cloud Platform service.


Timeline

Apache Beam makes minor releases every 6 weeks.


See also

*
List of Apache Software Foundation projects This list of Apache Software Foundation projects contains the software development projects of the Apache Software Foundation (ASF). Besides the projects, there are a few other distinct areas of Apache: *Incubator: for aspiring ASF projects *Attic ...


References

{{DEFAULTSORT:Beam Apache Software Foundation Apache Software Foundation projects Big data products Cluster computing Distributed stream processing Google software Hadoop Java platform Free software programmed in Java (programming language)