Apache Samza
   HOME

TheInfoList



OR:

Apache Samza is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
, near-realtime, asynchronous computational framework for
stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
developed by the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the A ...
in Scala and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. It has been developed in conjunction with
Apache Kafka Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency plat ...
. Both were originally developed by
LinkedIn LinkedIn () is an American business and employment-oriented online service that operates via websites and mobile apps. Launched on May 5, 2003, the platform is primarily used for professional networking and career development, and allows job se ...
.


Overview

Samza allows users to build
stateful In information technology and computer science, a system is described as stateful if it is designed to remember preceding events or user interactions; the remembered information is called the state of the system. The set of states a system can oc ...
applications that process data in real-time from multiple sources including Apache Kafka. Samza provides fault tolerance, isolation and stateful processing. Unlike batch systems such as
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
or
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californi ...
, it provides continuous computation and output, which result in sub-second response times. There are many players in the field of real-time stream processing and Samza is one of the mature products. It was added to Apache in 2013. Samza is used by multiple companies. The biggest installation is in LinkedIn.


See also

*
Apache Beam Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. Beam Pipelines are defined using one of the provided SDKs and executed in one of t ...
*
Druid (open-source data store) Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.Hemsoth, Nicole. , ''Datanami'', 8 November ...
*
List of Apache Software Foundation projects This list of Apache Software Foundation projects contains the software development projects of the Apache Software Foundation (ASF). Besides the projects, there are a few other distinct areas of Apache: *Incubator: for aspiring ASF projects *Attic ...
*
Storm (event processor) Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. ...


References


External links


Apache Samza website
{{Apache Software Foundation LinkedIn software Samza Java platform Free software programmed in Java (programming language) Free software programmed in Scala Software using the Apache license Free software Distributed stream processing Distributed computing architecture Parallel computing