Apache Storm
   HOME

TheInfoList



OR:

Apache Storm is a distributed
stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
computation framework written predominantly in the
Clojure Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is comm ...
programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch,
distributed processing A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
of streaming data. The initial release was on 17 September 2011. A Storm application is designed as a "topology" in the shape of a
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ve ...
(DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering ...
job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end. Storm became an Apache Top-Level Project in September 2014 and was previously in incubation since September 2013.


Development

Apache Storm is developed under the Apache License, making it available to most companies to use. Git is used for version control and Atlassian JIRA for issue tracking, under the Apache Incubator program.


Apache Storm architecture

The Apache Storm cluster comprises following critical components: * Nodes- There are two types of nodes: Master Nodes and Worker Nodes. A Master Node executes a daemon Nimbus which assigns tasks to machines and monitors their performances. On the other hand, a Worker Node runs the daemon called Supervisor which assigns the tasks to other worker nodes and operates them as per the need. As Storm cannot monitor the state and health of cluster, it deploys ZooKeeper to solve this issue which connects Nimbus with the Supervisors. * Components- Storm has three critical components: Topology, Stream, and Spout. Topology is a network made of Stream and Spout. Stream is an unbounded pipeline of tuples and Spout is the source of the data streams which converts the data into the tuple of streams and sends to the bolts to be processed.


Peer platforms

Storm is but one of dozens of stream processing engines, for a more complete list see
Stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
. Twitter announced
Heron The herons are long-legged, long-necked, freshwater and coastal birds in the family Ardeidae, with 72 recognised species, some of which are referred to as egrets or bitterns rather than herons. Members of the genera ''Botaurus'' and ''Ixobrychus ...
on June 2, 2015 which is API compatible with Storm. There are other comparable streaming data engines such as Spark Streaming and
Flink ''Flink'' (full name: ''The Misadventures of Flink'' according to the title screen) is a 2D scrolling platform video game developed by former members of Thalion and published by Psygnosis. The Amiga CD32 and Mega CD versions are reliant on CD- ...
.


See also

*
C++ AMP C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
*
Data parallelism Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like ...
*
Lambda architecture Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault ...
*
Message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporting i ...
*
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...
*
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
*
OpenHMPP OpenHMPP (HMPP for Hybrid Multicore Parallel Programming) - programming standard for heterogeneous computing. Based on a set of compiler directives, standard is a programming model designed to handle hardware accelerators without the complexity as ...
*
Parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
* TPL *
Thread (computing) In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes dif ...


References


External links


Project Homepage
{{Parallel computing Distributed computing architecture Parallel computing Cloud applications Cloud infrastructure
Storm A storm is any disturbed state of the natural environment or the atmosphere An atmosphere () is a layer of gas or layers of gases that envelop a planet, and is held in place by the gravity of the planetary body. A planet retains an atmos ...
Software using the Apache license Java platform Distributed stream processing