HOME

TheInfoList



OR:

Apache Kafka is a
distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
event store and stream-processing platform. It is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
system developed by the
Apache Software Foundation The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
written in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect, and provides the Kafka Streams
libraries A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
for stream processing applications. Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks ..which allows Kafka to turn a bursty stream of random message writes into linear writes."


History

Kafka was originally developed at
LinkedIn LinkedIn () is an American business and employment-oriented Social networking service, social network. It was launched on May 5, 2003 by Reid Hoffman and Eric Ly. Since December 2016, LinkedIn has been a wholly owned subsidiary of Microsoft. ...
, and was subsequently open sourced in early 2011. Jay Kreps, Neha Narkhede and Jun Rao helped co-create Kafka.Li, S. (2020). He Left His High-Paying Job At LinkedIn And Then Built A $4.5 Billion Business In A Niche You've Never Heard Of. Forbes. Retrieved 8 June 2021, fro
Forbes_Kreps
.
Graduation from the Apache Incubator occurred on 23 October 2012. Jay Kreps chose to name the software after the author
Franz Kafka Franz Kafka (3 July 1883 – 3 June 1924) was a novelist and writer from Prague who was Jewish, Austrian, and Czech and wrote in German. He is widely regarded as a major figure of 20th-century literature. His work fuses elements of Litera ...
because it is "a system optimized for writing", and he liked Kafka's work.


Operation

Apache Kafka is a distributed log-based messaging system that guarantees ordering within individual partitions rather than across the entire topic. Unlike queue-based systems, Kafka retains messages in a durable, append-only log, allowing multiple consumers to read at different offsets. Kafka uses manual offset management, giving consumers control over retries and failure handling. If a consumer fails to process a message, it can delay committing the offset, preventing further progress in that partition while other partitions remain unaffected. This partition-based design enables fault isolation and parallel processing while allowing ordering to be maintained within partitions, depending on consumer handling. In 2025, Apache Kafka introduced "Queues for Kafka", adding share groups as an alternative to consumer groups. This feature enables queue-like semantics where consumers can cooperatively process records from the same partitions, with individual message acknowledgment and delivery tracking. Unlike traditional consumer groups where partitions are exclusively assigned, share groups allow the number of consumers to exceed partition count, making it ideal for work-queue patterns while maintaining Kafka's durability and scalability benefits. This development addresses the common challenge of "over-partitioning" that many Kafka users face.


Kafka APIs


Connect API

Kafka Connect (or Connect API) is a framework to import/export data from/to other systems. It was added in the Kafka 0.9.0.0 release and uses the Producer and Consumer API internally. The Connect framework itself executes so-called "connectors" that implement the actual logic to read/write data from other systems. The Connect API defines the programming interface that must be implemented to build a custom connector. Many open source and commercial connectors for popular data systems are available already. However, Apache Kafka itself does not include production ready connectors.


Streams API

Kafka Streams (or Streams API) is a stream-processing library written in Java. It was added in the Kafka 0.10.0.0 release. The library allows for the development of stateful stream-processing applications that are scalable, elastic, and fully fault-tolerant. The main API is a stream-processing
domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
(DSL) that offers high-level operators like filter,
map A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
, grouping, windowing, aggregation, joins, and the notion of tables. Additionally, the Processor API can be used to implement custom operators for a more low-level development approach. The DSL and Processor API can be mixed, too. For stateful stream processing, Kafka Streams uses RocksDB to maintain local operator state. Because RocksDB can write to disk, the maintained state can be larger than available main memory. For fault-tolerance, all updates to local state stores are also written into a topic in the Kafka cluster. This allows recreating state by reading those topics and feed all data into RocksDB.


See also

*
RabbitMQ RabbitMQ is an open-source message-broker software (sometimes called message-oriented middleware) that originally implemented the Advanced Message Queuing Protocol (AMQP) and has since been extended with a plug-in architecture to support Str ...
*
Redis Redis (; Remote Dictionary Server) is an in-memory key–value database, used as a distributed cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low- latency reads ...
* NATS *
Apache Flink Apache Flink is an Open-source software, open-source, unified stream processing, stream-processing and batch processing, batch-processing software framework, framework developed by the Apache Software Foundation. The core of Apache Flink is a dis ...
* Apache Samza * Apache Spark Streaming * Data Distribution Service * Enterprise Integration Patterns * Enterprise messaging system * Streaming analytics *
Event-driven SOA Event-driven SOA is a form of service-oriented architecture (SOA), combining the intelligence and proactiveness of event-driven architecture with the organizational capabilities found in service (systems architecture), service offerings. Before even ...
* Hortonworks DataFlow *
Message-oriented middleware Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. Message-oriented middleware is in contrast to streaming-oriented middleware where data is communicate ...
*
Service-oriented architecture In software engineering, service-oriented architecture (SOA) is an architectural style that focuses on discrete services instead of a monolithic design. SOA is a good choice for system integration. By consequence, it is also applied in the field ...


References


External links

* {{Authority control LinkedIn software
Kafka Franz Kafka (3 July 1883 – 3 June 1924) was a novelist and writer from Prague who was Jewish, Austrian, and Czech and wrote in German. He is widely regarded as a major figure of 20th-century literature. His work fuses elements of real ...
Enterprise application integration Free software programmed in Scala Free software programmed in Java (programming language) Message-oriented middleware Service-oriented architecture-related products 2011 software Software using the Apache license