Everything You Need to Know About Apache Kafka

3 min readAug 20, 2019

Talking about Big Data, we cannot fathom the amount of data that is amassed and put to use every minute of every day. Now considering this large volume of data, there are two major challenges that float around.

And, these are the two most primal challenges that anyone can face while working with Big Data. The first being how to collect large volumes of data and the second being, how to analyze this collected data.

This is where a messaging system comes in to overcome these challenges.

Let us understand messaging system first
As the name itself suggests, in our daily lives a messaging system is responsible for sending messages or data from one individual to another. We do not worry about how we will share a piece of information to the other person. We focus on the message, the content of the message.

Similarly, in Big Data, a Messaging System is responsible for transferring data from one application to another where the applications can focus on data, but not worry about how to share it.

Now coming to distributed messaging, it is based on the concept of reliable message queuing. In this system, the messages are queued asynchronously in between client applications and messaging system.

There are two types of messaging patterns we use today. i) Point-to-point and the other ii) Publish-subscribe or pub-sub messaging system. Majorly all messaging patterns follow pub-sub.

Point to Point Messaging System
Within a point-to-point system, the messages stay rested in a queue. One or more than one consumers have the ability where they can take in the messages in the queue, however a particular message can be consumed by a maximum of one consumer only.

After a consumer reads a message in the queue, it vanishes from that queue. One typical example of this system is an ‘Order Processing System’. In this system, each order will be processed by one Order Processor, but Multiple Order Processors can work as well at the same time.

Publish-Subscribe Messaging System
In the publish-subscribe or pub-sub system, the messages are contained in a topic. Contrary to the point-to-point system, the consumers in this system can subscribe to more than one topic and consume all the messages in that topic.

In the Pub-Sub system, the one who produces these messages are known as publishers and message consumers are called subscribers. One such example is of Dish TV or Satellite based channel subscription providers where they publish different channels like sports, movies, music, etc., and individuals can then subscribe to their own set of channels to view them whenever they want.

What is Kafka?
In this way, Apache Kafka is a distributed publish-subscribe messaging system and a sophisticated queue which has the ability to handle a high volume of data enabling users to send and receive messages from one end-point to another one.

It is highly suitable for both offline and online message consumption. Kafka messages are constrained on the disk and are replicated within the cluster preventing data loss. Apache Kafka has been built on top of the ZooKeeper synchronization service.

Kafka has proven to integrate well with Apache Storm and Spark for real-time streaming data analysis, thereby speeding up the Apache Kafka development process.

Conlusion
Kafka has been designed for distributed high throughput systems. One can easily replace it for a more traditional message broker. Compared to other messaging systems, Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, making it a good fit for large-scale message processing applications.

Everything You Need to Know About Apache Kafka

Written by A Smith

Responses (1)