Microservices have communicated with each other in different ways since their inception. Some have preferred to use HTTP REST APIs, but these come with their own queuing issues, while some have preferred older Message Queues, like RabbitMQ, which come with scaling and operational concerns.
Kafka-centric architectures aim to solve both problems.
In this article, I’ll explain how Apache Kafka improves upon the historical HTTP REST API/message queuing architectures used in microservices and how it further extends their capabilities.
A Tale of Two Camps
The first camp in our story is one where communication is handled by calling other services directly, often over HTTP REST APIs or some other form of Remote Procedure Calls (RPC).
The second camp, while borrowing from the Service-Oriented Architecture (SOA) concept of an Enterprise Service Bus, uses an intermediary that’s in charge of talking to the other services and operates as a message queue.
This role has often been accomplished by using a message broker like RabbitMQ. This way of communicating removes much of the communication burden from the individual services at the cost of an additional network hop.
Microservices Using HTTP REST APIs
HTTP REST APIs are a popular way of performing RPC between services. Its main benefits are simplified setup in the beginning and relative efficiency in sending messages.
However, this model requires its implementor to consider things like queuing and what to do if the amount of incoming requests exceeds the node’s capacity. For example, if you assume that you have a long chain of services preceding the one exceeding its capacity, all of the preceding services in the chain will need to have the same sort of back pressure handling to cope with the problem.
Additionally, this model requires that all of the individual HTTP REST API services need to be made highly available. In a long processing pipeline made of microservices, none of the microservices can afford to lose all of their component parts and this only works as long as at least one process from any given group is still operating normally.
This often requires load balancers to be put in front of these microservices. Also, service discovery is often a must since the different microservices need to find out where to call in order to communicate with each other.
One advantage of this model is that it provides for potentially excellent latency, as there are few middle-men in a given request path and those components, like web servers and load balancers, are highly performant and thoroughly battle-tested.
General dependency handling between different RPC microservices often quickly becomes more complex and eventually starts to slow development efforts. New approaches like Envoy proxy, which provides a service mesh, have also been introduced to solve these issues.
Although these address many of the load balancing and service discovery problems with the model, they necessitate an increase in the system’s overall complexity from just using simple, direct RPC calls.
Many companies start with only a few microservices talking to each other but eventually their systems grow and become more complex, creating a spaghetti of connections between each other.
Another way of building microservices communications revolves around the use of a message bus or message queuing systems. Old-fashioned Service-Oriented Architecture called these enterprise service buses (ESBs). Often, they’ve been message brokers like RabbitMQ or ActiveMQ.
Message brokers act as a centralized messaging service through which all of the microservices in question talk to each other, with the messaging service handling things like queuing and high availability to ensure reliable communication between services.
By supporting message queuing, messages can be received into a queue for later processing instead of dropping them when processing capacity is maxed out during peak demand.
However, many message brokers have demonstrated scalability limitations and caveats around how they handle message persistence and delivery in clustered environments.
Another large topic of conversation around message queues is how they behave in error cases, e.g. whether the message delivery is guaranteed to happen at least once, at most once, etc.
The semantics chosen depend on the message queue implementation, meaning you will have to familiarize yourself with its message delivery semantics.
Additionally, adding a message queue to an architecture adds a new component to be operated and maintained, and network latency is also increased by having one additional network hop for sent messages, which incurs extra latency.
Security matters are slightly simplified in this model by the centralized nature of the Access Control Lists (ACLs) that can be used with message queuing systems, giving central control over who can read and write what messages.
Centralization also brings some other benefits in regard to security. For example, instead of having to allow all services to connect to each other, you can allow only connections to the message queue service instead and firewall the rest of the services away from each other, decreasing the attack surface.
Advantage of a New Kafka-Centric Age
Apache Kafka is an event streaming platform created and open-sourced by LinkedIn. What makes it radically different from older message queuing systems is its ability to totally decouple senders from receivers in the sense that the senders need not know who will be receiving the message.
In many other message broker systems, foreknowledge of who would be reading the message was needed; this hampered the adoption of new use cases in traditional queuing systems.
When using Apache Kafka, messages are written to a log-style stream called a topic and the senders writing to the topic are completely oblivious as to who or what will actually read the messages from there. Consequently, it is business as usual to come up with a new use case to process Kafka topic contents for a new purpose.
Kafka is completely agnostic to the payload of the sent messages, allowing messages to be serialized in arbitrary ways, though most people still use JSON, AVRO, or Protobufs as their serialization format.
You can also easily set ACLs to limit which producers and consumers can write to and read from which topics in the system, giving you centralized security control over all messaging.
It’s common to see Kafka employed as a receiver of firehose-style data pipelines where the data volumes are potentially enormous. For example, Netflix reports that they process over two trillion messages a day using Kafka.
One of the important properties that consumers have is that when message load increases and the number of Kafka consumers changes due to faults or increased capacity, Kafka will automatically rebalance the processing load between the consumers. This moves the need to handle high availability explicitly from within the microservices to the Apache Kafka service itself.
The ability to handle streaming data extends Kafka’s capabilities beyond operating as a messaging system to a streaming data platform.
To top it all off, Apache Kafka provides fairly low latency when using it as a microservices communication bus, even though it incurs an additional network hop for all requests.
This powerful combination of low latency, auto-scaling, centralized management, and proven high availability allows Apache Kafka to extend its reach beyond microservices communications to many streaming real-time analytic use cases that you’ve yet to imagine.