Written by Sami Alashabi and Flavius Fernandes
Essent is well-known for its impressive stature in the Dutch market. With a history dating back to 1999, the company has always been exploring new ways of doing things and keeping up with the latest trends. This year, Essent embarked on a new journey - exploring Kafka and integrating it within the organization. Kafka's possibilities have always been impressive, and we are confident that this technology can help Essent grow even further. This week we will take a deep dive into how we would like to integrate Kafka and prepare it for future use-cases.
What is Apache Kafka?
Kafka is a distributed streaming system used for stream processing, real-time data pipelines and data integration at scale. It allows the creation of multiple topics with replication (if needed) to store the streaming data for a certain period and at the same time process it downstream to the consumers. Kafka is highly scalable, fault tolerant and provides a high-level API for producers and consumers. Kafka is used by many organizations for a variety of reasons including:
- Providing real-time dashboards/metrics
- Ingesting data from multiple sources for further analysis
- As a message bus to connect different microservices
If you would like to read more on Kafka, click here.
Why Kafka is a good idea for Essent
There are many benefits that Kafka can bring to an organization like Essent. It is a fast, scalable, and reliable platform that can handle large volumes of data with ease. In addition, Kafka is easy to use, has a wide range of applications, and is an industry leading event streaming platform. This makes it a perfect fit for Essent, which is always looking for new ways to improve its operations.
Let’s look at a real-world example. At Essent we favour breaking our components up into small asynchronous pieces to increase the load we can handle and make it easier for the teams to deliver new functionality. One of the hurdles to overcome is the sharing of data between these individual components. This is where Kafka comes in. We broadcast a “Domain Event” when the state of a component has changed. Other components can listen to these events and can update their own state, resulting in an eventually consistent landscape.
Our other use-case is the ingestion of streaming data from, for example, our front ends to our Data Lake in a more quick and efficient manner. Kafka is extremely capable of managing these large streams of data and will allow Essent to handle these high volumes of data with ease. This is a major advantage for the company, as it often must deal with large amounts of data.
Although the existence of some use-cases is nice to have, integrating new software while ensuring the quality and standards are being retained is already a substantial challenge. To allow the creation of a quality gate to enforce the required standards, and keeping in mind the development of a scalable architecture, the following solution was created:
As illustrated in the figure above, the source can be multiple sources, e.g. Web or Mobile application client interaction or any other sources. In this example, an AWS Lambda was created to simulate a source of the data and stream continuous data to the API Gateway using API calls.
The producer was created using the Confluent Python Client for Apache Kafka and the combination of the API Gateway and AWS Lambda to allow an asynchronous scalable infrastructure with low maintenance. Furthermore, this solution allowed us the opportunity to create a quality gate in which we can set the standards of the events received, revert with a concise failure message for a schema drift and/or apply throttling when needed.
Confluent offers multiple managed connectors to consume data available in Kafka topics. The Amazon S3 Sink connector best fit our needs to store the streaming data into the data lake.
The idea behind the creation of the architecture was to offer an asynchronous scalable and reliable streaming data processing platform with Kafka, that could easily be integrated into any company's ecosystem. We also planned to use this opportunity to create a quality gate that would allow us to enforce the required standards and improve the overall quality of the data received.
The integration process was not without its challenges, but we are confident that we will be able to overcome them. We are currently in the process of testing Kafka and making sure that it works well with the rest of the organization. So far, the results have been promising and we are optimistic about the future.
In terms of future endeavors there are three main pillars that could be useful: Kafka streaming, business events, and getting Data out of SAP. Kafka streaming is a process of taking data and processing it as it comes in, as opposed to waiting to load all the data into memory and then processing it. This makes Kafka an ideal tool for streaming data, real-time data pipelines, and data integration at scale. Kafka streaming can be used for a variety of purposes, such as creating real-time dashboards/metrics, ingesting data from multiple sources into Hadoop for further analysis or using it as a message bus to connect different microservices.
Second, in a quest to move away from traditional middleware for SAP Integrations, we explored different possibilities to get data out of SAP and directly into our Data Lake. After some in-depth research and a couple of PoC’s, we zeroed in on a solution which leverages Kafka-native functionalities, thus providing benefits such as high performance, high scalability, and exactly once semantics.
In the following months, Kafka will have a big role within Essent. As many of our engineers are getting familiar with the software, and first handedly experience the benefits – as described earlier – the future for Kafka seems bright at Essent. Undoubtedly, we will run into problems and “Eureka!” moments which will all be shared on our blog. Make sure to follow our blog to follow our journey, and for now, until next time!