Articles - NBA: From giant imports to Kafka

Previous Spotlight on Essenters: Annelies van Dijk

Next Spotlight on Essenters: Nick van Osch

WHAT IS NBA?

At Essent we are always trying to provide our customers tips on how to save on their energy bill. One part of that is the banner in the app which will contain one of many relevant messages for a particular customer, we call these messages actions internally. Actions range from an introductory call for a solar panel installation (like in Figure 1) to a reminder for a customer to spend their accumulated credit in our Thuisvoordeel shop.

The actionstore is the name of the backend that determines which action is served for each customer in real time. To facilitate this matching, we need data on both the actions that can be served and about the customers that will see these actions.

OLD SYSTEM

The old system at Essent relied on full table synchronizations. This means we select millions of rows from databases like Oracle and Postgres using AWS Glue or a Typescript script in AWS Batch. We have been using this process for a lot of things and for most use cases it does the job okay. There are quite a few disadvantages though:

Full data imports can take multiple hours. This is because we must import all rows all the time. Most imports run once per day, some multiple times per day. The amount of time spent importing goes up quick which costs a lot of money.
We use a database transaction while importing to ensure we only apply changes whenever we have successfully completed an import. This does mean that we must wait until all data is loaded before we start using new data. Even changing one record will take at least a few hours to propagate because we need to check all records again.
If an importer process fails at any time it needs to be restarted from the beginning. The importers complete successfully most of the time and restarting them is not a big deal. In the worst case we will keep showing the same actions as yesterday while the new data is imported.
When we import and combine a lot of data (100s of millions of rows) we need AWS Glue and a lot of workers to create a diff between runs. This diff is needed because otherwise we would need to insert / update / delete all records into Postgres which takes way longer. Glue is very quick to find the differences, but it is very expensive, we spent about 33% of our monthly AWS bill amount on Glue alone, this is on the same account that the entire backend and database run on.

NEW SYSTEM

The new system uses Apache Kafka for sending messages to the different systems. Kafka is an open source distributed streaming platform which we have used with great success, as you might have seen in our other blog posts like Essent and Web data processing using Kafka and The Kafka serverless journey

In this case, using Kafka had the following advantages:

A small change is processed within seconds. Remember, before we had to do an import which would take multiple hours to complete.
Big changes can still be processed just fine; Although big changes might take a while to process it’s still way faster than imports because Kafka allows for parallelization. For the action store we have settled on 15 Kafka partitions, meaning up to 15 messages can be processed at the same time which greatly increases speed.
If a processor fails to process a message, it can simply retry to process that batch again. This won’t help against hard errors, but it could help with occasional hiccups in the database or network. Before a single database or network failure meant we had to restart from the beginning whereas with Kafka we can resume where we left off.
Because we have a central Kafka topic, we aren’t the only ones that can watch for changes. If we ever want to have another system watch the topic they can simply subscribe too.

This new system gives us a lot more flexibility. Any changes can be simply added on the topic, and it will get imported in a few seconds at most, compared to a few hours at least before. There is also way less overhead because we can deploy one lambda which can handle the topic whereas in the old situation, we had one lambda to check for changes and Glue scripts to do the actual import. This simplifies the process, saves on our AWS bill and it's easier to develop.

This is just one of the improvements Kafka has enabled for Essent, we are going to do a lot more to streamline business processes which we will share on this blog. Make sure to follow for more content, read you next time!

Hi, I'm Teun, a DevOps engineer at the personalisation team. I mostly work on the backend which determines the offers and banners you see on the frontend. In my own time I'm playing video games and going swimming

NBA: From giant imports to Kafka

WHAT IS NBA?

OLD SYSTEM

NEW SYSTEM

Teun Willems

DevOps Engineer