I joined Essent on a day in March that, as the story goes, was cold and stormy. While my memory of the weather may not be entirely accurate, it certainly makes for a good beginning 😊
I started in the newly-formed Proposition team. We were tasked of building a new software product for our landscape.
We embraced the new way of working of Essent, with our event driven architecture (EDA) supported by Confluent Kafka at the core.
We developed Amazon AWS Lambda functions coded in TypeScript, housed within a monorepo, and deployed through the Serverless Framework. To improve the developer experience, we equipped ourselves with an array of tools, including numerous annotations that streamline the deployment of our Lambdas and their integration with the API gateway, crucial for the smooth development of APIs.
I had worked with an EDA and with Apache Kafka before. But Confluent Kafka was new to me. Combined with the serverless framework and the tooling in use, this posed its own set of challenges. I embarked on a journey.
WHAT IS APACHE KAFKA?
For those unfamiliar with Apache Kafka, it's an open-source distributed streaming platform that plays a critical role in event-driven architectures (EDAs). Imagine it as a marketplace where on one end, producers create and send out messages, and on the other, consumers pick and process these messages. Kafka ensures that this exchange happens smoothly through something called 'topics,' which are like designated areas in the marketplace for specific types of messages. Each topic can be accessed by multiple producers and consumers, making the system highly scalable.
Kafka comes packed with a range of sophisticated features, but at its core, for the context of this discussion, you can simply view it as a reliable courier, ensuring messages are transferred from point A to point B.
Confluent Kafka is a commercial managed platform offer for Kafka, aimed at simplifying the work of managing a Kafka cluster.
DEVELOPERS WANT TO DEVELOP
Through my experience, I’ve observed developers primarily want to develop (no shock here, right!). Event-Driven Architecture (EDA) may be an elegant design pattern, but many developers are less concerned with the details of event transmission and more with the assurance that events are received, triggering their code to perform the required actions.
This perspective might seem like an oversimplification, yet it's similar to how developers view APIs. When I create an endpoint at https://example.com/foo, my main interest is that it receives calls and behaves as expected. The intricate "magic" that enables the call — unless troubleshooting is required — is often secondary.
Kafka is perceived similarly. To most developers, it represents another tool in their kit. Need to stay informed about a specific topic? Simply subscribe to it. The goal is to achieve this with minimal fuss and maximum efficiency.
CALLING OUR FIRST LAMBDA
Our initial strategy was straightforward: develop Lambda functions within a monorepo, deploy them, and then do the necessary configuration in Confluent. Although this method was successful, it wasn't without its quirks. For instance, it was unusual that our Lambdas didn't have the triggers that are typically standard. We overcame this with some workarounds, allowing Confluent to activate our Lambda with the arrival of an event. This achievement was a cause for celebration!
But our triumph soon gave way to frustration:
- We had to individually configure Confluent for every single Lambda across multiple environments: development, testing, acceptance, and production.
- This setup led to an unnatural separation between the Lambda's operation and its trigger configuration, a workflow we hadn't encountered before.
- Our resources weren't consistently managed through Infrastructure as Code (IaC), leading to inconsistencies.
Faced with these challenges, it was time to explore how the Serverless Framework could streamline our process.
THE SERVERLESS FRAMEWORK
We discovered that AWS offers a Kafka source for Lambda functions. There is a specific solution for Amazon MSK (Kafka managed by AWS), as well as a generic option that connects directly to Kafka. This takes care of all the polling and subsequently invokes your Lambda function, which seemed very promising. However, it necessarily requires support from the Serverless Framework.
After some research, it seemed we were in luck: the Serverless Framework does indeed support Apache Kafka. Yay! However, the real question was whether we could actually utilize it. As it turned out, the situation was not as straightforward as we had hoped. Our configuration for the Serverless Framework relies on types provided by DefinitelyTyped, which, unfortunately, lacked types for self-managed Kafka. True to our commitment as active open-source contributors, we decided it was time to submit a small PR upstream to add support, albeit for a very minimal set of features.
The merge was quick and smooth, leading to a new version of our dependency that included support—all within a day. Now we could begin the journey to establish the very basics and get our system up and running. The outcome was an annotation system where the required fields could be input, allowing for the creation of a Lambda function with the Kafka source configured.
This approach offers a significant advantage: we only need to create the resources once—namely, the topics in Confluent Kafka—while the rest of the Infrastructure as Code (IaC) is managed alongside the code in Git. This centralization simplifies the process, making it much easier for developers to see where their Lambda functions are being utilized and to what they are connected.
BRINGING IT TO PRODUCTION
Now that the code was operational in development, it was time to consider what was necessary to transition to production. While one might think it's simply a matter of deploying to production, reality proved more complex. Our production Confluent cluster isn't accessible over the public internet; instead, it requires a connection via a VPC, a task that is easier said than done.
We had to ensure that our connector ran within the VPC, with the correct security group and assigned subnets. The cloud team at Essent underwent a significant amount of debugging to establish these connections and ensure they worked as intended. And just when we thought we were in the clear, we encountered another hurdle: by default, our role lacked the permissions to utilize VPCs and security groups. Fortunately, once we identified the missing permissions, rectifying the issue was swift.
One of the biggest challenges was the vague error messages generated by the AWS Kafka source. Messages like "Something went wrong" or "Could not connect" were unhelpful at best. The debugging process was largely trial and error. However, once we got everything functioning manually, it was a matter of replicating the steps we had already taken. This involved extending the Lambda decorator to properly set these connection parameters, necessitating changes in the types and an update of those packages. After these adjustments, we achieved our first successful deployment.
We were finally able to gather events in production and get them processed by our lambdas. It was time to celebrate!
AND NOW WHAT?
With our annotations now functioning correctly, we're positioned to accelerate the integration of Kafka with serverless technologies at Essent. This development relieves our developers from the burdensome task of manually connecting services, which would have been an operational nightmare. They can now focus more on the logic of their code and the topics it connects to, rather than the underlying infrastructure ensuring message delivery.
Kafka has effectively become just another event source, akin to how Amazon SQS has been utilized for some time—a seamless part of our event-driven architecture.
Yet, our work doesn't stop here. There are additional configurations for the Kafka source in AWS that we’re incrementally incorporating into our decorator, such as the batch size of events triggering the Lambda function. Although not critical during the initial phases of my development, it became apparent that other Essent developers required more nuanced control. With the core workflow established, we can now implement such enhancements swiftly—often within days—facilitating a smooth transition into production.
If you have any lingering questions, do not hesitate to drop a comment below!