Loosely coupled systems have been a noble goal of many IT architectures since the 70s of the last millennium. We have already seen approaches to this in various projects: CORBA, RMI, SOAP, ReST and so on. If the whole thing is also asynchronous, it becomes directly more complex. Nevertheless, the technology selection of Message Oriented Middleware with IBM WebSphere MQ, RabbitMQ, Apache ActiveMQ and Co. is still manifold and with iPaaS a new candidate is added. In short, the standard has not yet been found.
With Apache Kafka – Franz Kafka served as the name sponsor – an open source, community-driven, distributed and highly available event streaming platform has been in the focus of IT architects since 2011. Can we use it to modernize the mainframe-based IT of the insurance industry?
Last year, the project Phönix started. With our customer, we took on the challenge of not only establishing a new inventory system with Factor IPM and a new quote/application route with Factor IOS, but also taking a distributed, asynchronous approach to linking systems together.
How do I get to the title of the blog? The lead IT architect coined an image in his remarks that I find very fitting and would like to mention here: “In a household, you can solder the toaster, the coffee maker, etc. firmly to the power grid – that works fine as long as you don’t have to replace the toaster; we want to do better with Phoenix and install sockets!”
This is how the architectural picture emerges: the system landscape envisions micro-services communicating with each other as asynchronously as possible via Kafka (our Schuko socket). This is our chosen “standard” or default integration pattern, and deviations from it must be justified.
If one decides in favor of a technology, then there are always pros and cons. If one decides anyway, then it must be well documented (e.g. https://arc42.de/template) why one has decided in favor of it or what alternative options exist.
To use the analogy: is it 230V or 150V?
In the architecture decision for the project Phoenix you can read: We use JSON, define data transfer objects and perform simple versioning by package names. The alternative with Apache Avro (3) is often preferred, since one has here a very optimized data format. However, Avro additionally requires a schema registry, which we did not want to introduce in the “scope” of the project.
Faktor Zehn Suite
The Faktor Zehn Insurance Suite has a rich collection of functional building blocks. I’ve always liked that, namely walking the fine line between a) offering really usable building blocks and b) not being too prescriptive about what processes look like or even imposing ready-made applications on the customer. Thus, the suite makes no assumptions about the systems to be connected and offers abstract interfaces (Java interfaces) and many extension points (specializations, listeners, etc.).
Once again, this philosophy works in the Phönix project: Processes and actions are modeled in Faktor IPM. Here, in addition to the behavior of the historization and various settings, we can also attach lifecycle listeners that offer us possible extension points.
As it stands, the inventory management factor IPM is a JEE application.
If you want to do it right, a JEE container only talks to the environment via managed (ge-managed) resources. Therefore it would be good if the connection to the Kafka system, i.e. the connection to the (plural!) Kafka brokers could be provided and injected as a container resource (CDI, @Inject).
Looking for implementations, one can find two places to start: Payara Cloud Connectors (1) and a paper from IBM (2). I could not extract any executable code for the latter, while the Payara Kafka Connector runs with a few adjustments in a WildFly server and talks to Kafka via
@Resource(lookup = "java:/KafkaConnectionFactory") KafkaConnectionFactory factory
JEE talks to Kafka.
We have some scenarios where we are done posting to a Kafka topic – so a fire&forget. Since Kafka doesn’t lose messages, this is a good usable approach. However, there is also a chance that the Kafka message simply cannot be processed by consumers (poision message). We set up dead letter topics DLT for this purpose.
Transactions – does it work or does it not?
JEE server managed connections can help immensely. If you have ever had to update multiple databases and a JMS connection in a “unit” in a real project, you will be happy to know that distributed transactions, XA-Transactions/Two-Phase-Commit, exist.
We have exactly these requirements in some places. Factor IPM generates orders for premium payment, the (subsequent) debit position. The whole thing happens as a JEE batch in Faktor IPM and with its database.
Following our project philosophy, these contribution payments are now sent on their way via Kafka to be further processed in the collection system. Unfortunately, the first tests show that there is no common transaction bracket for the IPM database update and sending the message to Kafka. After checking everything, i.e. XA datasource, transaction control in JEE batch, commit size, at least three Kafka brokers running, etc., the source code of the Payara project shows that the connector is not XA-enabled.
We decided to face the problem and define as solution: in the Kafka Topic is the truth; all other parties must compensate. An XA transaction is now no longer an option, but we have to make sure that the respective message arrived once in Kafka and was replicated (Kafka transaction).
In case of error, we can also ensure here that a rollback command does not write to the IPM database. Unfortunately, we cannot ensure that in the “happy case” the database commit works (probably an academic rather than real problem). This is where a skew between the database and Kafka can occur. We have decided that the dataset in Kafka represents the truth. “Truth” in our context means that a message has already been written to Kafka once (an indent was ordered). Kafka messages have a key and a value. For our use case, where a batch run creates an identifiable record, we can use exactly this key. The combination of batch run ID and message ID uniquely identifies the message in the Kafka topic. If our batch now fails for technical reasons and is restarted, we can determine that the generated message is already available in Kafka and thus only perform the database update.
This gives us a sufficiently robust solution for project purposes. If you research further, you may come across this link from Microservices.io and find an even more elegant way. Again, the Faktor Zehn suite offers options with the
which can easily be enabled. However, an additional process must be implemented here (the Message Relay) that converts the ExternalSystemNotification into Kafka messages.
View after production run
Thanks to the tireless efforts of the entire project team, we went live on 10/15/2020 (4). We experienced pretty much all the pitfalls: missing or wrong connection configurations, lack of experience with the tools, APIs used incorrectly. In short, we have had a very steep learning curve. After more than two months, we can now say that the architectural goals have been achieved, the loose coupling via Kafka is working, and normal operation is becoming more and more common.
We have distributed the first sockets and thus paved the way for a new IT line. From a developer’s point of view, everything feels good. Now it remains to be seen how the architecture will bear up in the future, what new flows will emerge, how operations will warm up with the new concepts.
About the author
Carsten has been working at Faktor Zehn since January 2019 and is at home as a developer, architect and consultant in both customer projects and product development.
If we have piqued your interest or you would like more information, please feel free to contact firstname.lastname@example.org.
Your Faktor Zehn-Team