Think different – Event Streaming with Apache Kafka.

The 2020s – a decade of event-driven business processes.

The digital world is becoming real-time, not only in business, but also in private everyday life: Communication runs via WhatsApp chats, movies are consumed in streams, stock orders are executed in fractions of seconds on trading platforms around the globe, online food orders land directly in kitchens. A rapid acceleration of business processes makes this possible. They are based on constant data flows, "data streams" or "streaming data", which are processed in real time, or at least near real time. In the past, IT systems were mainly concerned with static data, "data at rest", but now the focus is on "data in motion". This requires a new approach to data processing: "Events" move to the center of attention, business processes are event driven. It can be assumed that two-thirds of all business processes will develop in this direction in the next few years, across all industries: Logistics, sports betting, financial sector, retail, software, travel industry, energy industry or classic industrial companies. At companies like Walmart, Expedia, Adobe, BNP PARIBAS, Betgenius or PLEX, it is already everyday life.

"It's not about things, it's about events."

In traditional IT, the focus is on things. Their respective state is stored in a database. Companies that base their business processes on streaming data are changing the way they look at things. For them, the focus is on a continuous flow of events, states marked by time stamps. Events can be, for example, the change of a delivery address in an ongoing shipping process, a percentage price fluctuation of a share in a defined period of time, a vibration in the course of predictive maintenance, a "fraud detection" in digital payment transactions, or when a prosumer connects his photovoltaic system to the grid. What all these events have in common is that they have taken place in real life and must be followed by an action in real time. If the business process moves into the area of big data, then the necessary "realtime processing" can hardly be solved with classical approaches.

Apache Kafka – a bit of classic navigation.

To map event streaming data in classical databases is not a suitable concept for many reasons. The solution is provided by the open source platform Apache Kafka, which is reminiscent of navigation in times before GPS or sextants and nautical almanacs. To know where a ship was, it was "dead reckoning". In regular intervals or when changing course, the speed was measured with a log and this together with wind direction, estimated drift, compass course and time was entered as an event on a map. For larger voyages also as distributed events over several maps.

Apache Kafka does nothing else. It writes events, data sets describing a state including timestamp, into a logfile, an ordered series of events, which are called "topic" in Apache Kafka. Such a topic could be "navigate Amsterdam - London". In terms of data economy or system architecture, Apache Kafka does not impose any restrictions. Topics do not require a certain size for Kafka to make sense. They can represent seconds or be infinite. The system architecture is not subject to any size restrictions. And even the topic of scalability, which is often not an easy one in classical databases, is mastered by Apache Kafka under the term "elastic scalability".

Kafka Streams API – managing Kafka topics.

Apache Kafka was designed to manage logs at an unrestricted scale. "It is the fate and perhaps the greatness of this work that it presents all possibilities and confirms none." Albert Camus said about Franz Kafka's work. This is also the case with Apache Kafka. Topics are managed with Kafka Streams. The idea behind Streams are many small apps, not a classic program monolith, that communicate with topics, group, aggregate, filter or enrich events and write them into new topics. For this no code needs to be written, as Streams is a Java library. However, Apache Kafka does not provide any answers.

Kafka Connect API – not everything is kafkaesque.

The new data world is not "data in motion" alone. Event streams augment static data into a larger whole, if the data comes together in a meaningful way. This is accomplished through the Kafka Connect API, a library of connectors, some open source, some enterprise solutions, or somewhere in between, that communicate with external data. The path leads in or out of Apache Kafka.

Even "data at rest," static data, is not always set in stone. For example, if a customer updates a shipping address in an SQL database in an ongoing logistics process, this represents an event that can be written to an Apache topic via an SQL connector. Events relevant to the business process can occur anywhere, which means that once started with Apache Kafka, viral development processes begin. Events occur in statistical analyses in the area of market research, in time series for forecasts and much more.

Communication is not a one-way thing. Events can occur in an event stream of a Kafka topic that have meaning for external systems. Services that discover them, for example as dashboards, can write them to new Kafka topics to in turn make them available to external systems via Kafka Connect.

Kafka is complex – help in the cloud.

As brilliantly simple as the idea of Apache Kafka is, its technical implementation in existing system architectures is challenging. The conception as a distributed system, which requires scalability and performance in the big data area, often does not fit into classic IT infrastructures. There is a tendency to use managed solutions that are elastically scalable, offer database languages based on SQL such as KSQL, provide connector libraries and offer performance guarantees. Players in this segment include IBM Event Streams, CONFLUENT, a company founded by the three creators*) of Apache Kafka, HAZELCAST or axual. The latter focuses on the energy industry with its solution.

Apache Kafka in the Energy Industry – webinar axual and HAKOM Time Series.

Obviously, event streaming will become a key topic in the energy transition. Microgrids, prosumers, battery power plants and more, will produce big data event streams that will require realtime actions to keep grids stable. In the 30 minute webinar "Event Streaming in the Energy Industry", Jeroen van Disseldorp, CEO & Co-founder axual, Stefan Komornyik, CEO & Co-founder HAKOM Time Series and Ricardo Wickert, Head of R&D HAKOM Time Series, will highlight opportunities and challenges for the energy industry. A central topic will be the connection of classic time series data for the energy industry with event streams.

Watch the webinar "Streaming Data in the Energy Industry." from September 16, 2021 on demand.

Contact the author Stefan Komornyik.

*) Jay Kreps, Neha Narkhede, Jun Rao

HAKOM YouTube Channel.