- Hey, I'm Tim Berglund with Confluent. I wanna tell you everything you need to know, and only what you need to know about Apache Kafka. (upbeat music) Apache Kafka is an event streaming platform used to collect, store and process real time data streams, at scale.
It has numerous use cases, including distributed logging, stream processing and Pub-Sub Messaging. And that sounds like something that a committee of MBAs would write if they had their persona document in front of them, and we're just trying to nail the messaging, and that's not what we're gonna do in this series. But all those words are true, that really is kind of a nice one sentence description of what Kafka is, but there's so much in there that we have to expand on for any of this to make sense.
I mean, even the phrase event streaming platform, that's totally accurate, requires a bit of a journey before the full significance of the words really land on you. And these videos are that journey. To begin with, I wanna start with just the idea of an event.
It's worth just thinking about what an event is. Once we do that, then we can talk about how Kafka stores events, how events get in and out, how to analyze them, all that stuff. But first, we have to agree on what an event is.
Now, an event is just a thing that has happened, that's it. And I know that sounds a little abstract, but that really is true. It can be any kind of thing.
My go-to example is, a smart thermostat phoning home to report the current temperature and humidity and status of the HVAC system in the house, like that's an event. But an event can be other kinds of things. An event can be the change in the status of some business process, say an invoice becomes past due, well that's an event.
An event can be some kind of user interaction, somebody is mousing over a certain link on a screen or clicking on a thing. That's certainly an event. A microservice completes some unit of work and wants to put the record of that unit of work somewhere.
That's an event. All these things are events. They're just things that have happened combined with the description of what happened.
So, an event is a combination of notification. That's the element of when-ness to the thing that can be used to trigger some other activity, it's a notification and state. Now the state of an event is usually fairly small, say less than a megabyte or so, in concrete terms.
And is normally represented in some structured format, like JSON, or JSON Schema, or Avro, or Protocol Buffers or something like that. The state is serialized in some usually standard format. Now Kafka has a little bit of a data model for an event.
An event in Kafka is modeled as a key/value pair. Internally, inside Kafka, when these things are actually stored, keys and values are just sequences of bytes. Kafka internally is loosely typed, but externally outside, like you're not, I mean, just look at you and your programming language that you're using, whatever it is, is probably not that loosely typed.
There's probably some kind of structure to the data. And so going back and forth between the way that key/value pair, that event is represented in your languages type system, and the representation inside Kafka. Kafka famously calls that, the process of serialization and de-serialization, we came up with those words ourselves.
And again, that serialized format is usually like JSON, or JSON Schema, Avro, Protobuf, something like that. And the value, that serialized object, is usually the representation of an application domain object, or some form of raw message input, like the output of a sensor or something like that. So that's why that structure of that thing's important 'cause in your world, as you think about it, it probably has some structure.
Now the key part, I said, a message is a key/value pair. Keys in Kafka can be a fairly rich topic, I'm gonna summarize them very simply right now. They can be complex domain objects, serialized with all those same formats, but are often just primitive types like strings or integers.
So the key part of a Kafka object is probably not a unique identifier for the event. If you're thinking of like a primary key and a database table where the key uniquely identifies the row. The key in a Kafka message is not like that.
It's more likely the identifier of some entity in the system, like a user or an order or a particular connected device, like the ID of that smart thermostat or something like that. And this may not sound significant right now, but we will see later on that keys are crucial for how Kafka deals with things like parallelization and data locality and things like that. So that's the very basics of Kafka, and the one sentence definition and the notion of events.