let's talk about this so messages are retained right so obviously it's kind of implied right you produce a message it's stored on the broker and it's kept there and then with the decoupled system a consumer comes along later at any point in time and says hey I want to get this message I want to read this information right so uh there's two different scenarios I wanted to decide to keep these situations right as long um if you're published to a topic and there's still an active subscription on that topic right now we call that visible
to a subscription so there's an active subscription out there that hasn't been explicitly deleted so you create a consumer you know we saw in that scenario gave it a subscription name and then that subscription is there uh then we know that there's registered interest in getting that message and we will always keep it regardless of what the all other policies come into play so this is the global global one we know somebody's out there that wants to get it so we're going to keep it for them no matter how long it takes regardless of other
policies the second one is once it's out outside of an uh a subscription so every active subscription on that topic has acknowledged that message and the cursor has been been moved beyond that position of that topic then it falls into What's called the retention policy frame and this is for uh the right once read many times scenario you want to do this for a streaming use case right you publish this data uh it's kept inside of Pulsar you use your reader interface use things like that to go back and read this information multiple times and
the retention policy specifies how long you keep this message around on that Now by default no retention policy is set so this means that there's no subscriptions are present messages are immediately deleted so I ran into this when I first used pulsar I came from the Kafka world I said okay you do one you create a producer you write some messages you connect to Consumer you go read them and I was very confused because I would publish the message then I connect to Consumer and there's no messages there I'm like what's going on pulsar's broke
right turns out that I ran into this this actual Behavior right so you have to first create a subscription you do it backwards you create a consumer first then you start producing those messages uh on that on that topic and then then they will be uh you know retained or Alternative Center retention policy by default and those messages are are retained now retention policy is one thing so that is on unacknowledged messages so that's unacknowledged messages for unacknowledged messages the ttl's capitalism backlog quotas what's come into play so this is limiting the amount of messages
that that uh that that are in an active subscription right these are the cursor is in a certain position and the messages that happened acknowledges on acknowledges are here and you don't want to get that consumer on that subscription to get too far behind right you won't be processing data that's you know minutes seconds minutes hours whatever weeks late uh you're only interested in recent data so you can set specify ttls which is how long the maximum age of a message uh is kept in that that subscription or what's called a backlog quota which is
just the total size if I fall around you know 10 kilobytes behind uh you know the different policies come into play that you make sure that that cursor is basically fast forwarded in such a way that your consumer basically catches up you know inadvertently right so a TTL limits the length of a Time in seconds a message remains in a subscription right uh as we saw earlier each message has a published time on it and that's what's used to determine this information so when it arrives this message you know is here uh you know sort
of thing it's published it arrived at at you know 10 58 and if you set a TTL so every message has a timestamp right from 958 959 10 o'clock at 10 o'clock comes along if you know if your message was set to be your TTL is set to be 120 seconds then after you know automatically you don't want to consume these messages anymore the cursor is automatically moved forward uh past this point right so it gets to be 1002 your TTL is two minutes old you're going to skip automatically all these messages whether they were
you know consumed or not that's what this is showing right so you automatically pull this cursor ahead it checks every every periodically when a message comes in it'll automatically move this cursor forward this happens on mic on the broker on the dispatcher's side so again it decides what message is to send it's going to say hey is my is this message before I send it outside the TTL and automatically skip this skip this for just make sure that you're guaranteeing that your consumers processing data that's no longer than x seconds old whatever you specify and
for certain use cases that makes sense again you're doing you know sensor data or you're doing a ticket stock ticker quote I'm updating the UI I don't care about the stock quote five seconds ago or a minute ago I care about the most recent one so I can skip this old data I don't need to process it I only need it for most recent information right again they can be set on a per topic or namespace basis so again if you want to have a global setting across multiple topics you set it in a namespace
here and this is the command to do it uh and again message TTL and the default setting of seconds on that and then you can override uh messages as or namespace policies of the topic level well as well that's another piece of information embedded in here so you can set it at a topic level if you want and you can override it a previous one that was set at a namespace if if if you know if if if you want to and this is defined this is true for all topics not or all policies not
just the TTL but basically you can set it either one here and these are some examples of use cases where again tto will come into play stock quote section maybe real time you know Sports data scores things like that you don't care about you know score in the first quarter in the third quarter or something like that first period second period you want to skip forward and then real time push notifications you only want the most recent information uh only let's say on like a you know I don't know a weather notification hey there's a
there's a storm alert you don't want to get you know 10 seconds later or 10 minutes later you want to just said that a lot at once the most recent information is give over messages so those are good examples of that right so that's how that's actually handled and that automatically happens on the broker side and it moves the Christian for you automatically you don't have to do anything with it right there's no you're not even aware of this you just automatically on the consumer side get messages that are within that TTL now backlog quotas
approach the same problem differently right so you're getting the problem is your consumer is falling behind the producers as far as messages are being produced they can't consume these things so you can say I want to specify once I get Beyond 100 megabytes my backlog gets to be 100 megabytes or more I want to do something about it I never want to get more than that found that volume of data behind let's say you have a pretty pretty fixed size message you know what these message sizes are that translates into you know some set of
volume of data I get 20 000 messages behind that's that's 100 Mega 100 megabits whatever so that's that's what this is there for now you can specify uh this is how you communicate back to to the messages you can communicate back to the to the producer to throttle the producer and have them stop producing messages right so the first two are uh request hold that we talked about uh before and this is the broker will block new Publishers on the producers right and how it does this is so your consumer's consuming it gets to this
message it basically has the broker stop sending acknowledgments and so your queue on your producer side fills up and they can't send messages and this is the more way of reaching through from the consumer to the producer to say hey stop stop sending messages I can't keep up the second behavior is to you specify that I want the producer to get an exception right so when you that's when you do a rather than having you still you know send the acknowledgment you're still sending internal cue but when the producer tries to send a message an
exception is going to be thrown and it's going to explicitly say hey uh backlog quote exceeded and then on your producer side you can be aware of this and you can decide what what action makes makes the most sense on that so those are the two proactive approaches or policies that you can set that impact the producer in this producer consumer backlog equation the other one is to just active proactively do it on the consumer side and do what's called backlog eviction and silently to sort of remove older messages uh in such a way that
you're back underneath that whatever quota you specified in this case 100 Megs right so you're you're up to 150 Megs backlog on the consumer side automatically it will start reading those messages and moving that cursor forward for you again like like the TTL to some or the way it's just going to start seeping those forwards to say hey this older messages are just going to be skipped they're going to be dropped they're not going to be processed uh and get you back into 100 megabits so it's going to sort of skip skip forward for you
as well right so this uh you know allows you to you know limit this Behavior back how quotas apply back pressures to producers as I talked about the first two cases causes producers to block or throws an exception the second one is all the active consumers uh you know you know let's just say you don't have any active consumers you have a subscription open uh but all the consumers that use it are dead but you haven't deleted it then you again since we don't throw away these messages you can you know end up with unbounded
disk storage so this backlog eviction takes care of that limits the total size of the storage by removing these messages uh you know within you know the variants of the data volume so you make sure that you're never going to be just having unbounded storage grow because you forgot to delete a topic or I'm sorry delete a subscription on a topic and so this is another way of active proactively just deleting messages on that side so those are your two approaches uh it's also they're not mutually exclusive you can use both at the same time
because they both solve different problems uh in in in in in in some way right so some just you know have this this scenario here uh you can use the TTL for limiting the age of messages and then a backlog for limiting the total size so again if you get older messages in a burst uh for example periodically just pulling it Forward uh and then there are also if messages are slowed down on the producer side as well so you can obviously have this throttling mechanism where you get bursty producer side you get you fall
behind uh you can have both TTL uh have that information uh Tyler producer sort of throttle back I'm sorry you know the backlog quota setup and you have TTL to automatically moving forward they both work together uh as well so this allows you to slow down the consumer so you don't want to see expire less messages you skip like you skip less messages right so that's kind of a balance uh between the two