by default if you spin up a pulsar cluster uh there's no encryption authentication or authorization authorization enabled right so it doesn't confirm it doesn't configure any encryption like TLS or authentication things of that nature so by default any client can connect to the Pulsar cluster by just the plain text URI and your messages are coming in as clear text even though authentication uh either so everything so everybody's eavesdropping on that connection can get can just look at all the raw contents of the data so the first step in locking this down is to look at

authorization but remember that without TLS uh you know encryption endpoints those usernames and passwords are going to be transmitted in clear text as well so it really starts to begin you know uh by just enabling TLS versus how we basically recommend that uh you know availability or just start locking it down now now if you're using Pulsar inside a private Network I have a set of strong firewall rules or something like that just being being filed and that might work for you might not have to do it but uh if you're going to expose it

over the public internet in any way outside of a behind outside out of your uh private or corporate firewall then we highly recommend that you use both TLS and authentication uh on your on your Pulsar cluster at a minimum before you expose it to the wild or else everybody's gonna be out there uh you know these people Bots will find it they'll start publishing data to it and then all these script kitties will be taking over your Pulsar cluster and doing nasty things to it so definitely definitely uh want to make sure that you're fixing

all that so obviously there's multiple layers to security the first being a connection encryption or connection over the wire like that again Pulsar supports TLS encryption between all components so you know you can expose the endpoint versus the broker the clients connect to a TLS enabled endpoint but then you could also make sure that your communication you know between uh pulsar and zookeeper and pulsar and bookkeeper can also be TLS encrypted as well to make safer communication again if you want people inside your organization uh eavesdropping on this communication so it doesn't make sense to

tell us how that works is the client encrypts it on its side it sends it up when it lands on the broker it unencrypts it for a second and then it's then it decides to send it on now you know at that point everything's in clear text again so we have secured secure information in your messages as it travels from your broker to The Bookies it's all in plain text so if somebody gets into your behind your firewall and starts looking around uh they can get access to that information as well so that's something you

want to consider doing you know if if at all possible the brokers in the proxy all all perform the TLs that can do the TLs termination directly or you can use something like a load balancer to terminate uh you know that at the at the end point so whenever you want to stop TLS termination that's up to you uh you know it's just you know this is just something that you uh can want to perform at at some point uh to make sure that you know you and that TLS termination you can use a different

sort of TLS with different certificates internally and something like that make sure that it's actually secured now authentication is you know basically challenging somebody to prove they are who they are we support multiple authentication techniques within Pulsar so you things include you know JWT tokens oauth tokens Mutual TLS uh uh you know off offense it's another security mechanism uh we do not have like username password sort of thing so you have to provide your your credentials and then you know when the clients First Connect they have to you know provide the whatever they must specify

I'm connecting with these security credentials uh and you know this is the type of security I want to use and these are the credentials I have and then we'll either authenticate or not and then once you've proven who you are then the authorization comes in to say what actions you're allowed to perform on different topics and namespaces are you allowed to do any administrative tasks can you create topics can you create namespaces can you delete delete those assets can you write to a topic can you read from a topic uh as well so all those

authentication rules are done on a on a role based role based basis right so as I mentioned we support a plugable one so we support more than one you can actually have multiple authentication mechanisms simultaneously as well so if you have different clients that have different security uh you know access to different things some can have jwts something you go off something use TLS all at the same time and those all can be supported simultaneously as long as long as you provide the right credentials you can I'll be authorized with the system again once you

have authenticated you are assigned a role we map your user ID that you signed up with or your token to a specific role whether it's uh things like admin or app one or you know student student01 for example you get mapped these individual roles uh you can Define and then they use those permissions to Define what the different clients can do as well so when I set up this class for example I create everybody their own tenant with a student zero ID and I made each of you uh and gave you admin permissions on your

tenants right so you can create topics as you see fit namespaces as you see fit uh and you can publish and write to all the topics and namespaces in your tenant by default so that was sort of the way the implement this class for example so you have full admin rules on your tenant only but if you try to go a student want to do something in student two you're not going to be you have no authorization also you cannot publish messages there we made a big deal about hey you know we start out with

this code it's it's by default let's do one but you want to replace it with your student ID because if you try running with your token on a different student's tenant it's not going to work everything's going to fail and you're going to get all these nasty errors so that's something uh just just to be there and again that's how the between authorization who you are and the access control is determine what you're what you're authorized to do as I mentioned before these are authentication providers that are natively supported uh Mutual TLS you want to

have uh third-party science certificates we do obviously you can use self-sign but it is not recommended it's not a best practice uh offense again is was a was the security the authentication system that came out of Yahoo since Pulsar came out of Yahoo and natively building support for that if you want to use Kerberos uh you know tickets inside if you have a Linux system and you want to use Kerberos or link your kubrows to something like ldap if you're running in a Microsoft environment uh you know a the active directory element of that Kerberos

to be a bridge for that sort of system uh Json web tokens uh oauth tokens for stream native cloud as well uh yeah and this is just a note that that's really just our stream native only thing we haven't really sent to the public yet but it is supported with stream native cloud so again we validate your credentials when a connection is first established so you do a client you do client connect you you specify the service URL as we saw in the code you specify your authentic authorization authentication method and your credentials and then

you try to connect eventually say you know create the client and at that point that's sent over and the initial connection is authenticated and that data is stored there uh you know and it's and it's kept on the client side now the broker uh periodically is going to check the status it's going to challenge your force you to re-authenticate by default every every 60 seconds just just to make sure that if you want to revoke security permissions from somebody after this after the fact that those those wishes will be honored right so if you have

a long running application again we're working in the streaming world you start an application you're consuming data you're processing I don't know you know web clicks on your website you spin up and running that thing is supposed to run for you know days and weeks and months before you do a code change right so this thing is just designed to run continuously now somewhere along the way if you just said hey I don't want this guy to have permission and you change their authorization uh or you know change their credentials to to just disconnect them

and revoke their security credentials well if you only authenticate them when they connect then that's a big security Gap because they'll continue to operate for however long that process is running until it until it goes and re-authenticates again and then finds out hey this actually guy has had a security credentials revoked so that's why we periodically every every minute the broker will send a a chat challenge to all the connected clients to say we authenticate and it sends the same authentication credentials again and then it just validates that you're still authorized if not it's gonna

you know it'll it'll cut you off base obviously you're you're disconnected uh you know but it forces you know re-authentication periodically to make sure that that you are who you say you are and you're if you get revoked you get revoked on that right so again this is just walking through that process uh you know internally the client supports uh an authentication refreshing and the credentials expired then on the client side we have a refresh authentication method to initiate the refreshing process right so again if you have a lot of systems you set up and

these customers will set up that they have JWT tokens or something like that but they don't live forever they force you to go back or tail a certificate they force you to go back and get a new one every so often right so back to the use case if you want to have a continuously running client uh and you want to change these credentials but you don't want uh your processes to crash because you refresh them you've just forced them to to clean these out you know just in case somebody else got access to these

credentials you want to sort of inspire them and just make sure that you have newer fresher ones and so that you're limiting your make sure that your access is controlled to these people who are only authorized then this refresh authentication method uh internally you can have uh basically different ways of doing it it's a it's a basically in a interface that you implement and you can have it do whatever you want to do to do this refreshing and we have some building ones that automatically you know go get new certificates or go get a JWT

token from that and if this refresh fails so first you get disconnected because you're uh you know the workflow would be uh you're using the JWT token let's say it gets expired so that minute goes by the broker challenges you again you pass in your old JWT token which is not expired it fails right and so once that fails then if you have this enabled it will try to refresh it and go out and get a new do jwc token if it gets one it'll pass that new one there and everything is good it's it's

a new token it's the one that's authorize you everything goes well if you if you do if you fail to get that token or whatever then you're going to be disconnected so it's another way of making sure that you always are authenticated and you can make lock down your cluster a little bit more on that so uh you know we talked about a little bit about authorization and ACLS there's really uh you know again a role can be any arbitrary string you want there aren't any other than you know a super user and uh admin

those are really the only ones that are sort of Reserve keywords uh those ones mean exactly exactly what you what what you think they mean uh but you can create whatever you want you know um as we saw in the previous slide app one or microservice this or you know Finance person that or whatever you want these things to be and map them to these different roles and then you take these roles so many people can map to an individual role you have you know these are all developers these are all microservices uh for example

uh they map to that and then those roles can then be applied to different policies or you know topics and you have the the different permissions you can grant right now or produce consume and admin right so these are the different uh and then Super users incubating you know tenant admin access that means they can do anything on a tenant and that's what I've done with your with your individual student ID so again you have a tenant level access uh permissions for your roles and your roles mapped to your uh that token that's been granted

to you so when you sign up with that you get that role and then your granted token admin uh commands with that and you can assign uh additional things so you as a tenant admin that can create new users off of that and say you know this particular subuter's user can only have produced permissions on this namespace or this topic and consume as well so you have very granular rewrite access and that's your ACLS are implemented through these produce and consume uh permissions granted based on your role right so you know these are the different

so if you're a super user this is kind of like a hierarchy of the super users can do anything and everything it administers the entire cluster it can set cluster-wide policies as far as like uh resource quotas for tenants um you know it also creates the tenant administrators right so it is a delegation sort of model where you have one super user for your entire cluster and if you every time you add a new team comes onto your cluster you say okay you know development team x uh who's going to be your tenant administrator you

create one role for that you give that token out to one person and then that person is responsible for administering administering their tenants super users can do it but you really want to delegate and say Hey you know every one of your you know this is your tenant do whatever you want with it so you can create and set your policies on your name spaces you can create and uh you know create the different topics and create partition topics and then you can also Grant permissions to different uh parts of your organization you can grant

consume and produce uh permission on individual topics or namespaces to say this person can do this or that uh then these are sort of a list of the policies you can set right you know cluster creation Geo replication cluster quotas uh tenant namespaces again can set namespace resource quotas or topic level retention policies backlog message CTO we talked about Tribune tiered storage message deduplication all these different policies can be enabled uh by their individual tenants because you know your names you know you you know your namespace better you know your policies you know you know

the use case and so we give you more granular control over that that's sort of the model that's adopted and pulsar's is a delegation model right so now while authentication doesn't require encryption as we talk about it's strongly recommended to use TLS uh all the all the different components within a pulsar cluster speak uh TLS and they have to communicate with one another uh um typically uh or at a minimum uh the proxy or the publicly facing endpoint definitely should be TLS encrypted that way again this is all your income and inbound traffic whether it's

a proxy or a broker or load balance or something in front should have TLS enable uh and then you know you can have internal communication be unencrypted I don't know if that's typical anymore but uh it's that's the minimum and then you can obviously increase your security by having TLS enable between all these different uh communication points you know the broker to the bookkeeper broker to zookeeper uh bookkeeper to zookeeper have them all secure on that

L1Dev Lesson: Apache Pulsar Security