You are here for um traffic Avail sorry optimizing El traffic for high availability I'm John zis I customer success at elastic low balancing and with me is enrio Lor who is a Solutions architect today we're going to talk about uh some different things you can do for your applications that we deliver over the Internet for high availability and deep dive into the routing we have on the load balancers themselves as well As a dive into a new feature we just launched that uh wasn't announced before but just before reinvent the lcu capacity reservation which we

launched for application and classic load balancers so kind of a bonus item there uh after the session you should have a better idea of what high availability looks like in AWS how you can configure your load balancers and other systems to ensure that your traffic is being routed to healthy and adequately scaled Endpoints um you'll know about DNS we like to always talk about DNS and you'll know how some of the advanced routing features of application load balancer can be used and some best practices and of course we'll cover uh the lb capacity reservation we

use a few different uh icons during the slides to indicate we've got internal details or if there's a new feature launch or links to documentation for a QR code we'll pause try to give you time to take pictures of That the recording will be on YouTube afterwards so uh we do have a lot of content to get into so let's Jump Right In uh we're all here because we deliver applications over the internet now when I say internet it could be the public internet it could be private networking generally these days everybody's landed on IP

networking uh we do use two protocol versions today four and six uh one of the nice things about a load balancer or a bunch of AWS systems is You don't have to learn and ramp up to IPv6 across your entire stack to get the benefit at the front door we're also going to be talking about TCP and UDP and then on top of TCP HTTP request traffic and we'll talk about the different load balancers and how they route for the different U protocols for our example here uh we've got just a normal client so how

do your clients connect to your services and just very very generically um a lot of you Probably already know this really well but uh just so we're all on the same page your client's going to have a network they're going to get some kind of Gateway with ipv4 they're going to have private networks uh and then the Gateway is going to anat them to a public IP where they can then connect to the rest of the internet uh if your application in AWS this eventually ends up at an Ingress Gateway which usually does the opposite

kind of network Address translation sending the traffic to your private network uh one other thing we're going to do so in the uh in all elastic load balancers you've got Route 53 built in with h checks and DNS so when you create a load balancer we create a host name that's unique and you can use that for C name or Route 53 Alias and your clients will connect to that and then get into the the traffic to load balancer we use this for everything so we do this for uh health Checks we do this for

uh adequate scale if you look at DNS for your load balancer those IPS will be the ones that are adequately scaled for the detected level of traffic as well as healthy and passing health checks now cloudfront and Global accelerator are both great things to put in front of your load balancer and all of these three systems give you roughly the same options as far as what you can do for availability but some of them like Cloudfront or glob accelerator have some additional features you can leverage for this example for my part of the talk we're

going to focus on global accelerator the main benefit of which is to get your clients onto our the AWS backbone as quickly as possible uh they've got a great demo on the product page that goes through and can run a whole test you can see the actual difference and the benefit here is you get to one of the many Edge locations Where we have private capacity that we then route to all the regions so in a simple setup for one zone or one one region your clients would connect to the closest Edge location and then

from there it would be AWS backbone all the way to the closest region that your applications are in we have the option to do this to many different regions at the same time and this slide is highlighting that you've got the health checks for each of those Regions are something that we're doing at multiple levels and just like Route 53 takes an IP out of DNS when it goes unhealthy Global accelerator will send traffic only to the healthy IPS and generally when we're saying traffic we're talking new TCP connections if you're doing TCP or UDP

and HTP new requests if you're you're doing uh HTTP so if we zoom in to a single region we've got under the hood our aob fqdn and you can look at that and you'll see Usually one IP address per Zone but an Al can scale to multiple IPS per Zone uh depending on how big it grows each of those IPS is independently Health checked by Route 53 individually a target from Global accelerator that it can be routed to and able to be failed out of DNS if it becomes unhealthy so one of the things that

a lot of people have confusion about that we'd like to get rid of the confusion for is when are you going to fail out of an IP or a zone So when you look at your load balancer and you see that you have all these different IPS how do you know which ones are healthy or what should you expect if one of them becomes unhealthy these are the different reasons that a load balancer IP can fail out of a zone now a network load balancer is slightly different than an application load balancer because it'll have

one IP per availability Zone and that IP won't change for the life of the load balancer So when you create it if you're using a public load balancer we recommend using a public EIP that you pre-allocate because you get to keep that or if you just allocate a internal or an NLB without ANP will you use one we'll just pick one from your range and when we're looking at that when you when you've now got your individual zone so on network load balancer a Zone can be failed out of entirely by just pulling that one

IP on Al you could have many nodes in that Zone and so part of the Zone could be healthy and part could be unhealthy the other thing that can trigger these so the target group that you have on each of your load balancers can fail your health check and I went deeper into this last year if you want to go back and see kind of a deeper dive into this feature uh it does let you change the threshold so today the threshold is 100% unhealthy for everyone that's the default AWS doesn't like to change defaults

on People so we didn't change it when we brought this feature in we recommend people look at it and see what makes sense for you and configure it this does two things there's two settings you can set one is the target groups fail open and remember fail open means treat all Targets as if were healthy so if you have the default 100% of your targets need to be unhealthy to trigger fail open the more relevant part of this is the next setting which is to fail DNS so You can set both of these settings and

you can set it in a way that effectively disables the Route 53 health check maybe you have a Target group that you're populating it's not expected to be healthy flip this to you don't want it to ever fail the Route 53 health check but if you have a case where maybe you half of your application in a zone is unhealthy you want to actually fail out of that zone this setting would do that and it would propagate forward to Anything in front of that that's also using Route 53 so the other big thing that you

can use here is the arc zonal shift all load balancers are integrated with it with zonal shift you can say this zone is unhealthy we want to fail away from it and we just launched or they just launched auto shift which lets you say if this zone is breaching these dimensions on these unhealthy the characteristics fail out of the zone and that you can turn on and it's automatic And it's completely under your control what to say should fail and what to say won't fail uh for E we will sometimes shift out of a Zone

if there's a big issue we might fail all load balancers out of that zone we do try to take precautions to minimize impact when that happens if someone has say a healthy zone and all of the other zones are unhealthy but there are risks there that you should go look at and say the best practice is use every Zone on your load Balancer that's configured so if you have a zone that you're not actually using don't leave it configured on your load balancer create a new one without it or remove it if it's aob so

those are those are the first few things we talked about the target group there is one exception to the Target group and that's the NLB empty Target group so on ALB the default is cross zone is on so we think an empty Target group is less of a risk and it has a a blast radius to fail Everything out but on NLB the default is a cross Zone off because of that an empty Target group we defaulted to say that counts is unhealthy now you can disable it but if you have empty Target groups in

some zones and cross Zone off you could have a case where you have zones that just they're not in DNS and if something triggered that zone that was healthy to fail out of DNS you would now be failing open and potentially sending traffic to unhealthy zones so Populate zones that are filled use ARC and configure auto shift when you're ready to shift traffic out based on application or whatever metrics you want to select for that when this fails out whatever IPS are detecting the failure in the E will be removed from DNS and that happens

automatically you can change the thresholds or you can do it manually with auto shift so if we go back to our Global application and we pivot our view to Look at the targets that you have so these are application servers in a bunch of zones and spread out through six regions in the world we're using Global accelerator we have it locating the traffic to the same Geo proximity for any of the regions that are that are there if we zoom in on this we can see that each of these regions will have at least one

IP per Zone on the elb those IPS are all Health checked by Route 53 as I mentioned uh and if those health Checks fail we'll pull those out of the zones so if we go back to uh one of the one of the regions and we just look and say okay each of these IPS relates to a machine in a single zone and if one of them becomes unhealthy for whatever reason on the previous slide that gets removed from DNS if we zoom out to a Global Perspective we can see that you can have a

global application that's actively routing around multiple zonal failures And possibly even Regional failures and with that when you send traffic through Global accelerator you'll get routed around any of the unhealthy zones and routed only to the healthy zones now one other side of this that we do is the same health check that Route 53 is doing we're doing on the Node itself so when we ping the node and say are you healthy rap is doing the exact same thing their action is to remove it from DNS and our action is to actually go replace it

so We'll detect it it's failed we will pull it out of DNS but that is not as fast usually as Route 53 which takes about 30 seconds to trigger and the dnsl of 60 seconds so about 90 seconds and you get IPS completely out of DNS client should be reconnecting uh if they have any errors and then we'll come in and replace the node if the same failure occurred we have another system that we're actually looking at that the node's performance across its peers so We take all of the nodes in a load balancer and

say which one of you has any outlying metrics and if it's a metric that means you're having a problem like maybe higher e 5xx or higher unhealthy host count account then we'll proactively and gracefully remove that IP from DNS and add a new node to replace it and then we'll come back later and do some checks and say was that a good replacement and we do this anomy detection on every load balancer Every five and 15 and 4 Hour window of every day all the time and this is detecting failures that are gray failures that

aren't enough to trigger the help check and again this is on the AWS side and not on your target group side so on NLB we have a similar thing that was Al uh on NLB we have something that we tag onto the health check packets so we're using AWS hyperplane under the hood and AWS hyperplane gives us an eni That is like an anycast IP address that is integrated with ec2 software defined networking and what that does is it can distribute traffic coming in amongst a host of resources that are supporting that traffic and then

they'll route it to the Target we can change all of those under the hood without affecting the inflight traffic because of that integration with hyperplane but every hell check goes through that same path so when you create your NLB we have an Eni when we send the hell check we send it through the eni to the targets and if we detect that it was dropped somewhere along the way we can proactively shift that load balancer as a gray failure out of that zone so you might see sometimes where that IP for one zone got pulled

out of DNS and if you don't see any problem with any of your hosts then it could have been this case where we're actually uh proactively shifting out of it so last year we talked about how we Integrated with zonal shift for the application recovery controller this lets you proactively decide I want to shift out of this Zone based on whatever you want we did make a change this year so we now support cross Zone on with zonal shift in the past you had to have cr Zone off or zonal shift wouldn't work and now

you can actually have it on when we shift out we currently will shift out the whole zone so we'll shift out the the the entire IP and traffic shouldn't Go there anymore even if cross zone is on I mentioned a little bit the auto shift which is a new feature that uh Arc launched this year and it lets you configure uh alarms to say if these are breaching fail out of the zone and a reminder that this is in addition to all of the other reasons that will fail out so to recap we have zones

that we will fail out of logically these are our ec2 blast radiuses where we say everything in this Zone has no relation to Everything in this Zone we don't we try not to cross anything over that means things like earthquake places or flood planes or power grids we keep those separate through all of the zones and we have many things that can trigger an actual zone of failover including ones you can't control and with that I'll take a break and hand it over to enrio thank you John let's see now how the traffic distribution to

Targets in ALB is Actually a result of multiple decisions taken at different levels first level of decision is regarding the elb IP selected by the client in order to send traffic to the loot balancer once this traffic arrives to the elb a Target group is selected by the Erb to handle the traffic the third phase is regarding the actual Target selection within the selected Target group now let's review each one of these phes starting with the ebap Selection the selection of the ebap used to send traffic it it is made by the client and it

relies on DNS in fact when a customer create an load balancer an elb a DNS name is going to be generated and assigned to the Lo balancer this DNS name Will turn the IP addresses assigned to theb and the number of the Sip can change depending of the product used with network balancer there is an IP assigned for each of ability at Zone and With application balancer there are always at least two IPS assigned and then this number can grow up to 100 IPS regardless of the specific type of product the right way to access

an lbs through this DNS name so the first thing that is going to happen inside a client is the DNS resolution the DN the client sends a DNS query to roof T3 and for what John shared before roof tt3 is the AWS service maintaining a list an updated list of healthy IPS assigned to All balancer and those IPS are also the ones that are considered scared enough for the detected workload so R3 is going to return the list to the client the client is going to select one IP in the list and usually it is

the first one in this list this is also why B while returning DNS queries for the same DNS name to multiple uh clients is always trying to uh return them in a different order and this is done in order to influence a more balanced decision Across all the uh IPS available now ultimately it will be still a client decision the one regarding the IP used to sent traffic so for this reason is it extremely important that customers configure their clients in order to to resolve the DNS when they want to send traffic to ANB and

also to respect the time to live associated with the DNS entry this is important with applicational balancer because we know that IPS are dynamically Changing over time but is important also with network balancer because as John shared before there are events in which NLB control plane can remove ANP from the DMS name so when customers um configure their clients they should always take in mind the full tolerance and iability aspect of it in this example it means configuring those clients to refresh the DNS when a TCP connection it close and expectedly and this is done

in order to Allow the clients to get an updated list of lbps then when reconnecting clients should use exponential back off and Jitter and this is done in order to overwhelm a specific elbp in case a large number of clients are reconnecting all at the same time then it is a good idea to uh try another IP address in the list in case a client is experiencing problem with a specific IP and this because even if It's true that theb control playe systems to detect when an IP is anal and replace it automatically is also

true that this um process takes time usually about 30 seconds so in order to expedit the recovery process of a client it is a good idea to try other IP addresses in the list now this selection of the AP made by the clients it might have a direct impact on the traffic distribution produced by the lb this is true when the Cross zone is disabled the cross zone is a setting that allows the lb to send traffic to targets that are located on a different avability Zone compared to the IP that receive the traffic so

when the cross zone is off it means that the lb is going to select only the targets that are within the same avability zone of the a a ALB IP that receive the traffic this means that uh if the clients are not selecting in a balanced way hold the IP is available and the Cross zone is off then uh this will result on an uneven distribution as it is showed in these slides here we are seeing a graph showing the number of requests received by um uh by some targets so I perform a test with

four Targets in a Target group two in each of ability Zone and as it is it visible essentially only two Targets were receiving traffic and this because my clients were sending traffic only to one ebap and the cross Zone was Disabled when customer detect this kind of uh uh uneven distribution they should consider to enable the cross Zone setting and this will allow the elb to consider all the Targets in the Target groups when making uh a selection on which Target should receive the TR the traffic however they should also be aware that this might

increase the charges applied to the ARB and this is true with network balancer where when a request is crossing the boundary of an An avability Zone then an Interra charge is going to be applied however this uh traffic is free of charge in application Lo balancer now let's move to the second level of decision and this is regarding the target group selection I'm going to focus only on application Lo balancer because with the other products Network Lo balancer and Gateway Lo balancer the selection of the target group is actually fixed based on the listener That

receive the traffic this because they can customers can only configure one target group behind a single listener while with application balance answer this is not true customers can configure multiple Target groups behind the same listener rule this behind the same listener and then they can leverage for this listener rules this means when an HTTP request arrives to ANB listener which has Associated multiple rules those rules Are going to be evaluated by the lb in a sequential order following the priority associated with the rule then when a match is determined with a specific rule the target

group that has Associated that is associated with that rule is going to be selected if none of the rule are going to be matched there is always defined the default rule which is going to handle the default case rules have conditions Associated Those conditions are actually the ones technically what the elb are going is going to evaluate and those condition can be based on a great variety of different parameters included in the client request this can be can be the part included in the request the HTTP method more in general any type of HTTP header

included cookies query parameters and also the IP address of the client uh that used by the client to send the Request to the elb now this Rich set of options give elb great flexibility and indeed elb has been designed to simplify architectures and it is intended to load balance traffic directly to the targets behind reg regardless of the specific technology used in the those application behind the Lo balancer one way to simplify those architectures is when for example customers have multiple elbs configured In their in their account to serve traffic to multiple domains when they

have this situation they can consider to aggregate those multiple Lo balancer in to a single elb and this can be done leveraging host B host header based rules in fact in the host header in the client request it will be contained The Domain that the client want to access if customers are offloading the TLs to uh the load balancer they can configure multiple certificates on the Same listener and then the elb will be able to present the right certificate to the right request uh thanks to the support of the server name indication the Sni is

an in an info that is included in the client request in the more specifically in the TLs handshake and again in this information it is included the host name that the client is trying to access so checking this information the lb will present the right certificate to the uh client Allowing customer to um use a single Lo balancer to serve traffic to multiple domains also when the https is needed this has the benefit to of a less operational overhead because they will need to manage only one L balancer in this case and also in terms

of cost optimization however when they performing this decision they should also be aware uh be careful of the blast radius un impact impacting for example DLB will have an impact also on all the Domains that is serving so it's always a balanced decision between the blast radius and the benefit that they can get with this type of architecture the second type of best practice I would like to share is regarding the holder of the rules on the listeners so customers in general terms should always prioritize the most utilized rules this because we enable the lb

to minimize the number of evaluation that is performing on each Single request received on the listener this will have the benefit that as I said it will minimize the number of those evaluation and even if it's true that the lb is able to scale his capacity based on the also on the number of evaluation it is also true that the evaluation have a computational cost so for customers that would like to minimize as much as possible the latency haded by those evaluation they should consider to prioritize the um the rules That are expected to be

the most utilized last best practice I'd like to share is regarding specifically for internet faing load balancer so when customer expose a load balancer to the internet they cannot really predict what type of request those loot balancer is going to receive together with with some legitimate request the loot balancer is going to receive also a certain amount of random request from BS Intruders on The internet so it is it is a good idea to handle the redirect and reject of those random request through a default rule in this way the elb will have the responsibility

to produce an answer to those um random request without forwarding the request to the Target behind and if we imagine in a specific moment that the elb is uh going to receive a very large number of those random request the targets behind the lb will be protected by the elb scaling System which will scale up and will absorb this increased amount of traffic last mention for the request routing uh is regarding the weighted Target group feature with this feature customer can associate multiple Target groups behind the same listener Rule and then they can associate a

percentage of traffic that they would like to steer to one target group rather than another this is very helpful when blue green deployments in fact when customer Want to roll out a new version of application they could Leverage The weighted Target group feature and in this way they could uh uh test their new version of the application with some real client request then if they want they can also turn on the stickiness at Target group level and in this example it means that a single client will be rooted consistently to the same Target group meaning

the same version of the application Now let's move to the uh last stage the target selection within the target group we will see that each Erb product has Associated it its own rooting algorithm and for this for this reason I'm going to discuss each one of them separately starting with a network Lo balancer NLB is the AWS manag service for layer for load balancing and as a layer forload balancer is able to perform the target selection also based on a combination of information at layer Three and layer four when NLB it's used with TCP listeners

the target selection is performed when a new connection is established when this traffic arrives to the to the network balancer a six tle is going to be extracted from the TCP and IP header this exaple will be the input parameter of a flash algorithm which is the rooting algorithm used by the network balancer to select a Target behind and then to forward traffic to It when performing this selection um the network balancer will track the association between the six tle and the selected Target in a distributed database we will see in a moment why this

is important before doing that I would like to mention something regarding the r not the traffic the traffic that is generated by the targets and is sent to the client also this type of traffic will travel through the network Lo Balancer in fact when a a con new connection is established a Target is selected by NLB traffic is forwarded to the Target then uh thanks to the connection tracking system that is implemented in the AWS hyper plane which is as John shared before the underlying technology U that allows the NLB to manipulate traffic and send

traffic back and forth from a client to a Target so thanks to the hyper plane and its integration with the ec2 software defined Network The Return of the traffic is forced to go again back to the network L balancer and this is not so obvious specifically when the client IP preservation feature is enabled on network L balancer in fact in this case the destination IP of the traffic sent from a Target to a client will be will include the real client IP that open at the connection and this IP is not really An IP assigned

to the network Lo balancer so for this reason there is the need of this special type of tracking system which is implemented in the ac2 software defined network will force the traffic to enter back the network Lo balancer which is going to Rite The Source IP of the traffic with its own IP and then send the traffic back to the client I mentioned the importance of the distributed database it is important for Multiple reason the first one is because it allows the network L balancer to avoid recalculating the flow as algorithm each time that is

receiving traffic for an existing Connection in fact when data arrives for an existing connection then lb is going to search inside the distributed database the association between the six tole identify find this TCP connection and a Target selected if it finds a record it's going to select the same Target That inally uh was selected so it means that the NLB is going to perform the target selection only once when the connection is established and then it's going to consistently rout traffic to the same Target that initially was selected as long as the traffic is exchanged

inside the same TCP connection the second reason it's something that John already mentioned before is because maintaining the connection stay Externally from the physical host in the hyper plane manipulating the the traffic it allows then lb to replace one of those physical host for example for a scaling activity without having an impact on the customer traffic every open connection has Associated an idle timeout which by default is set to 350 second when this uh when a connection is idle for more than this timeout then lb is going to remove the record from its distributed Database

for that connection meaning that when traffic arri for the same connection then NLB won't be able to find the record in the distributed database and then it's going to reply with a TCP reset to the client since we releas um since we released NLB a very popular request from our customers was the ability to customize that idle timeout and recently we released this new feature for TCP listeners now customers can um submit a Value decide a value that goes from 60 seconds to 6,000 uh for the idol timeout if you want to know more about

this feature feel free to follow the QR code in this slide going to pause for a moment okay when n NLB is used with UDP listeners the same flash algorithm is used for the Target selection here however a five tle is used to calculate the flash protocol source and Destination I source and destination Port here the missing parameter compared to the TCP listener is the sequence number and this because we know that UDP has no concept of connections or sequence number however NLB is applying a concept of UDP flows meaning that one when traffic arrives

for a new UDP flow the network balancer is going to select a Target forward traffic to it then if that Target is going to reply within 30 seconds uh then NLB is going to track The association between the five typle identifying this UDP flow and the selected Target again in the distributed database meaning that also for UDP traffic DLB is going to select once the target behind and then consistently root traffic belonging to the same UDP flow to the same Target behind this is an important concept because we see sometimes customers configuring their client to

use uh the same s support for multiple UDP flows And when this happen all this traffic will arrive to the network L balancer containing the same five tle meaning that NLB is going to select just one target to handle to this all this traffic so it is important that um clients are configured to um diversify The Source Port used for multiple UDP flows and this will enable the NLB to distribute Heavenly the traffic across whole the healthy Targets in the Target Group now let's change product let's talk about Gateway load balancer with Gateway load balancer

uh when it's used with TCP traffic the same flash algorithm is used to to select a Target and the same six tle is used to calculate the flash here however uh the association between the TCP connection and the target it is extended also to the r of the traffic the one generated from a server to the client in fact Gateway Lo Balancer it is a resource that sits in be in between a client as a server both direction of the traffic are entering the uh Gateway balancer listener so it is important that the gate balancer

is going to uh make a consistent decision regarding the target selected and this because behind a gatew balancer usually it is configured the firewall appliance which is going to enforce stateful uh rules on the traffic so Gateway balancer for this reason provides natively this Type of stickiness between the TCP connection and the uh Target selected for each direction of the traffic when G balancer it's used with non TCP traffic just aaple is used to calculate the Flowage source source and destination IP and protocol now let's move finally to application Lo balancer starting with clarifying when

the target selection is performed by the ALB ALB is capable to perform the target selection for each Single HTTP request that it receive and this is true also when a sing single client is reusing the same TCP connection to send multiple HTTP requests those HTTP requests will be individually balanced across all the available and healthy targets then when ALB is going to open a connection with a Target behind we'll try to keep that connection uh open this allow the lb to have a sort of pull of connections that are always ready to be Used and

will avoid the elb to reopen a TCP connection each time that he wants to distribute an HTTP request to a Target behind now if we take this example and we use a network Ro balancer as in the uh left of this slide for what we shared so far the NLB will root hold those HTTP requests to the same Target behind and this because I said many times elb is capable to root uh HTTP request NLB Roots connections What I shared regarding the ability to select a target for each HTTP request is true as long as

session stickiness is not enabled on the ELP the stickiness is the functionality of all balancer to repeatedly Route traffic belonging to the same client session to the same Target behind this means that when a new request arrives to an elb with sck enabled if this request is for a new Session then the lb is going to perform the target selection if the request it is uh for an existing session meaning it will include an awsb cookie the elb will look inside this cookie we see the Target that initially selected uh was selected and then instead

of applying the target selection following the rooting algorith is going to use the target included in the cookie meaning that in case the there are few clients sending the majority of the request to The load balancer using existing client session the resulting distribution will look something like this here it's an extreme case where I had just two client session were sending the majority of the request however um it's true to say that stickiness is produ in inherently unbalance distribution by the lp so customers should really use it carefully and use it only when they really

need it if it's possible remove the this dependency with the session Stickiness now let's finally talk about the rooting algorithm available on the lb there are three and this rooting algorithm it is an attribute associated with the target group if customers don't configure a specific rooting algorithm then the four choice is round dring with round robin elb nodes are maintaining a list of healthy targets that can receive traffic and then When selecting one target is going to select Sequentially a Target inside this list elb is the less sophisticated of the rooting algorithm options available for

IB however it is able to produce very well balanced distributions in terms of number of request per targets and it is uh indicated in use cases where requests are Sim similar in terms of complexity they take approximately the same amount of time to be served or when targets have similar capability in the Target Group however elb when using round drin is not going to perform any type of special logic when performing the target selection for example uh regarding how much busy is a specific Target in the Target group this concept is implemented in the list

outstanding re which is the second option for the rooting algorithm with uh list out sending request theb nodes are going to maintain a table including for each Target the number of inflight Request that is handling the target for inflight request I mean a request that has been forwarded to a Target behind but for which the Target still didn't produce an answer so theb with this rooting algorithm is going to select the target which is handling in that specific moment the least amount of infl request and doing so it is able to prioritize those targets that

are less busy in a specific moment in terms of traffic distribution This means that the list out sending request it's very it's a very good choice when customer have instances of different easy to sides in the Target group in that case the list out sending request will be able to prioritize those targets with more more capability more capability means also more uh um that targets are able to process request faster faster so it is expected that in each specific moment that Target with more capability is also handling the List amount of request outstanding request so

list outstanding request will prioritize those uh targets as it's visible in this graph where it is showed the number of request received for each of my four Targets in the Target group in terms of benefit of this uh rooting algorithm there is of course for sure the UN increased avability of a service specifically in the case where as I said before targets have are configured of Different E2 sides here I'm comparing a loot test using round dring and list out sending request with uh round dring it was easier for me to overwhelm those targets that

were that have the least amount of capacity so those targets were producing 5x6 error 5x6 5x6 error that I couldn't see doing the same Lo test with list out sending request and the reason is because my lb was preferring to root request to targets that were more Capable to process those requests last option for the rooting algorithm is weighted random with weighted random earb notes are going to select a Target randomly from the target group then customers using weighted random have the possibility to enable the anomal mitigation feature which is a feature that relies on

the anomaly detection system the anomaly detection system is a capability always turn on on elb regardless of the specific rooting Algorithm used theb nodes are going to constantly monitor the uh Target responses returned by the targets and then it they are going to monitor also the some cloudwatch metrics showing TCP and HTTP errors the purpose of this monitoring is to identifying targets that are starting to return errors when this is true the lb is going to Mark those targets as anomalous this will be visible from the AWS con all and also from the anomaly Host

count cloudwatch metric looking at these two sorts of information customer can decide to turn on the anomal mitigation system the anomal mitigation system will allow the lb to take actions upon those findings in fact when a Target will be marked as anomalous the ALB will quickly uh steer traffic away from the anomalous Target preferring in the Target selection the other targets that are cons considered uh healthy however still Will forward part of the traffic to the anomalous targets and this is done in order to probe the health of those Targets in case in fact the

anomalous Target will start to return successful hws then the elb will be will gradually incrementally increase the traffic to the anomalous targets until it gets the full share now the anomaly mitigation uh capability shouldn't be seen as a replacement of the L check system the L Check system still applies and exist his main functionality is to Mark a Target as an and then stop sending traffic to it however it takes time this process in the moment from when the targets start to return return errors to the moment in which the targets marked as an empty

then the amoral mitigation uh feature can help the lb to react faster to this type of events this type of concept it is uh clearer showing the uh again a low test I performed comparing round Dropping and with at random here I put one Target on purpose out of so I stopped a service on a specific Target and while with round robbing request were uh rooted normally to the Target that was still considered healthy and uh not considered unhealthy by the lch system so that Target was returning 5x6 error those errors were reduced by the

use of weight at random because theb was able to quickly steer here the traffic to the other healthy Targets in the Target group then as soon as uh I essentially restarted the service in the Target you can see how the lb is gradually increasing the traffic again to the anomalous Target and this is done for an increased avability of the service in fact the number of 5x6 errors were reduced John went in greed detail uh about weighted random during last year session so feel free to follow this QR code here to watch last year's session

From John this is my last slide so during this session we shared several best practices I encourage you to um follow the QR code in the slides so you will access the awsb best practices page on giab where you will find some of the best practice discussed as today together with an extended version of best practices categorized following the pillars of the well architected framework and with this I'll pass the Ball again to John which is going to talk about an exciting feature that we recently released thanks enrio uh who has ever had their EB

pre-warmed some okay good good good uh so we just launched this feature uh what a week ago it's brand new uh it's we call it lcu capacity reservation Reser sorry L load balancer capacity unit reservation and before we get into the details of what it does let's talk about What we all should know about EB scaling so EBS will organically SC scale no matter what to the detected level of traffic so if you have a big workload and you're transferring it or you're you're growing usually you don't need any of this uh it's because we

over-provision capacity when we're provisioning it it's because we have uh redundant zones where we make sure that we have enough capacity to fail one of those at any time without needing to Scale when we're triggering scaling we do so very aggressively so minimum thresholds try to scale quickly and then we scale down very cautiously so we'll gate scaling down to make sure that if you have a typical workload we are not going to be impacted by scale up scale down all the time and then we scale on virtually every dimension of traffic so if your

load balancer receives it we probably scale on it by its bandwidth CPU uh connection count uh you name it And again our thresholds are very low so CPU 35% is kind of our average the reason for that is we want to drive overscaling I did go into this a little bit deeper last year uh just to kind of talk about the things we do for uh scaling this applies both application and network load balancer uh and with the capacity reservation you can use it for both application and network load balancer today and it'll come for

Gateway load Balancer in the future this doesn't change your EB's ability to scale up it only limits the scaling down and you can think of it as you're raising a floor of capacity so if you don't have anything set there's no floor and we'll cautiously scale down to almost zero and if you start to send traffic then we'll try to aggressively scale up uh this doesn't limit further scaling key point that I'll make many many times it usually takes a few minutes and the API Is instant so when you make the API call we start

configuring the capacity immediately and then there's a describe you can loop on and say you know are we done with the provisioning the capacity uh it's available for application and network I did mention that uh so when when do we expect folks to use this like what are the use cases that you have that people can or that you can do to make sure that your events or different things go smoothly so Traffic migration is a big one we have customers who move from one load balancer to another or from a different product to a

load balancer and if you have the information in AWS and you have metrics you can go look at this is the simplest especially EB to EB because of some of the metrics we've launched it it's also useful for Planned events so if you have a big sale coming or something where you know that you're going to have a surge of traffic to your Site at a specific time it's great to say you know we'll go set this an hour or two before the event let it scale up make sure everything is good run the event

uh and then Mondays because Mondays right we all love Mondays um the Monday case here is based on the fact that we'll scale in in 12-h hour increments so if you send traffic to a load balancer and it scales up we won't scale in and this is for ALB since NLB we a little more Dynamic but on a we Won't scale in in less than 12-h hour increments so if you have a surge of traffic and then you take all the traffic away it'll be at least 12 hours for each tier of scaling that you've

had and that could be as many as you know dozens and dozens it could take days to weeks but there are people who have two traffic or or two periods or more over the weekend where their traffic is low enough that they'll actually scale down and then if you have something like a Simple example would be a market that opens and it opens at the same time every day and all your clients show up at your site to to use your products and it aligns with the market then you may have scaled down enough from

the Friday Market close and you may benefit from using this feature uh intermittent spiky traffic this one's a little tricky uh we'll talk about it a little bit but um if you have a case like this it's probably good to reach out to AWS Support and get some feedback on your use case to see how it works because this feature is essentially allocating whatever capacity you tell us all the time and if you have intermittent spiky traffic you may want to do other strategies like sharding which I talked about last year a little bit deeper

uh on the ALB so we organically scale up and then out that means as you all probably know we're provisioning ec2 instances single tenant instances so When you launch an EB we create instances for it those instances get IPS those IPS go in DNS Route 53 health checks them and returns them if they're healthy which is why we say Route 53 or DNS will have healthy and adequately scaled nodes um when we're doing this scaling we're also graceful about any previous nodes so if we pull a node out of DNS we don't shut it down

until it has zero connections for 5 minutes and will last days and days if clients don't Update their DNS they'll still be talking potentially to nodes that are out of DNS until they finally uh completely drain NLB is slightly different for how it's scaling so organic scaling on NLB is zonal meaning one Zone's capacity or utilization doesn't drive scaling in the other zones so if you have an nolb in three zones and one of them is getting a lot more traffic than the others it could organically scale to a much larger Size than the other

others now when you set the lcu reservation on your NLB we actually divide it into all of the zones and so this goes back to don't put a Zone on your NLB that you're not actually using and with this feature we are just looking for what your provisioning and and providing as much capacity as we can but we're going to spread it to all the zones when you're planning for your event so the metric we launched on ALB Is called Peak lcus anyone see it pop up in the console and say what is that been

a case to support or okay few people did that so Pak lcus is the metric that your aob is saying this is how many lcu I think we need for this workload and the I in this case is actually the EB scaling system so our scaling systems are looking at all of those dimensions of traffic and trying to map it to essentially an lcu capacity the one minute sum relates to the 1eh hour bill So if you use 1,000 lcus in an hour your one minute sum will probably look somewhere around a th uh if

you don't know how much to use or you're moving into AWS a load test is the preferred way to see so do a load test to the capacity you think you need or some fraction of it and then multiply to the full value based on the uh take the peak lcu and multiply to the full value you can also get this from the 1 hour metrics so one minute metrics we only Keep for a couple weeks 1 hour metrics we keep for over a year and a half the 1 hour metrics are a combination of

the one minute metrics and what that means is you can take the maximum of the 1hour metric and rely on that being the p00 or the maximum of a single nodes view for that minute or for that hour and you can take that and multiply it by the sample count I know this is tricky math we have it on our demo site to kind of go through all of this um if you're Confused but it's it it is confusing um but so you multiply by the sample count so you've got the maximum and the sample

count sample count is how many nodes there are and the maximum is the most expensive of one of those and with that you can go back a year and a half and see we had our same sale it's our annual sale this year we can go back and look the console is also updated to have the same math so if you look at the console you'll see these values from the 1our Metric even if you're looking at the current time period so let's look at some um what happens when you make the reservation uh so

you basically will determine how much capacity you want you tell the API that you want it we provision it and then confirm everything is good you're ready for your event once you have the event or once the traffic hits what you think is the peak of the event it is safe to remove the provisioning because when you remove the Provisioning we actually look at the actual utilization for the past 42 hours pick the highest point and scale to that and then we go back to that organic scaling where we're going to cautiously scale down in

no less than 12-h hour periods on a and itbe it's much faster but it's also much faster to scale up because we have the capacity sitting there we can just move it between the the hyperplane E and I so let's look at some examples of traffic and apologies For the graph it's not the best one I've ever made uh the green is representing the uh the green is representing the capacity the E is using and this purple is showing our sinusal pattern which we commonly see with like daily cyclical workloads so you can think of

these as the peak is a day and then a night and then a day and a night and if you're growing this could be on a faster rate also because we're doubling every 5 Minutes but as you grow organically we're provisioning a lot of extra capacity as we trigger so if we trigger on 35% we're going to have a bunch of capacity sitting there that is ready in case this the capacity or the the traffic keeps growing so this is just organic scaling uh there's no problems here it's all uh well scaled uh we see

this all the time this is kind of the normal case for provisioning the other normal case we See is the sinusal pattern where because of the 12-hour delay in scaling we usually won't scale down between day and the next day so if your traffic ends late in the day starts early the next day you're probably going to be at the same scale you were the day before if you scale down say one level you won't run into the Monday problem because we're only going to scale down one level and each level can support the the

one above it so the the capacity we Provision is is enough to include a rapid scale up and then we'll come back and add capacity because we want to maintain redundancy so the scaling action itself can be thought of more as a backfilling of capacity to to plan for future growth or failure so let's look at our first Bad Case um under scaling so in this case the traffic is migrating from somewhere to an ALB or an NLB and we can see that that Spike went over we detected the scaling when the spike Crosses 35

45% of the current capacity but it takes us a few minutes to provision because of that the provisioning might be too late to support the actual Spike that your traffic is generating and for that we have this new feature lcu capacity reservation where you could call it an hour or minutes before make sure that your provision and then when the traffic arrives you can remove it as soon as you've hit the peak and then you'll Quickly transition you have a very fine line where you've got a period of traffic where you know it's coming and

you've reserved and then you remove the reservation so it's similar for a event except usually with an event the the traffic scales back down do the same thing here you provision in advance you start your event you can remove it afterwards because we won't scale down too quickly so this is our Monday example uh Um you can see that we had a couple scale Downs over the weekend and then when Monday came we didn't have enough capacity and so we scaled but there were minutes there where we weren't caught up yet so provisioning an hour

before the start of the market uh is a great way to be ahead of this and if you looked at this load balancer the the numbers you would see would be here for your lcu metrics so you would see Peak lcu would be one of These and you could plug that in at this time before you go before your start and you would actually get the same capacity because when you tell us to provision this we still pad it with all of this extra capacity just so we're ready in case you need to scale further

so let's recap uh this new feature lets you raise the minimum capacity of your load balancer for your application or network load balancer it starts provisioning immediately it usually takes a few Minutes and it doesn't block further scaling so check it out today uh it's available for albs and nlbs uh we will be available in the hall for questions I know we went to time but thank you all for joining us

AWS re:Invent 2024 - Optimizing ELB traffic distribution for high availability (NET401)