Great thanks very much thanks everyone for joining us here this afternoon my name is Colin I'm a principal engineer in our infrastructure networking team I'm based in Dublin and Ireland and I kind of specialize in looking after our data center networks so we're gonna take a look today at some of the work that goes on in supporting our global network infrastructure so as a trainer all aware we have lots of networking and content Delivery services that are provided as part of a SS I'm not going to talk to you today about really any of these

what I'm gonna talk about is the global network infrastructure that underpins all of these services and how how they work together a couple of points during the talk we will kind of pop up from this sub star so Blair and see where some of these infrastructure components are visible to you as customers but we're really our goal for today is to Talk about how our approach has evolved over the last decade or so and how we think about designing building and operating the network so that's kind of our agenda for today we're going to talk

about some of the key themes and kind of tenants we think about when we op when we build the network we're gonna do a deep dive into the data center network cuz that's my like kind of specialty area we're gonna talk about some of the things we do to operate the network at The scale we do then we're going to step back and have a look at the availability zones and the regional network topology we're gonna take a quick look at the network encryption which was a feature we announced earlier in the year and then

we'll finish off by looking at how the global network backbone ties all of our region's together so as I said we're gonna kick off with the key themes and tenants we applied to the network and so these are gonna be common across the Talk so I wanted to just get them out of the way to start this we can have them in our mind as we go forward to the afternoon so security is our key most important tenant so we're gonna talk about some of the security features we've implemented the network availability is fundamental to

what we do it's table stakes both for us done for yourselves and we've grown to have strong convictions about how we build and operate the network in order to make Sure that we maintain the availability that you all depend on in particular we want to make sure that failures are constrained and isolated and don't spread any further than they have to for example availability zones provide isolation within the region regions are isolated from each other and we look at some of the isolation properties we have inside the data center then scalability is key we want

to be able to keep growing and not be constrained at any Point and performance is vital and the particular part of performance I focus on is ensuring that the network is consistent that the performance is the same all the time even when failures happen that the failures don't impact the performance and we maintain consistency and then lastly we obviously want the network to be globally available we wanted to reach our customers wherever they are so Before we jump into the data center part I want to Talk briefly just about some host Network features because they

they're important and enable some of the work we do in the data center network so you've probably all seen this slide at various points over the last year or so it's a our nitro architecture so in from a network point of view what's really important for me is that nitro offloads in Hardware lots of network features importantly it gives us consistent Network performance but it also allows Us to offload many traditionally complex Network features such as ACLs security groups VPC peering a lot of the hyperplane functions that underpin NLB not gateway private link are implemented

in the nitro controller and for me as the network designer for the infrastructure is the key property of this is that it removes the requirement for kind of special middleboxes that you might commonly find in traditional networks designs and this all makes the Network simpler and easier to build just while we're talking about nitro I wanted to talk about one key security feature we launched earlier in the year which is V PC encryption so V BC the encryption is harder accelerated encryption of all of your network traffic from a from an instance it's implemented in

the hardware built by annapurna labs that network card in the middle you see there is our third-generation nitro card in that's in c5 ends and similar n series Hardware it encrypts your traffic all the time there's nothing to do you don't have to turn it on it's always on and it does not impact performance so we just encrypt all of the traffic within your V PC or between V pcs that are appeared with each other so that's just one of the little network security features we've shipped this year but let's start off and dig into

the data center network this is where I spend my time so there's really two Categories of traffic we have to worry about in the data center network there's house-to-house traffic this is going side to side or east-west you'll see it referred to that way and traffic that has to leave the data center it's going either to a different data center different availability zone to another region or right to the Internet and we want to be able to scale each of these categories of traffic independently we need them to be elastic We need them to grow

and not be constrained and to give you just some context in terms of how much capacity we require in terms of host a host traffic we in my team we're commonly working we neither hundreds or thousands of terabytes a second of capacity in the data center and in terms of traffic leaving the data center it's smaller but it's still quite large scale we're commonly working in units of tens of terabytes a second of capacity egressing A data center so let's look at kind of how we want to go about building a network to deliver this

and so we want to build a network to build a network that's scalable we want it to be built out of building blocks we want something we can scale easily we want units of eight that we can just keep adding in incrementally as the business grows and as we need new features in order to add those building blocks we need them to be easy to install we need them to be Self-contained we don't want them to bring additional complexity to the rest of the network we want it to be stable and we want them to

be reasonably significant chunks of capacity we want them to be able to allow us to scale in increments as we go and in terms of how we're gonna go about building these building blocks there's kind of about two or three areas we have to make decisions there's the physical kind of red or hardware choice there's How we want to connect them together and then finally there's a control plane and so we're gonna dig in bring you to the first two today and we might have to save the third one for a follow-up session but let's

talk about the harder options so Amazon on a device like most enterprises started off using large-scale chassis systems this is kind of a nice block diagram of traditional chassis based router and the first thing you're going to see Looking at this is there's a lot of stuff going on it's pretty complicated there's just lots and lots of moving parts you've got on the left-hand side you have the line cards which have the physical ports you can attach to you they usually have one or more forwarding Asics involved and then to interconnect all those line cards

we have our switch fabric cards on the right and we need more than one of them in order to provide redundancy and then to Coordinate all these parts inside the box we need some kind of brain a control plane and so we have a control plane CPU at the top and it's responsible for keeping all of these things aligned and synchronized and operating correctly however if it fails you're going to lose a huge amount of capacity and so you probably want a second control plane CPU in order to have redundancy but now you have to

bring in a whole bunch of additional complexity and protocols to Manage keeping them in sync detecting failure and taking over which has its own set of reliability problems the other challenge with what these kind of architecture is that it's difficult to troubleshoot there's lots of places inside the device for failures to occur and it can be challenging to that isolate and deal with them and localize them and so if you find yourself having to take the device out of service because you think something's wrong with It and you want to work on it it's a

large amount of capacity so the blast radius of device failure is really significant and at a really practical level if you do discover there's something physically wrong with it it's really big and heavy and it takes lots of people to actually move and do a hardware as well and so we didn't really love this approach and so we we the team sat down and we deep dug and we thought about what we wanted to achieve and we Kind of came at it from a different angle which was we wanted to focus on really simple network

devices single chip boxes and so this is basically the block diagram of pretty much every network device we've deployed in the last 10 years hi eight or nine News you have a single forwarding ASIC connected to the front panel ports on the switch you have a single back control plane CPU attached to it and it's a fixed form factor it's usually only like one rack Unit high and so it's pretty easy for someone to handle if they need to replace it and unlike the large scale chassis because you know how many ports are going to

be the switch when you build it you can get the CPU with the correct amount of capacity to support all the control plane functions you want sometimes when you have large chassis platforms you end up getting the CPU sizing incorrect and as your usage grows you can find that You can run out of CPU and the network can become unstable so we like being able to have these fixed devices and then we can integrate integrate them into the network and so yeah we want to talk about some of these trade-offs like simple versus complex ability

to to manage them while large houses have some benefits and that you have less devices once you cross a certain point that that benefit goes away and you still need to build a software to manage them and inch Out and well in this is the belief that you can use like different types of line cards you end up being constrained by what the chassis designer designed to the beginning and subsequent line cards might not fish or might not interoperate and you certainly can't have multiple vendors they don't interact so by having standard interconnects we're able

to take advantage of the fixed form factor design okay so what are the building blocks we have in the network and so the First one is our host racks that actually contains the physical servers that are running our our services so every rack has its own power source it has its own networking infrastructure and how is that visible to you this is one of the first places where we actually expose the network infrastructure to customers and we do that in two ways firstly we have partition placement groups and so partition placement groups are an ec2

Feature that allows you to get groups of these two instances basically call them partitions and ensure that all of the ec2 instances in a given partition group do not share any physical racks with those instances and any of the other partition groups and so this can be used for example if maybe you had something like a video processing pipeline will you want to ensure that the redundant pipelines never had any failure modes in common or maybe you know Cassandra Clusters or things like that where you want to ensure capacity is separated for redundancy and there's

a second second related placement group called spread placement groups which is aimed at a much smaller number of instances usually in limited to seven where you can guarantee that no instance in the priest group shares a physical rack with any other instance in the placement group so this is mainly aimed at things like database clusters or maybe some legacy Applications that need to be kept separate or things like locking leader election those kind of systems to depend on quorums and you want to make sure there are no correlated failure modes so that's kind of our

our first building block but we want to look at what's our what's gonna be our generic building block and so we start and we go let's have a building block but that's that's but what are we going to put inside it and so we're gonna take our fixed form Factor platforms and we're gonna grab a bunch of them and stick them in there now just a quick disclaimer in order to make the network diagrams easier to draw and for everyone to follow along I've simplified a lot of these and we just have less devices we're

gonna keep the topology the same but in our real network we'd have much more many more devices inside this building block which is for for today we're gonna keep it simple with four here in each layer on Screen and so we we arrange the devices in two tiers the bottom tier we're going to use for providing ports for connecting to our host trucks and the tier on top is going to be used for connecting to add out of the data center and up north band well we need to connect them together and so how do

we do that and so we end up whether it's apology it looks a little like this again I've eliminated a lot of the duplicate lines just to make it easy to Follow but the rule we follow here is that for a device in a given layer has at least one link to every device in the layer it's adjacent to so this case we looked at device in the bottom left it has a link to each of the riders in the tier above it and the ratio of ports facing down to ports facing up is always

one to one so we're never constrained there's no oversubscription inside this building block and we don't find ourselves capacity constraint and you Might be thinking this pattern looks really similar to that chassis design I showed you earlier and you're correct it is it has a lot in common but because we move to discrete physical units we mitigate a lot of the challenges that came with the large Chelsey's the connectors are all standardized if we need to change one of the devices it's easy to do that they each come with their own control plane CPU and

they use standard protocols for Synchronous their stake between each other so if it's easy to take one in and at a service and it's an easy to ensure that they're all making the right forwarding decisions so how do we attach host rocks to this building block they just get connected to two or more of those those frontier routers the number of up links that the host rock uses is determined by the bandwidth requirements of that host track so for example a rack of c-5 ends Which are all hundred gigabit the second instances need substantially more

uplink capacity than iraq that's may be composed of the previous generation like c4 s so we are able to basic variable amounts of uplink from the racks based on their traffic command and we kind of give is a very crude example of that here where you can see some of the racks as for uplinks others only have two but they're all able to talk to each other they're all have unconstrained bandwidth Within this cell because they're able there's a full amount of interconnect in between the layers and so traffic can flow inside this access cell

really easily the challenge though is this is obviously a fixed size and what happens when we run out of ports on us how do we scale and so this is where we start to approach the network in kind of a cellular matter like Verner was talking about today we just started adding more access cells and we keep doing that as The data center needs to grow and as more host tracks need to be installed so now we figure that we have a methodology building blocks we can install in the data center that allow us to

be elastic and just scale as the data center grows and as host capacity increases but how are these going to talk together and this is where we're going to introduce kind of our third building block and that's what we're gonna call a spine cell and it's gonna sit here on top and The spine cells provide the capacity that interconnects each of these access cells in the data center and every access cell is connected to every spine cell and a sufficient capacity here to deliver full bisection bandwidth there's no constraint in bandwidth inside this and this

is what we call a placement group Network and this is visible to you as customers as a poster placement group in which you can ensure that all of the ec2 instances in a closed replacement Group will be placed on the same network in the data center in order that they have low latency and unconstrained capacity between them so in order to get out of the data center network we're gonna need a way to egress this placement group Network and so we just repeat our pattern and we dedicate some cells for providing connectivity out of the

data center and they're able to be scaled based on demand we can just keep Attaching additional core cells as we need them in order to keep up with utilization and ensure we have sufficient capacity this is a very simple building block approach we can just keep it attaching cells one of the time but let's talk about how this has evolved over time because one of the benefits of this approach is that as we add additional cells they can take advantage of newer and better Hardware and newer and better capabilities so That's a quick look at

how we've evolved over time so on the left is our original first-generation cell XSL it's built with devices using 10 Gigabit Ethernet and on the right is our latest 2019 third-generation fabric which is built with hardware that's using 400 Gigabit Ethernet I just to give you a sense of the scale that these capacities support in the data center in 2013 we were supporting policeman group networks with over nearly five hundred terabytes a Second of capacity on 12 micro seconds of latency across the fabric and today our latest generation is supporting 10,000 terabytes pretend head of

it's a second of capacity in the data center with only seven microseconds latency almost half the latency across the network and so that turns out into being about a 20x increase in performance over the last six years and we're really several years ago my colleague James Hamilton who's here talked about how Excited we were by the move to 25 gigabit ethernet because it took the same set of components that were used to deliver a 10k beneath it and delivered 25 babies it's of two and a half times the bandwidth for the same component count and

that made delivering 50 give it a second ports cheaper than 40 gave it a second ports today we're effectively making the next transition and not journey we're moving from twenty five gigabit interfaces to 100 gigabit Interfaces using the same component and so a 4x increase in capacity for only a very minor increase in hardware cost slightly more expensive to make the interfaces do 100 gigabit but it's certainly a lot less than the bandwidth increase and so you wouldn't noticed looking at those pictures that the first generation switches were blocked and the latest generation switches were

green and we've previously shown you our 25 GB but they're second generation switch was Blue and the colors actually have meaning for us physically the switches all look the same they have the same connector layout on the front and so they're very hard to tell apart so in order to make our life easier for our field team when we're doing part of replacements the colors are actually used to signify the generations so that when our engineer is having to replace a device in the field they replace it for one of the same color in order

to Simplify operations so it's one of those things we've done to the control of the hardware to simplify how we operate and build the network and we keep the color coding to make it life easy for everyone so we've talked through this topology that's built out of a large number of network devices but how do we actually operate them and how do we you know operate that many devices and keep it going so there's three drilled pillars and we think about here in terms of how We operate the network at scale and so the first

is a device life cycle automation is key here with so many devices in the network we can't have things that are handcrafted or have any kind of difference we don't want the devices to have any personality we need them to be conformed to what we want them to be and so that starts off with software that generates the configuration for all of the devices that understands their role which part Of the network they're going to be and what they have to connect to and what features need to exist on that device and then when devices

are delivered into the data center when we roll those cells off a truck they're connected to the network they're powered up and they're automatically provisioned we had a human having to interact with them and they get the correct OS releases and the correct software packages and then they're validated to ensure they're Healthy and then they're put into service all with a human how a network engineer happy to be involved and one of the things we do as part of our design for every device role in the network we have a function we call traffic shift

that we're able to exercise to take traffic on or off that device in order to take it out of service and this is a key part of how we keep the network how we deploy to the network and ylabel is to have a Continuous deployment pipeline that's we have a system that just takes the device well we'll call the traffic shift function and take it out of service will then deploy any needed configuration changes and he needed software updates will verify the device is healthy and then place it back in service and then we'll just

go to the next device in the network and repeat and this is kind of like painting a bridge you just we start at one end we work our way across the Network and when we're done we loop back ran to the start and so the network is always up-to-date fixes will go ahead in a timely manner and we ensure that everything has the configuration that's supposed to have so now we've got the device is operating and they have all of their they're doing what we want them to do how do we tell that they stay

healthy and this is really where the second prong of how we operate comes in which is device monitoring devices so we Collect all the usual metrics you'd expect in terms of interfaces and system load we connect emit a very hot we collect them at a very high frequency we extract signal from anywhere we can and whether it's syslog messages any events from the software Damons on the boxes sdk errors in terms of controlling the hardware we also go deep into the hardware and we'll pull out any hardware values we can in terms of register table

States will pull information out of any Of the optical interfaces and the optical modules in the switches will extract all that in a little store and our telemetry team collect more than six trillion observations a day from the network which we store in glad watch that we're one of the toy box teams biggest customers but we take advantage of trade watch in order to store all that data and generate alarm events but sometimes those metrics aren't enough and we need to get generate new data or New insights into it and so we do a couple

of other things that we think are interesting and how we operate one we look for statistical deviations changes in the behavior of the device that are anomalous I examples are we will look at all the traffic that goes into a device and compare it to all the traffic that's leaving those two should have more cosy and should kamesh and if that changes were able to suspect something wrong if any of the metrics May be developed a big step change there's something anomalous we look at it will trigger an event or if we see an error

message or from the hardware that we've never seen before will that's will be statistically anomalous and we'll eventually that will trigger an event we don't depend on a human to have pre-configured every possible alarm scenario so we dig into those as well and there that form is part of the event stream That comes out of our monitoring system but we can only trust the devices so much to tell us what's going on and so we have another string to our bow here which is active data plane monitoring and we probe the network continuously to ensure

that traffic is healthy and every host in the Amazon network is running our active monitoring agent today the agents are actually built into the nitro controller how it's baked into the hardware and those agents send test Traffic between each other continuously a reasonably high data rate and they cover the whole network we ensure is that every device and every link in the network is being monitored by multiple probes are able to take the signal that comes back from all those probes to identify effectively in real time if a device becomes unhealthy through triangulation now triangulating

problems in the network is an area we've got to invest very heavily in because it's much Harder than it seems at first glance firstly the network topology is very large so you've got this huge amount of data you're trying to process secondly if something is broken in the network it's possible that your telemetry has got is one of the things that's being impacted by the network failure and so both the success and failure data you're getting may be partial so you've got to try and deal with partial data and the last bit that makes it

challenging is we Want to do this in almost real-time we want to process this partial data stream and identify the faulty component in a matter of seconds so that we can take action on and triggering the alarm so that's their kind of our main parts of how we think about gathering that rich laboratory we do however store all of this telemetry data in a data lake in s3 and use all of the standard AWS data analytics tools so Athena flew quick size to process this data to try and Find patterns or any and or things

we think are worthy of further investigation we allow it to track down and compare the reliability of components or if particular software releases are improving our performance so we take advantage of all those same data onyx tools to process through that huge management of data we've collected for analytics but then the third the third prong of how we operate the network is how we handle those events When we detect a problem in the network and that's where we depend on automated remediation and repair we don't want a human to have to investigate and deal with

the problem we want that to be dealt with by software and so when an event comes in for a device that we think is unhealthy our Auto remediation system will take action and use that traffic shift feature to take the device out of service immediately and mitigate the customer impact and it will actually Check after it's removed the device from service that the impact has been removed and if it sees the metrics go green it knows it can move the device into our repair workflow if it hasn't recovered the issue it then those two escalate

to an engineer to dig in and figure out why and as this system has improved we're now at the point where almost all of the alarms and events that flow of our monitoring system are dealt and managed with by software and so humans are Almost never involved in remediating network faults but once we've got the fault mitigated and we need to under repair we also depend on the software systems to do that and many faults can be fixed just through simple remote actions either restarting software or rebooting the router but sometimes they need physical repair

and in that case the work is dispatched to our field team who will carry out whatever needed replacement action is required and then The replacement device will be automatically provisioned and then verified to ensure that the fault is fixed so if it was a bad link will ensure that the link no longer has errors if you know earth was the switch was not programming its writing-table correctly we will have validated that and then once it's approved healthy it'll be returned to service automatically so this is one of the things that computers are really good at

Is ensuring that things are tested the same way every time and ensuring they're correct and the responses are fast and this is this process is continuous and happening all the time 24/7 ensuring that the network is healthy okay it's not certainly talked about how we build the day that's how we think about building the data center network and how we operate them and so now we can kind of take a look at how we have we put these data centers Together and build regions and availability zones so just as a reminder availability zones provide fault

isolation from other availability zones the reason they're always physically separated from each other they never share any facilities in common availability zones are directly connected to each other in order to ensure that applications with latency requirements can run and do things like synchronous replication while still Providing physical separation the large amount of capacity so that there's never constrained but the availability zones themselves have to be scalable and the way we scale availability zones is by attaching multiple date by having multiple data centers and availability zones so the availability zone is always at least one data

center but in a lot of cases it's many and in our largest regions such as Virginia you can have availability zones with more than ten Data centers composing them and each of those ten data centers has our network our data center network that provides thousands of terabytes of capacity so it's a lot of capacity and a lot of hosts available to support the people so let's have a quick look at a simplified view of the regional network so we say you see here we got three availability zones you see the data centers within the availability

zone are connected to each other Directly in order to provide capacity and low latency and then the availability zones are our interconnect with each other to keep the diagram simple I haven't shown the links to go from availability zone around to availability zone three but they do exist in order to ensure that this full geographic diversity in terms of how things are connected together so there's one piece of the topology that's missing here and that's that's our transit Centers and this is how we connect in a diverse region add to the internet and the global

backbone every availability zone is redundantly connected to the transit center on its own fiber and the transit centers are in locations which have dense inter connection opportunities usually where those internet exchanges or or network operators so that we can interconnect easily so if we update the dibot the diagram to show without Topology looks like you end up with this view where each Transit Center has connectivity into each teach availability zone and so the a fault in an availability zone doesn't impact services running in any deal their availability zones so how do we build that network

and so this is where we're gonna leverage the cellular architecture we talked about earlier and so we'll start here with our placement group Network at the bottom and those core Cells we have at the top of that diagram right now here at the bottom and we're going to attach those core cells into another set of spine devices and these spine devices will also support cells connecting to other functions in this case these cells here are going to support connectivity to other data centers in the same availability zone here we're leveraging the fact that these cells

are failure of separate failure domains to each other they have Their own control plane they're isolated from each other and so a fault with any one of these doesn't impact any of the other cells in the core network we can add other cells for the use cases for example we'll add cells dedicated to providing interets iike pasady and then finally we'll add some cells for connecting northbound out of the data center up to the transit centers so simply leveraging the same design pattern many devices in cells that are Provide strong isolation and lots of capacity

each of these cells has many riders in it and one of the things that's really important about this design that we've come to love is we don't have anything operating in an active standby role every device is carrying traffic all the time it's being monitored by our active monitoring system we don't ever want to find ourselves in a situation where we fail from an active device to a standby Device that it itself has developed a latent fault so a pattern we we depend on quite heavily is constant constant work everything's in service and allows us

to have lots of capacity to handle bursts and be able to tolerate device failure so that's the network inside the data centers how do we actually connect them together and that is going to depend on a lot of fiber within availability zones we have dark fiber spawns both cables the the photos Here on the top show some of the cables we built several years ago they have more than three and a half thousand fiber cores in them since then we've developed newer versions at this point are approaching 7000 strands of fiber in a single cable

which allows us to provide lots of capacity we pay really careful tential to how those cables are installed we map out the physical routes to ensure they have the lowest possible latency but Also that they have a physical separation from each other so they don't have so for any failure modes in common and for connectivity it has to go slightly further for example interets between availability zones are up to transit centers that may be a little further away than we want to use the both cables we take advantage of dense wavelength-division multiplexing to run multiple

signals on a given fiber pair and we can get up to 20 terabytes a Second out of a single fiber cable using the current generation of DWDM hardware one of the other one of the other benefits of DWM is that were able to take advantage of optical level failover to reduce the impact of physical faults before I show you an example of what that looks like as one pointed out - the cable on the bottom right that's blue that's a special cable we've had to build for use in Australia the blue coding discourages termites from

wanting To eat the cable it turns out termites are one of the many risk factors that are present in the network but let's look at what happens when we have a physical fault on the cable so this photograph is from one of our u.s. regions earlier in the year a construction crew hit the cable with a book it and while digging and as you can see the cable is pretty badly damaged it's not gonna be many packets flowing through that but our optical optical Level failover system saw this impact beginning and was able to transition

the signals over to a backup path and that active monitoring system I talked about earlier that's generating thousands of packets the second of monitoring traffic saw only 13 packets dropped when that cable was hit effectively that impact was invisible to customers which is really important and so while the links were all running on the backup path we dispatched a construction crew Ad they fixed the hole we put in a replacement cable they spliced it back together and that takes a bed a day at this point the cables have an awful lot of connect cable there's

a lot of cores in the cable so it takes a while to splice it back together but then they're able to bring it back online again without impacting customers so that's that's some of the challenges we see that we take advantage of to try and operate the network effectively but this Is a good way this is a good point to segue into how do we secure a customer traffic when it's outside of our data centers like this and this is where either this year we we announced that we've been deploying physical network encryption on any

link that passes outside of our physical control so if it leaves our data centers in any way both within our region but also across the backbone is protected by physical network encryption so all traffic that Would pass between available availability zones or between regions except from China the rest of the world is carried on our backbone most of the links are protected either with Macs ik or if it's DW mr d WM hardware has implemented optical leveling optical encryption at the link level the most of the links are all using AES 256 there is a

very small number of links that are still using AES 128 on some older hardware that's in the process of being Retired the max implementation is one we've customized ourselves because it's running on our own hardware platforms in order to ensure that has forward secrecy so even if someone had recorded the traffic another later date got the encryption keys back they could it's impossible for them to recover the data and in the cases of maybe set a small number of links that crossed short distances for example maybe across a corridor in a data center where there's

A shared corridor or maybe where there's two data centers were there across the road from each other we actually take advantage of laser monitoring of the cables we have devices that attach to the cable and they generate light pulses down the cable and they're able to detect minut ferry minut vibrations in the cable if someone was attempting to touch the cable they'll trigger an alarm on our security team are able to spawned an investigation this is a very Exciting piece of technology and sable allows us to do control the security on the cables so that's

an example of how we protect your traffic and for example if you were using VPC encryption we talked about earlier that traffic is then encrypted the second time so if traffic passing between availability zones would have two layers of encryption on it so it's doubly protected so that's a good segue to talk about then is our global backbone the Global backbone is used for a number of alw services for example Direct Connect or global accelerator or just traffic between AWS regions as you've probably seen this diagram a bit this week this is how our global

infrastructure looks it has all 22 regions 210 pop cloud front low K shion's plus 97 Direct Connect locations each of these links on the diagram are composed of one or more 100 gigabit a second Ethernet links many of them are actually in the tens are Made multiple tens of links Lootera bits of capacity and they're interconnected together so why do we have a backbone network and it goes back to some of those tenants we talked about earlier firstly it's security we want to control the traffic and have a lot of infrastructure traverses we don't want

to hand it off to the third party and not have control of it we want to ensure availability we want to both have control over scaling and Redundancy we want to ensure that there's always sufficient Headroom for physical faults we want to control the hardware from the river so we can operate it to the best we want to control the performance we want to ensure that we know where the traffic is going to go in failure what the backup paths are going to be now we want to be able to connect closer to our customers

in order to maximize their coaster experience we want to avoid internet Hotspots are any suboptimal connectivity and when we think about building that global backbone we have to focus on a couple of things firstly we want to latency really matters the physical distances are large the speed of light is unfortunately a constraint with you not yet being able to work rent so we're doing our best so we spend a lot of time thinking about how how circuits gonna be righted what's going on how they're going to be managed and additionally we Want to make sure

that when physical faders occur that the backhoe pots add the smallest possible amount of additional latency now as I said 100 Gig is the new normal in the data in the backbone everything is at least a hundred gig in order to provide a burst Headroom for any spikes in traffic now we're going to use many of the same design patterns we have from the data centers to operate the backbone network but before we talk about that let's talk A little bit about what it means to actually care of it how we build the fiber paths

so for the backbone fiber paths we go to extreme lengths to audit them and understand how they're how they're built and where they're physically laid as I said these routes are really long potentially thousands of kilometres and that means there's just many opportunities for there to be risks exposed to it so we want to understand how is that where is that cable going is It going through a tunnel a rail tunnel where there may be risk of a derailment or something else that could damage the cable is it crossing a bridge that might be at

risk from being damaged during flooding or washed away are there areas that are prone to construction in the cases of subsea fibres is any of it in areas that are prone to heavy fishing or trawling the may I make damages and then we also understand if and when it has a gets damaged How is it going to be repaired it can be challenged you know the unlike Metro where it's easy to dispatch your crew to go repair something the same day if there's a sub break on a submarine cable it may take several weeks for

a repair ship to get there particularly if the weather's bad in the winter it can it can be challenging to get repairs so your recovery time might be long and so we need to plan we need to understand those risk so we can plan for how long It's going to take to repair and we also understand the fiber path diversity we care we want to make sure we understand anywhere where two pots may have commonalities maybe there's wooden cable that's running east-west across the country and another cable that's running north says you want to know

exactly where they cross how much of it's common what are the risks can we get any kind of separation maybe ones on the buck on the Road below an underpass and the other ones on they're gonna cross on the bridge across the top and so they're vertically separated but we want to go chase down all those details and make sure we understand then we also want to make sure that they stay that way that they don't change when we're not looking so one of the things we do is we measure the latency of all of

these backbone circuits continuously and if they change or maybe there's an event the link goes Down and it comes back a couple of hours later after it's been repaired but the latency has changed by tens of milliseconds we know someone's or maybe two or three milliseconds we know someone's just added several hundred kilometers of fiber into the circuit this length of the cable has changed which means possibly it's ended up going on a path we didn't know that so we will then use that to trigger an investigation and make sure we Understand where the connectivity

is going initially we won't understand a lot of the capacity of scale limitations what is the type of fiber on these long-haul routes how much capacity can we get down and maybe it's slightly older doesn't support as many circuits as newer cables and so we want to make sure we understand where those constraints are going to be so that we can go dig in and off and make it better so how do we actually build this network What's it look like and you're probably going hey I recognize this diagram it looks remarkably similar to what

we do in the data center we again have multiple cells that were deploying in this case we're reusing them to provide either connectivity to remote backbone locations or provide egress edging to our transit centers or edge pops and just like in the data center we want to take advantage of a large number of devices so as to minimize the capacity Loss when a device fails so we go wide across all of these devices to ensure that they don't the device fare there only impacts a very small percentage of the truck capacity so what are the

services that run in our edge pops we're gonna take a quick look through some of these so there's as I said this bunch of a device services running our edge pops you've got Direct Connect and Clyde front end right 53 along with others but we also Use edge pops for our global network access and our external Internet connectivity and so why do we want to use it for those like what are the what's how does the headers Amazon connect to the Internet and so we want to connect to the Internet in the most optimal way

possible and we're going to do that through two days we're gonna take advantage of the transit centers we talked about earlier that are part of the region's but also the edge pops They're spread across the global backbone and we use those to extend our internet access edge from a region as wide as we can both to ensure we get the best possible performance for customers we deliver the traffic as close to the destination network as we can avoid any congestion that might happen we also use this as a way to ensure we can scale the

network appropriately if we delivered all the traffic originated by a NATO base region two networks in that one Location we would overwhelm any of those networks and so by taking advantage of the backbone we take the traffic and we spread it at wider across a region and ensure that we don't generate any hotspots and that we can control how traffic flows through and this allows us to have a much greater aggregate amount of connectivity to for delivering traffic for customers and allows us to suffer managed for failure in the event that someone either us or

a network of Another network operator has a fault in the particular location we're able to just move that traffic to another location and still have sufficient capacity to deliver to them and so edge pops again built on a very similar pattern that we've seen before in this case we're dedicating bricks both were connecting back into our backbone network and also cells for connecting to the external Internet networks service providers that we're Pairing with and then we also have cells dedicated to the various eight of the best functions that are running as a TMS services are

running inside the Pops and here's again an example of where we're using our cellular architecture to provide isolation if CloudFront has a big event maybe there's you know it's Thursday Night Football or and some of another customer is generated a large amount of traffic because they've done maybe an Update to a game or some other thing then we don't want any issues there to impact customers running on the neck Direct Connect or read 53 performance so we take advantage of cell to sever their architecture to provide isolation and separation and so so quick talk about

why some of these services are taking advantage of our edge pops so flat front is our content distribution network and right 53 is our DNS service both of those services want to have the lowest Possible latency to the end users and so by being distributed out across our edge low our 200 plus edge locations they get the lowest possible latency to the users and direct access to the networks that those users are based on in almost many of these pop locate page pop locations were directly paired with the largest access providers in those markets and

so are able to deliver traffic directly to the end users platforms also able to take advantage of the backbone for doing Origin fetches back to either s3 or to resources running in ec2 and in the region so they're able to take advantage of the backbone for low latency and high performance origin fetches Direct Connect is used for customers to access their VPC resources privately also public services again low latency access get as close to our customers as possible customers can connect through their nearest Direct Connect location either directly or with one of our Partner networks

and then they're able to use the backbone network to access resources in any a diverse region region in the world and they can do that in these locations we always have multiple customer facing edge writers so customers can land their circuits across two devices for redundancy but they can also then connect across multiple pops for geographic diversity as well if they desire a SS shield is our DDoS attack Mitigation platform I provides traffic scrubbing and we want that to played out at the edge locations because we want to be able to classify and eliminate attack

traffic as soon as possible we don't have to carry that across the backbone where it might put other customer traffic areas to unbale to identify and stop the traffic at source and it's all done automatically as the traffic enters the network and then finally global accelerator based global accelerator so A global accelerator provides any caste like functionality for customers you get a pair of static SATA thing assigned IP addresses they're advertised at the diverse pops so no one location will advertise both IP addresses this ensures that your traffic is always ready to diverse pops where

it enters our backbone and then it can be routed across our backbone for the consists of consistent performance and customers have the ability to control how that Traffic is for that which regions the traffic is delivered to so you can do regional writing you can have users in Europe directed to resources in Europe and users in the North America go to us-based resources and with failover and traffic distribution control and it's all easy to set up and manage you don't have to do any tricks with like bgp announcements of the international control and can control

how the traffic flows away so that's kind of the set of Services running and so as we finish up I want to just say touch back on what we talked about before which is hopefully you've seen how we designed the network to ensure a strong isolation from feathers how we take advantage of our extensive network monitoring and other mediation to ensure that we detect and mitigate faults as fast as possible and that we ensure that through redundancy and over visioning that we can manage those faults without impacting customer Experience that ensure consistent performance view and

that we're able to scale the network at every layer or without constraint and we take advantage of our control over the hardware and software platforms to deliver features such as encryption so she's Africa failover to allow us to give the best possible experience so you want to know about any of the services I talked about we have a large amount of training material you can look up but That's basically all I had today for folks we have about 10 minutes left and so I'm happy to take questions if folks have stuff they want to know

more about I believe we have a microphone somewhere for folks have questions sure yes so the question is you're saying asking if they really say there were seven thousand strands and yes there we have both fiber cables now that in a 40-something bility or two inch cable has I think it's six thousand nine Hundred and something I can't remember the exact so it's just barely below seven thousand we effectively several years ago we went public with a cable that had three thousand five hundred forty-six and and that's just over time as we've worked with our

partners we figured out how to package more strands into the same physical space so we asked rope adjust to seven thousand strands it takes a reasonably large amount of time to splice one of those back together if Someone cuts us it's a it's a multi crew job gentleman down the back there so the question you're asking is how do I ensure that the encryption is working okay so the question is given that encryption changes may potentially change the size of the data that's leaving the device how do we how do we monitor we understand how

the encryption is modifying the packet sizes and so we're able to effectively account for that in the model if you know the Traffic leaving on interfaces that are encrypted are always two percent bigger we're able to account for that in the monitoring system and calibrated for it this one okay this one oh sorry I'm gonna throwing that one No so the question is do these blocks represent single devices and note they are representing that cellular kind of network building block we described earlier in the talk So there are composed of many many network devices in

that in that block commonly you know 16 32 types of devices so they provide a large amount of redundancy and single device failing is only four or five percent of the capacity know so these are some of them are being commercial boxers but predominantly this is all our own custom-built writers that we use in the data center and now across the backbone network so so they look to Fix form factor for guests yep so the question is within the placement group Network is it non-blocking and the answer is yes we have sufficient capacity to not

be blocking inside them and so that's how we deliver kind of high capacity like HPC type workloads on top of the placement group network so you should never see constraints inside the placement Reaper gentleman here I'll go get 250 yep so the question is do we own all of These undersea cables and the answer's no we are owners of several subsea cables we've announced in the past our investment in several trans-pacific and transatlantic cables so generally it's a mix of either cables we own or infrastructure releasing from cable operators and the same with the terrestrial

we're involved in kind of joint ventures they're what folks are releasing fibers for folks and then we're connecting or releasing capacity From Intuit from companies and then using it so the question is do we run custom protocols routing protocols for things like indoorsy we try to as much as possible stick to standard industry standard protocols things like those PF and BGP because they're well understood they're well verified we do take it down in fact we have control over the software platform to customize them where it makes sense well it can be so Easy can be

quite a good distance it can be like hundred plus kilometers in order to provide physical separation well it's yeah we've we have our DWDM hardware to provide that connectivity and then we attach our standard writers okay so yeah so the question is at our scale we must be using multiple vendors and so how do we manage those multiple vendors so we have multiple Hardware partners that were providing that we're Building working with to build the writers we use in our network they're custom to us but we run our own operating system in control plane stack

on top of those so that is consistent across all the devices so effectively even though there may be multiple generations of hardware or hardware from multiple manufacturers from multiple ODMs they're running our own operating system and protocol stack on top so that's effectively how they're Consistent though it's largely all internally because some of it predates a lot of that work so we we have our own kind of ecosystem for that management software gentleman on the far right there so the question is how does that post change this and it's it's certainly bringing some changes mainly

the devote the rocks are far away but almost all of the same technology and systems are being used in ed post rocks so for Example our active monitoring solutions I described earlier because it's integrated into the nitro cards it's present in the outpost drokken so we're able to use that to monitor the health of the network inside an ED post deployment for example yep yeah okay suppose so the question is how does the forward secrecy working encryption it's not actually TLS but it's using the Same underlying cryptographic primitives I'm not a cryptographic expert I would

I can happily bring you to our crypto team later and they can they've been deeply involved in ensuring that the encryption meets the bar that they session great okay I'll take one last question oh so the question is how long how do we roll that software changes and how long does it take us to get it across the world yeah so generally we like all Continuous deployment processes you know we'll do a small number we do one box will do a small number maybe do the only one availability zone and let it soak for a

period of time to ensure that it's it's correct and there's no there's no errors and then expand kind of geometrically across the world doing more regions more availability zones and so deployments can take from days to weeks depending on how aggressive we want to be but generally it's kind of Its kind of a couple of weeks to a month or so to just roll across the world as as it goes great um we're just about at a time thank you all for staying and talking I'll be here for a couple of minutes we can I

can take further questions and if not if you don't have time for that I'll be back at our booth global infrastructure booth in the expo in the phoenicians later today and either myself or one of my colleagues will happily take your questions and I'd Ask that you all if you could complete your session survey in the mobile app let us know what you'd like to didn't like and what you'd like to see more of your feedback helps drive the agenda and and what we present thank you very much and have enjoyed the rest of the

conference [Applause]

AWS re:Invent 2019: Innovation and operation of the AWS global network infrastructure (NET339)