Foreign thank you everyone for coming and listening I'm going to be talking as it says here about highperformance.net Json serialization with code generation so I am going to be talking first of all about the performance issues that exist around what you might call Old School Json serialization so anything using the newtonsoft Jason's json.net libraries um they're great but they're not as fast as They could be so we're then going to look at what has changed since then um how do we do things if we're targeting modern.net so what are the options in today's dotnet seven
what's going to be like in dot net 8 and we're going to look at the performance enhancements that are available there and we're going to see why they're able to go so much faster than the Newton soft versions of these libraries uh we're also going to look at How the.net SDK includes functionality that can take advantage of c-sharp code generators to improve certain aspects of Json serialization performance and we're going to look at where that does help and also where it doesn't because it's it's quite targeted the way in which it can improve things we're
also going to look at how you might go beyond what the SDK does and use some open source code generators to provide Very high performance Json deserialization in a way that has a lot of the benefits of regular serialization but with some of the performance normally associated with much more kind of difficult apis so it kind of brings The Best of Both Worlds and we're going to look at how that's possible and um and how it works so let's start with newtonsoft Json what what's up with it I mean basically mutantsoft Json is great it's
it's Everywhere it's one of the most popular.net libraries out there like forever it was at the top of the nuget package list because everything used it so unisoft.json or json.net as it's sometimes also known hugely popular does everything you might want of a Json Library so so what's the problem why not aren't we just still using that and in fact many projects still are using that why might you not though well performance Wise It's got three major Problems uh two of which are kind of fundamental the third of which could be fixed but probably isn't
the bigger the first two are basically inherent to how it works so the first single big problem with mutasoft.json is that it doesn't understand utf-8 it's been designed to work with diet strengths and dot-net strings do not use utf-8 encoding for historical reasons that made perfect sense 25 years ago.net uses utf-16 as it's encoding for Strings But that's not how anything out in the wild works yeah you you encounter a real Json document over the network or in a data store or store data lake or whatever it is almost certainly going to be in utf-8
form so the first major hurdle that Newton soft Json has to overcome is that the Json you're going to want to process is going to be in a different encoding than it understands so you've got to convert it somehow convert to X format so that's a problem Another problem with Newton soft Json is that it tends to allocate lots of objects and lots of strings and although the.net garbage collector is pretty good it does an amazing job the fact is that the more work you give the garbage collector to do the less time your program
can spend doing useful work and so it's fundamental to some aspects of Newton soft json's design that it can't really avoid quite a lot of the allocations that it does and these Essentially limit the performance it's able to offer and the third thing is that it's serialization engine relies on reflection and that has some issues it's not it's not the biggest of the three but it is an issue and all three of these things are addressed by uh the modern.net Json apis so let's look at a bit more detail about why some of those things
are actually a problem here is some Json so this is a very Straightforward Json object if you look at it closely you'll see the top level is an object it's got a property called name whose value is a nested object and one of that properties nested values is a an array so it's not massively complicated it's got a little bit of structure like most real Json objects will have and so this is typical of the sort of thing we might want to process I mean maybe we're going to process 10 million of these things because
they're Sat in a data Lake but the basic structure might be just like this so here is what it looks like in utf-8 form uh that might be a bit of a blur the actual numbers aren't hugely relevant but basically that shows you roughly how many bytes that they're going to be for that much Json but as I said newtonsoft.json works with net strengths and net strings don't use utf-8 they use utf-16 so one of the things you're going to have to do to Work with Newton soft Json is to get your Json data into.net
string form and this is how big it looks in utf-16 is twice the size and if you pair closely you'll see half of those which are zeros because that's what you mostly end up with with most Json now if you've got large volumes of non-english text then maybe it's not so inefficient because utf-8 does get bigger as you use um non-ascii characters but most Json Certainly has a high proportion of ASCII range characters so in practice your data can easily grow to be twice the size just in the process of converting it so there's already
two costs up front one the cost of conversion and two the cost of storing you know three times as much memory as you really need it if you've got your original data and a copy that's twice the size so you're already starting on the back foot before you even begun to look at the first Character now let's look at what happens if you try and deserialize this with our sort of performance goggles on so let's say we've decided we've got a net type that represents the structure of the Json data that we're looking at and
we just want to convert that Json data into an instance of that.net type pretty straightforward right that's how we tend to do it with Json so this is a person to object it's got a name and a birth date and that name if you look at the Json is actually a sub-object broken down into a given name family name and so on and so when we deserialize it we just expect those properties to be populated and actually that's quite a lot of objects right because we've got an object to allocate for the person name that's
got three more properties two of those are strings another one's an array with two more strings again and the birth dates yet Another string so we've got one two three four five six seven eight dot net objects just for this one bit of Json it's not horrible but it is a fair amount of data to allocate and if you're dealing with very high volumes of data you can start to give the garbage collector just too much work to do you can basically end up spending huge amounts of time in the garbage collector as it cleans
up after you and that's not ideal it's not the best way to be Spending our CPU Cycles now some of this overhead is inherent to this style of serialization this this really isn't specifically Newton's off json's fault you can deserialize in exactly the same way with system text Json and you will have all the same problems for the same reasons you know you're going to have to allocate this many objects if that's the structure you're looking for but there are also some json.net specific overheads largely around the the data Format that's coming in so let's
take a look at what this means in practice slight pause there because we have discovered every time I switch windows we get a slight break in the audio so I will stop talking every time I switch Windows sorry but we weren't unable to work out how to stop that in the tech check today anyway I have a benchmark so I'm using benchmark.net here to measure uh the performance of various things I'm Going to kick this off running right now so that by the time I finish describing what's going on you will hopefully be able to
see the results so that's going to run in the background so what do my benchmarks do well they all basically do the same thing I've got several benchmarks that are all trying slightly different ways of doing the same thing they're all essentially looking through an array of objects for a particular value so let's Look at the data that gets set up first I'm going to go into my base class so this thing basically the job of this Base Class is to set up this array here so before every single test runs and before the timing
starts uh this Constructor here is going to run and it's going to populate this array with the Json data that I'm going to parse as part of my test so what's going to go in here well basically 10 000 objects so here's the actual.net version of the Data that we're going to load into the array this person serializable class this is basically a.net representation of the exact same Json data structure I was just showing you on the slide so we've got a date of birth property we've got a nested property representing the name and inside
that we've got given name family name and an array of other names so it's basically exactly what you were just seeing on the slide but in.net type form so this array this Loop down Here just creates 10 000 of these things and is going to write all of them into that Json utf-8 thing uh down here so we end up with a great big Json array now this part's not being timed this is part of the test setup the this is just giving us some data to look at and the main thing to know is
that every person here has a different name so we're going to create 10 000 entries the first one is going to be called Arthur one because we stack an I on the end then after two R three r four and so on so they've all got slightly different names they've all got slightly different birth dates um and so they're all just about slightly different from each other and then our tests basically are going to pass that utf-8 data they're going to then somehow get that data into a form that we can Wrangle and the different
benchmarks do it in different ways and then we're Going to look for a particular piece of data in that array so we're going to look for the item whose given name is R for five thousand there are 10 000 entries in my race that that one's going to be right in the middle so we only actually need to look through half of the data to find that thing but this is simple but doing enough work to be representative we have to be able to parse the Json we have to be able to go and look
at nested Properties of objects in the Json and work out if they're what we're interested in so that's not totally unrealistic so let's look at what this first one is doing in detail takes our UTF data wraps it in a memory stream wraps that in a stream reader because we need to deal with the fact that this is in utf-8 format and json.net doesn't do utf-8 so the stream reader can be told the encoding and then it can read all the data back out into a String for us now some of you might be thinking
hold on that can't possibly be the most efficient way to pass data into newtonsoft Json I did try a load of different ways and they're all basically indistinguishable in this Benchmark so this was the simplest and was not noticeably slower than any of the others so that's why I've gone with this um there are some very tiny differences between exactly how you feed data into The deserializer but they're so small that they make no meaningful difference to this Benchmark so we get all the Json data as a.net string which remember is going to convert it
from utf-8 to utf-16 and then we feed that into newtonsoft's jsonconvert.d serialize object method and we tell it I would like you to deserialize all of this Json as one.net array of person serializer so my result I'm expecting that to be an array with 10 000 entries in it and then I'm just using link to objects this data DOT first that's the link to objects implementation first it says please find me the first element that matches these criteria and then we rather arbitrarily return the date of birth why am I doing that well I found
that with benchmark.net if you don't return something you can occasionally find that the thing you were hoping to time gets optimized out of existence which rather invalidates Your benchmark so by returning something it makes it harder for the optimizer to get over enthusiastic so anyway that's my first Benchmark and let's let's see how it did my benchmark's finished yet let's have a look and I think I froze again for a second there but hopefully I'm back uh let's see if we can scroll down to the end of that uh yeah they're all Run Okay so
wow that is a lot slower than it normally is uh probably because I'm running Um with the stream stuff simultaneously however the basic ratios all look the same just everything's gone a bit slower than normal normally I'd expect this first one to take about 34 milliseconds to that end it's done that in every driver and I've done so far today but now on live it's gone a bit slower but it doesn't really matter because the the ratios between the numbers are all consistent so this is our first one took 50 milliseconds to process 10 000
entries so that's about five microseconds per object um and if we look at the memory usage you can see it actually ended up allocating about 22 22 and a half megabytes of data in order to process those objects and it's also told us how many garbage collections happen so these are average by the way which is you might be wondering what 0.18 of a garbage collection is that's just the average across the number of times it Ran the Benchmark so this caused an average of about 2 800-ish generation zero garbage collects 1272 generation one of
this these numbers are high this is a lot of garbage collections for a single iteration of a benchmark that only took 50 milliseconds so that's not great on the other hand you know five microseconds to pass some json's not that horrible but this next Benchmark 11 milliseconds Is clearly a lot better so let's go look at that one so this next Benchmark still on Newton soft here um starts off similar we wrap the Json utf-8 data in a stream and then wrap that in a stream reader but now we're going to pass that into this
Json text reader this is a streaming API this doesn't try and decode the entire document in one go we've repeatedly call read and it passes us to Json one sort Of feature at a time so we say let's wait for the array to start and then once we're inside the array keep going until we hit the end and then individually deserialize each object that we come across and crucially stop as soon as we find the one we're interested in this ought to run a lot faster because this doesn't bother to look at the second half
of the objects so Arthur 5000 that's the that's halfway through It's never even going to try looking at the second 5000 objects because as soon as it finds that one it's going to stop so this should be a lot faster it should be twice as fast right because we're doing half as much work but actually it's a lot better than twice as fast 50 milliseconds versus 11 and a half milliseconds okay normally when I run this that's 34 ish and that's nine ish so it's not quite it's a bigger ratio than normal I think This
test runs a bit too slightly normally though this is about three and a half times slower than the second benchmark why is it three and a half times slower when it's only doing twice as much work that's peculiar well let's look at the memory characteristics so in terms of allocated memory it managed to allocate a little bit under half um the main reason for that is that with this first one it's trying to return me The entire array in one go and to do that it's actually building up a list object and it doesn't know
how many elements it needs to start with so it just guesses how many elements might fit and every time we run out of space in a list of T it just allocates a new buffer that's twice the size so it ends up sort of throwing away all the two small buffers it created on the way so actually it's a bit wasteful so it's going to allocate more than twice as Much memory as the more efficient version so that's why that's a higher number but look at the garbage collect Generations so we've got well under half
the number of gen one but the sorry gen zero but the Gen 1 and Gen 2 are way lower there's only 15 as opposed to you know 1200 gen ones and there are no gen 2s here now that's really significant that's the main reason that this second Benchmark actually runs So much faster I mean yes it's doing half as much work but it's going in this case almost five times faster but usually over three times faster despite only doing half the work and it's because it's not clobbering the garbage collector this first one is way
slower than it ought to be because it's making it's spending so much time in the GC so let's look at why that is so going back to the first Benchmark and This will all be relevant when we get to the system text Json stuff uh going back to the first Benchmark we're saying I want you to return me the entire array in one go so what's going to happen as this starts to do its work this deserialized object is going to start churning through the string we're given it gradually creating objects eventually it's going to
create 10 000 of these things and actually we worked out there's about eight sub-objects per Thing it's gonna create about 80 000 objects and the garbage collector is going to go you seem to be creating lots and lots and lots and lots of objects uh maybe um it's time for a garbage collect maybe we can get some of that back and so it halts the thread uh looks at what you're doing and discovers that it can't really free anything much because this the serialized object's gonna have to hold on to everything until it's ready to
Return it to you because we've said I want all the data in one great big lump and so the garbage connector goes oh well you've created thousands of these objects but they're all reachable because you're not done with them yet it's not going to be able to garbage collect anything until we get to about here that's the first point where we're not holding on to all the objects anymore um so this inherently can't free up Memory but the garbage collector doesn't know that until it tries so it's desperately going oh you've allocated so much I'm
going to try again oh that didn't work uh maybe a gen 1 will help and it hasn't helped either maybe a Gen 2 will help oh that hasn't helped either and it just doesn't realize that we're just building objects that are going to live for a relatively long time so that's not good and that's causing the GC to behave inefficiently whereas this Second one which deserializes the objects one at a time so there's no array brackets on this one we're saying just just the current object please just just that uh deserialize that and then we
take a look at it and go was that the one if it was we're done if it wasn't well we should move on and this object here being returned here is no longer reachable so if the garbage collector does run it will be able to recover that memory so This usage pattern is much better suited to the way the garbage collector expects things to run you will allocate objects and then almost immediately cause them to be unreachable and the GC is really good at that it copes very well with very short-lived objects which is why
this thing goes over three times faster even though it's only doing half the work we get a bigger Boost from the GC efficiencies than we got from Simply doing less work Okay so that's um some analysis of Newton soft Json so now let's take a look at how things look in.net today what would you use for date Json deserialization in a modern.net app that's targeting.net seven or dot net or whatever you would almost certainly use system tax Json so this is built into the.net runtime Library so it's right there you don't need to add any
extra dependencies to get hold of it you don't need to Worry about whether you're depending on a different version from someone else it's going to be serviced as part of the runtime so you don't need to worry about security issues being discovered with you get packages and all that kind of stuff uh simplifies your deployments but it has some advantages in its design biggest one being that it works directly natively with utf-8 so this does not need data to be Converted from utf-8 format into uts-16 before it can start to work with it so it
doesn't use the.net native string type internally as as it works but that's not the only thing it offers a wider range of trade-offs between performance and convenience so we saw with newtonsoft that we could either just deserialize everything in one go or we could use the streaming API to process things one item at a time and then we could fall back to serialization For each individual thing or if you want to in newtonsoft you can go streaming all the way but that actually makes code a lot more complicated now those are both options with system.text.json
as well but it also offers a couple of other flavors of using Json that I'll offer different trade-offs different places on that trade-off between complexity and performance so for example there's an API Known as Json element it's a pair of apis or any Json documents and Json elements and this offers a read-only view over some Json data it's basically an object model except technically they're structs but it's like an object model you can't modify Json with it you can only look at an existing Json document with it but it's superpower is that it's very low
allocation it does not allocate very much memory at all When it's working with Json data there is also another object model which is modifiable so the downside of the ultra high performance root only one is you can't actually generate a new document with it you can't build up a description of a new document you can't load a document into memory and then edit it and then write out the edited version that's not available with Json element but it is available with this other Modifiable model which is not as efficient it can't be as efficient as
the read-only one because being able to support modification comes at a cost so it's not free but it's still pretty efficient and it also as I say supports serialization and streaming just like newtonsoft does as well um so you've got those benefits and also uh it's possible to get system text Json to do compile time code generation to Drive reflect to drive its serialization engine instead of using reflection it will use Reflection by default but you can make it not use reflection and that improves performance in certain ways that I'll discuss when we get to
that so these are the differences now let's start to look into that so let's take a look at our benchmark so this is the newton-soft one if we scroll down a bit the next one here find whole array assistant text Json Deserialize this is basically exactly the same idea as our first Benchmark if I quickly scroll back up to the first Newton soft one what did we do well we loaded in the entire UTF data passed the whole thing deserialized it as a person serializable array and then used linked to objects to find the object
of interest and this third Benchmark here does precisely the same thing we are deserializing all of the Json into a person serializable array And then we're using link to objects to find the item we wanted the only difference is we're now using system.txt.json Json serializer to do our deserialization notice we can pass the utf-8 data in directly there is no need to do any sort of conversion before we do this otherwise literally the same code so the actual thing that's processed the results is identical to the previous version so let's see how that compares Uh
for performance purposes so we're now looking at this third benchmark uh again usually that's a lower number when I run this um uh but it's it's in the same ballpark so the main thing is it's it's a fair bit faster than this normally I'd expect this to be about 23 whereas this is normally about 34 every other time I've run it today Um so for some reason these two have gone a bit slower today don't know why they just have um probably because I'm sharing my screen but look at the memory usage the Jace the
system text J sorry the Newton soft json1 took about over 22 and a half megabytes whereas the assistant X Json 4.86 megabytes a lot less and a big chunk of that is down to the fact that It hasn't had to convert it into a utf-16 representation it's been able to just use the utf-8 data exactly as it is um and hasn't had to kind of convert to dot net strings before it even begins to parse the Json now it is still having to do some of that conversion because we've asked it to deserialize this into
objects some of those objects have string properties and so the values of all of those properties are going to have to be utf-8 strings Because they're.net strings however um most of the Json data isn't actually string values most of it's like structural or it's property names or stuff like that so the amount of data that it's had to do that for is much smaller and the memory footprint is correspondingly a lot lower and this is reflected in the garbage collection cost so whereas the newtonsoft one caused over 2 800 garbage collapse in gen zero alone
our system Tech Json version of The same thing was just 687 and a half on average the Gen one is similarly lower so 1272 versus 625 so that's about half and the Gen twos are also lower now it's not ideal you don't really want any gen twos if you can help it you don't want that many gen ones but it's certainly better than this and that accounts for a significant chunk of the performance Improvement now what you would have noticed is that This first attempt at system text.json is actually quite a lot slower than our
second attempt and Newton soft Json um so what this illustrates is that uh upgrading to a faster Library probably isn't going to save you from bad performance decisions because the fundamental problem with this first one also exists in this second one which is with we're asking to deserialize all 10 000 objects uh before we even look at them and that Causes two problems it means first of all these end up deserializing 5000 objects they never even look at because the object we're looking for turn is about halfway through the data so that's a waste of
effort but also the fact that we're asking these two things to both return us one massive array with everything in it all at once is responsible for this garbage collection pattern the fact we're seeing lots of Gen 2 collects so Simply upgrading to system text Json isn't going to save you from yourself if you are making certain kinds of mistakes before and you carry on making those same kind of mistakes today you'll have the same kind of problems I mean I'm calling the mistake but maybe it doesn't matter if you are not in performance critical
code then you know who cares it's fine but this is a talk about high performance Json processing so presumably you do care about the perf And and this is not a good look so it's definitely significantly faster than the direct equivalent in newtonsoft is definitely a lot lower memory consumption than either of the Json ones even the efficient Json one still took um nearly twice as much memory as our inefficient system text json1 and that's interesting actually the fact that we're allocating less memory could mean that the difference between these two in this Simple test
that would actually look a lot closer in the context of a real application just because the amount of allocation you do has a system-wide impact but let's look at this other one per element system text Json deserialized so that's obviously better although you'll notice it's still worse than the Ubisoft one I'll explain why let's look at the code So this is the the second of the system.text.json once and this one it's kind of similar to the second Newton soft one so the Newton soft one we basically blast through the array one object at a time
and deserialize them and then we stop as soon as we find them when we're interested in that is kind of what's happening in this new one so we're sort of looping through all the elements in the document deserializing them one at a time and stopping when we Find them when we're interested in so it's pretty similar it's not not quite the same though we're not using the streaming API here we are just using the um uh we're going one element at a time through this Json element API and this comes from Json document and you'll
notice whether we begin by saying Json document.parse all of the utf-8 so we are actually logically speaking Saying please read the whole document in now it doesn't deserialize the whole thing but when you load data into one of these things it does actually do a certain amount of processing it essentially builds up a little kind of index in memory to keep track of where everything is in the document so that all subsequent operations everything you do with the Json in that document will then go faster as a result but it does mean in situations like
this where we Don't even look at half the Json this is actually doing a bit more work than it really needs to that explains that's part of the reason why this one is not as fast as our best Newton soft Json effort but don't worry we will get much faster than that um so what is this doing we're saying I'd like to read the whole thing in as a Json document I then like to retrieve the root element and I believe that's an array so I'd like to iterate over all The elements in that Json
array this will throw an exception if I turn out to be wrong about that by the way if I'm mistaken and it's actually just a string or an object at the root that just won't work but it is in this case that's fine and then for each element in the array then we're going back to deserializing so we're deserializing one at a time and uh bailing as before so that's the code now let's go back and Look at the Benchmark results again so I believe the reason it's actually a teeny bit slower than the Newton
soft version is because of that initial document parse where it does look at the whole document so this is at a slight disadvantage however so using a lot less memory only using 4.6 megabytes as opposed to the 9.3 that is being used by the by Newton that our best Newton soft effort and the garbage collection counts are Much better not only is it the lowest gen zero count we've seen so far only 546 and a bit per test iteration so about half what the best Newton soft effort was able to achieve but also no gen
1 or Gen 2 Collections and that matters more so you might find that even though these two benchmarks look fairly closely matched the fact that this one is making the garbage collector work much less hard might mean that on average your application performance Goes better because the garbage collector is not having to step in and sort of freeze threads as often as it has to it's not having to move data around as often as it otherwise would do so the collateral damage that this one does by allocating twice as much memory might actually have a
larger impact on your overall app performance than is visible through these kind of micro Benchmark measurements you're obviously looking at these two Down here why am I talking about these slow ones up here when there are clearly some better ones available down here okay let's talk about this next one and I'll get to that in a second because that's not as simple as it looks so this next one uh that's a clear winner you know so that's saying for again these numbers are different from what I've seen every time from this test recently normally I've
seen that come out as about three And a half so it's about 10 times faster normally than this one here uh yeah not not far off four milliseconds and fifty so order a magnitude faster even with these slightly wonky test results I'm getting today uh and you know three-ish times faster than this version um but wow look at the memory 222 bytes just to remind you what this test is doing this test is looking through ten thousand Json objects in an array and it's allocated 222 bytes of memory to do that that's not per object
that's 222 bytes to process the entire Json document so less than one byte per object on average that's clever we'll talk about how it does that in a moment but let's look at the code first so it's this next one fine system text Json Json element that's not a typo so this Benchmark is assistant.x.json and we're looking at direct use of Json Element now it starts off the same way as the preceding test so we basically load the whole thing into a Json documents and we go looking at the root element like we did last
time and we're still using the enumerate array as before but now I'm actually using link to objects again I'm just using the first operator to find the first thing that matches my criteria but the big difference here is I'm not Deserializing individual objects I still want to do the same thing I want to look at the name dot given name property and see if it's equal to that but I'm not going to do that through deserialization this time I am going to use the Json element types methods it offers a try get property method which
says does this thing is this an object and does it have a property of type if it does please Return true and put that property to the name value pair the whole property name value pair in this in a new Json element here and then if that worked we're going to look at that new Json element that holds that name property value pair and say well is your value a um an object and does it have a given name and if so well we'll sorry I misspoke these are not pulling out the key value pairs
these are pulling out the values Sorry there's a separate way to get the property without the value so yes sorry this first one gets the value of the name property if it's there the second one then drills into that to get the nested given name so we're doing name dot given name but we're just doing it without using serialization we're just saying well I just want to go looking for these properties by name and if that succeeds then we say well okay I'm expecting this to be a string is it Equal to this value here
now this one's a little more subtle up here I'm using serialization so I'm saying please just take the Json and turn it into a.net object and then I'm just retrieving the given name property and using ordinary string comparison to say is this the one that I'm looking for this to work that given name property has to contain a string it has to have a bottom that string in it so I'm forcing This deserialized method to create a.net string object for every single given name property of every Json object that's in that thing there so it's
going to have to allocate 10 000 strings it all for this code to work but here I never actually asked for the property value by name and saying give me the Json elements that represents that property value but I'm not going to ask you to turn that into a string I'm saying can you just Tell me whether it is equal to this other string over here why does that matter it matters because this does not need to allocate anything on the net garbage collected Heap that's a string constant it's going to be interned that there's
going to be exactly one allocation of that string across the entire lifetime of the process so you know that's essentially free once you're up and running and this Thing doesn't convert the underlying Json data to a string instead character by character it looks at the Json data it just says here's the stuff in the Json the same characters as this string I've got here never allocates a.net string it just Compares an existing string I passed in with the raw utf-8 data now it is having to do some conversion there it is actually having to convert
from this.net String format into um utf-8 we could actually avoid that by the way you can stick this on the end of a string in recent versions of C sharp and that will then actually end up passing this as a byte array of utf-8 data which will go a little bit faster but actually for the purposes of this discussion that's not the most important thing the most important thing is that this code doesn't allocate anything except for the iterator that is required By link to objects to do this first thing each of these Json elements
is a struct so they don't need to live on the Heap and the way I've called it here they're all going to end up being hidden local variables they're all essentially going to live on the stack frame so there's no Heap allocations done in this code other than the one for link object which is why our allocation is just a couple hundred Bytes for the whole thing that's basically link is consuming that and there's not nothing but it's it's um it's very little compared to the others 220 bytes as opposed to several megabytes it's a
big step forward that's a large part of why this goes so fast so you can see it allocated so little that the garbage collector never felt the need to step in and do anything phenomenally efficient and we were parsing Ten thousand well five thousand records really because um we skipped the second half really we're not fully parsing all of them um so we're looking at 5 000 records in a millisecond so that's less than one microsecond per dot per Json object now I find that phenomenal I find it mind-boggling that it's possible to do meaningful
processing of a Json object in under a microsecond when I first started using computers in the 1980s you Couldn't execute a single machine language instruction in a microsecond yeah it took several clock Cycles on CPUs that had clock speeds measured in megahertz to do anything at all and so um the idea that you can interrogate the structure of a Json object and compare one of its values with something in less time than it takes to execute one instruction on a computer in the 1980s blows my mind I mean that's Moore's law for you Admittedly well
actually technically not Moore's Law it's the other laws of computers getting faster and faster which is not the same thing but I digress anyway it's very very fast it's certainly a lot faster than our first attempt it's an order of magnitude faster and the memory efficiency is pretty good however obviously all of you are going why isn't he talking about this one this one's obviously even better So maybe this is the one you want because that is uh way faster 50 bytes look at that and that's almost nothing at all 50 bytes to process 10
000 rows of Json this has got to be the one right let's take a look at this how much do you want this method here it is a bit more of it he still wants it how about now still a fan Nearly there and rest okay that was kind of complicated this is uh using the streaming API in uh assistant.text.json so we are using a utf-8 Json reader which is this thing here it's really really fast and this is what underpins everything else by the way anything you do with system tax Json to read Json
ultimately relies on this thing at some level they're all just Wrapping it to various extents so this is um is the basis of everything and a bit like the earlier example we just repeatedly called read saying please move to the next item and we check that it starts with an array but this has got a lot more complicated because we're not then falling back to serialization once we get to the individual objects we're doing the whole thing uh streaming wise and the big problem with that is you Have no idea what order the data will
show up in it's going to be valid Json but the the properties won't necessarily be in any particular order because Json doesn't require ordering basically you're not allowed to say oh you've got to put your properties in a certain order otherwise I won't talk to you that's not really permissible in the Json world so uh we have to be ready for things to happen in any order and it gets a bit weird and Complicated because we've got nested objects we have to remember were we in the middle of the nested name type or are we
still in the outer object or are we actually up in the array waiting for the first object to appear so we have to keep track of how deep we are the Json right now what have we already seen so far for this iteration did we already work out that this is not the one we're looking for but we haven't got far enough through the Json to be able to Skip to the next object yet because we're still embedded several layers deep it all gets really complicated you basically end up having to build a little miniature
State machine just to keep track of everything that's going on and the huge problem with this is it's not at all obvious what the code is doing yeah there's no obvious relationship between the structure of this code and the actual goal Yeah so somewhere buried in here is it here no it's not here somewhere I can't even find it now there it is but that's the actual line that matters that's happening there but to get to that line was sorry no it's not even that I've got it wrong twice now there it is sorry after
5000. that's what we're looking for so this is the actual logic and everything else is just complexity mandated by the fact that this is a Streaming API and we're in no control over the order in which we see things so for this reason people often decide that actually the performance boost you get from this thing just isn't worth it it might be that's a business decision you know it depends on how much your performance is worth it to you is it worth the extra development overhead it takes to create that in the first place and
the extra maintenance overhead uh Involved in having an almost completely unreadable bit of code that's going to be a nightmare to maintain are those costs justified by the performance improvements now if you are processing trillions of records of Json those performance improvements could well translate directly into lower cloud computing costs at which point the answer may be yes this pays for itself 100 times over of course we're going to do it this way if that's not what you're Doing it might be that the much easier development path of doing it this way means that this
is the preferred one because this Json element thing gets you most of the way there that's more complicated than the serialization approach up here but it's not much more complicated so so maybe maybe this is the way maybe oh foreshadowing of what's coming later in the talk maybe there's a better alternative however for the time being that's enough of this Demo I'm going to go back to my slides and talk in a bit more detail about how it is but the Json element is able to achieve this remarkable performance so just remember what we've seen
he was able to process 10 000 rows of Json apparently not having to allocate any data on a per object basis which is like 222 bytes up front and that was it no more allegations how can that work well there is a basic principle in Operation here which is that Json elements never makes copies of data we don't copy data we just point to some data that was already there so the basic observation that Json element is making is this you've already got all the data that constitutes your Json document right there it's that big
blob of utf-8 data it's all there we don't need to store it somewhere else because it's all there in the source document so we don't need to copy the Property names out into stock mastering we don't need to copy the property values out into other objects they're all right there so when you ask the Json document for the root elements what does it give you it gives you a Json element that says I represent this range of text in the Json all of it in this case because it's the root element I represent everything from
that opening brace to the closing brace or in The example of my benchmarks everything from the opening square bracket to the closing square bracket so this is I'm all this data here and it knows that it's a Json object if you ask this to give you the name property um it will return you another Json elements that represents just that section of the Json that is the property so in this case I've called get property this does return the name Value pair so this gives me the both pieces name code quote name quote that's the
property name and everything after the colon that's the property value and then if we ask this thing for its value we get another Json element which is just the bit after the colon this is an object if we ask this for the given name property it's going to give us just that slice and if we ask that for just its property it's going to give us just that piece there Every single one of these things is pointing to the same lump of data it's all pointing to the same utf-8 array it just points to different
subsets of it it's just saying I'm just this PC I'm character you know 47 to 95 or whatever it might be they're all just referring to the one and only copy of the underlying data and they're all struct types they're all value types which means it is possible to use them in a way that they never have to live on The Heap it's not true that struts always live on the stack that's a popular misconception but it's possible to use them in a way that doesn't cause any Heap allocations so that is what enables these
to go so fast they don't ever make copies of the data now talk about the other performance oriented feature of um the system text.jsfont offers that you don't get in Newton soft Json This is to do with a reflection so by default the reflection mechanisms in both Newton soft Json and systems xjson they both do the same thing they both use reflection to discover what properties your.net types have so when I said please deserialize this Json array into an array of person serializable objects he would have gone okay what is the person's serializable look like
oh it's an object with two properties name and date of birth what does name look Like oh that's another.net type that's an object called a person's name serializable it's got three more properties and so on it uses reflection to discover the structure of my.net object and that's what drives the behavior of the reflection engine it's very convenient we can just write a.net site whose structure reflects the structure we expect in our Json and it just works that's great however there are some problems with this Reflection has a bad reputation for Speed which is partially deserved
it's actually not as slow as a lot of people think I mean it's definitely slow it's definitely a lot slower to read and write through reflection than it is to just use normal property access because all property accesses can you know run in nanoseconds whereas serialization probably tens maybe hundreds of nanoseconds maybe even as much as a Microsecond they're not that slow yeah it's not like it's going to take you know the blink of an eye to do it nothing nothing as slow as that um but it's um it's definitely slower but actually that's not
the real problem of them with reflection it can be but often it's not often the real killer with reflection is how much it costs the first time you use it doesn't always matter Because clever reflection-based things can use reflection to work out what's going on and may actually do code time sorry runtime code generation to avoid the recurring overheads they're just using reflection to work out what to do and then use code gen to actually do it that can amortize the costs but the startup costs for that can be horrible because it's having to discover
everything about the structure of your data types at runtime and then generate Code and then get that compiled and this will take quite a long time if you're processing a trillion records you probably don't care about that because the first time cost is sort of dwarfed by the overall cost of the total job if however uh you're writing a desktop application or a phone application the first time costs really matter because that's when you make your first impression with users if the first time the user clicks on a Button it takes two seconds before it
responds that's the thing they're going to remember they're not going to remember the next 10 times on that button that it was nice and fast they're going to remember that the first time it was horribly slow particularly if that's true every single morning that they launch the application and it's not just desktop applications or phone applications this can also be a problem in the cloud Because a lot of people run stuff up in cloud Platforms in a way where you don't pay when you're not using things so things like AWS Lambda or Azure functions there
are billing models where they say if you're not handling requests we're not going to charge you any money which is great but obviously they're not going to have Hardware on standby for your benefit if you're not paying so the downside of this is if you have one of These models and your app is idle for four hours it's not going to be running and so when the first when the next request comes in they're going to have to quickly find a VM to host you on for a container or whatever fire it up start the.net
runtime load your application hit your application's entry point and get your app to the point where it's able to respond to a request invoke the request Handler and get a response all that has to happen and that can be quite Slow because the first time things happen are often much much slower than everything else and reflection is particularly bad for that reflection is horrible on first use characteristics so if that's the kind of deployment model you've got then not using reflection has its benefits so first use performance it's a cold start performance it's a big
deal there's also a memory overhead typically a few megabytes not huge but not nothing Um also if you're using ahead of time compilation if you're doing like native aot compilation uh you're likely going to be trimming your binaries to only include the code that you're actually using and anything that uses reflection makes that a lot harder to get right so avoiding reflection entirely has some benefits for those deployment models as well um and also there's a separate related feature which can generate high Performance output serialization which is sort of orthogonal to all of this but
anyway let's take a look at how this works so the way they're able to avoid using reflection is through the use of c-sharp source generators now this is a feature of the.net build chain that enables libraries to kind of inject code at compile time so libraries there could be new packages or they might be libraries built into the.net SDK usually better both they're things that get Involved in the build process and they're able to generate code at build time that gets compiled into your application and can then execute at runtime so if you're using Source
generators the build process looks like this the C sharp compiler is started it loads all the files it's been given all the files your project says to load and it tries to process them all looks for syntax errors then starts to do Semantic analysis to work out what it is you mean by string oh it's system.string and basically resolving all the symbols but once it's looked at everything it doesn't necessarily worry if it doesn't have a complete picture because it says okay I've done as much as I can but are there any generators registered in
the build are there any Source generators that would like to have a go and there's actually a couple built in to the.net SDK one of which is for Json serialization purposes and what these things do is they get access to all of the analysis the c-sharp compiler has done so far so everything the compiler has done to try and understand the source code is available to these things they can look at the code and go I might add some more I might add some extra source files into the mix and whatever comes out of your
Source generator the compiler will then Incorporate those into the build processes oh there's some more files great let's keep going and only once it's looked at all of these does it consider compilation to be complete so this gives Source generators the ability to fill in gaps that were there after stage one so let's look at how this works with Json if you want to use the GE system text Json features for avoiding reflection But you still want to use serialization here's what you do you write a type that Dera from Json serializer context and then
for each type that you wish to serialize or deserialize you apply an attribute to that class of type Json serializable and this can generate two kinds of code it can generate property descriptors which basically contain all of the information that it would otherwise have got from reflection and it can also optionally generate a high Performance serializer so let's take a look at this rather than describing it it's easier let's see in practice so I have up here a class called test serialization context I'm just going to quickly uncomment something in the code here because I
have that compile that by default let me go back here now so as I just said you need to derive from Json serializer context that's defined by the System Tech Json libraries and then for each type that you want to use in serialization and you don't want the reflection overheads you add one of these attributes so I've said I need to be able to deserialize and or serialize person cereals I also want that to work for arrays of person serializable and I've also had to include this person name serializable which remember is my nested object
so my name property is actually One of these person name serializable so I've had to include that in the list as well so what has this done if I come over to my Benchmark and look for a particular piece of code right here we are so here's a benchmark wasn't running earlier because it was hash F out and it is using Json serializer this one's actually serializing things it's writing Stuff out rather than reading them in so this isn't directly comparable with the earlier benchmarks which is one reason I wasn't showing them earlier um and
it's saying I would like to write data into this memory stream and the data I'd like to serialize is this not people this is the the same 10 000 elements that we were also reading in earlier so my Benchmark Base Class just sets up both the utf-8 data and Also the equivalent.net representation so this is going to go from the.net wrap out to utf-8 data but notice the person in context.person serializable Array so what's that well context is my test serialization context class and we're using the person serializable array property well I don't see a
person serializable array property in there it looks pretty empty to me however I've annotated it with partial that tells the compiler This might not be everything it says there might be other source files around that are going to contribute more stuff to this class and that is what enables The Source generator to add members to this class so for every Json serializable attribute I apply it's going to in a different source file add members to this so if I go back to the Benchmark that uses this and say well well show me this person's serializable
array property I'm going to F12 to go to The definition uh this thing here open so this tab you may or not be able to read that it says this file is also generated by the generator system.txt.json.sourcegeneration.json Source generator and cannot be edited so it's telling me this is generative code actually being generated as part of the build process and so this is extending my test realization context class it's adding in extra members including this person's serializable array thing That my Benchmark was using so I'm not going to go into huge amounts of detail on
this but if we dig into this a little bit we will find where is it um oh I'm probably in the wrong file here yeah hold on a second there's more than one generated file I don't want person serializable I actually want sorry I want person serializable not person serializable array because person Serializer was going to be more interesting let's go and look at that one right so this one is for not the array type but the actual type itself and this this here is what I was interested in so this is basically everything you
could have discovered from reflection saying this is a property it's public it's not virtual it's declared by this type here it doesn't have a type converter here are it's getter and Setter and so on This is all the stuff that the property info type in reflection would have told you but this is a dead simple object this has got very little in it it's going to take almost no time at all for this thing to be constructed compared to the milliseconds it will take for the reflection engine to get itself up and running after you
start the process so this gives all the information that it would otherwise have Got from reflection in a tiny fraction of the time and that is the key benefit that this offers it enables us to get all the information really really quickly now it's actually quite hard to Benchmark this uh because it mainly affects first time usage because once reflection is up and running it's actually relatively fast so I'm not actually going to show you a benchmark for this Um it's kind of better to demonstrate it by deploying stuff up into the cloud and timing
your cold start request times but that that takes about four hours to measure properly so I can't redo that right now but it does make a difference it does tend to make um multiple milliseconds of difference exactly how much depends on how complicated your object models are but it's definitely a significant Improvement Um for cold start time okay um so the other thing can I find it in here oh yes the other this one here the other feature I mentioned is the ability to um write data out a very fast generated serializer so the other
thing you get with this code generator is a method for each serializable type that uses utf-8 Json writer which is the Right mode um streaming API to enable you to write the data out and this just basically generates the simplest fastest possible code that writes data out so this is also going to be way way way faster than trying to use reflection um streaming is much easier when you're writing than when you're reading because you get to control the order here so that's the other thing this will do okay so I'm going to quickly undo
the Change I just made there because I don't want to break this from compiling and then I'm going to go back to the slides Okay so the basic heart of this is that all that code gen creates what makes available instances of a thing called Json type info of T where T is your chosen serializable type person name info person serializable whatever it might be and this basically it only has one public member it's almost empty from an API perspective the one public member Points to that high speed serializer uh it's a delegate that points
to the serializer everything else is internal but it gives the Json system.text.json serializer all of the information it needs so that it doesn't have to go talk to reflection and runtime Okay so that's pretty much it for what's built in it's assistant.x.json but I want to show you another step we can take we can do better because what I've shown you so far is either you um deserialize things in the most straightforward way possible but you often end up with a lot of overheads because you end up creating more objects than you might actually need
and you might operate in a way that's quite inefficient for the garbage collector or you can start to go a little bit out of your way to get slightly improved Performance or a long way out of your way to get very fast performance but what if there are a way that we could give a programming model that looked very similar to serialization but without the problems just to remind you what the problems are no matter which Library you're using there's a fundamental issue with trying to represent Json data as.net objects in Fact there's two fundamental
problems one of which is the.net type system isn't really a very good match for Json illegal people actually do with Json particularly if they're typescript programmers they'll often say okay this thing might be one of these or it might be one of those it can be two different things and we'll decide at runtime which it is that's not really a thing the.net type system is particularly good at dealing with it doesn't do some types Yet so that can be an issue but even if you constrain yourself to Json that's easy to represent in.net we've still
got this big problem of allocations if you've got you know a hierarchy of data like like this example I've been using throughout you've got a bunch of objects to allocate just to represent one record uh and strings are a problem in particular because the moment you force system text Json to give you a net string Um you're throwing away the advantage that you would have got from its ability to work with utf-8 data naturally because you say I want this as a.net string so it's it's going to have to turn it into utf-8 it's going
to gtf 16. it's going to have to make a copy so this is fundamentally unavoidable with this model of serialization maybe we're thinking about serialization all wrong though could we do something else Do we have to make our Json data into an ordinary looking.net types could we do something else could we maybe generate.net types that are harder to write but it doesn't matter because a code generator is going to write them for us which do a bit more work and work a bit more like the Json element API maybe rather than making copies of data
and turning it into.net stuff what if they Were just a facade around the underlying utf-8 data so what we could do is basically have a thing that looks like Json elements only it's sort of strongly typed it knows about the structure of the data that we actually expect what would that look like well to wet your appetite is an idea of how that might perform so these are the same benchmarks I've been running throughout uh the benchmarks gets slower and slower as I run more of them which Is why I'm not running this one live
uh you'll notice the numbers are now more like I expect 35 milliseconds for the just first Json one um but look down the bottom the last three are benchmarks that are going to use the um the new technique I'm talking about and notice these are all around the sort of three four about four milliseconds to process the records and if you look in The memory allocation side they all have this ultra low allocation characteristic and yet they have a programming model that's very similar to the to serialization that sounds intriguing let's actually see what that
would look like so I'm going to go into my code I'm going to comment out that thing and take a look so so here is how it could look so we start off with Json document as as we did with all the other Json Element-based examples so you've got to first of all load stuff into a Json documents and that then lets you do this very hyper efficient read-only Json element-based API over the top of that so we can get the root elements that comes back as a Json element so one of these very efficient
things but then I've wrapped it in this type here so I've got a static class called Jennifer sorry a namespace called Jen from Json schema dot person array from Json It's going to give me this thing here which is also a struct this kind of like a Json element but it knows more about what's going to be in there so this offers an enumerator array just like Json element does but if I Mouse over the Lambda argument here notice that the arguments of the Lambda is not just any old Json elements it's actually passing me
a person So this array enumerator and those that this is not any old array this is the person array and each of the elements in it is a person and these are still all struggs and these are still basically wrappers around Json elements so these are all very low allocations so I'm able to enumerate the entire array and use linked to objects over this to find the thing I want now here's the really interesting thing about this this uses The same basic structure as the very first Benchmark I showed you this not very clever one
up here that reads the entire object the entire sorry the entire Json in as a as one big array and then uses link to objects over it if I come back to this latest example it's basically doing the same thing saying load the Json get the whole thing and there's one big personal right and then use Link to objects over that to get the results but look at the Results in the benchmark and I promise you these are real if I had time to run them you'd see the same results that's that third up from
the bottom find whole array link scheme agenda serialize it's almost 10 times faster than the first one it doesn't have those garbage collection problems that the other ones did and we can do the kind of um per element stuff but actually There's much less benefit to doing that now it turns out that with these wrappers we know we can do the stupid thing and magically it goes faster how on Earth is that working well let's talk about that so the way this works is that these generated types I'm showing you and I'll show you the
website that offers the tool to build these in a moment these generated types look on the face of it Like normal dotted objects let me just quickly go back and show you the Benchmark again because I kind of went over that quite quickly these benchmarks are able to do uh thing.name.given name like like it was an ordinary deserialized object they're able to do this but we don't see those memory allocation those bad memory allocation characteristics this whole thing REM with very low total memory allocation if I had to back up 350 bytes most of which
was Link's fault so this is working because these things are just facades over the data and exactly the same with adjacent elements is they're not really deserializing anything they're just sort of things saying I'm these range of tech characters from here to here I'm this range of characters from here to here it's exactly the same sort of picture you saw earlier but we've just stuck an extra layer On top of the Json elements these things are just wrappers for Json elements that annotate them with properties that give you a way of accessing those underlying things
and I'll quickly I'm getting close to the end but I'll quickly show you one example of that so if I go looking at this main property it's on this this generated code so it's a little bit of a mess but this thing says okay well Actually you can run in one of two modes these are actually writable things you can either build a person by parsing some Json or you can actually build one out from scratch the first thing it does it says am I made out of Json or I'm actually made out of values
and it has two completely different code Parts if it says I'm made out of Json I'm actually just a rapper for adjacent element it first of all says well is this the right kind of storage is this Actually an object under here or have you got the wrong sort of thing entirely but if it is the right sort of thing then it calls that same method our earlier Benchmark did try get property it's saying okay try it's calling jsonelement.trygetproperty for the name it's expecting so that is actually basically the property name it's represented as a
span but it's basically the property name and if that works then that will get a Json element out and it Will wrap that in another generated type person name so these things are all basically just wrappers around um Json elements it's kind of complicated because it's generated code but fundamentally it's just this picture here it's a generated layer sitting on top of the thing you already saw earlier the Generation by the way is driven by Jason schema that's how we how we know what to generate so this only works for Schematized data but by the
way it supports validation so if you do a Json schema not only can we generate the wrappers for it it'll also generate the code to validate that your Json data is actually conforming to the schema so you can be confident that it looks exactly like it should do so there we are I just showed you that so in conclusion system text.json can do serialization Just like newtonsoft.json.net but it's it's just faster out the box if you do like for like code it will be faster mainly because it doesn't have to do any utf-8 to uts-16
conversion but also generally within the code base they've had a lot more attention to avoiding unnecessary allocations but if you're prepared to use the new Json element API I say new it's been here since.net5 the newish Json element API um directly then this gives you an API That gives you very very low allocation parsing at extremely high performance you can go faster still UTF Jason reader and utf-8 Jason writer give you ultra high performance very low level access to the structure of the Json but that's the way to get the fastest fastest performance that'll be
can potentially go twice as fast again but it's really hard work um there's built-in code generation for the um Uh avoiding the use of reflection during serialization and it's also possible to use a tactic of code Generation Um to boost performance further without throwing away the benefits of um serialization like behavior I'm going to quickly show you the website that that's on so corbus.net is an open source Library maintained by my employer's engine this is all free to use you'll see the license when I go to The thing so corvus Json schema is under the
Apache 2 license which is pretty permissive and the instructions here will tell you how to use that the tool that generates all those types that I showed you so if you would like to use this um it's available it's up there on GitHub please have fun but in any case everything else I applied is built into.net itself so uh that is how we do high performance Json processing in.net thank you for listening oh there we go oh that was fantastic thank you for that we're pressing all the wrong buttons backstage here yeah I was speaking
into a dead microphone for the last 45 minutes yeah Ian stop stop no no no we we've all been listening here and it's been fantastic there's uh that it's a quite astonishing how much of a an impact all of these techniques can have and it's nice to see Computers be really fast yeah you hear so much about computers being so slow again these days but it's nice to be able to get it really fast so nice one yeah love it yeah and and there's actually yeah um uh there's also uh no no go ahead go
ahead environmental impact to all of this right you know um the amount of CPU time you consume doing stuff has a direct impact on carbon you Know every CPU cycle say that that's a bit less impact on the environment so um and that's the thing at engine we spend quite a lot of effort on we've actually developed other high performance processing libraries in a similar vein so we produce a library called ais.net that parses the radio messages that are sent by ships to report GPS data and this was for a customer who was processing enormous
volumes of that and we were able to get a server farm from About 10 servers down to one by using more efficient methods and that yeah that's pure power saving there it makes a big difference these techniques yeah that's pretty cool um and it's always nice with um uh reflection where you've kind of you know this stuff up front because your assemblies aren't going to change for the lifetime of the application for the deployment but it's being generated every single time so it's it makes sense To kind of get that information uh up front um
Khalid I think you had a question to go with yeah it's kind of interesting I think Ian maybe alluded to the story behind it but when you were going through the utfa and utf-16 encoding examples I was like well duh yeah that makes total sense but then I thought to myself like what was the story behind that and like that Discovery for you like yes this makes a Lot of sense but when you're dealing with.net apis that stuff isn't so evident when you're using just c-sharp um you know method calls and stuff like that so
so you're asking how did I kind of make the connection that that these sort of differences exist okay yes so I mean the short answer is I'm really old uh so I'm turning 50 in a couple of weeks um so so as I mentioned I first started using computers back at school in the 1980s And um so back then you kind of had to know what all the bits and bites were doing um you only had you know a 64 kilobyte address space and you could know most of what most of the bytes in memory
were for in your head back then um and if you didn't you wouldn't be able to fit anything of value inside this machine that small and so I'm I kind of come from a background of thinking about trying to cram a pint Into a court pot so to speak um and then my first job was actually writing kernel mode device drivers so that's another thing where you really had to think about what is actually happening at the hardware level because there if you want to get high performance you need to think about things like how
many times is the data being copied um is it yeah can we get this incoming data to go exactly where it needs to be Because that's actually quite challenging you know you think of a server sitting there on the network and uh suddenly some you know electrical signals start wiggling up and down on the network Port um how does it know which of the you know 400 process is running on your machine is supposed to receive that data um and if it can't work it out fast enough it's going to have to put the data
somewhere else while it works out Where it's going and then copy it into the place it needed to be um whereas if you get it right you you can devise things in such a way that you don't make additional copies and that directly connects to what I've been talking about here because the single biggest thing about performance for this sort of processing is not making additional copies of your data that's the that's the number one thing you know if you learn nothing else don't make Copies and okay I'm going to plug in my book here
because it's also relevant to the talk so I write programming C sharp for O'Reilly the final chapter in that book is dedicated to these sorts of programming techniques we have a whole chapter on memory efficiency measures in C sharp we talk about span and um the member of the pipeline reader and all that sort of stuff and this this is all part of that world in essence and I show an example of high performance Json Parsing in that and um it's it's don't make copies that's the single most important message that whole chapter is don't
don't make unnecessary copies of your data because that above anything else is what will kill you um because from a developer the device driver point of view it's like well the bus that connects your memory to your CPU can only do so many cycles per second and if you make it do more things than necessary you're going to max out The capacity it's as simple as that but it's because I'm used to thinking you know at that level of Hardware because I've had to debug Problems by putting probes onto the motherboard and trying you know
measure things on the stethoscopes you know you have to think in those terms and so somebody on somebody on the chat has just described you as coming well me too because I'm of the same age it's coming from the Clive Sinclair generation oh yes absolutely Yes when I was younger no clue what that means oh for God thank you they were branded as in the US Timex was the it was the was the rebranding of Sinclair but yeah they were not it still does in the US okay Boomers that's not that's all I can say
um it was 45 servers down to one I I said about 30 it was more it's four they had 45 servers introduced it to one that was The uh that was very impressive so that that shows what a big impact this sort of thing can have which is uh that's even better than the the benchmark.net uh figures really yeah I mean that was applying a load of multiple techniques but yes it's all part of a package yep uh you know an obvious question is uh are these code samples available on GitHub um so yeah are
are they available for The audience to get access to all of the code gen stuff I showed at the end is up there and that was the link I put there and someone has reshared that link in uh Maps we share that link in the comments um so uh the demos are currently not available uh the reason for that was that I was doing this talk at a bunch of conferences and didn't want to put it available for people who hadn't yeah I would end up making it available to People who haven't paid who were
going to go to a conference later so yeah I probably could do that now though they're on a repo which is not currently public but um yeah I can I can set that public now I think yeah that'd be cool um we've been asked yeah and I guess people can follow me okay good I think we've got a delay colleague why don't you go go ahead well I was just going to ask uh folks Should just follow Ian Griffith on his social media channels uh go to engine uh maybe you'll end up writing a blog
post there about this talk so um yeah and engine is yeah the engine Block's the best place I am idg 10 in most places that's I idg 10 is me on GitHub it's me on Twitter I'm not it's I've been a lot less active lately on Twitter because reasons but uh yeah I'm select so sorry Matt yeah no no problem I was just gonna ask A different question um so we've been asked um I'm not quite sure what they're going to say but can we add our own uh Json property info arrays into the partial
class is that it's getting generated so if you wanted to override or uh you know sort of custom have custom handling for the um for what the code generator produces is that possible uh right so if I understand the question the idea is that sorry someone said Twitter question mark x I should have said but anyway back to your question um so um the the code the thing about that code gen is that is that the it's creating a type called Json type info of T which is almost entirely opaque so it's innards are not
accessible to you um and so it's a sort of in only API um and it's not really meant for public deconstruction so you certainly can't Get any of the information back out of it yourself so all of the generated code just pushes information in to what looks like a black hole and to the best of my knowledge you can't really change that the only thing you could do would be to disable it entirely copy what the code gen had done and then write your own version of it so you could do that because obviously the
code gen can't do anything you couldn't write by hand but there's No there's no mechanism in the tooling as far as I know for getting it to do most of the work but for you to change a bit of it I don't think they support that um yeah yeah and it all depends on what the original asker was intending to do with that additional property um I think we've got one last question before we uh wrap up then at least again with the code generators there you show that he's kind of caching some reflection data
to Prevent that sort of first um first warm-up cost and then also a serializer wasn't it does it do anything at all for deserializing God is that not appropriate oh no it works in both directions because serialization and deserialization both need the same information fundamentally they need to know what properties are there um and so it's yeah they both both use the same thing if you're using the Json Type info of T stuff however if we're talking about the streaming stuff so there was the code that basically does a very fast right out to the
utf-8 Jason Wright at utf-8 Jason Wright I think it's cool uh yeah there is no reader in that one that's asymmetrical so that one and the reason it's asymmetrical uh is well do you remember the code I showed you for the fastest one that was you know Eight Pages of code um it it's hard to write that stuff but The more subtle aspect of it is that um it's not at all clear what the output format should be so that particular thing knew exactly what it was looking for and is returning a single thing once
it's found it but that's a very specific goal and they could write a code gen that did that but what if you wanted something else what if you wanted the top three things what if you wanted to group stuff you'd need a different code gen every time and there isn't They haven't no one has yet proposed uh within Microsoft or on Microsoft repos how that might look in a way that looks plausible and sufficiently general for them to do it I think they would like to do it they said they would like to but they
said they don't have a design they consider good enough yeah that's why they haven't done fair enough that's good um okay well then um I think uh I think we're done um thank you very much for uh For your session it's very interesting lots of uh cool tips and useful uh things to know there I'm sure a lot of people put that into practice and um benefit a lot from it so uh thank you very much thank you [Music]