Hello my name is dan bonnet and i'd like to welcome you to my course on cryptography that i'll be teaching at stanford university this quarter this quarter i'm experimenting with recording the lectures and having the students watch the lectures online in fact anyone is welcome to watch the lectures and join the course this is an experiment so we'll see how It goes my goals for this course are basically to teach you how cryptographic primitives work but more importantly i'd like to teach you how to use cryptographic primitives correctly and reason about the security of your
constructions we will see various abstractions of cryptographic primitives and we'll do some security proofs my goal is that by the end of the course You'll be able to reason about the security of cryptographic instructions and be able to break ones that are not secure now i'd like to say a few words on how i would like you to take the class first of all i'm a big believer in taking notes as you listen to the lectures so i would really encourage you to summarize and take notes in your own words of the material that's being
presented Also i should mention that on the videos i'm able to go much faster than i would go in a normal classroom and so i would encourage you to periodically pause the video and think about the material that's being covered and not move forward until the material is clear in your mind also from time to time the video will pause and pop-up questions will come up these are intended to kind of help you along With the material and i would really encourage you to answer those questions by yourselves rather than skip them usually the questions
are about the material that has just been covered and so it shouldn't be too difficult to answer the questions so i would really encourage you to do them rather than skip them now by now i'm sure everybody taking the class knows that cryptography is used Everywhere computers are it's a very common tool that's used to protect data for example web traffic is protected using a protocol called https wireless traffic for example wi-fi traffic is protected using the wpa2 protocol that's part of 802.11i cell phone traffic is protected using an encryption mechanism in gsm bluetooth traffic
is protected using cryptography and so on we're going to See how these various systems work in fact we're going to cover ssl and in fact even 802.11.9 quite a bit of detail and you'll see how these systems work in practice cryptography is also used for protecting files that are stored on disk by encrypting them so if the disk is stolen the files are not compromised it's also used for content protection for example When you buy dvds and blu-ray discs the movies on these discs are encrypted in particular dvd uses a system called css the content
scrambling system css and blu-ray uses a system called aacs we'll talk about how css and aacs work it turns out that css is a fairly easy system to break and we'll talk about how we'll do some cryptanalysis and actually show how to break encryption that's used in Css cryptography is also used for user authentication and many many many other applications that we'll talk about in the next segment now i want to go back to secure communication and talk about the case where here we have a laptop trying to communicate with a web server this is
a good time to also introduce our friends alice and bob who are going To be with us throughout the quarter essentially alice is trying to communicate securely with bob here alice is on the laptop and bob is on the server the protocol that's used to do that is called https but in fact the actual protocol is called ssl sometimes it's called tls and the goals of these protocols is basically to make sure that this data travels across the Network an attacker first of all can't eavesdrop on this data and second of all an attacker can't
modify the data while it's in the network so no eavesdropping and no tampering now as i said the protocol that's used to secure web traffic called tls actually consists of two parts the first part is called this handshake protocol where alice and bob talk with one another and at the end of the handshake basically a shared secret key appears Between the two of them so both alice and bob know this secret key but an attacker looking at the conversation has no idea what the key k is now the way you establish this secret key the
way you do the handshake is using public key cryptography techniques which we're going to talk about in the second part of the course now once alice and bob have the shared key we can use this key to communicate Securely by properly encrypting data between them and in fact this is going to be the first part of the course which is essentially once the two sides have a shared secret key how do they use that secret key to encrypt and protect data that goes back and forth between them now as i said another application of cryptography
is to protect files on disk so here you have a file that happens to Be encrypted so that even if the disk is stolen an attacker can't actually read the contents in the file and if an attacker tries to modify the data on disk the data in the file while it's on disk when alice tries to decrypt this file that will be detected and she'll then basically ignore the contents of the file so we have both confidentiality and integrity for files Stored on disk now i want to make a minor philosophical point that in fact
storing encrypted files on disk is very much the same as protecting communication between alice and bob in particular when you store files on disk it's basically alice who stores a file today wants to read the file tomorrow so rather than communicating between two parties Alice and bob in the case of a stored disk encryption it's alice today who's communicating with alice tomorrow but really the two scenarios secure communications and secure files are kind of philosophically the same now the building block for securing traffic is what's called symmetric encryption systems and we're going to talk in
the first half of the course Extensively about symmetric encryption systems so in a symmetric encryption system basically the two parties alice and bob share a secret key k which the attacker does not know only they know the secret key k now they're going to use a cipher which consists of these two algorithms e and d e is called an encryption algorithm and d is called the decryption algorithm the encryption algorithm takes the Message and the key as inputs and produces a corresponding ciphertext and the decryption algorithm does the opposite it takes the ciphertext as input
along with the key and produces the corresponding message now a very important point that i'd like to stress i'm only going to say this once now and never again but it's an extremely important point and that is that the algorithms e and d the actual encryption algorithms Are publicly known adversary knows exactly how they work the only thing that's kept secret is the secret key k other than that everything else is completely public and it's really important to realize that you should only use algorithms that are public because those algorithms have been peer reviewed by
a very large community of hundreds of people for many many many years and these Algorithms only begin to be used once this community has shown that they cannot be broken essentially so in fact if someone comes to you and says hey i have a proprietary cipher that you might want to use the answer usually should be that you know you stick to standards to standard algorithms and not use a proprietary cipher in fact there are many examples of proprietary ciphers that as soon as They were reverse engineered they were easily broken by simple analysis now
even in the simple case of symmetric encryption which we're going to discuss in the first half of the course there are actually two cases that we're going to discuss in turn the first is when every key is only used to encrypt a single message we call these one-time keys okay so for example when you encrypt Email messages it's very common that every single email is encrypted using a different symmetric key yeah different symmetric cipher key because the key is used to only encrypt one message there are actually fairly efficient and simple ways of encrypting messages
using these one-time keys and we'll discuss those actually in the next module Now there are many cases in fact where keys need to be used to encrypt multiple messages we call these many time keys for example when you encrypt files in a file system the same key is used to encrypt many many different files and it turns out if the key is now going to be used to encrypt multiple messages we need a little bit of more machinery to make sure that the encryption system Is secure in fact after we talk about one-time keys we'll
move over and talk about encryption modes that are specifically designed for many time keys and we'll see that there are a couple more steps that need to be taken to ensure security in those cases okay the last point i want to make is that there are a couple of important things to remember about cryptography First of all cryptography of course is a fantastic tool for protecting information in computer systems however it's also very important that cryptography has its limitations first of all cryptography is really not the solution to all security problems for example if you
have software bugs then very often cryptography is not going to be able to help you similarly if you're worried about social engineering attacks Where the attacker tries to fool the user into taking actions that are going to hurt the user then cryptography is very often actually not going to help you so it's very important that although it's a fabulous tool it's not the solution to all security problems now another very important point is that cryptography essentially becomes useless if it's implemented incorrectly So for example there are a number of systems that work perfectly fine and
we'll see examples of those systems that in fact allow alice and bob to communicate and in fact messages that alice sent to bob bob can receive and decrypt however because cryptography is implemented incorrectly the systems are completely insecure and actually i should say that i like to mention An old encryption standard called wep web that's used for encrypting wi-fi traffic web contains many mistakes in it and often when i want to show you how not to do things in cryptography i will point to how things were done in wep as an example so for me
it's very fortunate to have an example protocol i can point to for how not to do things and finally a very important point that I'd like you to remember is that cryptography is not something you should try to invent and design yourself as i said there are standards in cryptography standard cryptographic primitives which we're going to discuss at length during this course and primarily you're supposed to use these standard cryptographic primitives and not invent things these primitives yourself they have the standards have all gone through Many years of review by hundreds of people and that's
not something that's going to happen to an ad hoc design and as i said over the years there are many examples of ad hoc designs that were immediately broken as soon as they were analyzed before we start with the technical material i want to give you a quick overview of what cryptography is about in the different areas of cryptography so the core of cryptography of course is Secure communication that essentially consists of two parts the first is secure key establishment and then how do we communicate securely once we have a shared key we already said
that secure key establishment essentially amounts to alice and bob sending messages to one another such that at the end of this protocol there's a shared key that they both agree on shared key k And beyond that beyond just a shared key in fact alice would know that she's talking to bob and bob would know that he's talking to alice but a poor attacker who listens in on this conversation has no idea what the shared key is and we'll see how to do that later on in the course now once they have a shared key they
want to exchange messages securely using this key And we'll talk about encryption schemes that allow them to do that in such a way that an attacker can't figure out what messages are being sent back and forth and furthermore an attacker cannot even tamper with this traffic without being detected in other words these encryption schemes provide both confidentiality and integrity but cryptography does much much much more than just these two things And i want to give you a few examples of that so the first example i want to give you is what's called a digital signature
so a digital signature basically is the analog of a signature in the physical world in the physical world remember when you sign a document essentially you write your signature on that document and your signature is always the same you always write the same signature on all documents that you want To sign in the digital world this can't possibly work because if the attacker just obtained one signed document for me he can cut and paste my signature onto some other document that i might not have wanted to sign and so it's simply not possible in a
digital world that my signature is the same for all documents that i want to sign so we're going to talk about how to Construct digital signatures in the second half of the course it's really quite an interesting primitive and we'll see exactly how to do it just to give you a hint the way digital signatures work is basically by making the digital signature be a function of the content being signed so an attacker who tries to copy my signature from one document to another is not going to succeed because the Signature on the new document
is not going to be the proper function of the data in the new document and as a result the signature won't verify and as i said we're going to see exactly how to construct digital signatures later on and then we'll prove that those constructions are secure another application of cryptography that i wanted to mention is anonymous communication so here imagine user alice wants to talk To some chat server bob and perhaps she wants to talk about a medical condition and so she wants to do that anonymously so that the chat server doesn't actually know who
she is well there's a standard method called a mixnet that allows alice to communicate over the public internet with bob through a sequence of proxies such that at the end of the communication Bob has no idea who he just talked to the way mixnets work is basically as alice sends her messages to bob through a sequence of proxies these messages get encrypted and decrypted appropriately so that bob has no idea who he talked to and the proxies themselves don't even know that alice is talking to bob or that actually who's talking to who more generally
one interesting thing about this anonymous Communication channel is bi-directional in other words even though bob has no idea who it's talking to he can still respond to alice and alice will get those messages once we have anonymous communication we can build other privacy mechanisms and i want to give you one example which is called anonymous digital cash remember that in the physical world if i have a physical dollar I can walk into a bookstore and buy a book and the merchant would have no idea who i am the question is whether we can do the
exact same thing in the digital world in the digital world basically alice might have a digital dollar a digital dollar coin and she might want to spend that digital dollar at some online merchants perhaps some online bookstore now what we'd like to do is make it so That when alice spends her coin at the bookstore the bookstore would have no idea who alice is so we provide the same anonymity that we get from physical cash now the problem is that in the digital world alice can take the coin that she had this one dollar coin
and before she spent it she can actually make copies of it and then all of a sudden instead of having just one dollar coin now all of a sudden She has three dollar coins and they're all the same of course and there's nothing preventing her from taking those replicas of the dollar coin and spending it at other merchants and so the question is how do we provide anonymous digital cash but at the same time also prevent alice from double spending the dollar coin at different merchants in some sense there's a paradox here where anonymity is
in conflict with security because if We have anonymous cash there's nothing to prevent alice from double spending the coin and because the coin is anonymous we have no way of telling who committed this fraud and so the question is how do we resolve this tension and it turns out it's completely doable and we'll talk about anonymous digital cash later on just to give you a hint i'll Say that the way we do it is basically by making sure that if alice plans the coin once then no one knows who she is but if she spends
the coin more than once all of a sudden her identity is completely exposed and then she could be subject to all sorts of legal problems and so that's how none of digital cache would work at a high level and we'll see how to implement it Later on in the course another application of cryptography has to do with more abstract protocols but before i state the general results i want to give you two examples so the first example has to do with election systems so here's the election problem suppose we have two parties party zero and
party one and voters vote for these parties so for example this voter could have voted for Party zero this voter voter for party one and so on so in this election party zero got three votes and party two got two votes so the winner of the election of course is party zero but more generally the winner of the election is the party who receives the majority of the votes now the voting problem is the following the voters would somehow like to compute the majority of the votes but do it in Such a way such that
nothing else is revealed about their individual votes okay so the question is how to do that and to do so we're going to introduce an election center who's going to help us compute the majority but keep the votes otherwise secret and what the parties will do is they will each send the funny encryption of their votes to the election center in such a Way that at the end of the election the election center is able to compute and output the winner of the election however other than the winner of the election nothing else is revealed
about the individual votes the individual votes otherwise remain completely private of course the election center is also going to verify that this voter for example is allowed To vote and that the voter has only voted once but other than that information the election center and the rest of the world learn nothing else about that voters votes other than the result of the election so this is an example of a protocol that involves six parties in this case there are five voters and one election center these parties compute amongst themselves And at the end of the
computation the result of the election is known but nothing else is revealed about the individual inputs now a similar problem comes up in the context of private auctions so in a private auction every bidder has his own bid that he wants to bid and now suppose the auction mechanism that's being used is what's called a vicary auction Where the definition of a victory auction is that the winner is the highest bidder but the amounts that the winner pays is actually the second highest bid so he pays the second highest bid okay so this is a
standard auction mechanism called the vickery auction and now what we'd like to do is basically enable the participants To compute to figure out who the highest bidder is and how much he's supposed to pay but other than that all other information about the individual bids should remain secret so for example the actual amount that the highest bidder bid should remain secret the only thing that should become public is the second highest bid in the identity of the highest bidder So again now the way we will do that is it will introduce an auction center and
in a similar way essentially everybody will send their encrypted bids to the auction center the auction center will compute the identity of the winner and in fact he will also compute the second highest bid but other than these two values nothing else is revealed about the individual bits now this is actually an example of a Much more general problem called secure multi-party computation let me explain what secure multi-party computation is about so here basically abstractly the participants have a secret input to themselves so in the case of an election the inputs would be the votes
in the case of an auction the inputs would be the secret bids and then what they'd like to do is compute some sort of a function of their inputs again in the case of an Election the function is a majority in the case of auction the function happens to be the second highest largest number among x1 to x4 and the question is how can they do that such that the value of the function is revealed but nothing else is revealed about the individual votes so let me show you kind of a dumb insecure way of
doing it what we do is we introduce A trusted party and then this trusted authority basically collects individual inputs and it kind of promises to keep the individual inputs secret so that only it would know what they are and then it publishes the value of the function to the world so the idea is now that the value of the function became public but nothing else is revealed about the individual inputs but of course you got this trusted Authority that you got to trust and if for some reason it's not trustworthy then you have a problem
and so there's a very central theorem in crypto it's really is quite a surprising fact that says that any computation you'd like to do any function f you'd like to compute that you can compute with a trusted authority you can also do without a trusted authority let me at a high level explain What this means basically what we're going to do is we're going to get rid of the authority so the parties are actually not going to send their inputs to the authority and in fact there's no longer going to be an authority in the
system instead what the parties are going to do is they're going to talk to one another using some protocol such that at the end of the protocol all of a sudden the value of the Function becomes known to everybody and yet nothing other than the value of the function is revealed in other words the individual inputs is still kept secret but again there's no authority there's just a way for them to talk to one another such that the final output is revealed so this is a fairly general result it's kind of a surprising fact that
this is at all doable and in fact it is and towards the end of The class we'll actually see how to make this happen now there are some applications of cryptography that i can't classify in any other way other than to say that they're purely magical let me give you two examples of that so the first is what's called privately outsourcing computation so i'll give you an example of a google search just to illustrate the point So imagine alice has a search query that she wants to issue it turns out that there are very special
encryption schemes such that alice can send an encryption of her query to google and then because of the property of the encryption scheme google can actually compute on the encrypted values without knowing what the plain text are so google can actually run its massive search algorithm On the encrypted query and recover an encrypted results okay google would send the encrypted results back to alice alice would decrypt and then she would receive the results but the magic here is all google saw was just encryptions of her queries and nothing else and so google as a result
has no idea what alice just searched for and nevertheless alice actually learned Exactly what she wanted to learn okay so these are magical kind of encryption schemes they're fairly recent this is only a new development from about two or three years ago that allows us to compute unencrypted data even though we don't really know what's inside the encryption now before you rush off and think about implementing this i should warn you that this is really at this point just theoretical in the sense that running a Google search on encryption data probably would take a billion
years but nevertheless just the fact that this is doable is already really surprising and is already quite useful for relatively simple computations so in fact we'll see some applications of this out later on the other magical application i want to show you is what's called zero knowledge and in particular i'll tell you about something called the zero knowledge proof of Knowledge so here what happens is there's a certain number n which alice knows and the way the number n was constructed is as a product of two large primes so imagine here we have two primes
p and q each one you can think of it as like a thousand digits and you probably know that multiplying two thousand digit numbers is fairly easy but if i just give you Their product figuring out their factorization into primes is actually quite difficult and in fact we're going to use the fact that factoring is difficult to build public key cryptosystems in the second half of the course okay so alice happens to have this number n and she also knows the factorization of n now bob just has the number n he doesn't actually know the
factorization Now the magical fact about the zero knowledge proof of knowledge is that alice can prove to bob that she knows the factorization of n yes she can actually give this proof to bob that bob can check and become convinced that alice knows the factorization of n however bob learns nothing at all about the factors p and q and this is provable bob absolutely learns nothing at all About the factors p and q and this statement actually is very very general this is not just about proving the factorization of n in fact almost any puzzle
that you want to prove that you know an answer to you can prove it in zero knowledge so if you have a crossword puzzle that you've solved well maybe crosswords is not the best example but if you have like a sudoku puzzle for Example that you want to prove that you've solved you can prove it to bob in a way that bob would learn nothing at all about the solution and yet bob would be convinced that you really do have a solution to this puzzle okay so those are kind of magical applications and so the
last thing i want to say is that modern cryptography is a very rigorous science And in fact every concept we're going to describe is going to follow three very rigorous steps okay and we're going to see these three steps again and again and again so i want to explain what they are so the first thing we're going to do when we introduce a new primitive like a digital signature is we're going to specify precisely what the threat model is that is what can an attacker do to Attack a digital signature and what is his goal
in forging signatures okay so we're going to define exactly what does it mean for a signature for example to be unforgeable unforgeable again i'm giving digital signatures just as an example for every primitive we describe we're going to precisely define what the threat model is then we're going to propose a construction And then we're going to give a proof that any attacker that's able to attack the construction under this threat model that attacker can also be used to solve some underlying heart problem and as a result if the problem really is hard that actually proves
that no attacker can break the construction under the threat model okay but these three steps are actually quite important in the case of signatures We'll define what it means for a signature to be unforgeable then we'll give a construction and then for example we'll say that anyone who can break our construction can then be used to say factor integers which is believed to be a hard problem okay so we're going to follow these three steps throughout and you'll see how this actually comes about okay so this is the end of the segment And then in
the next segment we'll talk a little bit about the history of cryptography before we start with the technical material i want to tell you a little bit about the history of cryptography there's a beautiful book on this topic by david khan called the code breakers that covers the history of cryptography all the way from the babylonian area era to the present here i'm just going to give you a few Examples of historical ciphers all of which are badly broken so to talk about ciphers the first thing i'm going to do is introduce our friends alice
and bob who are going to be with us for the rest of the quarter so alice and bob are trying to communicate securely and there is an attacker who's trying to eavesdrop on their conversation so to communicate securely they're going to share a secret key Which i'll denote by k they both know the secret key but the attacker does not know anything about this key k so now they're going to use a cipher which is a pair of algorithms an encryption algorithm denoted by e in a decryption algorithm v these algorithms work as follows the
encryption algorithm e takes the message m as inputs and it takes as inputs The key k i'm going to put a wedge above the key input just to denote the fact that this input is really the key input and then it outputs a ciphertext which is the encryption of the message m using the key k i'm always going to write the key first and when i write colon equals what i mean is that the expression defines what c What the variable c stands for now the cipher text is transmitted over the internet to bob somehow
actually it could be transmitted over the internet could be transmitted using an encrypted file system it doesn't really matter but when the ciphertext reaches bob he can plug it into the decryption algorithm and give the decryption algorithm the same Key k again i'm going to put a wedge here as well to denote the key input and the decryption algorithm outputs uh the original plaintext message now the reason we say that these the reason we say that these are symmetric ciphers is that both uh the encryptor and decrypter actually use the same key key as we'll
see later in the course There are ciphers where the encrypter uses one key and the decrypter uses a different key but here we're just going to focus on symmetric ciphers where both sides use the same key okay so let me give you a few historic examples of ciphers the first example simplest one is called a substitution cipher i'm sure all of you played with substitution ciphers when you were in Kindergarten basically a key for a substitution cipher is a substitution table that basically says how to map letters so here for example the letters a would
be mapped to c the letter b would be mapped to w the letter c would be mapped to n so on and so forth and then the letter z would be mapped say to a so this is one example Of a key for a substitution cipher so just to practice the notation we introduced before the encryption of a certain message using this key let's say the message is b c z a the encryption of this message using this key here would be uh is done by substituting one letter at a time so b becomes w
c becomes n z becomes a and a becomes C so the encryption of bcza is wnac and this defines the ciphertext similarly we can decrypt the ciphertext using the same key and of course we'll get back the original message okay so just for historical reasons there's one example of something that's related to the substitution cipher called the caesar cipher the caesar cipher actually is not really A cipher at all and the reason is that it doesn't have a key what a caesar cipher is is basically a substitution cipher where the substitution is fixed namely it's
a shift by 3. so a becomes d b becomes e c becomes f and so on and so forth what is it y becomes b and z becomes c it's a fixed substitution that's applied to all Plain text messages so again this is not a cipher because there is no key the key is fixed so if an attacker knows how our encryption scheme works he can easily decrypt the message the key is not random and therefore decryption is very easy once you understand how the scheme actually works okay so now let's go back to the
substitution cipher where the keys Really are chosen at random these substitution tables are chosen at random and let's see how we break this substitution cipher turns out to be very easy to break the first question is how big is the key space how many different keys are there assuming we have 26 letters so i hope all of you said that the number of keys is 26 factorial because a key a substitution key is simply a table A permutation of all 26 letters the number of permutations of 26 letters is 26 factorial if you calculate this
out 26 factorial is about 2 to the 88 which means that describing a key in a substitution cipher takes about 88 bits so each key is represented by about 88 bits now this is a perfectly fine size for a key space in fact we're going to be seeing Ciphers that are perfectly secure or you know there are adequately secure with key spaces that are roughly of this size however even though the substitution cipher has a large key space of size 2 to the 88 it's still terribly insecure so let's see how to break it uh
and to break it we're going to be using letter frequencies so the first question is what is the most frequent letter In english text so i imagine all of you know that in fact e is the most common letter and that is gonna if we make that quantitative that's gonna help us break a substitution cipher so just given the cipher text we can already recover the plain text so the way we do that is first of all using frequencies of english letters so here's how this works So you give me an encrypted message using a
substitution cipher what i know is all i know is that the message is in english the plaintext is in english and i know that the letter e is the most frequent letter in english and in fact it appears 12.7 percent of the time in standard english text so what i'll do is i'll look at the ciphertext you gave me and i'm going to count how many times Every letter appears now the most common letter in the ciphertext is going to be the encryption of the letter e with very high probability so now i'm able to
recover one entry in the key table namely the letter namely now i know what the letter e maps to the next most common letter in english is the letter t that appears about 9.1 percent of the Time so now again i count how many times each letter appears in the ciphertext and the second most frequent letter is very likely to be the encryption of the letter t so now i've recovered the second entry in the key table and i can continue this way in fact the letter a is the next most common letter it appears
8.1 percent of the time so now i can guess that the third most Common letter in the ciphertext is the encryption of the letter a and now i've recovered three entries in the key table well so now what do i do the remaining letters in english appear roughly same amount of time other than some rare letters like q and x but we're kind of stuck at this point we figured out three entries in the key table but what do we do next so the next idea is to use frequencies Of pairs of letters sometimes these
are called diagrams so what i'll do is i'll count how many times each pair of letters appears in the ciphertext and i know that in english the most common pairs of letters are things like h e a n i n i guess t h is another common pair of letters and so i know That the most common pair of letters in the cipher text is likely to be the encryption of one of these four pairs and so by trial and error i can sort of figure out more entry more elements in the key table and
again by more trial and error this way by looking at trigrams i can actually figure out the entire key table so the bottom line here is that in fact this substitution Cipher is vulnerable to the worst possible type of attack namely a cipher text only attack just given the cipher text the attacker can recover the decryption key and therefore recover the original plaintext so there's really no point in encrypting anything using a substitution cipher because the attacker can easily decrypt it all you might as well send your plaintext Completely in the clear so now we're
going to fast forward to the renaissance and i guess we're moving from the roman era to the renaissance and look at a cipher designed by a fellow named vigenere who lived in the 16th century he designed a couple of ciphers here i'm going to show you a variant of one of his ciphers this is called a visionaire cipher so in a vigenere cipher the key Is a a word in this case the word is crypto it's got six letters in it and then to encrypt the message what you do is you write the message under
the key so in this case the message is what a nice day today and then you replicate the key as many times as needed to cover the message and then the way you encrypt is basically you add the key letters to the message letters Modulo 26. so just to give you an example here for example if you add y in a you get a z if you add t and a you get u and you do this for all the letters and remember whenever you add you add modulo 26 so if you go past z
you go back to a so that's the vigenere cipher and in fact decryption is just as easy as encryption Basically the way you decrypt is again you would write the ciphertext under the key you would replicate the key and then you would subtract the key from the ciphertext to get the original plaintext message so breaking the vigener cipher is actually quite easy let me show you how you do it the first thing we need to do is we need to assume that we know the length of the key so let's just assume we know That
in this case the length of the key is six and then what we do is we break the ciphertext into groups of six letters each okay so we're going to get a bunch a bunch of groups like this each one contains six letters and then we're going to look at the first letter in each group okay so in this case yes we're looking at the first letter every six characters Now what do we know about these six letters we know that in fact they're all encrypted using the same letter in the cipher text all of
these are encrypted using the letter c in other words z l w is just shift by three of the plain text letters so if we collect all these letters then the most common letter among the set is likely to be the encryption of e Right e is the most common letter in english therefore if i look at every sixth letter the most common letter in that set is likely to be the encryption of the letter e aha so let's just suppose that in fact the most common letter uh every six letter happens to be the
letter h then we know that e plus the first letter of the key Is equal to h that says that the first letter of the key is equal to h minus e and in fact that is the letter c so now we've recovered the first letter of the key and now we can continue doing this with the second letter so we look at the second letter in every group of six characters and again we repeat the same exercise we find the most common letter among the sets And we know that the most this most common
letter is likely the encryption of e and therefore whatever this letter whatever this most common letter is if we subtract e from it we're going to get the second letter of the key and so on and so forth with uh the third letter every six characters and this way we recover uh the entire uh Key and that allows us to decrypt uh the message now the only caveat is that i had to assume ahead of time that i know the length of the key which in this case is six but if i don't know the
length of the key ahead of time that's not a problem either what i would do is i would run this decryption procedure assuming the key length is one then i'd run it assuming the key length is two Then i would run it assuming the key links is three and so on and so on and so on until finally i get a message i get a decryption that makes sense that's sensical and once i do that i know that i've kind of recovered the right length of the key and i know that i've also recovered the
right key and therefore uh the right message okay so very very quickly you can recover you can decrypt Regionair uh ciphers again this is a ciphertext only attack the interesting thing is a viginer had a good idea here this addition mod 26 is actually a good idea and we'll see that later except it's executed very poorly here and so we'll correct that a little bit later okay we're going to fast forward now from the renaissance to uh to the 19th century where everything became electric And so people wanted to design ciphers that use electric motors
in particular these ciphers are called roller machines because they use rotors so an early example is called a headburn machine which uses a single rotor here you have a picture of this machine the the motor i guess the rotor is over here and the secret key is captured by this disc here it's embedded inside of this disc Which rotate by one notch every time you press a key on the typewriter okay so every time you that you hit a key the disk uh rotates by one notch now what does this key do well the key
actually encodes the substitution table so uh and therefore the key the disk actually is the secret key and as i said this disk encodes a substitution table in this case if you Happen to press c as the first letter output would be the letter t and then the disk would rotate by one notch after rotating rotating by one notch the new substitution table becomes the one shown here you see that e basically moves up and then the remaining letters moves down so imagine the this is basically a two dimensional rendering of the disk Rotating by
one notch then you press the next letter and the disk rotates again you notice again n moved up and the remaining letters moved down so in particular if we hit the letter c three times the first time we would output the output would be t the second time the output would be s and the third time the output will be k So this is how these a single rotor machine works and as it turned out very quickly after it was advertised it was again broken basically using letter frequencies diagram frequencies and trigram frequencies it's not
that hard given enough ciphertext to directly recover the secret key and then the message again a ciphertext only attack so to kind of work against these frequency Attacks these statistical attacks these rotor machines became more and more complicated over time until finally i'm sure you've all heard of the enigma the enigma is kind of complicated router machine it uses three four or five rotors there are different versions of the enigma machine here you see an example of the enigma machine with three rotors the secret key in the enigma machine is the initial setting Of the
rotors okay so in the case of three rotors there would be 26 cube possible different keys when you type on the typewriter basically these rotors here rotate at different rates i forgot to say this is a diagram of an enigma machine using four rotors as you type on the typewriter the rotors rotate and output the appropriate letters of the ciphertext so in this case the number of keys is 26 to the Fourth which is 2 to the 18 which is actually relatively a small key space today you can kind of brute force a search using
a computer through two of the 18 different keys very very quickly you know my wrist watch can do it in just a few seconds i guess and so uh these this enigma machine was already was using relatively small key spaces but i'm sure you've all heard That the british cryptographers at bletchley park were able to mount ciphertext only attacks on the enigma machine they were able to decrypt german ciphers back in world war ii and that played an important role uh in many different battles uh during the war uh after the war i guess that
was the end kind of the mechanical age and started the digital Age where uh folks were using computers and as the world kind of migrated to using computers the government realized that it's buying a lot of digital equipment from industry and so we wanted industry to use a good cipher so that when it buys equipment from the from industry it would be getting equipment uh with with a decent cipher and so the government put out this request for Proposal for a data encryption standard a federal data encryption standard and we're going to talk about this
effort uh in more detail later on in the course but in 1974 a group at ibm put together a cipher that became known as des data encryption standard which became a federal standard for encrypting data the key space for des is 2 to the 56 Which is relatively small these days but was large enough back in 1974 and another interesting thing about des is rather than unlike rotor machines which encrypt one character at a time the data encryption standard encrypts 64 bits at a time namely eight characters at a time and we'll see the significance
of this uh later on in the course because des uses such a small key space these days it can be broken by a brute Force search and so these days des is considered insecure and should not be used in projects unfortunately it is used in some legacy systems but it definitely is on its way out and should not be definitely should not be used uh in new projects today there are new ciphers things like the advanced encryption standard which uses 128 bit keys again We'll talk about the advanced encryption standards in much more detail later
on in the course there are many many other types of ciphers i mean i mentioned salsa 20 here we'll see why in just a minute but this is this is all for the quick historical survey and now we can get into the more technical material over the years many natural cryptographic constructions were found To be insecure in response modern cryptography was developed as a rigorous science where constructions are always accompanied by a proof of security the language used to describe security relies on discrete probability in this segment in the next i'll give a quick overview
of discrete probability and i point to this wikibooks article over here for a longer introduction discrete probability is always defined Over a universe which i'll denote by you in this universe in our case is always going to be a finite set in fact very commonly our universe is going to be simply the set of all n bit strings which here is denoted by 0 1 to the n so for example the set 0 1 squared is a set of all two-bit strings which happens to be the string 0 0 0 1 1 0 and 1
1. so there are four elements in this set And more generally in the sets 0 1 to the n there are 2 to the n elements now a probability distribution over this universe u is simply a function which i'll denote by p and this function what it does is it assigns to every element in the universe a number between zero and one and this number is what i'll call the Weight or the probability of that particular element in the universe now there's only one requirement on this function p and that is that the sum of
all the weights sum up to one that is if i sum the probability of all elements x in the universe what i end up with is the number one so let's look at a very simple example looking back to our two bit universe so Zero 0 1 1 0 and 1 1 and you can consider the following probability distribution which for example assigns to the element 0 0 the probability one half at the elements 0 1 we assign the probability 1 8 to 1 0 we assign the probability one quarter and 2 1 1 we
assign the probability 1 8. okay and you can see that if we sum up these numbers in fact we get 1 which means that this Probability p is in fact the probability distribution now what these numbers mean is if i sample from this probability distribution i'll get the string 0 0 with probability one half i'll get the string 0 1 with probability 1 8 and so on and so forth so now that we understand what a probability distribution is let's look at two classic Examples of probability distributions the first one is what's called the uniform
distribution the uniform distribution assigns to every element in the universe exactly the same weight i'm going to use u between two bars to denote the size of the universe u that is the number of elements in the universe and since we want uh the sum of all the weights to sum up to one And we want all these weights to be equal what this means is that for every element x in the universe we assign a probability of one over u so in particular if we look at our example the uniform distribution on the sets
of two bit strings would simply assign one quarter the weight one quarter to each one of these strings And clearly that the sum of all the weights sums up to one what again what this means is that if i sample at random from this distribution i'll get a uniform sample across all two-bit strings so all of these four bit strings are equally likely to be sampled by this distribution another distribution that's very common is what's called the point distribution at the point x0 and what this point Distribution does is basically it puts all the weights
on a single point namely x0 so here we assign to the point x0 all the weight one and then to all other points in the universe we assign the weight 0. and by the way i want to point out that this inverted a here should be read as uh for all so all this says is that for all x that are not equal to X0 the probability of that x is equal to 0. so again going back to our example a point distribution for example that would put all its mass on the string 1 0
would assign the probability 1 to the string 1 0 and 0 to all other strings so now if i sample from this distribution pretty much i'm always guaranteed to always sample the string 1 0 and never sample any of the other Strings so now we know what a distribution is and i just want to make one last point and that is that because this universe u is always going to be a finite set for us we can actually write down uh the weights that the distribution assigns to every element in u and represent the entire
distribution as a vector so here for example if we look at the Universe of all three bit strings we can literally write down the weight that the distributions assigned to the string zero zero zero then the weight that the distribution is assigned to the string 0 0 1 and so on and so forth and you can see that we can write this as a vector in this case it will be a vector of dimension 8 there will be there are eight strings of three bits um and as a result Basically the entire distribution is captured
by this vector of eight real numbers in the interval zero to one the next thing i want to do is define the concept of an event so consider a subset a of our universe and i'll define the probability of this subset to be simply the sum of the weights of all the elements in the set a in other words i'm summing Over all x and a the weight of these elements x in the set a now because the sum over the entire universe of all weights needs to be one this means that if we sum
well if we look at the probability of the entire universe basically we get one and if we look at the probability of a subset of the universe we're going to get some number in the Interval 0 to 1. and we say that the probability of this set a is the sum which is a number between 0 and 1. and i'll tell you that a subset a of the universe is called an event and the probability of the set a is called the probability of that event so let's look at a simple example so suppose we
uh look at the universe u which consists of all eight bit strings right so the size Of this universe u is 256 because there are 256 8 bit strings essentially we're looking at all byte values all 256 possible byte values now let's define the following events basically the event is going to contain all bytes so all eight bit strings in our universe such that the two least significant bits of the byte happens to be one one So for example if we look at zero one zero one one zero 1 0 that's an element in the
universe that's not in the set a but if we change this 0 to a 1 then that's an element in the universe which is in our set a and now let's look at the uniform distribution over the universe u and let me ask you what is the probability of the event a So what is the probability that when we choose a random byte the two least significant bits of that byte happens to be 1 1. well the answer is one-fourth and the reason that's true is because it's not too difficult to convince yourself that of
the 256 uh 8-bit strings exactly 64 of them one-quarter of them end in one one and the probability of each String since we're looking at the uniform distribution the probability of each string is exactly one over the size of the universe namely one over 256 and the product of these you know 64 elements each one has weight 1 over 256 is exactly 1 4 which is the probability of the event a that we were looking at so a very simple bound on the probability of events is called the union bound So imagine we have two
events a1 and a2 so these are both subsets of some universe u and we want to know what is the probability that either a1 occurs or a2 occurs in other words what is the probability of the union of these two events this little u here denotes the union of the two sets so the union bound tells us that the probability that Either a1 occurs or a2 occurs is basically less than the sum of the two probabilities and that's actually quite easy to see so simply look at this picture here you can see that when we
look at the at the sum of the two probabilities we're basically summing the probability of all the elements in a1 all the elements in a2 and you realize we kind of double summed These elements in the intersection they get summed twice here on the right hand side and as a result the sum of the two probabilities is actually going to be larger or larger than or equal to the actual probability of the union of a1 and a2 so that's the classic union bound and in fact i'll tell you that if the two events are disjoint
in other words their intersection Is empty in that case if we look at the sum at the at the probability that uh either a1 happens or a2 happens that's exactly equal to the sum of the two probabilities okay so we'll use these facts of here and there throughout the course so just to be clear the inequality holds always but when the two events are disjoint then in fact we get an equality over Here so let's look at a simple example suppose our event a1 is the set of all n-bit strings that happen to end in
1 1 and suppose a 2 is the set of all n bit strings that happen to begin with 1 1. okay so n think of it as 8 or some large number and i'm asking now what is the probability that either a1 happens or a2 happens in other words if i sample Uniformly from the universe u what is the probability that either the least significant bits are 1 1 or the most significant bits are 1 1 well as we said that's basically the probability of the union of a1 and a2 we know that the probability
of each one of these events is one quarter by what we just uh did on the previous slide and therefore by the union bound the probability of the or Is you know a quarter the probability of a one plus the probability of a two which is a quarter plus a quarter and we just proved that the probability of seeing one one in the most significant bit or one one and the least significant bit is less than one half so that's a simple example of how we might go about using the union bound to bound the
probability that one of two events might happen the next concept we need to define is What's called a random variable now random variables are fairly intuitive objects but unfortunately the formal definition of a random variable can be a little confusing so what i'll do is i'll give an example and hopefully that will be clear enough so formally a random variable denoted say by x is a function from the universe into some set v and we say that the set v is where the random variable takes its Values so let's look at a particular example so
suppose we have a random variable x and this random variable maps into the set 0 1. so the values of this random variable are going to be either zero or one so one bit basically now this random variable maps our universe which is the set of all n-bit binary strings 0 1 to the n and how does it do it well given a particular sample in the Universe a particular n-bit string y what the random variable will do is simply output the least significant bit of y and that's it that's the whole random variable so
now let me ask you suppose we look at the uniform distribution on the set 0 1 to the n let me ask you what is the probability that this random variable outputs 0 and what is the probability that a random variable Outputs 1. well you can see the answers are half and half but let's just reason through why that's the case so here we have a picture showing the universe and the possible output space and so in this case the variable can output either 0 or 1. well when does the variable output 0 the variable
outputs 0 when the sample in the universe happens to be to have its least significant bits be set to zero And the variable one outputs one when the sample in the universe happens to have its least significant bit set to one well if we choose strings uniformly at random the probability that we choose a string that has its least significant bit set to zero is exactly one-half which is why the random variable output is zero with probability exactly one-half Similarly if we choose a random n bit string the probability that the least significant bit is
equal to one is also one-half and so we say that the random variable outputs one also with exactly probability one-half now more generally if we have a random variable taking values in a certain set v then this random variable actually induces a distribution on the set v and here i just wrote a Kind of in symbols what this distribution means but it's actually very easy to explain essentially what it says is that the variable outputs v um basically with the same probability that if we sample a random element in the universe and and then we
apply the function x we ask how likely is it that the output is actually equal to v so formally we say that the probability That x outputs v is the same as the probability of the event that when we sample a random element in the universe we fall into the pre-image of v under the function x and again if this wasn't clear it's not that important all you need to know is that a random variable takes values in a particular set v and it induces a distribution on that set v Now there's a particularly important
random variable called a uniform random variable and it's basically defined as you would expect so let's say that u is some finite set for example the set of all n bits binary strings and we're going to denote a random variable r that's basically samples uniformly from the set u by this little funny arrow with a little R on top of it and this again denotes that the random variable r is literally a uniform random variable sampled over the set u so in symbols what this means is that for all elements a in the universe the
probability that r is equal to a is simply uh 1 over u and if you want to stick to the formal definition of a of a uniform variable it's actually not that Important uh but i'll just say that formally uh the uniform random variable is just an identity function mainly rx is equal to x for all x in the universe so just to see that this is clear let me ask you a simple puzzle suppose we have a uniform random variable over two bit strings so over the set 0 0 0 1 1 0 and
1 1. and now let's define a new random variable x To basically sum the first and second bits of r that is x simply is the sum of r1 and r2 the first and second bits of r treating those bits as integers so for example if r happens to be 0 0 then x will be 0 plus 0 which is 0. so let me ask you what is the probability that x is equal to 2 so it's not difficult to see that the Answer is exactly 1 4 because basically the only way that x is
equal to 2 is if r happens to be 1 1 but the probability that r is equal to 1 1 is basically 1 4 because r is uniform over the set of all two-bit strings the last concept i want to define in the segment is what's called a randomized algorithm so i'm sure you're all familiar with Deterministic algorithms these are algorithms that basically take a particular input data as inputs and they always produce the same output say y so if we run the algorithm a hundred times on the same input we'll always get the same
output so you can think of a deterministic algorithm as a function that given a particular Input data m will always produce exactly the same output a of m a randomized algorithm is a little different in that it as before it takes the input data m as input but it also has an implicit argument called r where this r is sampled a new every time the algorithm is run and in particular this r is sampled Uniformly at random from the set of all n-bit strings for some arbitrary n now what happens is every time we run
the algorithm on a particular input m we're going to get a different output because a different r is generated every time so the first time we run the algorithm we get one output the second time we run the algorithm a new r is generated and we get a different output The third time we run the algorithm a new r is generated and we get a third output and so on so really the way to think about a randomized algorithm is it's actually defining a random variable right so given a particular input message m it's defining
a random variable which is defining a distribution over the set of all possible outputs of this algorithm Given the input m so the thing to remember is that the output of a randomized algorithm changes every time you run it and in fact the algorithm defines a distribution and the set of all possible outputs so let's look at a particular example so suppose we have a randomized algorithm that takes as input a message m and of course it also takes an implicit input which is this random String that is used to randomize its operation so now
what the algorithm will do is simply will encrypt the message m using the random string as inputs so this basically defines a random variable this random variable takes values that are encryptions of the message m and really what this random variable is it's a distribution over the set of all possible encryptions of the message m under a Uniform key so the main point to remember is that even though the inputs to a randomized algorithm might always be the same every time you run the randomized algorithm you're going to get a different output okay so that
concludes this segment and we'll see a bit more discrete probability in the next segment in this segment we're going to continue with a few more tools from discrete Probability and i want to remind everyone that if you want to read more about this there's more information in this wikibooks article that is linked over here so first let's do a quick recap of where we are we said the discrete probability is always defined over a finite set which we're going to denote by u and typically for us u is going to be the set of All
n-bit binary strings which we denote by 0 1 to the n now a probability distribution p over this universe u is basically a function that assigns to every element in the universe a weight in the interval zero to one such that if we sum uh the weight of all these elements the sum basically sums up to one now we said that a subset of the universe is what's called an event And we said that the probability of an event is basically the sum of all the weights of the elements in the event and we said
that the probability of an event is some real number in the interval 0 to 1. and i want to remind everyone that basically the probability of the entire universe is basically 1 by the fact that the sum of all the weights sums up to 1. then we define what a Random variable is formally a random variable is a as a function from the universe to some other sets but the thing that i want you to remember is that the random variable takes values in some set v and in fact a random variable defines a distribution
on this set v so the next concept we need is what's called independence and i'm going to Very briefly define this if you want to read more about independence please go ahead and look at the wikibooks article but essentially we say that two events a and b are independent of one another if when you know that event a happens that tells you nothing about whether event b actually happened or not formally the way we define independence is to say that the probability of A and b namely that both events happen is actually equal to the
probability of event a times the probability of event b so multiplication in some sense the fact that probabilities multiplied under conjunction captures the fact that these events are independent and as i said if you want to read more about that please take a look at the background material now the same thing can be said for Random variables so suppose we have two random variables x and y they take values in some set v then we say that these random variables are independent if the probability that x is equal to a and y is equal to
b is equal to the product of these two probabilities basically what this means is even if you know that x is equal to a that tells you nothing about the value of y okay that that's what this Multiplication means and again this needs to hold for all a and b in the space of values of these random variables so just again to jog your memory if you've seen this before a very quick example suppose we look at the set of of two bit strings so 0 0 0 1 1 0 and 1 1 and suppose
we choose a random Element from this set okay so we randomly choose one of these four elements with equal probability now let's define two random variables x is going to be the least significant bit that was generated and y is going to be the most significant bit that's generated so i claim that these random variables x and y are independent of one another and the Way to see that intuitively is to realize that choosing r uniformly from the set of four elements is basically the same as flipping a coin an unbiased coin twice the first
bit corresponds to the outcome of the first flip and the second bit corresponds to the outcome of the second flip and of course there are four possible outcomes all four outcomes are equally likely which is why we get the uniform distribution Over two bit strings now uh variables x and y why are they independent basically if i tell you the result of the first flip namely i tell you the least significant bit of r so i tell you how the first coin uh you know whether it fell on its head or fell on its tail
that tells you nothing about the result of the second flip which is why intuitively you might you Might expect these random variables to be independent of one another but formally we would have to prove that for uh all zero one pairs the probability of uh x equals zero and y equals zero x equals one y equals one and so on these probabilities multiply let's just do it for one of these pairs so let's look at the probability that x is equal to zero and y is equal to zero well you see that The probability that
x is equal to zero and y is equal to zero is basically the probability that r is equal to zero zero and what's the probability that r is equal to zero zero well by the uniform distribution that's basically equal to one fourth uh well that's one over the size of the set which is one-fourth in this case and well lo and behold uh that's in fact these probability probabilities multiply Because again the probability that x is equal to zero the probability that the least significant bit of r is equal to zero this probability is exactly
one half because there are exactly two elements that have their least significant bit equal to zero two out of four elements gives you a probability of one half and similarly the probability that y is equal to zero is also one half so in fact the probability is multiplied Okay so that's uh this concept of independence and the reason i wanted to show you that is because we're going to look at an important property of xor that we're going to use again and again so before we talk about xor let me just do a very quick
review of what xor is so of course xor of two bits means the addition of those bits Modulo two so just to kind of make sure everybody's on the same page if we have two bits so zero zero zero one one zero and one one their xor the truth table of the xor is basically just the addition modulo 2. as you can see 1 plus 1 is equal to 2 modulo 2 that's equal to 0. so this is the truth table for the xor and i'm always going to denote xor by this circle with a
plus inside And then when i want to apply xor to bit strings i apply the addition modulo 2 operation bit wise so for example the xor of these two strings would be 1 1 0 and i guess i'll let you fill out the rest of the xors just to make sure we're all on the same page so of course it comes out to one one zero one now we're going to be doing a lot of xoring in this class in fact there's a Classical joke that the only thing cryptographers know how to do is just
xor things together but i want to explain to you why we see xor so frequently in cryptography basically xor has a very important property and the property is the following suppose we have a random variable y that's distributed arbitrarily over 0 1 to the n so we know nothing about the distribution of y But now suppose we have an independent random variable that happens to be uniformly distributed also over 0 1 to the n so it's very important that x is uniform and that it's independent of y so now let's define the random variable which
is the xor of x and y then i claim that no matter what distribution y started with this z is always going to Be a uniform random variable so in other words if i take an arbitrarily malicious distribution and i xor it with an independent uniform random variable what i end up with is a uniform random variable okay this is again kind of a key property that makes xor very useful for crypto so this is actually a very simple fact to prove let's just go ahead and do it Uh let's just prove it for one
bit so for n equals one okay so the way we'll do it is we'll basically write out the probability distributions for the various random variables so let's see for the random variable y well the random variable can be either 0 or 1. and let's say that p 0 is the probability that it's equal to 0 and p 1 is the probability that it's equal to 1. Okay so that's one of our tables similarly we're going to have a table for the variable x well the variable x is much easier that's a uniform random variable so
the probability that it's equal to 0 is exactly one-half probability that it's equal to 1 is also exactly one-half now let's write out the probabilities for the joint distribution in other words let's see what the Probability is for the various joint values of y and x in other words how likely is it that y is 0 and x is 0 y is 0 and x is 1 y is 1 and x is 0 and 1 1. well so what we do is basically because we assume the variables are independent all we have to do is
multiply the probabilities so the probability that y is equal to 0 is p 0 probability that x is equal to 0 Is one half so the probability that we get 0 0 is exactly p 0 over 2. similarly for 0 1 we'll get p 0 over 2. for 1 0 we'll get p 1 over 2 and for 1 1 again the probability that y is equal to one and x is equal to one uh well that's p one times the probability that x is equal to one which is a half so it's p one over
two okay so those are the four Uh probabilities for the various options for x and y so now let's analyze what is the probability that z is equal to zero well the probability that z is equal to zero is basically the same as the probability that um let's write it this way x y is equal to zero zero or x y is equal to 1 1. those are the two possible cases that z Is equal to 0 because z is the xor of x and y now these events are disjoint so uh this expression can
simply be written as the sum of the two expressions given above so basically it's the probability that x y is equal to 0 0 plus the probability that x y is equal to 1 1. so now we can simply look up these uh probabilities in our table So the probability that x y is equal to 0 0 is simply p 0 over 2 and the probability that x y is equal to 1 1 is simply p 1 over 2. so we get p 0 over 2 plus p1 over 2. but what what do we know
about p0 and p1 well it's a probability distribution therefore p0 plus p1 must equal 1 and therefore this fraction here must equal to a half p0 plus p1 is equal to 1 so therefore The sum of these two terms must equal a half and we're done basically we prove that the probability that z is equal to zero is one half therefore the probability that z is equal to one is also one half therefore z is a uniform random variable so the simple theorem is the main reason why xor is so useful in cryptography the last
thing i want to show you about discrete probability is what's called The birthday paradox and i'm going to do it really fast here because we're going to come back later on and talk about this in more detail so the birthday paradox says the following suppose i choose n random variables in our universe u okay and it just so happens that these variables are independent of one another they actually don't have to be uniform all we need to assume is that they're distributed in the same way the most Important property though is that they're independent of
one another so the theorem says that if you choose roughly the square root of the size of u elements we're kind of ignoring this 1.2 here it doesn't really matter but if you choose square root of the size of u elements then basically there's a good chance that there are two elements that are the same in other words if you sample about Square root of u times then it's likely that two of your samples will be equal to one another and by the way i should point out that this inverted e just means exists so
there exists indices i and j such that r i is equal to r j so here's a concrete example that we'll actually see many many times suppose our universe consists of all uh strings of length 128 bits So the size of u is gigantic it's actually 2 to the 128. it's a very very large set uh but it so happens that if you sample say around two to the 64 times from the set this is about the square root of u then it's very likely that two of your sample messages will actually be the same
so as it's called the birthday paradox well traditionally it's described in terms of people's birthdays so you would Think that each one of these samples would be someone's birthday and so the question is how many people need to get together so that there are two people that have the same birthday so just as a simple calculation you can see there are 365 days in the year so you would need about 1.2 times the square root of 365 people until the probability that two of them have the same birthday Is more than a half this if
i'm not mistaken is about 24 which means that if 24 random people get together in a room it's very likely the two of them will actually have the same birthday this is why it's called a paradox because 24 supposedly is a smaller number than you would expect interestingly people's birthdays are not actually uniform across all 365 days in the year there's Actually a bias towards september but i guess that's not that's not relevant to the discussion here the last thing i wanted to do is just show you the birthday paradox about a bit more concretely
so suppose we have a universe of about a million samples then you can see that when we sample roughly 1200 times the probability that we get we sample the same number The same element twice is roughly a half but the probability of sampling the same number twice actually converges very quickly to one you can see that if we sample about 2200 items then the probability that two of those items are the same already is ninety percent and you know at three thousand it's basically one so this converges very quickly to one as soon as you
go beyond the square root Of the size of the universe so we're gonna come back and study the birthday paradox in more detail later on but i just for now wanted you to know what it is so that's the end of the segment and then in the next segment we'll start with our first example encryption systems now that we've seen a few examples of historic ciphers all of which are badly broken we're going to switch gears And talk about ciphers that are much better designed but before we do that i want to first of all
define more precisely what a cipher is so first of all a cipher is actually remember a cipher is made up of two algorithms there's an encryption algorithm and a decryption algorithm but in fact the cipher is defined over a triple so there's the set of all possible keys Which i'm going to denote by script k this sometimes i'll call as the key space the set of all possible keys there's a set of all possible messages uh and the set of all possible cipher texts okay so this triple in some sense defines the environment over which
the cipher is defined and then the cipher itself is a pair of efficient algorithms e and d e is the encryption algorithm d Is the decryption algorithm uh of course e takes keys and messages and outputs ciphertexts and the decryption algorithm takes keys and ciphertexts and outputs messages and the only requirement is that these algorithms are consistent they satisfy what's called the correctness property so for every message in the message space And every key in the key space it had better be the case that if i encrypt the message with the key k and then
i decrypt using the same key k i had better get back the original message that i started with so this equation here is what's called the consistency equation and every cipher has to satisfy it in order to be a cipher Otherwise it's not possible to decrypt one thing i wanted to point out is that i put the word efficient here in quotes and the reason i do that is because efficient means different things to different people if you're more inclined towards theory efficient means runs in polynomial time so algorithms e and d have to run
in polynomial time in the size of their inputs If you're more practically inclined efficient means runs within a certain time period so for example algorithm e might be required to take under minutes to encrypt a gigabyte of data either way the word efficient uh kind of captures both notions and you can interpret it in your head whichever way you like i'm just going to keep referring to it as efficient and put quotes in it As i said if your theory inclined to think of it as polynomial time i'd otherwise think of it as concrete time
constraints another comment i want to make is that in fact algorithm e is often a randomized algorithm what that means is that as you're encrypting messages algorithm e is going to generate random bits for itself and it's going to use those random bits To actually encrypt the messages that are given to it on the other hand the decryption algorithm is always deterministic in other words given the key and the ciphertext output is always the same doesn't depend on any randomness that's used by the algorithm okay so now that we understand what a cipher is better
i want to kind of show you the first example of a secure cipher It's called the one one-time pad uh it was designed by vernon back at the beginning of the 20th century before i actually explained what the cipher is let's just state it in the in the terminology that we've just seen so the message space for the vernam cipher for the one-time pad is the same as the ciphertext space which is just the set of uh all n-bit binary strings so this just means All sequences of bits of 0-1 characters the key space is
basically the same as the message space which is again simply the set of all n-bit binary strings so a key in the one-time pad is simply a random bit string so it's a random sequence of bits that's as long as the message to be encrypted as long as the message okay now that we've Specified kind of what the cipher is defined over we can actually specify how the cipher works and it's actually really simple so essentially the ciphertext which is the result of encrypting a message with a particular key is simply the xor of the
two simply k x or m so see a quick example of this remember that xor this plus with a circle xor means Addition modulo 2. so if i take a particular message say 0 1 1 0 1 1 1 and i take a particular key say 1 0 1 when i compute the encryption of the message using this key all i do is i compute the xor of these two strings in other words i do addition modulo 2 bit by bit so i get 1 1 0 1 1 1 0 that's the cipher text and
then how do i decrypt i basically kind of do the same thing so to decrypt a cipher text using a particular key what i do is i xor the key and the cipher text again and so all we have to verify is that it satisfies the consistency requirements and i'm going to do this slowly once and then from now on i'm going to assume this is all simple to you so we're going to make We're going to have to make sure that if i decrypt a ciphertext that was encrypted using a particular key i had
better get back the message m so what happens here well let's see so if i look at the encryption of k and m this is just k xor m by definition what's the decryption of k x or m using k that's just k xor k x or m and so since i said that xor is Addition modulo 2 addition is associative so this is the same as k xor k xor m which is simply as you know k x or k is just zero and zero x or anything is simply m okay so this actually
shows that the one time pad is in fact a cipher but it doesn't say anything about the security of the cipher And we'll talk about security of the cipher in just a minute first of all let me quickly ask you a question just to make sure we're all in sync suppose you're given a message m is the encryption of that message using the one-time pad so all you're given is the message and the cipher text my question to you is given this pair m and c can you actually figure out the one time pad Key
that was used in the creation of c from m so i hope all of you realize that in fact given the message in the cipher text it's very easy to recover what the key is in particular the key is simply m x or c and we'll see that if it's not immediately obvious to you we'll see why that's the case a little later in the in the Lecture okay all right so the one-time pad is a really cool from a performance point of view all you're doing is you're xoring the key and the message so
it's a super super fast cipher for encrypting and decrypting very long messages unfortunately it's very difficult to use in practice the reason it's difficult to use is the keys are essentially as long As the message so if alice and bob want to communicate securely so you know alice wants to send a message m to bob before she begins even sending the first bit of the message she has to transmit a key to bob that's as long as that message well if she has a way to transmit a secure key to bob that's as long as
the message he might as well use that same mechanism To also uh transmit the message itself so the fact that the key is as long as the message is quite problematic and makes the one-time pad very difficult to use in practice although we'll see that the idea behind the one-time pad is actually quite useful and we'll see that a little bit later but for now i want to focus a little bit on security so the obvious questions are you know Why is the one-time pad secure why is it a good cipher then to answer that
question the first thing we have to answer is what is a secure cipher to begin with what is it what makes a cypher secure okay so to study security of ciphers we have to talk a little bit about information theory and in fact the first person to study security of ciphers rigorously is a very famous you know the father of Information theory uh claude shannon and he published a famous paper back in 1949 where he analyzes the security of the one-time pad so the idea behind shannon's definition of security is the following basically if all
you get to see is the cipher text then you should learn absolutely nothing about the plaintext in other words the ciphertext should Reveal no information about the plaintext and you see why it took someone who invented information theory to come up with this notion because you have to formalize formally explain what does information about the plaintext actually mean okay so that's what shannon did and so let me show you shannon's definition i'll write it out slowly first so what saanich said is you Know suppose we have a cipher e d that's defined over a triple
k m and c just as before so k m and c defined the key space the message based on the ciphertext space and we say that the ciphertext sorry we said that the cipher has perfect secrecy if the following condition holds it so happens that for every two messages m0 and m1 in the message space For every two messages the only requirement i'm going to put on these messages is they have the same length yeah so we're only we'll see why this requirement is necessary in just a minute and for every ciphertext in the ciphertext
space okay so for every pair of method messages and for every cipher text it had better be the case that If i ask what is the probability that uh encrypting m0 with k oops encrypting m0 with k gives c okay so how likely is it if we pick a random key how likely is it though when we encrypt m0 we get c that probability should be the same as when we encrypt m1 okay so the probability of encrypting m1 and getting c is exactly the same as the Probability of encrypting m0 and getting c and
this is as i said where the key the distribution is over the distribution on the key so the key is uniform in the key space so k is uniform in k and i'm often going to write a k arrow with a little r above it to denote the fact that k is a random variable that's uniformly sampled in the key space k Okay this is the main part of shannon's definition and let's think a little bit about what this definition actually says so what does it mean that these two probabilities are the same well what
it says is that if i an attacker if i'm an attacker and i intercept a particular cipher text c then in reality the cipher text the probability that the Cipher text is the encryption of m zero is exactly the same as the probability that it's the encryption of m because those probabilities are equal so if i all i have is the ciphertext c or that's all i've intercepted i have no idea whether the ciphertext came from m0 or the ciphertext came from m1 because again the probability of getting c Is equally likely whether m0 is
is being encrypted or m1 is being encrypted so here we have the definition stated again and i will just want to write these properties again more precisely so let's write this again so what this definition means is that if i'm given a particular ciphertext i can't tell where it came from i can't tell if it if the message that was encrypted Is either m0 or m1 and in fact this property is true for all messages for all these m0 for all m0 and m1 so not only can i not tell if c came from m0
or m1 i can't tell if it came from m2 or m3 or m4 mm5 because all of them are equally likely to produce the ciphertext uh c so what this means really is that if you're encrypting messages with a one-time pad then in fact the most powerful adversary I don't really care how smart you are the most powerful adversary can learn nothing about the plain text learns nothing about the plain text from the cipher text just to say in one more way basically what this proves is that there's no oops there's no ciphertext only attack
on this on a cipher that has perfect secrecy now ciphertext only attacks actually aren't the aren't the only attacks Possible and in fact other attacks may be possible but other attacks may be possible okay now that we understand what perfect secrecy uh means the question is can we build ciphers that actually have perfect secrecy and it turns out we don't have to look very far the one-time pad in fact has perfect secrecy So i want to prove to you this is shannon's first results and i want to prove this fact to you it's a very
simple proof so let's go ahead and look at it and just do it so we need to kind of interpret what does it mean what is this probability that e k of m0 is equal to c so it's actually not that hard to see that for every message in every ciphertext The probability that the encryption of m under a key k the probability that that's equal to c the probability of a random choice of key by definition all that is is basically the number of keys k in script k such that well if i encrypt
m with k i get c so i literally count the number of such keys and i divide by the total number of keys Right that's what it means that if i choose a random key that key maps m to c right so it's basically the number of keys that map m to c divided by the total number of keys this is this probability so now suppose that suppose that we had a cipher such that for all messages and all cipher text it so happens that if i look at this number the number of k k
In k such that e k m is equal to c in other words i'm looking at the number of keys that map m to c suppose this this number happens to be a constant so say it happens to be 2 3 or 10 or 15. it just happens to be an absolute constant if that's the case then by definition for all m0 and m1 and for all c this probability has to be same because the denominator is the same The numerator is the same it's just this constant and therefore this probability is always the same
for all m and c and so if this if this property is true then the cipher has to have the cipher has a perfect secrecy okay so let's see what can we say about this quantity for the one-time pad so the sec so the question to you is if i have a message Uh in a ciphertext how many one-time pad keys are there that map this message m to the ciphertext c in other words how many keys are there such that m x or k is equal to c so i hope you've all answered one
and let's see why that's the case for the one time pad if we have that the encryption of k of m under k is equal to c that basically well by definition that Implies that k x or m is equal to c but that also simply says that k has to be equal to m xor c yes i just xor both sides by m and i get that k must equal to m x or c okay so what that says is that for the one time pad in fact the number of keys in k shows
that e k m is equal to c that simply is 1 and this holds for all messages in Ciphertext and so again by what we said before this says that the one time pad has a perfect secrecy perfect secrecy and that completes the proof of this of this trivial very very simple very very simple lemma now the funny thing is that even though this lemma is so simple to prove in fact it proves a pretty powerful statement again this basically says for the one time pad there is no Ciphertext only attack so unlike the substitution
cipher or the vigener cipher or the rotor machines all those could be broken by ciphertext only attack we've just proved that for the one time pad that's simply impossible given the cipher text you simply learn nothing about the plain text however as we'll see this is not the end of the story i mean are we done i mean Basically we're done with the course now because we have a way to encrypt so that an attacker can't recover anything about our message so maybe we're done with the course but in fact as we'll see there are
other attacks and in fact the one-time pad is actually not such a secure cipher and in fact there are other attacks that are possible and we'll see that shortly okay so i emphasize again the Fact that it has perfect secrecy does not mean that the one-time pad is a secure cipher to use okay but as we said the problem with the one-time pad is that the secret key is really long so if you had a way of communicating the secret key over to the other side you might as well use that exact same method to
also communicate the message to the other side in which case you wouldn't need a cipher To begin with okay so the problem again is the one-time pad has really long keys and so the obvious question is are there other ciphers that have perfect secrecy and possibly have much much shorter keys well so the bad news is the shannon after proving that the one-time pad has perfect secrecy proved another theorem that says that if In fact a cipher has perfect secrecy the number of keys in a cipher must be at least the number of messages that
the cipher can handle okay so in particular what this means is if i have perfect secrecy then necessarily the number of keys or rather the length of my key must be greater than the length of the message so in fact since the one-time pad satisfies this with equality The one-time pad is an optimal cipher that has perfect secrecy okay so basically what this shows is that this is an interesting notion the one-time pad is an interesting cipher but in fact in reality it's actually quite hard to use it's hard to use in practice again because
of these long keys and so this notion of perfect secrecy even though it's quite interesting basically says that it doesn't really Tell us that practical ciphers are going to be secure and we're going to see but the as we said the idea behind the one-time pad is quite good and we're going to see in the next lecture how to make that into a practical system now that we know about the one-time pad let's talk about making the one-time pad more practical using something called the stream cipher but before we do that let's do a quick
review of Where we were so let me just remind you that a cipher is defined over triple of sets called a key space a message space and a ciphertext space and a cipher is a pair of efficient algorithms called e and d e stands for encryption and d stands for decryption and the only property that we need to satisfy is that decryption is the opposite of encryption In other words if i encrypt a message m using a particular key and i decrypt using the same key i get back the original message last time we looked
at a couple of weak ciphers like the substitution cipher and the vagina cipher we showed that all of them can be easily broken so you should never ever ever use those ciphers those were just for historical reference And then we looked at our first example of a good cipher namely the one-time pad let me just remind you how the one-time pad is defined uh basically the message space is the set of all bit n-bit strings the cipher text is a set of all bits and bit strings and similarly the key is the set of all
n-bit strings and the way we encrypt is by a simple xor to encrypt the message we just xor the message in the key that Gives us the cipher text and then to decrypt the cipher text we just do the this xor again and it's easy to show by properties of xor that in fact decryption is the opposite of encryption and then we talked about this lemma in fact we proved it that says that the one-time pad has perfect secrecy which means that if you're just an eavesdropper you just get to see A single cipher text
you're not going to be able to deduce any information about the encrypted plaintext unfortunately we also said that shannon proved this lemma we called it the bad news lemma that basically says that any cipher that has perfect secrecy must have really long keys in other words the key length must be at least as long as the length of the message uh which Means that the cipher is not particularly useful because if two parties have a way to agree on really long keys that are as long as the message they in some sense might as well
use that mechanism to already transmit the message itself so in this lecture we're going to take the idea of the one-time pad and try to make it uh into a practical encryption scheme so this is called What's called the stream cipher so the idea in a stream cipher is rather than using a totally random key we're actually going to use a pseudo-random key and to explain how that works i need to define what is a pseudo random generator a prg so a prg really all it is is just a function and i'll call it g
for generator that takes a seed so i'm going to use 0 1 to the s to denote All strings of length s so this will call the seed space so it takes an s bit seeds and maps it into a much larger string which we'll denote by 0 1 to the n and the property is that n must be much much larger than s so in other words we take a seed that might be maybe only 128 bits and we expand it into a much much larger output string That could could be gigabytes long that's
what the pseudorandom generator does and of course the goal is that first of all the generator is efficiently computable so the function g there should be some sort of an efficient algorithm that computes it uh so efficiently computable by a deterministic algorithm it's important to understand that the function g itself Has no more randomness in it it's totally deterministic the only thing that's random here is the random seed that's given as input of the function g and the other property of course is that the output should look random and the question is what does it
mean to look random and that's something that we'll define later on in the lecture okay so suppose we have such a generator How do we use that to build a stream cipher well the idea is that we're going to use the seed as our key so our short seed is going to be the secret key and then we're going to use the generator to basically expand the seed into a much much larger random looking sequence or pseudorandom sequence as it's known so this would be g of k and then we are going to xor it
just Like in the one time pad we're going to xor the pseudorandom sequence with the message and that's going to give us the ciphertext or if we want to write this in math we'll write c equals the encryption of the message m with a key k which is simply defined as m xor g of k and then when we want to decrypt basically we do exactly the same thing It's basically the ciphertext xor g of k just like in the one-time pad except that instead of x-raying with k we xor with the output of the
generator applied to k so the first question to ask is why is this secure so basically you know we only have one notion of security so far which we called perfect secrecy and so let's just quickly ask can a stream cipher have perfect secrecy Remember in the stream cipher the key is much much shorter than the message and so nevertheless can it have a perfect secrecy so i hope everybody said the answer is no the key is much shorter than the message and we said then in a perfectly secure cipher the key must be as
long as the message and therefore it's not possible that the Stream cipher actually has perfect secrecy so the question is then well why is it secure first of all we would need a different definition of security to argue that the stream service is secure and in particular the security property is going to depend on the specific generator that we used so in fact the definition of privacy that we'll need to argue security or stream Ciphers we'll see in the next lecture but for now let me show you one particular property that a generator must have
a minimal property needed for security this property is called unpredictability so let's just suppose for one second that in fact a stream cipher is predictable so what does that mean suppose the prg is predictable what that means is essentially that there is some I such that if i give you the first i bits of the output this notation bar one to i means look at the first i bits of the output of the function okay so i give you the first i bits of the stream there is some sort of an algorithm there's an efficient
algorithm that will compute the rest of the stream okay so given the first i bits you can compute the remainder of the bits i claim that if this is the case Then the stream cipher would not be secure so let's see why well suppose an attacker actually intercept a particular cipher text let's call it c if this is the case then in fact we have a problem because suppose that just by some prior knowledge the attacker actually knows that the initial part of the message happens to be some known value for Example you know that
in um in the mail protocol smtp the standard send mail protocol used in the internet you know that every message starts with a word from colon well that would be a prefix that the adversary knows that the site that the message must begin with from colon what it could do is it could xor the cipher text With the words from colon with the little prefix of the message that it actually knows and what that would give it is a prefix of the pseudorandom sequence and i would learn as a result of this it would learn
a prefix of the pseudo-random sequence but then we know that once it has a prefix of the pseudo-random sequence it can predict the remainder of the of the Pseudo-random sequence and that would allow it to then predict the rest of the message m okay so for example if the pseudorandom generator was predictable given you know five bits of the pad then every email encrypted using a street cipher would be decryptable because again the attacker knows it knows the prefix of the message from which he Deduces a prefix of the pad which then allows him to
compute the rest of the pad which then allows him to recover the entire plain text okay so this is an example that shows that in fact if a jeep prg is predictable then already there are security problems because a small prefix would reveal the entire message as it turns out even if i could just predict one bit of the outputs even if Given you know the first i bits i can predict the next bit the i plus first bit already this is a problem because that would say that given again the first couple of letters
in a message i can predict i can decrypt essentially and recover the next bit of the message or the next letter of the message and so on so This predictability property shows that you know our if we use a prg that's that's going to be used in a stream cipher it had better be unpredictable so what does it mean that a prg is unpredictable so let's define more precisely what it means for a prg to be unpredictable well first we'll define more precisely what it means for a prg to be predictable so we say that
g is predictable if there Exists an efficient algorithm let's call it a and there is some position there's a position i between 1 and n minus 1 such that if we look at the probability over a random key probability that if i generate a random key remember this notation means choose a random key from the set k so this arrow with r just means choose a random key from the set k Basically if i give this algorithm the prefix of the output so i give it the first i bits of the output the probability that
it's able to predict the next bit of the output this probability is greater than half plus epsilon for some non-negligible for some non-negligible epsilon and non-negligible for example would be Epsilon which is greater than 1 over 2 to the 30. one over a billion for example we would consider non-negligible so these terms negligible and non-negligible will come back at the end of the lecture and define them more precisely but for now let's just stick with intuitive notion of what non-negotiable mean and so this is what it means for an algorithm for a generator to be
Predictable basically there's some algorithm that is able to predict the i plus first bit even the initial prefix okay and then we say that an algorithm that a prg is unpredictable if in fact well if it doesn't satisfy the property that we just defined in other words it is not predictable but what does it mean more precisely for it not to be predictable it means that in fact for all positions For all i there's no efficient adversary no efficient algorithm a that can predict the idle's first bits with nonlinear probability epsilon okay and this has
to be true for all i so no matter which prefix i give you you're not going to be able to predict the next bit that follows the prefix okay so let's look at some examples here's a silly silly example suppose i Give you a generator and i ask you is it predictable well this generator happens to have the property that if i xor all the bits of the output i always happen to get one okay so i x or all the bits bit number one x or bit number two extra bit number three if i
x or all those bits i happen to get one the question is is that a predictable generator and again i hope Everybody answered yes that essentially given the first n minus 1 bits of the output i can predict the nth bit because it would just be the bits that's needed to make the xor of all the bits b1 in other words if i give you all but one of the bits of the generator uh you can actually predict the last bit of the generator now that we've seen that prg's have to Be unpredictable i just
want to mention a couple of weak periods used this should never ever be used for crypto this is a very common mistake and i just want to make sure none of you guys make this mistake so a very common prg that should actually never be used for crypto is called a linear congruential generator so let me explain what a linear congruential generator is basically it has parameters it has three Parameters i'll call them a b and p a and b are just integers and p is a prime and the generator is defined as follows essentially
i'll say r0 is the seed of the generator and then the way you generate randomness is basically you set you iteratively run through the following steps you compute a times r of i minus 1 plus b Modulo p then you output a few bits of the current states output few bits of ri then of course you increment i and you iterate this again and again and again okay so you can see how this generator proceeds it starts with a with a particular seed at every step there is this linear transformation that's being applied to the
seed and then you output a few bits of the current state and then you Do that again and again and again and again unfortunately even though this generator has good statistical properties in the sense that for example the number of zeros it outputs is likely going to be similar to the number of ones and so on it has you can actually argue all sorts of nice statistical properties about this nevertheless it is a very Uh easy generator to predict uh and in fact should never ever be used in fact just given a few outputs a
few output samples it's easy to predict the remainder of the sequence and as a result this generator should never ever be used another example is a random number generator that's very closely related to the linear confidential generator this is a random number generator implemented in g lab c Very common library that you can see i just wrote down the definition here you can see that it basically outputs a few bits at every iteration and it just does this simple linear transformation at every step again this is a very easy generator to predict and should never
ever be used uh for crypto and so kind of the lesson i want to make just emphasize here is never ever use Never use the function the built-in glib c function random for crypto because it doesn't produce uh cryptographic randomness in the sense that it's easy to predict and in fact systems like kerberos version 4 have used random and have been bitten by that so please don't make that mistake yourself we'll talk about how to do secure random number generation actually in the next Lecture before we conclude this lecture i just want to give a
little bit more detail about these concepts of negligible and non-negotiable negligible values so different communities in crypto actually define these concepts differently for practitioners basically these the terms negligible and non-negligible are just particular scalars that are used in The definition so for example a practitioner would say that if a value is more than one over one over a billion one over two to the 30 we say that the value is non-negotiable the reason is the reason that's so is because if you happen to use a key for example for uh for encrypting a gigabyte of
data a gigabyte of data is about 2 to the 30 or maybe even 2 to 32 bytes Then an event that happens with probability 1 over 2 to the 30 will likely happen after a bit about a gigabyte of data so since the gigabyte of data is within reason for a particular key this event is likely to happen therefore one over two to 30 is non-negligible on the other hand we'll say that one over two to the 80 which is much much much smaller is an event an event that happens with this Probability is an
event that's actually not going to happen over the life of the key and therefore i will say that that's a negligible event as it turns out these practical definitions of negligible and non-negligible are quite problematic and we'll see examples of why they're problematic later on so in fact in the more rigorous theory of cryptography the definition of Negligible and non-negligible are somewhat different and in fact when we talk about the probability of events we don't talk about these probabilities as scalars but rather we talk about them as functions of a security parameter so let me
explain what that means so these functions are essentially a functions that map that outputs Positive real values so or non-negative real values that are supposedly probabilities but they're functions that act on non-negative integers okay so what does it mean for a function to be non-negligible what it means is that the function is bigger than some polynomial infinitely often in other words for many for infinitely many values the function is bigger than some one Over polynomial okay so i wrote the exact definition here and we'll see an example in just a minute okay so if something
is bigger is often bigger than one over a polynomial we'll say that it's not negligible however if something is smaller than all polynomials then we'll say that it's negligible so what it says here is basically For any degree polynomial for all d uh there exists some lower bound lambda d such that for all lambda bigger than this lambda d the function is smaller than one over the polynomial okay so all this says is that the function is negligible if it's less than all polynomial fractions in other words it's less than one over lambda d for
sufficiently large lambda so let's look at some examples and we'll See applications of these negligible and non-negligible concepts later on but i just want to want to make it clear that this is how you would rigorously find these concepts basically either smaller than one over poly or bigger than one over poly one would be negligible the other would be non-negligible let's look at some examples so so for example a function that drops Exponentially and lambda clearly would be negligible because for any constant d there is a sufficiently large lambda such that one over two to
the lambda is less than one over lambda to the d okay so this is clearly less than all polynomials however the function say one over lambda to a thousand right this is a function that grows very very very slowly it barely ever moves This function nevertheless this function is non-negligible because if i set uh d to be 10 000 then clearly this function is bigger than 1 over lambda to the 10 000. and so this function is bigger than some polynomial fraction now let's look at a confusing example just to be tricky uh what do
you think if i suppose i have a function that's for odd lambda happens To be exponentially small for even land that happens to be polynomially small is this a negligible or non-negligible function well by our definition this would be a net non-negligible function and the intuition is if a function happens to be only polynomially small very often that actually means that this event you know an event that happens with this probability Is already too large to be used in a real crypto system okay so the main points to remember here are that these terms basically
correspond to uh less than polynomial or more than polynomial but throughout the course we're mostly used negligible to mean less than uh than an exponential and non-negligible to mean um less than one over a polynomial so now we saw the Core idea for converting the one-time pad into a practical cipher namely a stream cipher and then in the next lecture we're going to see how to actually argue that the stream cipher is actually secure that's going to require a new definition of security since perfect secrecy is not good enough here and we will see that
in the next lecture in this segment we're going to look at attacks on the one time pad And some things you need to be careful with when you use the stream cipher but before we do that let's do a quick review of where we were so recall that the one-time pad encrypts messages by xoring the message and a secret key where the secret key is as long as the message similarly decryption is done by similarly xoring the ciphertext and the same secret key when the key is uniform and random We prove that the one-time pad
has this information theoretic security that shannon's called perfect secrecy a problem was of course the keys are as long as the message so the one-time pad is very difficult to use we then talked about a way to make the one-time pad practical by using a pseudo-random generator that expands a short seed into a much larger message and the way a stream cipher works Essentially using a pseudo-random generator was in the same way as the one-time pad basically but rather than using a truly random pad we used the pseudorandom pad that's expanded to be as long
as the message from the short key that's given as input to the generator we said now that security no longer relies on perfect secrecy Because stream ciphers cannot be perfectly secure instead security relies on properties of the pseudo-random generator and we said that the pseudo-random generator essentially needs to be unpredictable but in fact it turns out that definition is a little bit hard to work with and we're going to see a better definition of security for prgs in about two segments but in this Segment we're going to talk about attacks on the one-time pad and
the first attack i want to talk about is what's called the two-time pad attack okay so remember that the one-time pad is called one-time pad because the pad can only be used to encrypt a single message i want to show you that if the same pad is used to encrypt more than one message then security goes out the window and Basically an eavesdropper can completely decrypt encrypted messages so let's look at an example so here we have two messages m1 and m2 that are encrypted using the same pad so the resulting ciphertext c1 and c2
again basically are encryptions of these messages m1 and m2 but both are encrypted using the same pad now suppose an eavesdropper intercepts C1 and c2 and he obtains he basically has both c1 and c2 the natural thing for the eavesdropper to do is actually compute the xor of c1 and c2 and what does he get when he computes this xor so i hope everybody sees that basically once you xor c1 and c2 the pads cancel out and essentially what comes out of this is the xor of the plain text messages and it turns out that
english basically has enough Redundancy such that if i give you the xor of two plain text messages you can actually recover those two messages completely more importantly for us since these messages are encoded using ascii in fact ascii encoding has enough redundancy such that given the xor of two ascii encoded messages you can recover the original messages back okay so Essentially given these xors you can recover both messages so the thing to remember here is if you ever use the same pad to encrypt multiple messages an attacker who intercepts the resulting ciphertexts can basically recover
the original plaintext without too much work so the stream cipher key or the one-time pad key should never ever ever ever be used more Than once so let's look at some examples where this comes up in practice it's a very common mistake to use a stream cipher key or a one-time pad key more than once and let me show you some example where this comes up so you know to avoid these mistakes when you build your own systems the first example is a historic example at the beginning of the 1940s Where the russians actually used
a one-time pad to encrypt various messages unfortunately the pads that they were using were generated by a human by throwing dice and so you know the human would throw these dice and write down the results of these throws and the collected throws would then form the pads that were used for encryption now because it was kind of laborious for them to generate these pads it seemed wasteful to use the pad to Encrypt just one message so they ended up using these pads to encrypt multiple messages and us intelligence was actually able to intercept these two
time pads these ciphertexts were encrypted using the same pad applied to different messages and it turns out over a period of several years they were able to decrypt something like three thousand plaintexts Just by intercepting these ciphertexts the project is called project winona it's actually a fascinating story of cryptanalysis just because the two time pad is insecure more importantly i want to talk about more recent examples that come up in networking protocols so let me give you an example from windows nt in a protocol called the point-to-point transfer protocol this is a protocol For a
client wishing to communicate securely with a server the client and the server both share a secret key here and they both send messages to one another so here we'll denote the messages from the client by m1 so the client sends a message the server responds the client sends a message the server responds the client sends a message the server responds and so on And so forth now the way pptp works is basically the entire interaction from the clients to the server is considered as one stream in other words what happens is the messages m1 and
m2 and m3 are kind of viewed as one long stream here these two parallel lines means concatenation so essentially we're concatenating All the messages from the client to the server into one long stream and all that stream is encrypted using the stream cipher with key k so that's perfectly fine i mean there's nothing wrong with that these messages are encrypted or treated as one long stream and they're all encrypted using the same key the problem is the same thing is happening also on the server side in other words All the messages from the server are
also treated as one long stream so here they're all concatenated together and encrypted using unfortunately the same pseudorandom seed in other words using the same stream cipher key so basically what's happening here is you see an effect that the two-time pad is taking place Where the set of messages from the client is encrypted using the same one-time pad as a set of messages from the server the lesson here is that you should never use the same key to encrypt traffic in both directions in fact what you need to do is to have one key for
interaction between the client and the server and one key for interaction between the server and the client the way i like to Write this is really that the shared key k really is a pair of keys one key is used to encrypt messages from server to client and one key is used to encrypt messages from client to server so these are two separate keys that are used and both sides of course know this key so both sides have this pair of keys okay and they can Both encrypt so one is used to encrypt messages in
one direction and the other is used to encrypt messages in the other direction so another important example of the two-time pad comes up in wi-fi communication in particular in the 80211b protocol so all of you i'm sure know that 802.11 contains an encryption layer and the original encryption layer was called wep And whip fortunately for us is actually a very badly designed protocol so that i can always use it as an example of how not to do things there are many many mistakes inside of wep and here i want to use it as an example
of how the two time pad came about so let me explain how web works so in web there's a client and an access point here's the client here's the access point they both share A secret key k and then when they want to transmit a message to one another say these are frames that they transmit to one another let's say the client wants to send a frame containing the plain text m at the access point what he would do is he first of all he appends some sort of a checksum to this plaintext that checksum
is not important at this point What is important is that then this concatenation gets encrypted using a stream cipher where the stream cipher key is this concatenation of a value iv and a long term key k so this iv is a 24 bit string okay this iv is a 24-bit string and you can imagine that it starts from zero and perhaps it just it's a counter that counts increments by one for every packet the reason they did this is the Designers of what realized that in a stream cipher the key is only supposed to be
used to encrypt one message and so they said well let's go ahead and change the key after every frame and the way they changed the key essentially was by prepending this iv to it and you notice this iv changes on every packet so it increments by one on every packet and the iv then is sent in the clear Along with the ciphertext so the recipient knows the key k he knows what the iv is he can rederive the prg of iv concatenated k and then decrypt the ciphertext to recover the original message m now the
problem with this of course is the iv is only 24 bits long which means that there are only two to the 24 possible ivs which means that after 16 million frames are Transmitted essentially the iv has to cycle and once it cycles after 60 million frames essentially we get a two-time pad the same iv would be used to encrypt two different messages the tk never changes it's a long term key and as a result the same key namely iv concatenated k would be used to encrypt two different frames and the attacker can then figure out
Plain text of both frames so that's one problem and the worst problem is in fact that in on many 802.11 cards if you power cycle the card the iv will reset back to zero and as a result every time you power cycle the card essentially you'll be encrypting the next payload using zero concatenated k so after every power cycle you'll be using the zero Concatenated k key to encrypt many many many times the same packet so you see how in web the same pad could be used to encrypt many different messages as soon as the
iv is repeated there's nothing to prevent the iv from repeating after a power cycle or from repeating after every 16 million frames which isn't that many frames in a busy network so while we're talking about wep I want to mention one more mistake that was done in wep this is a pretty significant mistake and let's see how we might design it better so you notice that the designers of web basically wanted to use a different key for every packet okay so every frame is encrypted using a different e this concatenation of iv and k unfortunately
they didn't randomize the Keys and the keys are actually if you look at the key for frame number one well you know it'll be this concatenation of one and k this field is 24 bits and then the key for frame number two is the concatenation of two and k the key for frame number three is a concatenation of three and k so the keys are very closely related to one another and i should probably mention also that these keys Themselves can be 104 bits so that the resulting prg key is actually 104 plus 24 bits
which is 128 bits unfortunately these keys are very much related to one another these are not random keys you notice they all have the same suffix of 104 bits and it turns out the pseudo-random generator used in wep is not designed to be secure when you use related keys that are so closely Related in other words the majority of these keys are basically the same and in fact for the prg that's used in wep that prg is called uh rc4 we'll talk about that more in the next segment it turns out there's an attack that
was discovered by fleur mantin and chamir back in 2001 that shows that after about 10 to the 6 of after about a million frames uh you can recover the secret key you Can recover uh k so this is kind of a disastrous attack that says essentially all you have to do is listen to about a million frames these frames basically as we said they're all generated from a very common seed namely 104 bits of these seeds are all the same the fact that you've used such closely related keys is enough to actually recover the original
key And it turns out even after the the 2001 attack better attacks have come out to show that these related keys are very much disastrous and in fact these days uh something like 40 000 frames are sufficient and so that within a matter of minutes you can actually recover the secret key in any web network so wep provides no security at all for two reasons First of all it can result in a two-time pad but more significantly because these keys are so closely related it's actually possible to recover the key by watching just a few
ciphertexts and by the way we'll see that when we do a security analysis of these types of constructions in a few segments we'll start talking about how to analyze these types of constructions we'll see that when we have related keys Like this in fact our security analysis will fail we won't be able to get the proof to go through so one could ask what should the designers of what should have done instead well one approach is to basically treat the frames you know m1 m2 m3 these each one is a separate frame transmitted from the
client to the server they could have treated them as One long stream and then xor them essentially using the pseudorandom generator as one long stream so the first segment of the pad would have been used to encrypt m1 the second segment of the pad would have been used to encrypt m2 the third segment of the pad would have been used to encrypt m3 and so on and so forth so they basically would never have had to change the key Because the entire interaction is viewed as one long stream but they chose to have a different
key for every frame so if you want to do that a better way to do that is rather than slightly modifying this iv that just slightly modifies the prefix of the key of the prg key a better way to do that is to use a prg again so essentially what you could do is you would take your long term key and then feed that directly Through a prg so now we get a long stream of bits that look essentially random and then the initial segment could be used the first segment could be used as the
key or frame number one and then the second segment would be used as the key for you know key and frame number two and so on and so forth the third segment would be used to encrypt frame number three and so on and so Forth okay so the nice thing about this is now essentially by doing this each frame has a pseudo-random key these keys now have no relation to one another they look like random keys and as a result if the prg is secure for random seeds it would also be secure on this inputs
because these keys essentially look as though they're independent of one another we'll see how to do this analysis formally once we talk about These types of constructions since this two-time pad attack comes up so often in practice it's such a common mistake i want to give one more example where it comes up so you know how to avoid it and the last example i want to give is in the context of disk encryption so imagine we have a certain file and maybe the file begins with you know the words to bob and then the contents
of the file Follows when this is stored on disk of course the file is going to get so here we have our disk here the file is going to get broken into blocks and each block will be you know when we actually store this on on disk you know things will be encrypted and you know so maybe two bob will go into the first block and then the rest of the content will go Into the remaining blocks but of course this is all encrypted so i'll kind of use these these lines here to denote the
fact that this is encrypted and an attacker looking at the disk has no idea what the contents of the message is but now suppose that at a later time user goes ahead and modifies basically you know fires up the editor and modifies the file so now instead of Saying to bob it says to eve and nothing else changes in the file that's the only change that was made when the user then saves this modified file to disk basically it's going to re-encrypt it again and so the same thing is going to happen the file is
going to get broken into blocks you know now the file is going to say to eve And everything of course is going to be encrypted so again i'll put these lines here now an attacker looking at the disk taking a snapshot of the disk before the edit and then looking again at the disc after the edit what he will see is that the only thing that changed is this little segment here that's now different everything else looks exactly the same So the attacker even though he doesn't know what actually happened to the file what's in
the file or what changed he knows exactly the location where the edits took place and so the fact that the one-time pad or a stream cipher encrypts one bit at a time means that if one change takes place then it's very easy to tell where that change occurred and that leaks Information that the attacker shouldn't actually learn ideally you'd like to say that even if the file changed just a little bit the entire contents of the files should change or maybe at least the entire contents of the blocks would change here you can see that
the attacker even knows within the block where the change was actually made okay so in fact because of this it's usually a bad idea to use stream ciphers For this encryption and essentially this is another example of a two-time pad attack because the same pad is used to encrypt two different messages they happen to be very similar but nevertheless these are two different messages and the attacker can learn what the change was and in fact he might be able to even learn what the actual changed words were As a result of this okay so the
lesson here is generally we need to do something different for this encryption we'll talk about what to do for this encryption in a later segment but essentially the one-time pad is generally not a good idea for encrypting blocks on disk so just again to summarize the two-time pad attack we saw that you suppose i hope i've convinced you that you never Ever ever supposed to use a stream cipher key more than once even though there are natural settings where that might happen you have to take care and make sure that you're not using the same
key more than once so for network traffic typically what you're supposed to do is every session would have its own key within the session the message from the client to the server looked as one complete stream It would be encrypted using one key the messages from the server to the client would be treated as one stream and encrypted using a different key and then for this encryption typically you would not use a stream cipher because as changes are made to the file you would be leaking information about the contents of the file okay so that
concludes our brief Discussion of the two time pad next attack i want to mention is a fact that the one-time pad and stream ciphers in general provide no integrity at all all they do is they try to provide confidentiality when the key is only used once they provide no integrity at all but even worse than that it's actually very easy to modify cipher text and have known effects on the corresponding plain text So let me explain what i mean by that this property by the way is called malleability and we'll see what i mean by
that in just a second so imagine we have some message m that gets encrypted so here it gets encrypted using a stream cipher and the cipher text of course is then going to be mx or k now an attacker intercepts the cipher text well that doesn't tell him what's uh what the plain text is but what he Can do is now beyond eavesdropping he can actually become an active attacker and modify the cipher text so when i say modify the cipher text let's suppose that he exits the ciphertext was a certain value p what's called
as the perturbation p well the resulting ciphertext then becomes m x or k x or p so now let me ask you when we decrypt the ciphertext What is it going to decrypt you well i hope everybody sees a manipulating xors basically the decryption becomes m x or p so you notice that by xoring with this pad p the attacker was able to have a very specific effect on the resulting plain text okay so i'll summarize this basically you can modify the ciphertext these modifications are undetected But even worse they're undetected they have a very
specific impact on the resulting plaintext namely whatever you xor the ciphertext with is going to have that exact effect on the plaintext so to see where this can be dangerous let's look at a particular example suppose the user sends an email that starts from the words from bob and the attacker basically gets to intercept the corresponding ciphertext Well he doesn't know what the ciphertext is but let's just for sake of it let's pretend that he actually knows that this message is actually from bob what he wants to do is he wants to modify the ciphertext
so that the plain text would now look like it came from somebody else say he wants to make it look like this message actually came from alice all he has is the cipher text well what he can do is he can xor it with a Certain set of three characters we'll see what those three characters are in just a second such that the resulting cipher text is actually an encryption of the message from eve and so that now when the user decrypts that all of a sudden he'll see hey this message is from eve it's
not he'll think this message is from eve Not from bob and that might cause you know the wrong thing to take place so here the attacker even though he himself could not have created a cipher text that says from eve by modifying an existing cipher text all of a sudden he was able to make the cipher text that he could not have done without intercepting at least one cipher text so again by intercepting one cipher Text he was able to change it so now it looks like it's from eve rather than from bob so just
to be specific let's look at what these three characters need to be so let's look at the word bob and i'm going to write it in ascii so bob and ascii corresponds to 42 hex six f hex and 62 hex so little b is encoded at 62 little o is encoded to the 6 f the word eve Is encoded as 45 hex 76 hex and 65 hex now when i xor these two words i'm literally going to xor them as bit strings so bob xor eve it's not difficult to see that what i get is
the three characters 0 7 19 and 0 7. so really what these three characters here are going to be are simply 0 7 19 and 0 7 and by xoring these three Characters at the right positions into the cipher text the attacker was able to change the cipher text to look like it came from eve rather than from bob so this is an example where having a predictable impact on the cipher text can actually cause quite a bit of problems and this is this property called malleability and we say that the muntine Pad is malleable
because it's very easy to compute on ciphertexts and make prescribed changes to the corresponding plaintext so to prevent all this i'm going to do that actually in two or three lectures and we're going to basically show how to add integrity to encryption mechanisms in general but right now i want you to remember that the one-time pad by itself has no Integrity and is completely insecure against attackers that actually modify the cipher texts in this segment they want to give a few examples of stream ciphers that are used in practice i'm going to start with two
old examples that are actually are not supposed to be used in the new systems but nevertheless they're still fairly widely used and so i just want to mention the name so that you're familiar With these concepts the first stream cipher i want to talk about is called rc4 designed back in 1987 and i'm only going to give you the high level description of it and then we'll talk about some weaknesses of rc4 and leave it at that so rc4 takes a variable size seed here i just gave us an example where it would take 128
bits as the seed size Which would then be used as the key for the stream cipher the first thing it does is it expands the 128 bit secret key into 2048 bits which are going to be used as the internal state for the generator and then once it's done this expansion it basically executes a very simple loop where every iteration of this loop outputs one byte of output so essentially you can run the generator For as long as you want and generate one byte at a time now rc4 is actually as i said fairly popular
it's used in the https protocol quite commonly actually these days for example google uses rc4 in its https and it's also used in wep as we discussed in the last segment but of course in web it's used incorrectly and it's completely insecure the way it's used inside of web so over the years some weaknesses have been Found in rc4 and as a result it's recommended a new project to actually not use rc4 but rather use a more modern pseudo-random generator as we'll discuss towards the end of the segment so let me just mention two of
the weaknesses so the first one is it's kind of bizarre basically if you look at the second byte of the up of rc4 it turns out the second byte is slightly Biased if rc4 was completely random the probability that the second byte happens to be equal to 0 would be exactly 1 over 256. there are 256 possible bytes the probability that at zero should be 1 over 256. it so happens that for rc4 the probability is actually 2 over 256 which means that if you use the rc4 outputs to encrypt a message the second byte
is likely to not be encrypted at all in other words it will be xored with 0 with twice the probability that it's supposed to so 2 over 256 instead of 1 over 256. and by the way i should say that there's nothing special about the second byte it turns out the first and the third bytes are also biased and in fact it's now recommended that if you're going to use rc4 what you should do is ignore basically the first 256 bytes of the output And just start using the output of the generator starting from by
257. the first couple of bytes turned out to be biased so you just ignore them the second attack that was discovered is that in fact if you look at a very long output of rc4 it so happens that you're more likely to get the sequence 0 zero in other words you're more likely to get 16 bits two bytes of zero zero than you should again if rc4 was Completely random the probability of seeing zero zero would be exactly one over 256 squared it turns out rc4 is a little biased and the bias is 1 over
256 cubed it turns out this bias actually starts after several gigabytes of data are produced by rc4 but nevertheless this is something that can be used to predict the generator and definitely it can be used to Distinguish the output of the generator from truly random sequence basically the fact that zero zero appears more often than it should gives a distinguisher and then in the last segment we talked about related key attacks that were used to attack wep that basically say that if one uses keys that are closely related to one another then it's actually possible
to recover the root key So these are the weaknesses that are known about rc4 and as a result it's recommended that new systems actually not use rc4 and instead use a modern pseudorandom generator okay second example i want to give you is a badly broken stream cipher that's used for encrypting dvd movies when you buy a dvd in the store the actual movie is encrypted using a stream cipher Called the content scrambling system css css turns out to be a badly broken stream cipher and we can very easily break it and i want to show
you how the attack algorithm works we're doing it so that you can see an example of an attack algorithm but in fact there are many systems out there that basically use this attack to decrypt encrypted dvds so the css stream cipher is based on Something that hardware designers like it's designed to be a hardware stream cipher that's supposed to be easy to implement in hardware and is based on a mechanism called a linear feedback shift register so a linear feedback shift register is basically a register that consists of cells where each cell contains one bit
and then basically what happens is there are these taps into certain cells not All the cells certain positions are called taps and then these taps feed into an xor and then at every clock cycle the shift register shifts to the left the last bit falls off and then the first bit becomes the result of this xor so you can see that this is a very simple mechanism to implement in hardware it takes very few transistors just a shift right the last bit falls Off and the first bits just becomes the xor of the previous bits
so the seed for this lfsr basically is the initial state of the lfsr initial state of lfsr and it's the basis of a number of dream ciphers so here are some examples so as i said dvd encryption uses two lfsrs i'll show you how that works in just a second gsm encryption these are algorithms Called a51 and a5 ii that uses three lfsrs bluetooth encryption is an algorithm called e0 these are all stream ciphers and that uses four lfsrs turns out all of these are badly broken and actually really should not be trusted for encrypting
traffic but they're all implemented in hardware so it's a little difficult now to change what the hardware does but the simplest Of these css actually has acute attack on it so let me show you how the attack works so let's describe how css actually works so the key for css is five bytes namely 40 bits five times eight is 40 bits the reason they had to limit themselves to only 40 bits is that dvd encryption was designed at a time where u.s export regulations only allowed for export of crypto Algorithms where the key was only
40 bits so the designers of css were already limited to very very short keys just 40 bit keys so their design works as follows basically css uses two lfsrs one is a 17-bit lfsr in other words the register contains 17 bits and the other one is a 25-bit officer it's a little bit longer 25-bit lfsr And the way these lfsrs are seated is as follows so the key for the encryption basically looks as follows you start off with a one and you concatenate to it the first two bytes of the key first two bytes of
the key and that's the initial state of the lfsr and then the second lfsr basically is initialized the same way one concatenated The last three bytes of the key three bytes of key and that's loaded into the initial state of the lfsr you can see that the first two bytes are 16 bits plus leading one that's 17 bits overall whereas the second lfsr is 24 bits plus one which is 25 bits and you notice we used all five bits of the key so then these lfsrs are basically run For eight cycles so they generate eight
bits of outputs and then they go through this adder that does basically addition modulo 256 yeah so this is an addition box modulo 256 there's one more technical thing that happens in fact what's actually also added is the carry from the previous block but that is not so important that's a detail that's not so relevant Okay so every block you notice we're doing addition modulo 256 so we're ignoring the carry but the carry is basically added either at zero or one to the addition of the next block okay and then basically this output one byte
per round okay and this byte is then of course used it's xored with the appropriate byte of the movie that's being encrypted Okay so it's a very simple stream cipher it takes very little hardware to implement it'll run fast even on very cheap hardware and it will encrypt movies so it turns out this is easy to break in time roughly 2 to the 17 and roughly 2 to the 17 times let me show you how so suppose you intercept a movie so here we have an encrypted movie that you want to decrypt so let's say
That this is all encrypted so you have no idea what's inside of here however it so happens that just because dvd encryption is using mpeg files it so happens that you know a prefix of the plain text let's just say maybe this is 20 bytes well we know that if you xor these two things together so in other words you do the xor here what you'll get is the initial segment of the prg so You'll get the first 20 bytes of the output of css the output of this prg okay so now here's what we're
going to do so we have the first 20 bytes of the output now we do the following we try all 2 to the 17 possible values of the first lfsr okay so 2 to the 17 possible values so for each value so for each of these two to the 17 initial values of the lfsr We're going to run the lfsr for 20 bytes okay so we'll generate 20 bytes of outputs from this first lfsr assuming for each one of the two to the 17 possible settings now remember we have the full output of the css
system so what we can do is we can take this output that we have and subtract it from the 20 bytes that we got from the first lfsr and if in fact our guess for the initial State of the first lfsr is correct what we should get is the first 20 byte outputs of the second lfsr right because that's by definition what the output of the css system is now it turns out that looking at a 20 byte sequence it's very easy to tell whether this 20 byte sequence came from a 25-bit lfsr or not
if it didn't then we know that our guess for the 17-bit Fsr was incorrect and then we move on to the next guest for the 17-bit office r and the next guest and so on and so forth until eventually we hit the right initial state for the 17-bit lfsr and then we'll actually get we'll see that the 20 bytes that we get as the candidate output for the 25 bit fsr is in fact a possible output for a 25 bit of fsr And then not only will we have learned the incorrect initial state for the
17 bit fsr we will have also learned the correct initial state of the 25-bit fsr and then we can predict the remaining outputs of css and of course using that we can then decrypt the rest of the movie we can actually recover the remaining plain text okay this is things that we talked about before So i said this a little quick but hopefully it was clear we are also going to be doing a homework exercise on this type of dream ciphers and you'll kind of get the point of how these attack algorithms work and i
should mention that there are many open source systems now that actually use this method to decrypt css encrypted data okay so now that we've seen two weak examples Let's move on to better examples and in particular the better pseudo-random generators are come from what's called the e-stream project this is a project that concluded in 2008 and they qualify basically five different stream ciphers but here i want to present just one so first of all the parameters for these stream ciphers are a little different from what we're used to so these stream ciphers as normal they
Have a seed but in addition they also have what's called a nonce and we'll see what the nonce is used for in just a minute so they take two inputs a seed and a nonce we'll see what the nonce is used for in just a second and then of course they produce a very large output so n here is much much much bigger than s now when i say nonce what i mean is a value that's never Going to repeat as long as the key is fixed and i'll explain that in more detail in just
a second but for now just think of it as a unique value that never repeats as long as the key is the same and so of course once you have this prg you would encrypt you get a stream cipher just as before except now as you see the prg takes as input both the key and the nonce and the property of the Nonce is that the pair k comma r so the key comma the nonce is never never repeats it's never used more than once so the bottom line is that you can reuse the key
reuse the key because the nonce makes the pair unique because k and r are only used once i'll say they're unique okay so this nonce is kind of a cute trick that saves us the trouble Of moving to a new key every time okay so the particular example from the e-stream that i want to show you is uh called salsa 20. it's a stream cipher that's designed for both software implementations and hardware implementations it's kind of interesting you realize that some stream ciphers are designed for software like rc4 everything it does is designed to make
software implementation run fast Whereas some other stream ciphers are designed for hardware like css using an lfsr that's particularly designed to make hardware implementations very cheap salsa the nice thing about that is that it's designed so that it's both easy to implement in hardware and its software implementation is also very fast so let me explain how salsa works well salsa takes either 128 or 256 bit keys I'll only explain the 128-bit version of salsa so this is the seed and then it also takes a nonce just as before which happens to be 64-bits and then
it'll generate a large output now how does it actually work well the function itself is defined as follows basically given the key and the nonce it will generate a very long well a long Pseudo-random sequence as long as necessary and it will do it by using this function that i'll denote by h this function h takes three inputs basically the key well the seed k the nonce r and then a counter then increments from step to step so it goes from zero one two three four as long as we need it to be okay so
basically by evaluating this h on this kr but using this incrementing Counter we can get a sequence that's as long as we want so all i have to do is describe how this function h works and let me do that here for you the way it works is as follows well we start off by expanding the state into something quite large which is 64 bytes long and we do it as follows basically we stick a constant at the beginning so there's tau zero These are four bytes it's a four byte constant so the spec first
also basically gives you the value for tau zero then we put k in which is 16 bytes then we put another constant again this is four bytes and as i said the spec basically specifies what this fixed constant is then we put the nonce which is 8 bytes then we put the index this is the counter 0 1 2 3 4 Which is another 8 bytes then we put another constant tau 2 which is another 4 bytes then we put the key again this is another 16 bytes and then finally we put the third constant
tau 3 which is another 4 bytes okay so as i said if you sum these up you see that you get 64 bytes so basically we've expanded the key and the nonce in the counter into 64 bytes basically the key is Repeated twice i guess and then what we do is we apply a function i'll call this function little h okay so we apply this function little h and this is a function that's one to one so it maps 64 bytes to 64 bytes it's a completely invertible function okay so this function h is as
i say it's an invertible function so given the input you can get the Output and given the output you can go back to the inputs and it's designed specifically so it's a easy to implement in hardware and b on an x86 it's extremely easy to implement because x86 has this sse2 instruction sets which supports all the operations you need to do for this function it's very very fast as a result also has a very fast stream cipher and then it does this basically Again and again so it implies this function h again it gets another
64 bytes and so on and so forth basically it does this 10 times okay so the whole thing here i'll say repeats 10 times so basically apply h 10 times [Music] and then by itself this is actually not quite Random it's not going to look random because like we said h is completely invertible so given this final output it's very easy to just invert h and go back to the original inputs and then test that the input has the right structure so you do one more thing which is to basically xor the inputs and the
final outputs actually sorry it's not an xor it's actually an Addition so you do an addition word by word so there are 64 bytes you do a word by word addition four bytes at a time and finally you get the 64 byte output that's it that's the whole pseudorandom generator so that's the whole function little h and as i explained this whole construction here is the function big h and then you evaluate big h by incrementing the counter i from 0 1 2 3 Onwards and that will give you a pseudo-random sequence that's as long
as you need it to be and basically there are no significant attacks on this this has security that's very close to 2 to the 128. we'll see what that means more precisely later on but it's a very fast stream cipher both in hardware and in software and as far as we can tell Seems to be unpredictable as required for a stream cipher so i should say that the e-stream project actually has five stream ciphers like this i only chose salsa because i think it's the most elegant but i can give you some performance numbers here
so you can see these are performance numbers on a 2.2 gigahertz you know x86 type machine and you can see that Rc4 actually is the slowest because essentially well it doesn't really take advantage of the hardware it only does byte operations and so there's a lot of wasted cycles that aren't being used but the e-stream candidates both salsa and another candidate called soso manuk i should say these are e-stream finalists these are actually stream ciphers that are approved by the e-stream project You can see that they achieve significant rates this is 643 megabytes per second
on this architecture more than enough for a movie and uh these are actually quite impressive rights and so now you've seen examples of two old stream ciphers that shouldn't be used including attacks on those stream ciphers you've seen what the modern stream ciphers look like with this nonce And you see the performance numbers for these modern stream ciphers so if you happen to need a stream cipher you could use one of the e-streams finalists in particular you could use something like salsa in the next three segments we will change gears a little bit and talk
about the definition of a prg this definition is a really good way to think of a prg and we will see many applications for this definition So consider a prg with key space k that outputs n bit strings our goal is to define what does it mean for the output of the generator to be indistinguishable from random in other words we're going to define a distribution that basically is defined by choosing a random key in the key space remember that arrow with r above it means choosing uniformly from the set script k and then We
output basically the output of the generator and what we'd like to say is that this distribution this distribution of pseudorandom strings is indistinguishable from a truly uniform distribution in other words if we just choose a truly uniform string in 0 1 to the n and simply output this string we'd like to say that these two distributions are indistinguishable from one another Now if you think about it this sounds really surprising because if we draw a circle here of all possible strings in 0 1 to the n then the uniform distribution basically can output any of
these strings with equal probability that's the definition of the uniform distribution however a pseudo-random distribution generated by this generator g Because the seed space is so small the set of possible outputs is really really small it's tiny inside of zero one to the n and this is really all that the generator can output and yet what we're arguing is that an adversary who looks at the output of the generator in this tiny set can't distinguish it from the output of the uniform distribution over the entire set okay that's the Property that we're actually shooting for
so to understand how to define this concept of indistinguishability from random we need the concept of a statistical test so let me define what a statistical test on 0 1 to the n is i'm going to define the statistical tests by the letter a and a statistical test is basically an algorithm that takes as input an n bit string and Simply outputs 0 or 1. now i'll tell you that 0 we're going to think of it as though the statistical test said the input you gave me is not random and 1 we're going to think
of it as saying that the input you gave me actually is random okay so all the statistical test does is it basically takes the input x that was given to us the n-bit string that was given to it and decides whether It looks random or doesn't look random let's look at a couple of examples so the first example basically will use the fact that for a random string the number of zeros is roughly equal to the number of ones in that string in other words the statistical test is going to say one if and only
if basically the number of zeros in the given string x Minus the number of ones in the given string x is these two numbers are not too far apart in other words the difference between the number of zeros and the number of ones let's just say is less than 10 times square root of n okay so if the difference is less than 10 times square of n the statistical test will say hey the string x looks random If the difference happens to be much bigger than 10 times square root of n that starts to look
suspicious and then the statistical test will say hey the string you gave me does not look random okay so this is our first example of a statistical test let's look at another similar example we'll say here the statistical test will say one if and only if say the number of times that we have two Consecutive zeros inside of x well let's think about this for a second this basically again counts in the string of n bits it counts the number of times that we see the pattern zero zero two consecutive zeros well for a random
string we will expect to see zero zero with probability one fourth and therefore in a random string we'll expect about N over four zero zeros yeah n over four blocks of zero zero and so what the statistical test will do is it will say well if the number of zero zeros is roughly n over four in other words the difference between the number and n over 4 is say less than 10 square root of n then we will say that x looks random and if the gap is much bigger than n over 4 will say
hey this string doesn't really Look random and then the statistical test will output 0. okay so here are two examples of statistical tests that basically for random strings they will output one with very high probability but for strings that you know don't look random for example think of the all zero string for the all zero string neither one of these tests will output a one and in fact the o zero string does Not look random let's look at one more example of a statistical test just to kind of show you that basically statistical tests can
pretty much do whatever they want so here's a third example let's say that the statistical test outputs one if and only if let's say the biggest block so we'll call this the maximum run of 0 inside of the string x This is basically the longest sequence of zeros inside of the string x in a random string you expect the longest sequence of zeros to be roughly of length log n so we'll say if the longest sequence of 0 happens to be less than 10 times log n then this test will say that x was random
but if all of a sudden we see a run of zeros that say is much bigger than 10 log n then the Statistical test will say the string is not random okay so this is another crazy thing that the statistical test will do by the way you notice that if you give this test the all one string so 1 1 1 1 1 this test will also output one in other words this test will think that the all one string is random even though it's not yeah even though All one string is not particularly random
okay so statistical tests don't have to get things right they can do whatever they like they can test they can decide to output random or not you know zero or one however they like and similarly there are many many many many other statistical tests there are literally hundreds of statistical tests that one can think of and i can tell you that in the old days basically the way You would define that something looks random is you would say hey here's a battery of statistical tests and all of them said that the string looks random therefore
we say that this generator the generator of the string is a good generator in other words this definition that uses a fixed set of statistical tests is actually not a good definition for security or more generally for crypto But before we talk about actually defining security the next thing we talk about is how do we evaluate whether a statistical test is good or not so to do that we define the concept of advantage and so let me define the advances so here we have a generator that outputs n bit strings and we have a statistical
test on n bit strings and we define the advantage of this Generator i'll denote it by advantage sub prg the advantage of the statistical test a relative to the generator g i'll define it as follows it's basically the difference between two quantities the first quantity is basically we ask how likely is this statistical test to output one when we give it a pseudorandom string okay so here k is chosen uniformly from the seed space We ask how likely is the statistical test to output 1 when we give it a pseudo-random output generated by the generator
versus now we ask how likely is the statistical test to output 1 when we give it a truly random string so here r is truly random in 0 1 to the n okay and yeah we look at the difference between these two quantities now you realize because these are differences of probabilities this Advantage is always going to lie in the interval 0 one so let's think a little bit about what this advantage actually means so first of all if the advantage happens to be close to one well what does that mean that means that somehow
the statistical test a behaved differently when we gave it pseudorandom inputs when we gave it the output of the generator From when we gave it truly random inputs right it somehow behaved differently in one case it output one with a certain probability and in the other case it output one with a very different probability okay so somehow it was able to behave differently and what that really means is that the statistical test can basically distinguish the output of the generator from random okay so in some sense We'll say that the statistical test broke the generator
g because it was able to distinguish the output from random however if the advantage is close to zero well what does that mean that means that basically the statistical test behaves pretty much the same on pseudorandom inputs as it does on truly random inputs and basically there we would say that a could not distinguish the generator from Random okay so this gives you a little bit of intuition about why this concept of advantage is important it basically tells us whether a was able to break the generator namely distinguish it from random or not able to
break it so let's look first of all at a very silly example suppose we have a statistical test a that simply ignores its inputs And always outputs zero okay always output zero what do you think is the advantage of the statistical test relative to a generator g so i hope everybody said the advantage is zero let me just explain why that's the case well if the statistical test always outputs zero that means that when we give a pseudo-random input it will never output one so the probability that outputs one Is zero similarly when we give
a truly random input it still will never output one and so the probability that it outputs one is zero and so zero minus zero is zero so its advantage is zero so basically and a statistical test that ignores its inputs is not able to distinguish truly random input from a pseudorandom input obviously okay so now let's look at a more Interesting example so suppose we have a generator g that satisfies a funny property it so happens that for two thirds of the keys the first bit of the output of the generator happens to be one
okay so if i choose a random key with probability two-thirds the generator will output one as its first bit okay so that's the property of the generator that we're looking at Now let's look at the following statistical test the statistical test basically says if the most significant bit of the string you gave me is 1 i'm going to say 1 meaning i think it's random if the most significant bit of the string you gave me is not one but namely zero i'm going to say zero okay so now my question to you is what is
the advantage of the Statistical test on the generator g okay so remember i just wrote down the definition here again and i'll let you think about this for a second so let me explain suppose we give the statistical test pseudorandom inputs by definition of g we know that with probability two thirds the first bits in the input will start with the bit one but if it starts with a bit one then the Statistical test will output one in other words the probability that the statistical test outputs one is exactly two-thirds now let's look at the
case of a random string if i give you a random string how likely is it that the most significant bit of the random string is one well for a random string that happens exactly half the time and so in this case the statistical test Will output one with probability one-half and so the overall advantage is one-sixth and one-sixth is actually a non-negligible number that's actually a fairly large number which basically means that this a was able to distinguish the output we'll say that a breaks the generator g with advantage one six okay which basically means
that this generator Is no good it's broken okay so now that we understand what statistical tests are we can go ahead and define what is a secure pseudorandom generator so basically we say that as generator g is secure if essentially no efficient statistical tests can distinguish its output from random more precisely what we'll say is that basically for all efficient statistical tests a Statistical tests a it so happens that if i look at the advantage of the statistical test a relative to g disadvantage basically is negligible okay so in other words it's very close to
zero and as a result this statistical test was not able to distinguish the output from random and that has to be true for all statistical tests so this is A very very pretty and elegant definition that says that a generator is secure not only if a particular battery of statistical tests says that the output looks random but in fact all efficient statistical tests will say the output looks random okay one thing i'd like to point out is that the restriction to efficient statistical tests is actually necessary if we ask that all statistical tests regardless of
whether they're efficient Or not not be able to distinguish the output from random then in fact that cannot be satisfied so in other words if we took out the requirements that the test be efficient then this definition would be unsatisfiable and i'll leave this as a simple puzzle for you to think about but i'll basically the fact is that restricting this definition To only efficient statistical tests is actually necessary for this to be satisfiable now that we have a definition the next question is can we actually construct a generator and then prove that in fact
it is a secure prg in other words prove that no efficient statistical test can distinguish its output from random and it turns out that the answer is we Actually can't in fact it's not known if there are any provably secure prgs and i'll just say very briefly the reason is that if you could prove that is that a particular generator is secure that would actually imply that p is not equal to np and i don't want to dwell on this because i don't want to assume you guys know what p and np are but i'll
just tell you As a simple fact that in fact if p is equal to np then it's very easy to show that there are no secure prgs and so if you could prove to me that a particular prg is secure that would imply that p is not equal to np again i will leave this to you as a simple puzzle to think about but even though we can't actually rigorously prove that a particular Prg is secure we still have lots and lots and lots of heuristic candidates and we even saw some of those in the
previous segments okay now that we understand what is a secure prg i want to talk a little bit about some applications and implications of this definition and so the first thing i want to show you is that in fact a secure prg is necessarily unpredictable in a previous segment we talked about what it Means for our generator to be unpredictable and we said that basically what that means is that given a prefix of the output of the generator it's impossible to predict the next bit of the outputs okay so we'd like to show that if
a generator is secure then necessarily it means it's unpredictable and so the way we're going to do that is using the contrapositive that is we're Going to say that if you give me a generator that is predictable then necessarily it's insecure in other words necessarily i can distinguish it from random and so let's see this is actually a very simple fact and so let's see how we would do that so suppose you give me a predictor in other words suppose you give me an efficient algorithm such that in fact if i give this algorithm the
output of the generator But i give it only the first i bits of the outputs it's able to predict the next bit of the output in other words given the first i bits it's able to predict the i plus first bit and it does that with a certain probability so let's say if we choose a random k from the key space then clearly a dumb predictor would be able to predict the next bit with probability One half simply just guess a bit you'll be right with probability one half however this algorithm a is able to
predict the next bit with probability half plus epsilon so it's bounded away from a half and in fact we require that this be true for some non-negligible epsilon so for example epsilon equals one over a thousand would already be a dangerous predictor because it can predict the Next bits given a prefix with non-negligible advantage okay so suppose we have such an algorithm let's see that we can use this algorithm to break our generator in other words to show that the generator is distinguishable from random and therefore is insecure so what we'll do is we'll define
a statistical test so let's define a statistical test b As follows basically b given a string x what it will do is it will simply run algorithm a on the first i bits of the string x that it was given and statistical test b is simply going to ask was a successful in predicting the i plus first bit of the string if it was successful then it's going to output one and if it wasn't successful then it's going to output zero Okay this is our statistical test let's put it in a box so we can
take it wherever we like and we can run the statistical tests on any n-bit string that's given to us as input so now let's look at what happens suppose we give this statistical test a truly random string so truly random string r and we ask what is the probability that the statistical test outputs one Well for a truly random string the i plus first bit is totally independent of the first i bits so whatever this algorithm a is going to output is completely independent of what the i plus first bit of the string r is
and so whatever a outputs the probability that it's going to be equal to some random bit x i plus 1 random independent bit x i plus 1 that probability is exactly One-half in other words algorithm a simply has no information about what the bit x i plus one is and so necessarily the probability that it is able to predict x i plus one is exactly one half on the other hand let's look at what happens when we give our statistical tests uh pseudorandom sequence okay so now we're going to run the statistical test on the
output of the generator And we ask how likely is it to output one well by definition of a we know that when we give it the first i bits of the output of the generator it's able to predict the next bit with probability half plus epsilon so in this case our statistical test b will output one with probability greater than half plus epsilon and basically what this means is if we look at the advantage Of our statistical tests over the generator g it's basically the difference between this quantity and that quantity and the difference between
the two you can see is clearly greater than an epsilon so what this means is that if algorithm a is able to predict the next bit with advantage epsilon then algorithm b is able to distinguish the output of the Generator with advantage epsilon okay so if a is a good predictor b is a good statistical test that breaks the generator and as we said the contrapositive of that is that if g is a secure generator then there are no good statistical tests and as a result there are no predictors okay and which means that the
generator is as we said unpredictable okay so so far what we've seen is that If the generator is secure necessarily it's impossible to predict the i pulse first bit given the first ibits now there's a very elegant and remarkable theorem due to yao back in 1982 that shows that in fact the converse is also true in other words if i give you a generator that's unpredictable so you cannot predict the i plus first bit from the first i Bits and that's true for all i that generator in fact is secure okay so let me state
the theorem a little bit more precisely so here we have our generator that outputs n bit outputs the theorem says the following basically for all bit positions it's impossible to predict i plus first bit of the output given the first i bits and that's true for all i in other words Again the generator is unpredictable for all bid positions then that in fact implies that the generator is a secure prg i want to paraphrase this in english and so the way to kind of interpret this result is to say that if basically these next bit
predictors these predictors to try to predict the i plus first bit given the first i bits if they're not able to distinguish g From random then in fact no statistical tasks can distinguish g from random so kind of next bit predictors are in some sense universal predictors when it comes to distinguishing things from random this theorem by the way it's not too difficult to prove but there's a very elegant idea behind its proof i'm not going to do the proof here but i encourage you to think about this as a puzzle and try to kind
of prove This theorem yourself let me show you kind of one acute implication of this theorem so let me ask you the following question suppose i give you a generator and i tell you that given the last bit of the outputs it's easy to predict the first bit of the outputs okay so given the last n bits you can compute the first n bits that's kind of the opposite of predictability Right predictability means given the first bits you can produce the next bits here given the last bits you can produce the first ones and my
question to you does that mean that the generator is predictable can you somehow from this fact still build a predictor for this generator so this is kind of a simple application of yao's theorem and let me explain to You the answer is actually yes let me explain why how do we build this generator well actually we're not going to build it i'm just going to show you that the generator exists well because the last n over two bits supply the first n over two bits that necessarily means that the generator here let me write it
this way that necessarily means that g is not secure Because just as we did before it's very easy to build a statistical test that will distinguish the output of g from uniform so g is not secure but if g is not secure by yao's theorem that means that g is predictable so in other words there exists some i for which given the first i bits of the output you can build the i plus first bit of the output okay and so even though i can't quite Point to you to a predictor we know that a
predictor must exist so that's a one cute simple application of java's theorem now before we end this segment i want to kind of uh generalize a little bit what we did and introduce a little bit of important notation that's going to be useful actually throughout so we're going to generalize the concept of indistinguishability from Uniform to indistinguishability of two general distributions so suppose i give you p1 and p2 and we ask can these two distributions be distinguished and so we'll say that the distributions are computationally indistinguishable and we'll denote this by p1 is quickly p
uh p2 this means that in polynomial time p1 cannot be distinguished from p2 and we'll say that they're Indistinguishable basically just as before if basically for all efficient statistical tests statistical tests a it so happens that if i sample from the distribution p1 and i give the output to a versus if i sample from the distribution p2 and i give the sample to a then basically a behaves the same in both cases in other Words the difference between these two probabilities is negligible and this has to be true for all statistical tests for all efficient
statistical tests okay so if this is the case then we say that well a couldn't distinguish its advantage in distinguishing two distributions is negligible and if that's true for all efficient statistical tests then we say that these two distributions are Basically computationally indistinguishable because an efficient algorithm cannot distinguish them and just to show you how useful this notation is basically using this notation the definition of security for prg just says that if i give you a pseudo-random distribution in other words i choose k in random and then output G of k that distribution is computationally
indistinguishable from the uniform distribution so you can see this this very simple notation captures the whole definition of pseudo-random generators okay so we're going to make use of this notation in the next segment when we define what does it mean for a cipher to be secure my goal for the next two segments is to show you that if we use a secure prg We get a secure stream cipher the first thing we have to do is define what does it mean for a stream cipher to be secure so whenever we define security we always define
it in terms of what can the attacker do and what is the attacker trying to do in the context of stream ciphers remember these are only used with a one-time key and as a result the most the attacker is ever going to see is just one cipher text encrypted using The key that we're using and so we're going to limit the attacker's abilities to basically obtain just one cipher text and in fact later on we're going to allow the attacker to do much much much more but for now we're just going to give him one
cipher text and we want to define what does it mean for the cipher to be secure so the first definition that comes to mind is basically to say Well maybe we want to require that the attacker can't actually recover the secret key okay so given the cipher text you shouldn't be able to recover the secret key but that's a terrible definition because think about the following brilliant cipher so here but the way we encrypt a message using a key k is basically we just output the message Okay this is a brilliant cipher yeah of course
it doesn't do anything given a message it just outputs the message as the cipher text so this is not a particularly good encryption scheme however given the cipher text namely the message the poor attacker cannot recover the key because he doesn't know what the key is and so as a result this cipher which clearly is insecure would be considered secure under this Requirement for security so this definition will be no good okay just recovering the secret key is the wrong way to define security so the next thing we can try and attempt is basically to
say well maybe the attacker doesn't care about the secret key what he really cares about is the plain text so maybe it should be hard for the attacker to recover the entire plaintext But even that doesn't work because let's think about the following encryption scheme so suppose what this encryption scheme does is it takes two messages so i'm going to use uh two lines to denote the concatenation of two messages m0 line line m1 means concatenate m0 and m1 and imagine what the scheme does is really it outputs m0 in the clear and Concatenates to
that the encryption of m1 perhaps even using the one time pad okay and so here the attacker is going to be given one cipher text and his goal would be to recover the entire plaintext but the poor attacker can't do that because here maybe we encrypted m1 using the one-time pad so the attacker can't actually recover m1 because we know the one-time pad is Secure given just one cipher text so this construction would satisfy the definition but unfortunately clearly this is not a secure encryption scheme because we just leaked half of the plaintext m0 is
completely available to the attacker so even though he can't recover all of the plaintext he might be able to recover most of the plaintext and that's clearly insecure so of course we already know the solution to this and We talked about shannon's definition of security perfect secrecy where shannon's idea was that in fact when the attacker intercepts a cipher text he should learn absolutely no information about the plaintext he shouldn't even learn one bit about the plaintext or even he shouldn't learn he shouldn't even be able to predict a little bit about a bit of
the plain text absolutely no Information about the plain text so let's recall very briefly shannon's concept of perfect secrecy basically we said that you know given a cipher we said the cipher has perfect secrecy if given two messages of the same length it so happens that the distribution of ciphertexts yeah if we pick a random key and we look at the distribution of ciphertext we encrypt m0 We get exactly the same distribution as when we encrypt m1 the intuition here was that if the adversary observes the ciphertext then he doesn't know whether it came from
the distribution the result of encrypting m0 or it came from the distribution as the result of encrypting m1 as a result he can't tell whether we encrypted m0 or m1 and that's true for all messages of the Same length and as a result the poor attacker doesn't really know what message was encrypted of course we already said that this definition is too strong in the sense that it requires really long keys if typhoid that has short keys can't possibly satisfy this definition and in particular stream ciphers can't satisfy this definition okay so let's try to
weaken the Definition a little bit and let's think to the previous segment and we can say that instead of requiring that the two distributions be absolutely identical what we can require is that the two distributions just be computationally indistinguishable in other words a poor efficient attacker cannot distinguish the two distributions even though the distributions might be very very very different But just given a sample for one distribution in a sample from another distribution the attacker can't tell which distribution he was given a sample from it turns out this definition is actually almost right but it's
still a little too strong this still cannot be satisfied so we have to add one more constraint and that is that instead of saying that this definition should hold for all m0 m1 It needs to hold for only pairs m0 m1 that the attacker can actually exhibit okay so this actually leads us to the definition of semantic security and so again this is semantic security for a one-time key in other words when the attacker is only given one ciphertext and so the way we define semantic security is by defining two experiments okay we'll define experiment
zero and experiment one And more generally we'll think of these as experiments parenthesis b where b can be zero or one okay so the way the experiment is defined is as follows we have an adversary that's trying to break the system in adversary a that's kind of the analog of a statistical test in the world of pseudorandom generators and then the challenger does the following so really we have two challengers but the challenges are so similar That we can just describe them as a single challenger that in one case takes as inputs the bit set
to zero and the other case takes is input the bit set to one now let me show you what these challengers do the first thing the challenger is going to do is it's going to pick a random key and then the adversary basically is going to output two messages m0 and m1 okay so this is an explicit pair of Messages that the attacker wants to be challenged on and as usual we're not trying to hide the length of the messages we require that the messages be equal length and then the challenger basically will output either
the encryption of m0 or the encryption of m1 okay so in experiment 0 the challenger will output the encryption of m0 in experiment 1 The challenger will output the encryption of m1 okay so that's the difference between the two experiments and then the adversary is trying to guess basically whether he was given the encryption of m0 or given the encryption of m1 okay so here's a little bit of notation let's define the event wb to be the event that an experiment be the adversary output one okay so that's just an event that Basically in experiment
zero w0 means that the adversary output one experiment one w1 means the adversary output ones and now we can define the advantage of this adversary basically to say what this is called the semantic security advantage of the adversary a against the scheme e to be the difference of the probability of these two events In other words we're looking at whether the adversary behaves differently when he was given the encryption of m0 from when he was given the encryption of m1 and i want to make sure this is clear so i'm going to say it one
more time so in experiment zero he was given the encryption of m0 in experiment one he was given the encryption of m1 now we're just interested in whether the Adversary output won or not in these experiments if in both experiments the adversary output one with the same probability that means that the adversary wasn't able to distinguish the two experiments experiment zero basically looks to the adversary the same as experiment one because in both cases it output one with the same probability however if the adversary is able to Output one in one experiment with significantly different
probability than in the other experiment then the adversary was actually able to distinguish the two experiments okay so to say this more formally essentially the advantage again because it's a difference of two probabilities the advantage is a number between zero and one if the advantage is close to zero that Means that the adversary was not able to distinguish experiment zero from experiment one however if the advantage is close to one that means the adversary was very well able to distinguish experiment zero from experiment one and that really means that he was able to distinguish an
encryption of m0 from an encryption of m1 okay so that's our definition actually that's just the definition of The advantage and then the definition is just what you would expect basically will say that a symmetric encryption scheme is semantically secure if for all efficient adversaries here i'll put this in quotes again for all efficient adversaries the advantage is negligible in other words no efficient adversary can distinguish the encryption of m0 from the concretion of m1 and basically This is what it says repeatedly that for these two messages that the adversary was able to exhibit he
wasn't able to distinguish these two distributions now i want to show you this is actually a very elegant definition it might not seem so right away but i want to show you some implications of this definition and you'll see exactly why the definition is the way it is okay so Let's look at some examples so the first example is suppose we have a broken encryption scheme in other words suppose we have an adversary a that somehow given the cipher text he's always able to deduce the least significant bit of the plaintext okay so given the
encryption of m0 this adversary is able to deduce the least significant bits of m0 so that's a terrible encryption Scheme because it basically leaks the least significant bit of the plaintext just given the cipher text so i want to show you that the scheme is therefore is not semantically secure so that kind of shows that if a system is semantically secure then there is no attacker of this type okay so let's see why is the system not semantically secure well so what we're going to do is we're going to basically use our Adversary who's able
to learn the least significant bits we're going to use him to break semantic security so we're going to use him to distinguish experiment zero from experiment one okay so here's what we're going to do we're algorithm b we're going to build algorithm b and this algorithm b is going to use algorithm a in its belly okay so the first thing that's going to happen Is of course the challenger is going to choose a random key the first thing that's going to happen is we need to output two messages the messages that we're going to output
basically are going to have differently significant bits so one message is going to end with 0 and one message is going to end with 1. now what is the challenger going to do the challenger is going to give us the encryption Of either m0 or m1 depending on whether we're in experiment zero or in experiment one and then we just forward this cipher text to the adversary okay so adversary now what's the property of adversary a given the ciphertext adversary a can tell us what the least significant bits of the plaintext is in other words
the adversary is going to output the less significant Bits of m0 or m1 but low and behold that's basically the bit b and then we're just going to output that as our guess so let's call this thing b prime okay so now this describes the semantic security adversary and now you tell me what is the semantic security advantage of this adversary well so let's see so in experiment zero what's the probability that adversary b outputs one well in experiment zero It's always given the encryption of m0 and therefore adversary a was always output the least
significant bit of m0 which happens to be zero in experiment zero b always outputs zero so the probability that outputs one is zero however in experiment one we're given the encryption of m1 so how likely is adversary b to output one in experiment one well it always outputs one Again by the properties of algorithm a and therefore the advantage basically is one so this is a huge advantage it's as big as it's gonna get which means that this adversary completely broke the system okay so we consider so under semantic security basically just deducing the least
significant bits is enough to completely break the system under a semantic Security definition okay now the interesting thing here of course is that this is not just about uh the least significant bit in fact take any predicate of the message for example the most significant bits you know bit number seven maybe the xor of all the bits in the message and so on and so forth any kind of information any bits about the plain Text that can be learned basically would mean that the system is not semantically secure so basically all the adversary i have
to do would be to come up with two messages m0 and m1 such that under one thing that a learns it's the value zero and another thing the value is one so for example if a was able to learn the xor of all the bits of the message Then m0 and m1 would just have different xors for all the bits of their messages and then this adversary a would also be sufficient to break semantic security okay so basically if a cypher is semantically secure then no bit of information is revealed to an efficient adversary okay
so this is exactly the concept of perfect secrecy only applied just to efficient adversaries rather than all adversaries So the next thing i want to show you is that in fact the one-time pad in fact is semantically secure it meant to be symmetrically secure because it's in fact it's more than that it's perfectly secure so let's see why that's true well so again it's one of these experiments so suppose we have an adversary that claims to break semantic security of the one-time pad the first thing the adversary is going To do is he's going to
output two messages m0 and m1 of the same length now what does he get back as a challenge as a challenge he gets either the encryption of m0 or the encryption of m1 under the one-time pad and he's trying to distinguish between those two possible cipher texts that he gets right in experiment zero he gets The encryption of m0 and experiment one he gets the encryption of m1 well so let me ask you so what is the advantage of adversary a against the one time pad so remember the property of the one-time pad is that
k xor m0 is distributed identically to kx or m1 in other words these distributions are absolutely identical distribution distributions Identical this is a property of xor if we xor the random path k with anything either m0 or m1 we get uniform distribution so in both cases algorithm a is given as inputs exactly the same distribution namely the uniform distribution on ciphertexts and therefore it's going to behave exactly the same in both cases because it was given the same distribution as inputs And as a result its advantage is zero which means that the one-time pad is
semantically secure now the interesting thing here is not only is it semantically secure it's semantically secure for all adversaries we don't even have to restrict the adversaries to be efficient no adversary it doesn't matter how smart you are no adversary will be able to distinguish kx or m0 for Kx or m1 because the two distributions are identical okay so the one-time pad is semantically secured okay so that completes our definition of semantic security and the next thing we're going to do is prove that a secure prg in fact implies that the stream cipher is semantically
secure so now that we understand what a secure prg is and we understand what semantic security Means we can actually argue that a stream cipher with a secure prg is in fact semantically secure so that's our goal for this segment it's a fairly straightforward proof and we'll see how it goes so the theorem we want to prove is that basically given a generator g that happens to be a secure pseudorandom generator in fact the stream cipher that's derived from this generator is going to be Semantically secure okay and i want to emphasize that there was
no hope of proving a theorem like this for perfect secrecy for shannon's concept of perfect secrecy because we know that a stream cipher cannot be perfectly secure because it has short keys and perfect secrecy requires keys to be as long as the message so this is really kind of the first Example that we see where we're able to prove that a cipher with short keys has security the concept of security is semantic security and this actually validates that really this is a very useful concept and in fact you know we'll be using semantic security many
many times throughout the course okay so how do we prove a theorem like this what we're actually going to be Doing is we're going to be proving the contra positive what we're going to show is the following we're going to prove this statement down here but let me parse it for you suppose you give me a semantic security adversary a what we'll do is we'll build prg adversary b that satisfy this inequality here now why is this inequality useful Basically what do we know we know that if b is an efficient adversary then we know
that since g is a secure generator we know that this advantage is negligible right a secure generator has a negligible advantage against any efficient statistical test so the right hand side basically is going to be negligible but because the right hand side is Negligible we can deduce that the left hand side is negligible and therefore the adversary that you looked at actually has negligible advantage in attacking the stream cipher e okay so this is how this this will work basically all we have to do is give it an adversary a we're going to build an
adversary b we know that b has negligible advantage Against generator but that implies that a has negligible advantage against the stream cipher okay so let's do that so all we have to do again is given a we have to build b so let a be a semantic security adversary against the stream cipher so let me remind you what that means basically there's this challenger the challenger starts off by choosing a key k and then the adversary Is going to output two messages to equal length messages and he's going to receive the encryption of m0 or
m1 and then he outputs b prime okay that's what a semantic security adversary is going to do so now we're going to start playing games with this adversary and that's how we're going to prove our lemma all right so the first thing we're going to do is we're going to make the challenger Also choose a random r okay a random string r so well you know the adversary doesn't really care what the challenger does internally the challenger never uses r so this doesn't affect the adversary's advantage at all the address here just doesn't care that
that the challenger also picks r but now comes the trick what we're going to do is we're instead of encrypting using gk We're going to encrypt using r and you can see kind of basically what we're doing here essentially we're changing the challenger so now the challenge ciphertext is encrypted using a truly random pad as opposed to the pseudorandom pad gk okay now the property of the pseudorandom generator is that its output is indistinguishable from truly random so because the prg is Secure the adversary can't tell that we made this change the adversary can't tell
that we switched from a pseudorandom string to a truly random string again because the generator is secure well but now look at the game that we ended up with so the adversary's advantage couldn't have changed by much because he can't tell the difference but now look at the game that We ended up with now this game is truly a one-time pad game this is a semantic security game against the one-time pad because now the adversary is getting a one-time pad encryption of m0 or m1 but in the one-time pad we know that the adversary's advantage
is zero because you can't beat the one-time pad the one-time pad is secure unconditionally secure and as a result Because of this essentially since the adversary couldn't have tell the difference when we moved from pseudorandom to random but he couldn't win the random game that also means that he couldn't win the pseudo-random game and as a result the stream cipher the original stream cipher must be secure so that's the intuition for how the proof is going to go but i want to do it Rigorously once from now on we're just going to argue by playing
games with a challenger and we won't be doing things as formal as i'm going to do next but i want to do formally and precisely once just so that you see how these proofs actually work okay so i'm going to introduce some notation and i'll do the usual notation basically in the original semantic security game when We're actually using a pseudorandom pad i'm going to use w0 and w1 to denote the event that the adversary outputs 1 when it gets the encryption of m0 or gets the encryption of m1 respectively okay so w0 correspond to
opening 1 when receiving the encryption of m0 and w1 corresponds to outputting one when receiving the encryption of m1 so that's just as in the standard Definition of semantic security now once we flipped to the random pad i'm going to use r0 and r1 to denote the event that the adversary outputs one when receiving the one-time pad encryption of m0 or the one-time pad encryption of m1 so we have four events w0 w1 from the original semantic security game and r0 and r1 from the semantic security game once we switched over to the one time
pad So now let's look at relations between these variables so first of all r0 and r1 are basically events from a semantic security game against the one time pad so the difference between these probabilities is as we said it's basically the advantage of algorithm a of adversary a against the one-time pad which we know is zero okay so that's great so that basically means that Probability of r zero is equal to the probability of r1 so now let's put these events on a line it's on a line segment between zero and one so here are
the events w0 and w1 are the events we're interested in we want to show that these two are close okay and the way we're going to do it is basically by showing oh and i should say well here is probability r0 and r1 and since they're Both the same i just put them in the same place what we're going to do is we're going to show that both w0 and w1 are actually close to the probability of rb and as a result it must be close to one another okay so the way we do that
is using a second claim so now we're interested in the distance between probability of w i b and probability of rb Okay so we'll prove the claim in a second let me just state the claim the claim says that there exists an adversary b such that the difference between these two probabilities is basically the advantage of b against the generator g and this is for both b's okay so given these two claims the theorem is done because basically what do we know we know that this distance is now less than the advantage Of b against
g that's from claim two and similarly this distance is less actually it's even equal to i don't have to say less it's equal to the advantage of b against g and as a result you can see that the distance between w 0 and w 1 is basically almost twice the advantage of b against g that's basically the theorem we're trying to prove okay so the only thing that remains is Just proving this claim too and if you think about what claim two says it's basically captures the question of what happens in experiment zero what happens
when we replace the pseudorandom pad gk by a truly random pad r right here in experiment zero say we're using a pseudorandom pad and here in experiment zero we're using a truly random pad and we're asking Can the adversary tell the difference between these two and we want to argue that he cannot because the generator is secure okay so here's what we're going to do so let's prove claim two so we want to argue that in fact there is a prg adversary b that has exactly the difference of the two probabilities as its advantage okay
and since the point is since this is negligible This is negligible and that's basically what we wanted to prove okay so let's look at the statistical test b so what is statistical test b it's going to use adversary a in its belly so we get to build statistical test b however we want as we said it's going to use adversary a inside of it for its operation and it's a regular statistical test so it takes an n-bit string as inputs And it's supposed to output you know random or not random zero or one okay so
let's see so it's gonna the first thing it's gonna do is it's gonna run adversary a and adversary a is gonna output two messages m0 and m1 and then what adversary b is gonna do is basically going to respond with m0 xor the string that it was given as inputs all right that's the statistical lesson Then whatever a outputs it's going to output as its output and now let's look at its advantage so what can we say about the advantage of this statistical test against the generator well so by definition it's the probability that if
we choose a truly random string so here r is in zero 1 to the n it's the Probability that r that b outputs 1 minus the probability that when we choose a pseudo-random string the output's 1. okay okay but let's think about what this is so what can you tell me about the first expression so what can you tell me about this expression over here well by definition that's exactly if you think about what's going on here that's This is exactly the probability r0 right because this game that we're playing with the adversary here is
basically he outputs m0 m1 right here he outputted m0 m1 and he got the encryption of m0 under a truly one-time pad okay so this is basically a probability of r zero let me write this a little better so this is basically our probability of r zero now what can we say about The next expression well what can we say about what happens when b is given a pseudo-random string y as inputs well in that case this is exactly experiment zero in the true stream cipher game because now we're computing m xor m0 xor gk
so this is exactly w0 okay that's exactly what we have to prove so it's a kind of a trivial proof okay so that completes the proof of claim 2 and again just to make sure this Is all clear once we have claim 2 we know that w0 must be close to w1 and that's that's the theorem that's what we have to prove okay so now we've established that a stream cipher is in fact semantically secure assuming that the prg is secure before we start with the technical material i want to tell you a little bit
about the history of cryptography There's a beautiful book on this topic by david khan called the code breakers that covers the history of cryptography all the way from the babylonian area era to the present here i'm just going to give you a few examples of historical ciphers all of which are badly broken so to talk about ciphers the first thing i'm going to do is introduce our friends alice and bob who are going to be with us for the rest Of the quarter so alice and bob are trying to communicate securely and there is an
attacker who's trying to eavesdrop on their conversation so to communicate securely they're going to share a secret key which i'll denote by k they both know the secret key but the attacker does not know anything about this key k so now they're going to use a cipher which is a pair of algorithms An encryption algorithm denoted by e in a decryption algorithm v these algorithms work as follows the encryption algorithm e takes the message m as inputs and it takes as input the key k i'm going to put a wedge above the key input just
to denote the fact that this input is really the key input and then it outputs a ciphertext which is The encryption of the message m using the key k i'm always going to write the key first and when i write colon equals what i mean is that the expression defines what c what the variable c stands for now the cipher text is transmitted over the internet to bob somehow actually it could be transmitted over the internet could be transmitted using an encrypted file system it Doesn't really matter but when the ciphertext reaches bob he can
plug it into the decryption algorithm and give the decryption algorithm the same key k again i'm going to put a wedge here as well to denote the key input and the decryption algorithm outputs the original plaintext message now the reason we say that these the reason we say that these are symmetric Ciphers is that both uh the encrypter and decrypter actually use the same key key as we'll see later in the course there are ciphers where the encrypter uses one key and the decrypter uses a different key but here we're just going to focus on
symmetric cipher where both sides use the same key okay so let me give you a few historic examples of ciphers the first Example simplest one is called a substitution cipher i'm sure all of you played with substitution ciphers when you were in kindergarten basically a key for a substitution cipher is a substitution table that basically says how to map letters so here for example the letter a would be mapped to c the letter b would be mapped to w the letter c Would be mapped to n so on and so forth and then the letter
z would be mapped say to a so this is one example of a key for a substitution cipher so just to practice the notation we introduced before the encryption of a certain message using this key let's say the message is b c z a the encryption of this message using this key here Would be is done by substituting one letter at a time so b becomes w c becomes n z becomes a and a becomes c so the encryption of b c z a is wnac and this defines the ciphertext similarly we can decrypt the
ciphertext using the same key and of course we'll get back the original message okay so just for historical reasons there's one example of Something that's related to the substitution cipher called the caesar cipher the caesar cipher actually is not really a cipher at all and the reason is that it doesn't have a key what a caesar cipher is is basically a substitution cipher where the substitution is fixed namely it's a shift by 3. so a becomes d b becomes e c becomes f And so on and so forth what is it y becomes b and
z becomes c it's a fixed substitution that's applied to all plain text messages so again this is not a cipher because there is no key the key is fixed so if an attacker knows how our encryption scheme works he can easily decrypt the message the key is not random and therefore decryption is very easy Once you understand how the scheme actually works okay so now let's go back to the substitution cipher where the keys really are chosen at random these substitution tables are chosen at random and let's see how we break this substitution cipher turns
out to be very easy to break the first question is how big is the key space how many different keys are there assuming we have 26 letters so i hope All of you said that the number of keys is 26 factorial because a key a substitution key is simply a table a permutation of all 26 letters the number of permutations of 26 letters is 26 factorial if you calculate this out 26 factorial is about 2 to the 88 which means that describing a key in a substitution cipher takes about 88 bits so each key Is
represented by about 88 bits now this is a perfectly fine size for a key space in fact we're going to be seeing ciphers that are perfectly secure or you know there are adequately secure with key spaces that are roughly of this size however even though the substitution cipher has a large key space of size 2 to the 88 it's still terribly insecure so let's see how to break it And to break it we're going to be using letter frequencies so the first question is what is the most frequent letter in english text so i imagine
all of you know that in fact e is the most common letter and that is going to if we make that quantitative that's going to help us break a substitution cipher so just given the cipher text we can already recover the plain text So the way we do that is first of all using frequencies of english letters so here's how this works so you give me an encrypted message using a substitution cipher what i know is all i know is that the message is in english the plaintext is in english and i know that the
letter e is the most frequent letter in english and in fact it appears 12.7 percent of the time in standard english text so what i'll do is i'll look at the cipher text you gave me and i'm going to count how many times every letter appears now the most common letter in the cipher text is going to be the encryption of the letter e with very high probability so now i'm able to recover one entry in the key table namely the Letter namely now i know what the letter e maps to the next most common
letter in english is the letter t that appears about 9.1 percent of the time so now again i count how many times each letter appears in the ciphertext and the second most frequent letter is very likely to be the encryption of the letter t so now i've recovered the second entry In the key table and i can continue this way in fact the letter a is the next most common letter it appears 8.1 percent of the time so now i can guess that the third most common letter in the ciphertext is the encryption of the
letter a and now i've recovered three entries in the key table well so now what do i do the remaining letters in english appear roughly same amount of time other than some rare Letters like q and x but we're kind of stuck at this point we figured out three entries in the key table but what do we do next so the next idea is to use frequencies of pairs of letters sometimes these are called diagrams so what i'll do is i'll count how many times each pair of letters appears in the cipher text And i
know that in english the most common pairs of letters are things like h e a n i n i guess t h is another common pair of letters and so i know that the most common pair of letters in the ciphertext is likely to be the encryption of one of these four pairs and so by trial and error i can sort of figure out more entry more elements in the key table and Again by more trial and error this way by looking at trigrams i can actually figure out the entire key table so the bottom
line here is that in fact this substitution cipher is vulnerable to the worst possible type of attack namely a cipher text only attack just given the cipher text the attacker can recover the decryption key and therefore recover The original plaintext so there's really no point in encrypting anything using a substitution cipher because the attacker can easily decrypt it all you might as well send your plaintext completely in the clear so now we're going to fast forward to the renaissance and i guess we're moving from the roman era to the renaissance and look at a cipher
designed by a Fellow named vigenere who lived in the 16th century he designed a couple of ciphers here i'm going to show you a variant of one of his ciphers this is called a vigenere cipher so in a vigenere cipher the key is a word in this case the word is crypto it's got six letters in it and then to encrypt the message what you do is you write the message under the key so in this case the Message is what a nice day today and then you replicate the key as many times as needed
to cover the message and then the way you encrypt is basically you add the key letters to the message letters modulo 26 so just to give you an example here for example if you add y and a you get a z if you add t and a you get u and you do this for all the Letters and remember whenever you add you add modulo 26. so if you go past z you go back to a so that's the vigenere cipher and in fact decryption is just as easy as encryption basically the way you decrypt
is again you would write the ciphertext under the key you would replicate the key and then you would subtract the key from the ciphertext to get the Original plaintext message so breaking of engineer cipher is actually quite easy let me show you how you do it the first thing we need to do is we need to assume that we know the length of the key so let's just assume we know that in this case the length of the key is six and then what we do is we break the cipher text into groups of six
letters each okay so we're going to get a bunch a Bunch of groups like this each one contains six letters and then we're going to look at the first letter in each group okay so in this case yes we're looking at the first letter every six characters now what do we know about these uh six letters we know that in fact they're all encrypted using the same letter in the cipher text all of these Are encrypted using the letter c in other words z l w is just shift by three of the plaintext letters so
if we collect all these letters then the most common letter among the set is likely to be the encryption of e right e is the most common letter in english therefore if i look at every sixth letter the most common letter in that Set is likely to be the encryption of the letter e aha so let's just suppose that in fact the most common letter uh every sixth letter happens to be the letter h then we know that e plus the first letter of the key is equal to h that says that the first letter
of the key is equal to h minus e and in fact that is the letter c so now we've recovered the first letter Of the key and now we can continue doing this with the second letter so we look at the second letter in every group of six characters and again we repeat the same exercise we find the most common letter among the sets and we know that the most this most common letter is likely the encryption of e and therefore whatever this letter whatever this most common letter is If we subtract e from it
we're going to get the second letter of the key and so on and so forth with uh the third letter every six characters and this way we recover the entire key and that allows us to decrypt the message now the only caveat is that i had to assume ahead of time that i know the length of the key which in this case is six but if i don't Know the length of the key ahead of time that's not a problem either what i would do is i would run this decryption procedure assuming the key length
is one then i'd run it assuming the key length is two then i would run it assuming the key length is three and so on and so on and so on until finally i get a message i get a decryption that makes sense that's sensical and once i do that i know that i've kind of Recovered the right length of the key and i know that i've also recovered the right key and therefore the right message okay so very very quickly you can recover you can decrypt regionair uh ciphers again this is a ciphertext only
attack the interesting thing is a viginer had a good idea here this addition mod 26 is actually a good idea and we'll see that later except it's executed very Poorly here and so we'll correct that a little bit later okay we're going to fast forward now from the renaissance to uh to the 19th century where everything became electric and so people wanted to design ciphers that use electric motors in particular these ciphers are called roller machines because they use rotors so an early example is called a head machine which uses a single rotor here you
have a picture of this machine The the motor i guess the rotor is over here and the secret key is captured by this disk here it's embedded inside of this disk which rotate by one notch every time you press a key on the typewriter okay so every time you that you hit a key the disk rotates by one notch now what does this key do well the key actually encodes the substitution table so Uh and therefore the key the disk actually is the secret key and as i said this disk encodes a substitution table in
this case if you happen to press c as the first letter output would be the letter t and then the disc would rotate by one notch after rotate rotating by one notch the new substitution table becomes the one shown here you see that e basically Moves up and then the remaining letters moves down so imagine the this is basically a two dimensional rendering of the disk rotating by one notch then you press the next letter and the disk rotates again you notice again n moved up and the remaining letters moved down so in particular if
we hit the letter c three times The first time we would output the output would be t the second time the output would be s and the third time the output will be k so this is how these a single rotor machine works and as it turned out very quickly after it was advertised it was again broken basically using letter frequencies diagram frequencies and trigram frequencies it's not that hard given enough ciphertext To directly recover the secret key and then the message again a ciphertext only attack so to kind of work against these frequency attacks
these statistical attacks these rotor machines became more and more complicated over time until finally i'm sure you've all heard of the enigma the enigma is kind of complicated router machine it uses three four or five rotors there are different Versions of the enigma machine here you see an example of the enigma machine with three rotors the secret key in the enigma machine is the initial setting of the rotors okay so in the case of three rotors there would be 26 cube possible different keys when you type on the typewriter basically these rotors here rotate at
different rates oh i forgot to say this is a diagram of An enigma machine using four rotors as you type on the typewriter the rotors rotate and output the appropriate letters of the cipher text so in this case the number of keys is 26 to the fourth which is 2 to the 18 which is actually relatively a small key space today you can kind of brute force a search using a computer through two of the eighteen different Keys uh very very quickly you know my wrist watch can do it in just a few seconds i
guess and so uh these this enable machine was already was using relatively small key spaces but i'm sure you've all heard that the british cryptographers at bletchley park were able to mount ciphertext only attacks on the enigma machine they were able to decrypt german ciphers back in World war ii and that played an important role in many different battles during the war after the war i guess that was the end kind of the mechanical age and started the digital age where folks were using computers and as the world kind of migrated to using computers the
government realized that it's buying a lot of digital equipment from industry and so we wanted industry To use a good cipher so that when it buys equipment from the from industry it would be getting equipment uh with with a decent cipher and so the government put out this request for proposal for a data encryption standard a federal data encryption standard and we're going to talk about this effort in more detail later on in the course but in 1974 a group at ibm put together a cipher that became known as des Data encryption standard which became
a federal standard for encrypting data the key space for des is 2 to the 56 which is relatively small these days but was large enough back in 1974 and another interesting thing about des is rather than unlike rotor machines which encrypt one character at a time the data encryption standard encrypts 64 bits at a time namely eight characters At a time and we'll see the significance of this later on in the course because des uses such a small key space these days it can be broken by a brute force search and so these days des
is considered insecure and should not be used in projects unfortunately it is used in some legacy systems but it definitely is on its way out and should not be definitely should not be used In new projects today there are new ciphers uh things like the advanced encryption standard which uses 128-bit keys again we'll talk about uh the advanced encryption standards in much more detail later on in the course there are many many other types of ciphers i mean i mentioned salsa 20 here we'll see why in just a minute but this is this is all
for the Quick historical survey and now we can get into the more technical material in this segment we're going to continue with a few more tools from discrete probability and i want to remind everyone that if you want to read more about this there's more information in this wikibooks article that is linked over here so first let's do a quick recap of where we are we said that discrete probability is Always defined over a finite set which we're going to denote by u and typically for us u is going to be the set of all
n-bit binary strings which we denote by 0 1 to the n now a probability distribution p over this universe u is basically a function that assigns to every element in the universe a weight in the interval zero to one such that if we sum Uh the weight of all these elements the sum basically sums up to one now we said that a subset of the universe is what's called an event and we said that the probability of an event is basically the sum of all the weights of the elements in the event and we said
that the probability of an event is some real number in the interval 0 to 1. and i want to remind everyone that basically the probability of the entire Universe is basically 1 by the fact that the sum of all the weights sums up to 1. then we define what a random variable is formally a random variable is a as a function from the universe to some other sets but the thing that i want you to remember is that the random variable takes values in some set v and in fact a Random variable defines a distribution
on this set v so the next concept we need is what's called independence and i'm going to very briefly define this if you want to read more about independence please go ahead and look at the wikibooks article but essentially we say that two events a and b are independent of one another if when you know that event a happens that Tells you nothing about whether event b actually happened or not formally the way we define independence is to say that the probability of a and b namely that both events happen is actually equal to the
probability of event a times the probability of event b so multiplication in some sense the fact that probabilities multiplied under conjunction captures the fact that these events are independent And as i said if you want to read more about that please take a look at the background material now the same thing can be said for random variables so suppose we have two random variables x and y they take values in some set v then we say that these random variables are independent if the probability that x is equal to a and y is equal to
b is equal to the product of these two Probabilities basically what this means is even if you know that x is equal to a that tells you nothing about the value of y okay that's what this multiplication means and again this needs to hold for all a and b in the space of values of these random variables so just again to jog your memory if you've seen this before a very quick example suppose we look at The set of of two bit strings so 0 0 0 1 1 0 and 1 1 and suppose we
choose a random element from this set okay so we randomly choose one of these four elements with equal probability now let's define two random variables x is going to be the least significant bit that was generated and y is going to be the most significant bit That's generated so a claim that these random variables x and y are independent of one another and the way to see that intuitively is to realize that choosing r uniformly from the set of four elements is basically the same as flipping a coin an unbiased coin twice the first bit
corresponds to the outcome of the first flip and the second bit corresponds to the Outcome of the second flip and of course there are four possible outcomes all four outcomes are equally likely which is why we get the uniform distribution over two bit strings now variables x and y why are they independent basically if i tell you the result of the first flip namely i tell you the least significant bit of r so i tell you how the first coin uh you know whether it fell on its head Or fell on its tail that tells
you nothing about the result of the second flip which is why intuitively you might you might expect these random variables to be independent of one another but formally we would have to prove that for uh all zero one pairs the probability of uh x equals zero and y equals zero x equals one y equals one and so on these probabilities multiply let's just Do it for one of these pairs so let's look at the probability that x is equal to zero and y is equal to zero well you see that the probability that x is
equal to zero and y is equal to zero is basically the probability that r is equal to zero zero and what's the probability that r is equal to zero zero well by the uniform distribution that's basically equal to one fourth uh well That's one over the size of the set which is one-fourth in this case and well lo and behold that's in fact these probability probabilities multiply because again the probability that x is equal to zero the probability that the least significant bit of r is equal to zero this probability is exactly one half because
there are exactly two elements that have their least significant bit equal to zero two out of four elements Gives you a probability of one half and similarly uh the probability that y is equal to zero is also one half so in fact the probability is multiplied okay so that's uh this concept of independence and the reason i wanted to show you that is because we're going to look at an important property of xor that we're going to use again and again so before we talk about xor let me just do a very quick review of
What xor is so of course xor of two bits means the addition of those bits modulo two so just to kind of make sure everybody's on the same page if we have two bits so zero zero zero one one zero and one one they're xor the truth table of the xor is basically just the addition modulo 2. as you can see 1 plus 1 is equal to 2 modulo 2 that's equal to 0. So this is the truth table for the xor and i'm always going to denote xor by this circle with a plus inside
and then when i want to apply xor to bit strings i apply the addition modulo 2 operation bit wise so for example the xor of these two strings would be 1 1 0 and i guess i'll let you fill out the rest of the xors just to make sure we're all on the same page so of course it comes out to one one Zero one now we're going to be doing a lot of xoring in this class in fact there's a classical joke that the only thing cryptographers know how to do is just xor things
together but i want to explain to you why we see xor so frequently in cryptography basically xor has a very important property and the property is the following suppose we have a random variable y That's distributed arbitrarily over 0 1 to the n so we know nothing about the distribution of y but now suppose we have an independent random variable that happens to be uniformly distributed also over 0 1 to the n so it's very important that x is uniform and that it's independent of y so now let's define the random variable which is the
xor Of x and y then i claim that no matter what distribution y started with this z is always going to be a uniform random variable so in other words if i take an arbitrarily malicious distribution and i xor it with an independent uniform random variable what i end up with is a uniform random variable okay this is again kind of a key property that makes xor Very useful for crypto so this is actually a very simple fact to prove let's just go ahead and do it uh let's just prove it for one bit so
for n equals one okay so the way we'll do it is we'll basically write out the probability distributions for the various random variables so let's see for the random variable y well the random variable can be either 0 Or 1. and let's say that p0 is the probability that it's equal to 0 and p1 is the probability that it's equal to 1. okay so that's one of our tables similarly we're going to have a table for the variable x well the variable x is much easier that's a uniform random variable so the probability that it's
equal to zero is exactly one-half probability that it's equal to one is also exactly One half now let's write out the probabilities for the joint distribution in other words let's see what the probability is for the various joint values of y and x in other words how likely is it that y is 0 and x is 0 y is 0 and x is 1 y is 1 and x is 0 and 1 1. well so what we do is basically because we assume the variables are independent all we have to do is multiply the Probabilities
so the probability that y is equal to 0 is p 0 probability that x is equal to 0 is one half so the probability that we get 0 0 is exactly p 0 over 2. similarly for 0 1 we'll get p 0 over 2. for 1 0 we'll get p 1 over 2 and for 1 1 again the probability that y is equal to 1 and x is equal to 1. well that's p 1 times the probability that x is equal to 1 which is a half So it's p 1 over 2. okay so those
are the four probabilities for the various options for x and y so now let's analyze what is the probability that z is equal to zero well the probability that z is equal to zero is basically the same as the probability that um let's write it this way x y is equal to zero zero or x y is equal to 1 1. those are the two possible cases that z is equal to 0 because z is the xor of x and y now these events are disjoint so this expression can simply be written as the sum
of the two expressions given above so basically it's the probability that x y is equal to 0 0 plus the probability that x y is equal to 1 1. so now we can simply look up these uh probabilities in our table so the probability that x y is equal to 0 0 is simply p 0 over 2 and the probability that x y is equal to 1 1 is simply p 1 over 2. so we get p 0 over 2 plus p 1 over 2. but what what do we know about p 0 and p
1 well it's a probability distribution therefore p 0 plus p 1 must equal 1 and therefore this fraction here must Equal to a half p 0 plus p 1 is equal to 1 so therefore the sum of these two terms must equal a half and we're done basically we prove that the probability that z is equal to zero is one half therefore the probability that z is equal to one is also one half therefore z is a uniform random variable so the simple theorem is the main reason why xor is so useful in cryptography The
last thing i want to show you about discrete probability is what's called the birthday paradox and i'm going to do it really fast here because we're going to come back later on and talk about this in more detail so the birthday paradox says the following suppose i choose n random variables in our universe u okay and it just so happens that these variables are independent of one another they actually don't have to be uniform All we need to assume is that they're distributed in the same way the most important property though is that they're independent
of one another so the theorem says that if you choose roughly the square root of the size of u elements we're kind of ignoring this 1.2 here it doesn't really matter but if you choose square root of the size of u elements then basically there's a good Chance that there are two elements that are the same in other words if you sample about square root of u times then it's likely that two of your samples will be equal to one another and by the way i should point out that this inverted e just means exists
so there exists indices i and j such that r i is equal to r j so here's a concrete example That we'll actually see many many times suppose our universe consists of all uh strings of length 128 bits so the size of u is gigantic it's actually 2 to the 128. it's a very very large set uh but it so happens that if you sample say around 2 to the 64 times from the set this is about the square root of u then it's very likely that two of your sample messages will actually be the
same so as is Called the birthday paradox well traditionally it's described in terms of people's birthdays so you would think that each one of these samples would be someone's birthday and so the question is how many people need to get together so that there are two people that have the same birthday so just as a simple calculation you can see there are 365 days in the year so you would need about 1.2 times the Square root of 365 people until the probability that two of them have the same birthday is more than a half this
if i'm not mistaken is about 24 which means that if 24 random people get together in a room it's very likely that two of them will actually have the same birthday this is why it's called a paradox because 24 supposedly is a smaller number than you would expect Interestingly people's birthdays are not actually uniform across all 365 days in the year there's actually a bias towards september but i guess that's not that's not relevant to the discussion here the last thing i wanted to do is just show you the birthday paradox about a bit more
concretely so suppose we have a universe of about a million samples then you can see that when we sample roughly 1200 times The probability that we get we sample the same number the same element twice is roughly a half but the probability of sampling the same number twice actually converges very quickly to one you can see that if we sample about 2200 items then the probability that two of those items are the same already is ninety percent and you know at three thousand it's basically one so this converges very Quickly to one as soon as
you go beyond the square root of the size of the universe so we're gonna come back and study the birthday paradox in more detail later on but i just for now wanted you to know what it is so that's the end of this segment and then in the next segment we'll start with our first example encryption systems now that we understand stream ciphers We're going to move on and talk about a more powerful primitive called a block cipher so a block cipher is made up of two algorithms e and d these are encryption and decryption
algorithms and both of these algorithms take as input a key k now the point of a block cipher is that it takes an n-bit plaintext as input and it outputs exactly the same number of bits as Outputs so it maps n bits of inputs to exactly n bits of outputs now there are two canonical examples of block ciphers the first one is called triple des in triple des the block size namely the number of input bits is 64. so triple des will map 64-bit blocks to 64-bit blocks and it does it using a key that's
168 bits long we're going to talk about how Triple des is built in the next segment another block cipher which is more recent is called aes now aes has slightly different parameters so here the block size is 128 bits so aas will map 128 bits of inputs to 128 bytes of output and it actually has three possible sizes of keys and i wrote down these sizes over here basically the longer the key the slower the cipher Is but presumably the more secure it is to break and we're going to talk about what it means for
block ciphers to be secure in just a minute now block ciphers are typically built by iteration they take in as input a key k for example in the case of aes the key could be 128 bits long and the first thing that happens to the key is it gets expanded into a sequence of keys K1 to kn called round keys now the way the cipher uses these round keys is by iteratively encrypting the message again and again and again using what's called a round function so here we have this function r that takes two inputs
this function r is going to be called the round function it takes as input the round key and it takes this input the current state of the message So here we have our input message say for aes the message would be 128 bits exactly because each block in aes is exactly 128 bits and then the first thing that happens is we apply the first round function using the key k1 to the message and we get some new message out as a result then we take this message m1 we apply the next round function to it
using a different key using the key k2 and we get the next round message out And so on and so forth until all the rounds have been applied and then the final output is actually the result of the cipher and again this would be also in the case of aes this was 128 bits and the resulting ciphertext would also be 128 bits now different ciphers have different number of rounds and they have different round functions so for example for triple des the number of rounds Is 48 and we're going to see exactly how the round
function for triple des works for aes for example the number of rounds is only 10 and again we're going to look at how the round functions for aes work as well in just a minute before we do that i just wanted to mention performance numbers so you can see here these are the performance numbers for the two typical block ciphers tripled as an Aes and these are the corresponding numbers for the stream ciphers that we looked at in the previous module rc4 salsa and sosa manuk and you can see that the block ciphers are considerably
slower than stream ciphers but we'll see that we can do many things with block ciphers that we couldn't do very efficiently with constructions like rc4 now my goal for This week is to show you how block ciphers work but more importantly i want to focus on showing you how to use block ciphers correctly for either encryption or integrity or whatever application you have in mind so to show you how to use block ciphers correctly it actually makes a lot of sense to abstract the concept a little bit so that we have kind of a clean
abstract Concept of a block cipher to work with and then we can argue and reason about what constructions are correct and what constructions are incorrect and so an abstraction it's a very elegant abstraction of a block cipher is what's called a pseudorandom function a pseudorandom permutation so let me explain what these things are so a pseudorandom function basically is defined over a key space an input space And an output space and all it is is basically a function that takes a key and an input as inputs and outputs some element in the output space okay
so it takes an element in k an element in x and outputs an element in y and the only requirement essentially is that there's an efficient way to evaluate the function for functions we're not requiring the db invertible we just need them to be Evaluable given the key and the input x now a related concept that more accurately captures what a block cipher is is called a pseudo-random permutation so a pseudo-random permutation is again defined over a key space and then just a set x and what it does is it takes an element in the
key space an element x and outputs basically one element in x now as usual The function e should be easy to evaluate so there should be an algorithm to evaluate the function e but more importantly once we fix the key k it's important that this function e be one to one in other words if you think of the space x as a set here and here's the same set x again then basically the function e what it does is it's a one to one function so every element in x Gets mapped to exactly one element
uh in x and then because it's one to one of course it's also invertible so given some output there's only one input that maps to that output and the requirement is that there is an efficient inversion algorithm called a d that given a particular output will output the original pre-image that Mapped to that output so really a pseudo-random permutation captures very accurately and syntactically what a block cipher is and often i'll use the terms interchangeably either a block cipher or a pseudo-random permutation i'll use whichever term depending on the context that where we're discussing things
okay so we have two examples as we said Of pseudorandom permutations tripled as an aes say for aes 128 the key space would be 128 bits and the output space would be 128 bits for triple des as we said the block size is only 64 bits and the key size is 168 bits okay so we'll use these running examples actually throughout so whenever i say a prp concretely you should be thinking aas or Triple desks now one thing that i wanted to point out is that in fact any pseudo-random permutation namely any block cipher you
can also think of it as a prf in fact the prp is a prf that has more structure in particular prp is a prf where the input space and the output space are the same so x is equal to y and in fact is efficiently invertible Once you have the secret key k okay so in some sense a prp is a special case of a prf although that's not entirely accurate and we'll see why in just a second so so far we've just described the kind of the syntactic definition of what is a pseudo-random permutation
and a pseudo and a function so now let's talk about what it means for a prf or prp to be secure and this concept will Essentially capture what it means for a block cipher to be secure okay so this is why i wanted to show you these definitions before we look at actual block cipher constructions so at least it's clear in your mind what it is we're trying to construct okay so here we have a prf and i'm going to need a little bit of notation not too much though so i'm going to need to
define the set Funds of x y this is basically the set of all functions from the set x to the set y denoted here is a big circle funds of x y now the set is gigantic its size is basically you know the size of y to the size of x so for example for aes remember both x and y would be 2 to the 128 so for aes the size of the set is Enormous it'll be two to the 128 times two to the 128. so it's kind of a double exponential so this is
a gigantic number this is more particles than there are in the universe but regardless we can kind of think of this set abstractly we never have to kind of write it down we can just keep it in our head and not worry about computing on it so this is a particular Gigantic set of the set of all functions from x to y now we're going to look at a much smaller set of functions namely i'll call this set s sub f and that's going to denote the set of all functions from x to y that
are specified by the prf as soon as we fix a particular key k okay so we fix the key k we let the second Argument float and that basically defines a function from x to y and we're going to look at the set of all such functions for all possible keys in the key space okay so if you think about it again for aes if we're using 128-bit keys the size of this uh i'll say s-a-e-s is basically going to be 2 to the 128. so much much much smaller than the set of all possible
functions From x to y and now we say that a prf is secure basically if our random function from x to y so literally we pick some arbitrary function in this gigantic set of all functions from x to y and we say that the prf is secure if in fact a random function from x to y is indistinguishable from a pseudorandom function from x to y namely when we pick a random function from the Set as f okay so more precisely basically again this the uniform distribution and the set of pseudorandom function is indistinguishable from
the uniform distribution and the set of all functions let me be just a little bit more precise just to give you a little bit more intuition about what i mean by that and then we'll move on to actual constructions so let me be a bit more precise about What it means for a prf to be secure and so what we'll do is basically the following so we have our adversary just trying to distinguish truly random function from a pseudorandom function so what we'll do is we'll let them interact with one of the two so here
in the top cloud we're choosing a truly random function in the bottom cloud we're just choosing a random key for a Pseudorandom function and now what this adversary is going to do is he's going to submit points in x so he's going to submit a bunch of x's in fact he's going to do this again and again and again so he's going to submit x1 x2 x3 x4 and then for each one of those queries we're going to give him either the value of the truly random function at the point x or the value of
the pseudo-random Function at the point x okay so the adversary doesn't know which one he's getting by the way for all queries he's always getting either the trillion function or the pseudorandom function in other words he's either interacting with a truly random function for all his queries or he's interacting with a pseudo-random function for all his queries And we say that the prf is secure if this poor adversary can't tell the difference he cannot tell whether is interacting with a truly random function or interacting with a pseudo-random function okay we're going to come back actually
later on and define prfs more precisely but for now i wanted to give you the intuition for what it means for prf to be secure so you'll understand what it is that We're trying to construct when we construct the pseudo-random functions and i should say that the definition for a pseudo-random permutation is pretty much the same except instead of choosing a random function we're going to choose a random permutation on the set x in other words a random one to one function on the set x the adversary can either query this Random function on the
set x or he can query a pseudorandom permutation and the prp is secure if the adversary cannot tell the difference okay so again the goal is to make these functions and permutations look like truly random functions of permutations okay so let's look at a simple question so suppose we have a secure prf so we know that this prf f happens to be Defined in the set x and it so happens you know it outputs 128 bits every time it so happens that this prf cannot be distinguished from a truly random function from x to 01
to the 128. now we're going to build a new prf let's call this prfg and the prfg is going to be defined as follows we say if x is equal to 0 always output 0 otherwise if x is not equal to 0 Just output the value of f so my question to you is do you think this g is a secure prf well so the answer of course is that it's very easy to distinguish the function g from a random function all the adversary has to do is just query the function at x equals 0.
for a random function the probability that the result is going to be 0 is 1 over 2 to the 128th Whereas for the pseudorandom function he's always going to get 0 because at 0 the function is always defined to be 0 no matter what the key and so all he would do is he would say hey i'm interacting with a pseudo-random function if he gets 0 at x equals 0 and he'll say i'm interacting with a random function if he gets non-zero at x equals zero so it's very easy to distinguish this g from random
so what this example Shows is that even if you have a secure prf it's enough that on just one known inputs the output is kind of not random the output is fixed and already the entire prf is broken even though you realize everywhere else the prf is perfectly indistinguishable from random okay so let's just show you the power of prf let's look at a very easy Application i want to show you that in fact around the functions directly give us a very simple pseudorandom generator okay so let's assume we have a pseudorandom function so this
one happens to go from n bits to n bits and then let's define the following generator its seed space is going to be the key space for the prf and its output space is going to be basically t blocks of n Bits each okay so you can see the output is a total of n times t bits for some parameter t that we can choose and it turns out basically you can do this very simple construction this is sometimes called counter mode where essentially you take the prf and you evaluate it at zero you valued
at one you validated two at 3 and 4 up to t and you concatenate all these values That's the generator okay so we basically took the key for the prf and we expanded it into n times t bits okay a key property of this generator is that it's paralyzable and what i mean by that is if you have two processors or two cores that you can compute on then you can have one core compute the even entries of the output and you can have another core compute the odd entries of the output So basically if
you have two cores you can make this generator run twice as fast as it would if you only have a single core so the nice thing about this is of course we know that pseudo-random generators give us stream ciphers and so this is an example of a parallelizable stream cipher and i just wanted to point out that many of the stream ciphers that we looked at Before for example rc4 those were inherently sequential so even if you had two processors you couldn't make the stream size work any faster than if you just had a single
processor now the main question is why is this generator secure and so here i'm only going to give you a little bit of intuition and we're going to come back and argue this More precisely later on but i'll just say that security basically follows directly from the prf property and the way we reason about security is we say well this prf by definition is indistinguishable from a truly random function on 128 bits so in other words if i take this generator and instead i define a generator using a truly random Function in other words i'll
write the output of the generator as f of 0 concatenated f of 1 and so on and so forth using a truly random function then the output of the generator using the truly random function would be indistinguishable from the output of the generator using a pseudorandom function that is the essence of the security property of a prf But with a truly random function you notice that the output is just truly random because for a truly random function f of 0 is a random value f of 1 is an independent random value f 2 is an
independent random value and so on and so forth so the entire output is a truly random output and so with a truly random function this generator produces truly random output and is therefore a perfectly secure Generator and so you see how the prf security property lets us argue security basically we argue that when we replace the prf with a truly random function the construction is necessarily secure and that says that the construction with a pseudorandom function is also secure okay and we're going to see a couple more examples like this later on so now you
understand what a block cipher is and you have intuition For what security properties it's trying to achieve and in the next segments we're going to look at constructions for block ciphers so now that we understand what block ciphers are let's look at a classic example called the data encryption standard so just a quick reminder block ciphers basically map in bits of input to n bits of output and we talked about two canonical examples triple des and aes In this segment we're going to talk about des and we'll talk about triple des actually in the next
segment and then i also mentioned before the block ciphers are often built by iteration in particular we're going to look at block ciphers that are built by a form of iteration where a key k is first expanded into a bunch of round keys and then a round function is applied to The input message again and again and again and essentially after all these round functions are applied we obtain the resulting ciphertext okay and again we're going to look at how des the data encryption standard uses this format i just want to be clear that in
fact to specify a block cipher of this type one needs to specify the key expansion Mechanism and one needs to specify the round function in the segment here i'm going to focus on the round function i'm not going to talk much about key expansion but i just wanted to mention that in fact key expansion is also a big part of describing how block cipher works okay so let's talk about the history of des essentially In the early 1970s ibm realized that their customers are demanding some form of encryption and so they formed a crypto group
and the head of that group was horst feistel who in the early 70s designed a cipher called lucifer now it's interesting in fact lucifer had a number of variations but one of the later variations in fact had a key length that was 128 bits and a block length That's also 128 bits okay in 1973 the government realized that it's buying many commercial off the shelf computers and so it wanted its suppliers to actually have a good crypto algorithm that they could use in products sold to the government so in 1973 the national bureau of standards
as it was called at the time put out a request for proposals for a Block cipher that's going to become a federal standard and in fact ibm submitted a variant of lucifer that variant actually went through some modification during the standardization process and then finally in 1976 the national bureau of standard adopted des as a federal standard and in fact for dez it's interesting that the key length was far reduced from lucifer It's only 56 bits and the block length was also reduced to 64 bits and in fact these decisions especially the decision to reduce
the key length is kind of the achilles heel of ds and was a source of many complaints over its life in particular already back in 1997 des was broken by exhaustive search meaning that a machine was able to search through all Two to the 56 possible keys to recover a particular challenge key and in fact we're going to talk about exhaustive search quite a bit it's quite an interesting question and there are various ways to defend against exhaustive search and basically this 1997 experiment kind of spelled the doom of des it meant that it des
itself is no longer secure and as a result the national Institute of standards as it became known issued a request for proposals for next generation block cipher standard and in 2000 it standardized on a cipher called reindel which became the advanced encryption standard aes and we'll talk about aes later on but in this segment i want to describe how des works now des is a cipher it's an amazingly successful cipher it's been used in banking industry In fact there's a classic network called the electronic clearinghouse which banks use to clear checks with one another and
the es is used for integrity in those transactions it's also used in commerce in fact it was very popular up until recently as the main encryption mechanism for the web of course now that's been replaced with aes and other ciphers overall it's a very successful cipher in terms of deployment Des also has a very rich history of attacks which we'll talk about in the next segment okay so now let's talk about the construction of the es so the core idea behind the es is what's called a feistel network due to horse feistel and basically it's
a very clever idea for building a block cipher out of arbitrary functions f1 to fd okay so imagine we have these functions F1 to fd that happens to map n bits to n bits now these are arbitrary functions they don't have to be invertible or anything what we want to do is build an invertible function out of those d functions and the way we'll do it is by building a new function we'll call it capital f that maps two n bits to two n bits and the construction is described right here so here we have
our inputs you notice there are two blocks of n bits in other Words the input is actually two n bits the r and l stand for right and left typically people describe a feistel network from top to bottom in which case these n bits really would be right and left but here it's more convenient for me to describe it horizontally so if we follow the r inputs you realize it basically gets copied into the l output without any change at all okay However the l inputs is changed somewhat basically what happens is the r input
is fit into the function f1 and the resultant is xor with l0 and that becomes the new r1 input okay so this is called one round of a five star network and it's done using the function f1 now we do this again with another round of the physical network with the function f2 and we do it again And again and again so we get to the last round and we do it again with the function fd and finally the output is rd ld okay so if you like we can write this in symbols that basically
l i is simply equal to r i minus 1 and ri let's see that's the more complicated one ri is equal well let's just follow the lines here ri is just equal to f i Applied to r i minus 1 xor l i okay then this is basically i goes from one to d so this is the equation that defines a five star network mapping a two-n-bit input to two n-bit outputs so here we have the again i just copied the picture of the feistel network and the amazing claim is that in fact it doesn't
matter which functions you give me for any functions f1 to fd that you Give me the resulting five-star network function is in fact invertible and the way we're going to prove that is basically we're going to construct an inverse because not only is it invertible it's efficiently invertible so let's see so let's look at one round of a feistel network so here this is the input ri li and this is the output ri plus 1 li plus 1. and now what i'm going to ask you is to Invert this so let's see so suppose now
the input that we're given is ri plus 1 li plus 1 and we want to compute ri li so we want to compute the round in the reverse direction so let's see if we can do it well let's look at ri so ri is very easy basically ri is just equal to li plus 1 so l i plus 1 just becomes ri but now let me ask you uh to write the expression for Li in terms of ri plus one and li plus one so i hope everybody sees that basically l i plus one is
fed into the function f i plus one the result is xored with ri plus one and that gives the li input so this is the inverse of one round of a feistel network and if we draw this as a diagram let's just write the picture for the inverse so here you notice the input is r i plus One l i plus one and the output is r i l i right so we're computing we're inverting the round so you notice that the inverse of a five star round looks pretty much the same as the five
stone round or in the forward direction it's literally well just for a technical reason it's kind of the mirror image of one another but it's basically uh the same construct and when we put These inverted rounds back together we essentially get the inverse of the entire feistel network so you notice we start off with round number d with the inverse of round number d then we do the inverse of round number d minus one and so on and so forth until we get to the inverse of the first round and we get our final output
which is r zero l zero right this is the inputs and we manage to invert basically r d l D and get r zero l zero and the interesting thing is that basically the inversion circuit looks pretty much the same as the encryption circuits and the only difference is that the functions are applied in reverse order right we started with fd and ended with f1 whereas when we were encrypting we started with f1 and ended with fd so for hardware designers This is very attractive because basically if you want to save hardware you realize that
your encryption hardware is identical to your decryption hardware so you only have to implement one algorithm and you get both algorithms the same way the only difference is that the functions are applied in reverse order okay so this filestone mechanism is a general method for building invertible Functions from arbitrary functions f1 to fd and in fact it's used in many different block ciphers although interestingly it's not actually used in aes so there are many other block ciphers that use a feistel network or of course they differ from des in the functions f1 to fd but
aes actually uses a completely different type of structure That's actually not a faison network we'll see how aes works in a couple of segments so now that we know what feistel networks are let me mention an important theorem about the theory of iso networks that shows why they're a good idea this theorem is due to lubian rock off back in 1985 and it says the following suppose i have a function that is a secure pseudorandom function okay so it's indistinguishable from Random and happens to act on n bits so it maps n bits to n
bits and uses a key key then it turns out that if you use this function in three rounds of a feistel network what you end up with is a secure pseudo-random permutation in other words what you end up with is an invertible function that's indistinguishable from a truly Random invertible function and i hope you remember that the definition of a secure block cipher is that it needs to be a secure pseudo-random permutation so what this theorem says is if you start with a secure pseudo-random function you end up with a secure block cipher basically that's
what this says now let me explain a little bit more detail what's actually going on here So essentially the prf is used in every round of the feistel network so in other words here what's actually computed is the prf using one secret key k0 here what's computed is the prf using a different secret key of course applied to r1 and here we have yet another secret key k1 applied up k2 applied to r2 and you notice this is why basically this physical construction uses Keys and k cubed in other words it uses three independent keys
so it's very important that the keys are actually independent so really we need three independent keys and then we end up with a secure pseudo-random permutation okay so that's the theory behind five so networks and now that we understand that we can actually look at the specifics of des So des is basically a 16-round five-star network okay so there are functions f1 to f16 that map 32-bits to 32-bits and as a result the des itself acts on 64-bit blocks 2 times 32. now the 16 round functions in des are actually all derived from a single
function f just used with different keys so in fact these are the different round keys so ki is a round key and it's basically Derived from the key k derived uh from uh the 56 bit the s key k okay now describe what this function f is in just a minute but basically you see that by using 16 different round keys we get 16 different round functions and that gives us the five stone network so just at a high level how the des works basically you have a 64 bit input The first thing it does
is this initial permutation that just permutes the 64 bits around namely maps bit number one to bit number six bit number two to bit number 17 and so on this is not for security reasons this is just specified in the standard then we go into the 16 round feistel network that actually you now know how it works basically uses the function F1 to f16 as specified before and then basically we have another permutation this is called the final permutation that's just the inverse of the initial permutation again just permutes bits around this is not necessary
for security reasons and then we finally get uh the final outputs okay now as we said there's a key expansion step which i'm not going to describe But basically this 56 bit dez key is expanded into these round keys where each round key is 48 bits okay so we have 16 48 bit round keys and they're basically used in the 16 rounds of des and then when you want to invert the cipher all you do is you use these round keys these 16 round keys in reverse order okay so now that we understand The des
structure the only thing that's left to do is specify the function capital f so let me explain how this function works so basically it takes his inputs it's a 32-bit value let's call it x but in reality you remember this is r0 r1 or 2 or 3 and so on and so forth these are 32-bit values and then it takes also a 48-bit round key so here we have our key ki Which happens to be 48 bits the first thing it does is it goes through an expansion box and this expansion box basically takes 32
bits and maps them into 48 bits now all the expansion box does is just replicate some bits and move other bits around so for example bit number one of x is replicated into positions 2 and 48 in the output bit number 2 of x is positioned in As bit number 3 of the output and so on and so forth just by replicating some of the bits of x we expand the input into 48 bits the next thing we do is we compute an xo with a round key sometimes people say that cryptographers only compute xors
this is an example of that where well we just do xors in this function and then comes the magic of des where actually these 48 Bits are broken into eight groups of six bits six seven eight and so let me draw and then what happens is uh so yes each one of these each one of these wires is uh six bits and then the they go into what what are called s boxes and i'll explain the xboxes in just a minute the xboxes are kind of the smarts of ds s8 and then the s boxes
basically a map six bits to four bits so the outputs of The xboxes are these four bits they're collected this gives us 32 bits right eight groups of four bits gives us 32 bits and then finally this is fit into yet another permutation which just maps the bits around so for example bit number one will go to bit number nine bit number two will go to bit number 15 and so on so it just permutes the 32 bits around and that's The final 32 bits outputs of this f function okay so by using different round
keys essentially we get different round functions and that's how we form the 16 round functions of the es now the only thing that's uh left to specify are these s boxes so the s boxes literally are just functions from six bits to four bits and they're just Implemented as a lookup table right so describing a function from six bits to four bits basically amounts to writing the output of the function on all two to the six possible inputs two to the six is 64. so we just have a table that literally contains 64 values where
each value is four bits so here's an example this happens to be the fifth s box and you see that this is a table that contains 64 values right it's four uh by 16. so 64 values and for example if you want to look at the output that corresponds to zero one one zero one one okay then right you look at these two bits this is zero one and you look at these four bits this is one one zero one and you see that the output is one zero zero one the four bits output one
zero zero one okay so the s boxes are just Implemented as these tables now the question is how are these s boxes chosen how are these tables actually chosen by the designers of this to give you some intuitions for that let's start with a very bad choice for s boxes so imagine the s boxes were linear what do i mean by that i mean that imagine that these six bit inputs literally Were just xored with one another in different ways to produce the four bits outputs okay another way of writing that is that we can
write the s box as a matrix vector product so here you have the matrix a i and the vector the six bit vector x and you can see that if we write this matrix vector product basically we take the inner product of this vector with the input vector remember these are all bits so the six Bit vector inner product another six bit vector and we do that modulo two you realize basically what we're computing is x two x or x three right because only position two and position three have ones in it and similarly the
next inner product will produce x1 x4 x4 x or x5 and so on and so forth okay so you can literally see that if the s boxes are implemented this way then all they Do is just apply the matrix a to the input vector x which is why we say that in this case the s boxes are completely linear now i claim that in fact if the s boxes were linear then des would be totally insecure and the reason is if the x boxes are linear then all that des does is just compute xor various
things and permute and shuffle bits around So it's just xors and bit permutations which means that as a result all of dez is just a linear function in other words there will be a matrix b of these dimensions basically it's a matrix b that has width 832 basically what i would do is i would write the 64-bit message plus the 16 round keys as one long vector right so the message is 64 bits and There are 16 round keys each one is 48 and that if you do the math it's basically 832 okay so i
write these guys the keys and the messages one long vector and then there would be this matrix that essentially when you compute these matrix vector products essentially you get the different bits of the ciphertext so there's 64 of these rows and as a result you get 64 bits of Ciphertext okay so this is what it means for ds to be linear so if you think a little bit about this you realize the s boxes are the only non-linear part of des so if the s boxes were linear then the entire circuit is linear and therefore
it can be expressed as this matrix now if that's the case then des would be terrible as a secure pseudorandom permutation and let me give you a very simple example Basically if you take the xor of three outputs of des well let's think what that means basically we would be looking at b times the matrix b that defines d e s times a one vector plus x or b times another vector x or b times the third vector we could take the b out of the parentheses so we'd be basically doing b times this vector
over here but of course k x or k x or k x okay This is just k and so if you think about what that means basically we just got back the s of k at the point m1 xor m2 xor m3 but this means that now des has this horrible relation that can be tested right so basically if you xor the output of three values m one m two and three you'll get the value of des at the point m one x or m two x arm three now this is not a relation that's
going To hold for a random function a random function is not going to satisfy this equality and so you get a very easy test to tell you that dez is not a random function and in fact it's not even maybe you can take that as a small exercise it's not even difficult to see that given enough input output pairs you can literally recover the entire secret key yeah you just need 832 input output Pairs and you'll be able to recover the entire secret key and so if the xboxes were linear this would be completely insecure
it turns out actually even if the xboxes were close to being linear in other words the s boxes were linear most of the time so maybe for 60 out of the 64 inputs the xboxes were linear it turns out that would also be enough To break the es and we're going to see why later on in particular if you choose the s boxes at random it turns out they'll tend to be somewhat close to linear functions as a result you'll be able to totally break the s you'll just be able to recover the key in
basically very little time and so the designers of des actually specified a number of rules they use for choosing the s boxes And it's not surprising the first rule is that these functions are far away from being linear okay so in other words there is no function that agrees with a large fraction of the outputs of the s box and then there are all these other rules for example there are exactly four to one maps right so every output has exactly four images and so on and so forth so we understand Now why they chose
the s boxes the way they did and in fact it's all done to defeat certain attacks on ds okay so that's the end of the description of the es and then in the next two segments we're going to look at the security of the es now that we understand how des works let's look at a few attacks on this and we're going to start with an attack called exhaustive search [Music] so our goal here is basically that given a few input output pairs mici our goal is to find the key that maps these m's to
the c's in other words our goal is to find the key that maps m1 m2 m3 into c1 c2 c3 and as i said our goal is to find the key that does this mapping the first question is how do we know that this key Is unique and so let's do a little bit of analysis to show that in fact just one pair is enough to completely constrain a des key and that's why this question makes sense okay so let's see so we're going to prove the simple lemma now let's assume that des is what's
called an ideal cipher so what is an ideal cipher basically we're going to pretend like d S is made up of random invertible functions in other words for every key des implements a random invertible function since there are two to the 56 keys in des we're going to pretend like des really is a collection of two to the 56 functions that are invertible from 0 1 64 0 164. okay so of course des is not a Collection of 256 random functions but we can idealize the cipher and pretend that it is such a collection then
what can we say then in fact it turns out that just given one message in ciphertext you just give me one pair message and ciphertext there's already only one key that maps this message to that Ciphertext so already just given one pair m and c i can ask you find me the key that maps m to c and the solution is very likely to be unique in fact it's going to be unique with probability roughly 99.5 percent i should say that the statement is true for all m and c and the probability is just over
the choice of the random permutation that make up the cipher so let's do the Proof this is fairly straightforward so what we're basically asking is what's the probability that there exists some key that's not equal to k such that well c we know is equal to d e s of k comma m by definition of c and m but we're asking how likely is it that there's this other key k prime that also satisfies this equality you realize that if this is true if such a key k prime exists then just given m and c
you can't decide Whether the right key is k or k prime because both of them work okay but i want to argue that this happens with low probability well so what does it mean that there exists a key k prime that's that satisfies this relation well we're asking what's the probability that the first key you know the all zero key satisfies it or the second key satisfies or the third key satisfied and so on and So forth so by the union bound we can bound this probability by the sum over all p's k prime over
all 56-bit keys of the probability that there's km is equal to des k prime okay so we're asking basically what is this probability for a fixed key k prime that it happens to collide with the key k at the message m well let's think about this for a Second let's fix this value let's suppose it's some fixed value and then we're asking how likely is it that a random permutation pi k prime at the point m happens to produce exactly the same output as the key k at the point m well it's not difficult to
answer and see that in fact this is for a single t k prime the probability is at most one over two to the 64. right there are 2 to the 64 possible outputs for the permutation what's the probability that it lands exactly on this output well it's 1 over 2 to the 64. and we're summing over all 2 to the 56 keys so we just multiply the two we get one over two to the eighth which is basically one over 256 okay so the probability that the key is not unique Is one over 256 therefore
the probability that it is unique is one minus that which is 99.5 percent okay so already if you give me one plain text ciphertext pair the key is completely determined there's only one key that will map that plain text to that ciphertext and the question is just can you find that key now it turns out in fact if you give me two pairs so you give me m1 And m2 and their corresponding outputs c1 and c2 the probability basically just do exactly the same analysis the probability basically becomes one that there's only one such key
okay essentially this is very very very very close to one and basically it says given two pairs it's very very likely that only one key will map this pair of messages to this pair of ciphertexts and as a result again we can ask well Find me that unique key and by the way the same is true for aes if you look at aes 128 again just given two input output pairs there's going to be only one key with very high probability so essentially now we can ask for this exhaustive search problem i give you two
or three pairs and i ask you well find me the key so how are you going to do it well You're going to do it by exhaustive search essentially by trying all possible keys one by one until you find the right one so this is what's known as the des challenge so let me explain how this challenge worked the challenge was issued by a company called rsa and what they did is basically they published a number of ciphertexts but three of the Ciphertexts had known plaintexts so in particular what they did is they took the
message here the unknown message is colon and you can see they broke it up into blocks if you look at these these are basically eight byte blocks eight bytes as you know is 64 bits right so each one of these is 64 bits and then they encrypted them using a secret key they encrypted all using the same secret key To get kind of three cipher text so this gives us three plaintext ciphertext pairs and then they gave us a whole bunch of other ciphertexts you know c4 c5 c6 and the challenge was decrypt these guys
using the key that you found from an exhaustive search over the first three pairs that you were given okay so that was called the des Challenge and let me tell you a little bit about how long it took to solve it so interestingly in 1997 using an internet search using distributed.net basically they were able to search through enough of the key space to find the correct key in about three months you realize the key space has size 2 to the 56 but on average you only have to search through half the key space To find
the key and so it took them three months then a kind of a miraculous thing happened the eff actually contracted paul kocher to build special purpose hardware to break the yes this was a machine called deep crack it cost about 250 dollars and it broke the next des challenge in only three days interestingly by the way rsa said that they would pay ten Thousand dollars for each uh solution of the challenge so you can see that this is not quite economical they spent 250k they got 10 000 for solving the challenge the next thing that
happened is in 1999 rsa issued another challenge and they said well you got to solve it in half the time of the previous solution and so using both deep crack and the internet search together they were able to break dez in 22 hours So the bottom line here is essentially des is completely dead essentially if you forget or you lose your des 56 bit key don't worry within 22 hours you can actually recover it and in fact anyone can recover it and so dez essentially is dead and no longer secure and just kind of a
final nail in the coffin as hardware technology improved there was another project uh called Copacabana they used uh fpgas just off the shelf fpgas only 120 fpgas it only cost ten thousand dollars and they were able to break to do an exhaustive key search in about seven days so very very cheap hardware just off the shelf you can break those already very quickly so the lesson from all this is essentially 56-bit ciphers are totally totally dead and so the question is what to do people Really liked as it was just deployed in a lot of
places there were a lot of implementations there was a lot of hardware support for it so the question was what to do and so the first thing that came to mind is well maybe we can take des and we can kind of artificially expand the key size so we strengthen it against this exhaustive search attack and the first idea that comes to mind is Basically well let's iterate the block cipher a couple of times and this is what's called triple des so triple des is a general construction basically it says the following suppose you give
me a block cipher e so here it has a key space k and it has a message space m and an output space of course m as well let's define the triple construction which now uses three keys and it's Defined as follows basically here the triple construction is uses three independent keys encrypts the same message block as before and what it does is it will encrypt using the key k3 then it will decrypt using the key k2 and then we'll encrypt again using the key k1 okay so basically encrypting three times using three independent keys
You might be wondering why is it doing e d e why not just do e e e why do we have to have a d in the middle well that's just for a kind of a hack you notice what happens if you set up k1 equals k2 equals k3 what happens if all three keys are the same well basically what would happen is one e and one d would cancel And he would just get normal des out so it's just a hack so that if you have a hardware implementation of triple des you can set
all three keys to be the same and you'll get a hardware implementation of single disk of course it'll be three times as slow as a regular implementation of single desk but nevertheless it's still an option okay so for triple des in fact now we Get a key size that's 3 times 56 which is 168 bits so this is 168 bits is way too long to actually do exhaustive search on that will take time 2 to 168 which is more than all the machines on earth working for 10 years would be able to do unfortunately of
course the cipher is three times slower than the es so this is a real problem with triple dense now i want to mention that in fact you Might think that triple test has security 2 to 168 but in fact there is a simple attack that actually runs in time 2 to 118 and i want to show you uh how that attack works okay so but in fact hootsuite 118 is still a large number in fact anything that's kind of bigger than 2 to the 90 is considered sufficiently secure 2 to 118 is definitely Sufficiently secure
against exhaustive search and generally is considered a high enough level of security so clearly triple des is three times as slow as des so the question is why did they repeat the cipher three times why not repeat the cipher just two times or in particular the question is what's wrong with double dez so here we have double this basically you see it uses only two keys And it uses only two applications of the block cipher and as a result it's only going to be twice as slow as the es not three times as slow as
the es well the key length for double dez is 2 times 56 which is 112 bits and in fact doing exhaustive search on a space of 112 bits is too much 2 to 112 is too big of a number to do exhaustive search over such a Large space so the question is what's wrong with this construction well it turns out this construction is completely insecure and i want to show you an attack so suppose i'm given a bunch of inputs say m1 to m10 and i'm given the corresponding outputs c1 to c10 what's my goal
well my goal is basically to find keys you know a pair of keys k1 k2 such that if i encrypt The message you know the message capital m using these keys in other words if i do this encryption this double des encryption then i get the ciphertext vector that was given to me okay so our goal is to solve this equation here now you stare at this equation a little bit and you realize hey wait a minute i can rewrite it in kind of a interesting way i can apply the decryption algorithm And then what
i'll get is that i'm really looking for keys k1 k2 that satisfy this equation here where basically all i did is i applied the decryption algorithm using k1 to both sides okay now whenever you see an equation like this what just happened here is that we separated our variables into two sides the variables now appear on independent sides of the equation And that usually means that there's a faster attack than exhaustive search and in fact this attack is called a meet in the middle attack where really the meet in the middle is going to somehow
attack this particular point in the construction okay so we're going to try and find a key that maps m to a particular value here and maps c to the same value okay so let me show you how the attack works So the first thing we're going to do is we're going to build a table here let me clear up some space here the first step is to build a table that for all possible values of k2 encrypts m under that value okay so here we have this table so you notice these are all i'll say
like this these are all 2 to the 56 dez keys single des keys okay so the Table has 256 entries and what we do is basically for each entry we compute the encryption of m under the appropriate key so this is the encryption of m under the all zero key the encryption of m under the one key and then at the bottom we have the encryption of m under the all one key okay so there are two to the 56 entries and we sort this table based on the Second column okay so far so good
so by the way this takes time to build this table takes time 2 to the 56 and i guess we also want to sort sorting takes n log n time so it's 2 to the 56 times log 2 to the 56. okay so now that we have this table we've essentially built all possible values in the forward direction for this point now what we're going to Do is this meet in the middle attack where now we try to go in the reverse direction with all possible keys k essentially we compute the decryption of c under
all possible keys k1 okay so now for each potential decryption remember the table holds all possible values in the midpoint so then for each possible decryption we check Hey use the decryption in the table in the second column in the table if it is in the table then aha we found a match and then what do we know we know that essentially well we found a match so we know that say for example a decryption using a particular key k1 happened to match this entry in the table you know k sub 2 or more generally
ki then we know That the encryption of m under ki is equal to the decryption of c under k okay so we kind of build this meat in the middle where the two sides you know the encryption of m under ki and the decryption of c under k collided but if they collided then we know that in fact this pair k i and k is equal to the pair that we're looking for and so we've just solved our challenge so now let's look at What's the running time of this well we had to build a
table and sort it and then for all possible decryptions we had to do a search through the table so there were 2 to the 56 possible decryption each search in a sorted table takes log of 2 to the 56 time if you just work it out this turns out to be 2 to the 63 which is way way way way way smaller Than 2 to 112. okay so this is a serious attack it's probably doable today that runs in a total time of 2 to the 63 which is about the same time as the exhaustive
search attack on des so really double dez really didn't solve the exhaustive search problem because well there's an attack on it that runs in about the same time as exhaustive search on single desk now Someone might complain that in fact this algorithm well we have to store this big table so it takes up a lot of space but you know so be it that's uh nevertheless the running time is still quite small or significantly smaller than 2 to 112. now you notice by the way this the same attack applies to triple des what you would
do is you would implement a man-in-the-middle attack against this Point you would build a table of size 2 to the 56 of all possible encryptions of m and then you would try to decrypt with all 2 to 112 keys until you find the collision and when you find a collision you basically found k1 k2 k3 okay so even triple des has an attack that basically explores only 212 possible keys but choosing 112 Is a large enough number so triple does in fact as far as we know is sufficiently secure i should mention that triple des
is actually a nist standard and so triple des is actually used quite a bit and in fact dez should never ever ever be used if for some reason you're forced to use some version of des use triple des not des okay i want to mention one more method for Strengthening dez against exhaustive search attacks this method actually is not standardized by nist because it doesn't defend against more subtle attacks on des but nevertheless if all you worry about is exhaustive search and you don't want to pay the performance penalties of triple des then this is
an interesting idea so let me show you how it works so let e be a block cipher that operates on N bit blocks we're going to define the e x construction and for des we're going to get that x to be the following so we use three keys k1 k2 k3 and then basically before encryption we xo with k3 then we encrypt using k2 and then after encryption we xo with k1 that's it that's the whole construction so basically you notice it doesn't slow the block cipher much because all we did is we applied the
cipher plus two Additional xors which are super fast the key length for this is in fact uh well we got two keys that are as long as the block size and we got one key that's as long as the key size so the total is 184 bits now it turns out actually the best attack that's known is actually an attack that takes time 2 to 120 and it's actually fairly simple so it's a generic attack On ex it will always take time basically block size plus the key size and it's a simple homework problem for
you to try to figure out this attack i think this is a good exercise okay in fact there's some analysis to show that there is no exhaustive search attack on this type of construction so it's a fine construction against exhaustive search but there are more subtle attacks on def that we'll talk About in the next segment that basically this construction will not prevent one thing that i want to point out unfortunately i found this mistake in a number of products is if you just decide to xor on the outside or if you just decide to
x or on the inside as opposed to xoring on both sides which is what des x does you notice there's x xors both on the Outside and on the inside if you just do one of them then basically this construction does nothing to secure your cipher it'll still be as vulnerable to exhaustive search as the original block cipher e okay so this is another homework problem and actually you'll see that as part of our homeworks okay so this basically concludes our discussion of exhaustive search Attacks and next we'll talk about more sophisticated attacks on des
there is an immense literature on attacking block ciphers in this segment i just want to give you a taste for what these attacks look like and i hope i'll convince you that you should never ever ever design your own block cipher and just stick to the standards like triple des and aes the first set of attacks i want to talk About are attacks on the implementation of the block cipher as an example imagine you have a smart card that's implementing a block cipher so the smart card for example could be used for credit card payments
it might have a secret key inside of it to authenticate your credit card payments as you stick the card into a payment terminal say so now if an attacker obtains your smart card What he could do is he could actually take this marker to a lab and then run the card and measure very precisely how much time the car took to do encryption and decryption now if the amount of time that the implementation took to the encryption depends on bits of the secret key then by measuring the time the attacker will learn something about your
secret key and in fact he might even be able to completely extract your secret key There are many examples of implementations that simply by measuring the time very precisely for many operations of encryption algorithm you can completely extract the secret key another example is rather than just measuring the time you can actually measure the power consumption of the card as it's operating so literally you can connect it to a device that will measure the Current that the card is drawing and then graph the current very very precisely now these cards are not very fast and
as a result you can actually measure the exact amount of power consumed at every clock cycle as the card was executing when you do that you actually get graphs of this form so this is an example of a smart card operating while it's doing the this computation So you can see very clearly here's when it was doing the initial permutation here's where it's doing the final permutation and then here you can count there are actually 16 hills and troughs corresponding to the 16 rounds and essentially when you zoom in on a graph like this you
can basically read the key bits off one by one just by looking at how much power the card consumed as it was doing the different operations It turns out that even cards that take steps to mask this type of information are still vulnerable there's an attack called a differential power analysis where basically you measure the power consumed by the card over many many many uh runs of the encryption algorithm and as long as there is any even small dependence between the amount of current consumed and the bits of the secret key basically that dependence will
show up After enough runs of the encryption algorithm and as a result you'll be able to completely extract the secret key okay so these attacks were actually discovered by paul kocher and his colleagues applied cryptography research and there's actually a fairly large industry devoted to just defending against these power attacks as far as timing attacks are concerned i want to Mention that these are real they're not just about smart cards for example you can imagine a multi-core processor where the encryption algorithm is running on one core and the attacker code happens to be running on
another core now these cores actually share the same cache and as a result an attacker can actually measure can actually look at the exact cache misses That the encryption algorithm incurred it turns out that by looking at cache misses you can completely figure out the secret key used by the algorithm so one core can essentially extract information from the other core just by looking at cache misses so implementing these block ciphers is actually quite subtle because you have to make sure that the side channel attacks don't leak Information about your secret key another type of
attack that's been discussed in the literature is what's called a fault attack so here basically if you're attacking a smart card you can actually cause the smart card to malfunction perhaps by overclocking it perhaps by warming it up essentially you can cause the processor to malfunction And output erroneous data it turns out that if during encryption there are errors in the last round of the encryption process that the resulting ciphertexts that are produced are enough to actually expose the secret key k it's quite an interesting result that in fact if you have any errors if
you ever output or wrong results that actually could completely compromise your secret key so of course the defense against this Means that before you output the result of your algorithm you should check to make sure that the correct result was computed now of course that's non-trivial because how do you know that the error didn't happen in your checking algorithm but there are known ways around that so basically you can actually compute something three or four times take majority over All those results and be assured that the output really is correct as long as not too
many faults occurred inside of your computation these are attacks on the implementation i hope these examples kind of show you that not only should you not invent your own block ciphers you should never even implement these crypto primitives yourself because a you have to make sure there are no side channel attacks on your Implementation and b you have to make sure that the implementation is secure against fault attacks okay so instead you should just use standard libraries like the ones available in openssl and many other libraries out there so don't implement these primitives yourself just
use existing libraries all right so now i want to turn to kind of more sophisticated attacks on block Ciphers and i'll particularly talk about how these attacks apply to des okay so these attacks were discovered by biam and chamir back in 1989 and i'll particularly describe a version of the attack discovered by matsui in 1993. so the goal here is basically given many many many input output pairs can we actually recover the key better than exhaustive search so anything that runs better than Exhaustive search already counts as an attack on the block cipher okay so
the example i want to give you is what's called linear cryptanalysis and here uh imagine it so happens that you know c is the encryption of m using key k and suppose it so happens that if i look at a random key and a random message somehow there's a dependence between the message Ciphertext and the key bits in particular if i xor a subset of the message bits so this is just a subset of the message bits if i xor that with a certain subset of the ciphertext bits so these two the attacker sees the
attacker has the message and the attacker has a cipher text and then you compare that to an xor of a subset of the key bits Now if the two were completely independent which is what you'd like you definitely don't want your message and your cipher text to somehow predict your key bits if the two are like completely independent then this equality will hold with probability exactly one-half but suppose it so happens that there's a bias and this probability holds with probability half plus epsilon for some small epsilon it so happens That in fact for des
there is such a relation if the relation holds specifically because of a bug in the design of the fifth s box it turns out the fifth s box happens to be too close to a linear function and that linear function basically as it propagates through the entire dead circuit generates a relation of this type you Notice this is basically a linear relation that's being computed here so this small tiny tiny linearity in the fifth s box generates this relation over the entire circuit where the epsilon is tiny epsilon is 1 over 2 to the 21
and i wrote down what that is so the bias is really really really really small but nevertheless there is a bias Using these particular substance of bits now i'm not going to show you how to derive this relation i'm not going to show you even what it is i'll just tell you how to use a relation like this once you find it okay so here's our relation that we have and the question is how to use it so with a little bit of statistics you can actually use an equation like this to determine some of
the key bits and here's how you do it Suppose you were given 1 over epsilon squared message ciphertext pairs and these have to be independently random messages and the corresponding ciphertexts what you would do is you would use the formula above in fact you would use the left hand side of the formula above to compute this relation between the message and ciphertext for all the pairs you were given Now what do you know you know that for half plus epsilon of these values you know that these things will be equal to an xor of the
key bits so if you take majority over all the values you've computed it turns out it's not so difficult to see that in fact you'll get the correct prediction for the extra of the key bits with probability 97.7 percent in other words if this relation happens To be correct more than half the time then the majority will be right and because there's a bias there's an epsilon bias the probability that you will be correct more than half the time is in fact 97.7 in which case the majority in fact will give you the correct xor
of the key bits okay so this is kind of cool within one over epsilon square time you can figure out an xor of a bunch of Key bits so now let's apply this to to des so for des we have epsilon which is 1 over 21 which means that if you give me 2 to the 42 input output pairs i can figure out an xor of the key bits and now it turns out i'm not going to exactly show you how roughly speaking using this method you don't just get one key bit in fact you
get two key bits you can kind of use this relation One's going in a forward direction and one's going in the backwards direction so that gives you two xors of bits of the secret key okay so that's two bits of information about the secret key and then it turns out you can get 12 more bits because essentially you can figure out what the inputs are to the fifth s box okay so i'm not going to exactly show you how but it turns out you can get 12 more Bits which is a total of 14 bits
overall so now using this method you've recovered 14 bits of the secret key and of course it took you time 2 to the 42. okay so then what do you do well so the rest of it is easy now what you're going to do is you're going to do exhaustive search on the remaining bits well how many remaining bits are there well there are 42 remaining bits so the exhaustive search will take you Time 2 to the 42. so what's the total attack time well the first step of the algorithm to determine the 14 bits
took two to the 42 time and the remaining brute force search also took to the 42 time so overall the attack took two to the 43 time okay so now this is much better than exhaustive search within 2 to the 43 time We broke dez but of course this required two to the 42 random input output pairs whereas exhaustive search only required three pairs okay so this is a fairly large number of input output pairs that are needed but given such a large number you can actually recover the key faster than exhaustive search okay so
what's the lesson in all this the lesson is firstly Any tiny bit of linearity basically in this in the fifth s box which was not designed as well as the other s-boxes basically led to an attack on the algorithm okay a tiny bit of linearity already introduced this linear attack and i want to emphasize again that this is not the sort of thing you would think of when you design a cipher and so again the conclusion here is there are very subtle attacks on block Ciphers one which you will not be able to find yourself
and so just stick to the standards don't ever ever ever ever design your block cipher okay so that's all i want to say about sophisticated attacks now let's move on to the last type of attack that i want to mention which i'll call quantum attacks which are again are these are generic attacks on All block ciphers so let me explain what i mean by this so first of all let's look at a generic problem it's a generic search problem suppose i have a function on some large domain x that happens to be to output either
0 or 1. and it so happens that this function is mostly 0 and there's like maybe one input where the function happens to evaluate to one and your goal is basically you know find Me the inputs where the function happens to be one maybe there's only one such input but your goal is to find it well so on a classical computer what can you do the function is given to you it's given to you as a black box so the best you can do is just try all possible inputs so this is going to take
time which is linear in the size of the domain now it turns out there's an absolutely magical Result that says that if you build a computer that's based on quantum physics as opposed to classical physics you can solve this problem faster so let me explain what i mean by this so first of all in the 70s and 80s it was observed i think it was actually richard feynman who observed this initially that said that it turns out to be very difficult to Simulate quantum experiments on a classical computer so feynman said hey if that's the
case maybe these quantum experiments are computing things that a classical computer can't compute so they're somehow able to compute very quickly things that are very difficult to do classically and that turned out to be correct And in fact the example i want to show you is one of these amazing things that in fact if you could build a quantum computer that's using quantum physics then it's in fact you can solve this search problem not in time x but in time square root of x so somehow even though the computer doesn't know anything about the function
f it's treating it as a black box nevertheless it's able to find a point Where the function is 1 and times square root of x i'm not going to explain this here but at the end of the class we're going to have an advanced topics lecture and if you'd like me to explain how this algorithm works i can explain it in that advanced topics lecture it's actually quite interesting and in fact quantum computers have quite an impact on crypto and again as i said i can explain this In the very last lecture all right so
what does this have to do with breaking block ciphers so far it's just a generic search problem well oh actually i guess i should say before i show you the application i should mention that well you might be wondering well can someone build a quantum computer and this is still completely unknown but at this point nobody really knows If we can build large enough quantum computers to actually take advantage of this beautiful algorithm due to grover all right so what does this have to do with block ciphers well so suppose i give you a message
in a ciphertext pair just one or just a few we can define a function as follows but it's a function on k it's a function on the key space and the function will Basically output one if it so happens that the encryption of m with k maps to c and it will open 0 otherwise now we argue that basically this is exactly the type of function that's one at one point in the key space and that's it so by grover's algorithm we can actually find the secret key in time square root of k so what
does that mean for this this would totally destroy this This would say that in time 2 to the 28 you can find a key 2 to the 28 is about 200 million so 200 million steps which is you know this takes a millisecond on a modern computer this would totally destroy this but even aes with 128-bit keys you would be able to find the secret key in time roughly through to the 64. and 2 to the 64 is these days considered insecure that's Within the realm of exhaustive search and so basically if somebody was able
to build a quantum computer we would then say that aes 128 is no longer secure instead if somebody you know if tomorrow you open up the newspaper and you read an article that says you know so-and-so built a quantum computer the conclusion the consequence of all that is that you should immediately move to block ciphers That use 256 bits because then the running time of grover's algorithm is 2 to the 128 which is more time than we consider feasible and the basically there are example ciphers with 256 bits for example a is 256. this is
one of the reasons why aes was designed with 256 bits in mind but to be honest this is not the only reason there are other reasons why you wanted to have larger key sizes okay so this is as i Said just a taste of the different attacks on block ciphers and i'm going to leave it at that if we decide on quantum for the last topic of the course then we'll cover that in the very last lecture over the years it became clear that dez and triple des are simply not designed for modern hardware and
are too slow as a result nist started a new process To standardize in a new block cipher called the advanced encryption standard or aes for short nist started this effort in 1997 when it requested proposals for a new block cipher it received 15 submissions a year later and finally in the year 2000 it adopted the cipher called rheindel as the advanced encryption standard this was a cipher design in belgium And we already said that its block size is 128 bits and it's got three possible key sizes 128 bits 192 and 256 and now the assumption
is that the larger the key size is the more secure the block cipher is as a pseudo-random permutation but because it also has more rounds involved in its operation the slower the cipher becomes so the larger the key Supposedly the more secure the cipher but also the slower it becomes so for example aes 128 is the fastest of these ciphers and aes 256 is the slowest now aes is built as what's called a substitution permutation network it's not a feistel network remember that in a feistel network half the bits were unchanged from round to round
in a substitution permutation network All the bits are changed in every round and the network works as follows so here we have the first round of the substitution permutation network where the first thing we do is we xor the current state with the round key in this case the first round key then we go through a substitution layer where blocks of state are replaced with other blocks based on what the substitution table says And then we go through a permutation layer where bits are permuted and shuffled around and then we do this again we xor
with the next round key we go through a substitution phase and we permute the bits around and so on and so on and so forth until we reach the final round where we xor with the very last around key and then out comes the output now an important point about this design is that in fact Because of how it's built every step in this network needs to be reversible so that the whole thing is reversible and so the way we would decrypt essentially is we would take the output and simply apply each step of the
network in reverse order so we start with a permutation step and we have to make sure that step is reversible then we look at the substitution layer And we have to make sure this step is reversible and this is very different from des in des if you remember the substitution tables were not reversible at all in fact they mapped six bits to four bits whereas here everything has to be reversible otherwise it would be impossible to decrypt and of course the xor with the round key is reversible as well Okay so inversion of a substitution
permutation network is simply done by applying all the steps in the reverse order so now that we understand the generic construction let's look at the specifics of aes so aes operates on 128-bit block which is 16 bytes so what we do with aes is we write those 16 bytes as a 4x4 matrix each cell in the matrix contains one Byte and then we start with the first round so we xor with the first round key and then we apply a certain function that includes substitutions and permutations and other operations on the state and again these
three functions that are applied here have to be invertible so that in fact a cipher can be decrypted and then we xor with the next round key And we do that again again we apply the round function and x over the round key and we do that again and again and again we do it 10 times although interestingly in the last round the mixed column step is actually missing and then finally we x over the last round key and out comes the output again at every phase here we always always always keep this four By
four array and so the output is also four by four which is 16 bytes which is 128 bits now the round keys themselves of course come from a 16 byte aes key using key expansion so the key expansion maps us from a 16 bytes aes key into 11 keys each one being 16 bytes so these keys themselves are also a four by four array that's xored Into the current state okay so that's the schematic of how aes works and the only thing that's left to do is specify these three functions byte sub shift row and
mix column and those are fairly easy to explain so i'm just going to give you the high level description of what they are and those interested in the details i can look it up online so the way by substitution works is literally It's one s box containing 256 bytes and essentially what it does is it applies the s box to every byte in the current state so let me explain what i mean by that so the current state is going to be this four by four table so here we have the four by four table
and to each element in this table we apply the s box so let's call it the a table and then what we do is essentially for all Four by four entries essentially the next step a i j becomes the current step evaluated at the lookup table so we use the current cell as an entry as an index into the lookup table and then the value of the lookup table is what's output okay so that's the first step the next step that happens is shift row step which is basically Just a permutation so essentially we kind
of do a cyclic shift on each one of the rows so you can see the second row is cyclically shifted by one position this third row is cyclically shifted by two positions and the third row is typically shifted by three positions and the last thing we do is mix columns where literally we apply a linear transformation to each one of these Columns so there's a certain matrix that multiplies each one of these columns and it becomes the next column so this linear transformation is applied independently to each one of the columns now i should point
out that so far shift rows and mixed columns are very easy to implement in code and i should say that the by substitution itself is also easily Computable so that you can actually write code that takes less than 256 bytes to write and you can kind of shrink the description of aas by literally storing code that computes the table rather than hardwiring the table into your implementation and in fact this is kind of a generic fact about aes that if you kind of allow no Precomputation at all including computing the s box on the fly
then in fact you get a fairly small implementation of aes so it could fit on very constrained environments where there isn't enough room to hold complicated code but of course this will be the slowest implementation because everything is computed now on the fly and as a result the implementation obviously is going to be Slower than things were precomputed and then there's this trade off for example if you have a lot of space and you can support large code you can actually pre-compute quite a bit of the three steps that i just mentioned in fact there
are multiple options of pre-computations you can build a table that's only four kilobyte big we can build a table that's even longer maybe 24 kilobytes so basically you'll have these big Tables in your implementation but then your actual performance is going to be really good because all you're doing is just table lookups and xors you're not doing any other complicated arithmetic and as a result if you can do a lot of precomputation these three steps here byte subshift rows and mixed columns can be converted just into a number of small number of table lookups And
some xors all you can do is just compute the s box so now your implementation would just have 256 bytes hard coded the rest will just be code that's actually computing these three functions the performance would be slower than in the previous step but the code footprint would also be smaller so and overall there's this nice trade-off between code size and performance So on high-end machines on high-end servers where you can afford to have a lot of code you can pre-compute and store these big tables and get the best performance whereas on low-end machines like
8-bit smart cars or think of like an 8-bit wrist watch you would actually have a relatively small implementation of aas but as a result of course it won't be so fast so here's an example That's a little unusual suppose you wanted to implement aes in javascript so you can send an aes library to the browser and have the browser actually do aes by itself so in this case what you'd like to do is you'd like to both shrink the code size so that on the network there's minimum traffic to send the library over to the
browser but at the same time you'd like the browser performance to be as fast as Possible so this is something that we did a while ago essentially the idea is that the code that actually gets sent to the browser doesn't have any precomputed table and as a result it's fairly small code but then the minute it lands on the browser what the browser will do is it will actually pre-compute all the tables so in some sense the code goes from this being small and compact it gets bloated with all These pre-computed tables but those are
stored on the laptop which presumably has a lot of memory and then once you have the pre-computed tables you actually encrypt using them and that's how you get the best performance okay so if you have to send an implementation of aes over the network you can kind of get the best of all worlds where the code over the network is small But when it reaches the target client it can kind of inflate itself and then get the best performance as it's doing encryption on the clients now aes is such a popular block cipher now essentially
when you build crypto into products essentially you're supposed to be using aes and as a result intel actually put aes support into the processor itself so since westmere There are special instructions in the intel processor to help accelerate aes and so i listed these instructions here they come in two pairs aes inc and asn class and then there's aes kegen assists so let me explain what they do so aes inc essentially implements one round of aes namely apply the three functions in the xor with the round key and asn Class basically implements the last round
of aes remember the last round didn't have the mixed columns phase it only had the subs bytes and shift rows and so that's what aes and class does and the way you call these instructions is using 128-bit registers which corresponds to the state of aas and so you would have one register containing The state and one register containing the current round key and then when you call aes inc on these two registers basically they would run one round of aes and place the result inside this xmm1 state register and as a result if you wanted
to implement the whole aes all you would do is you would call aas inc nine times and then you would call a s and class one time And these 10 instructions are basically the entire implementation of aes that's it it's that easy to implement aes on this hardware and they claim because these operations are now done inside the processor not using external instructions they're implemented in the processor they claim that they can get a 14x speedup over say an implementation that's running on the same hardware But implementing aes without these special instructions so this is
quite a significant speed up and in fact there are now lots of products that make use of these special instructions and let's just say that this is not specific to intel if you're an amd fan amd also implemented exactly kind of similar instructions in their bulldozer architecture and further and future architectures Okay so let's talk about the security of aes i want to mention just two attacks here obviously aes has been studied quite a bit but the only two attacks on the full aes are the following two so first of all if you wanted to
do key recovery the best attack basically is only four times faster than exhaustive search which means that instead of 128 bits key really you should be thinking of aes as A 126 bit key because exhaustive search really is kind of four times faster than it should of course hooter 126 is still more time than we have to compute and this really does not hurt the security of aes the more significant attack actually is on aes 256 it turns out there's a weakness in the key expansion design of aes which allows for what's called a related
Key attack so what's a related key attack essentially if you give me about two to the 100 input output pairs for aes but from four related keys so these are keys that are very closely related namely key number two is the same as key number one except that a few bits of key number one have been flipped similarly key number three is the same as key number one except that a few bits are flipped and The same for key number four these are very closely related keys if you like their hamming distance is very short
but if you do that then in fact there is a 2 to the 100 attack now you should say well 2 100 is still impractical this is still more time than we can actually run today but nevertheless the fact that it's so much better than an exhaustive search attack is so much better Than two to the 256 this is kind of a limitation of the cipher but generally it's not a significant limitation because it requires related keys and so in practice of course you're supposed to be choosing your keys at random so that you have
no related keys in your system and as a result this attack wouldn't apply but if you do have related keys then There's a problem so this is the end of the segment and in the next segment we're going to talk about more provably secure constructions for block ciphers in this segment we ask whether we can build block ciphers from simpler primitives like pseudorandom generators we're going to show that the answer is yes so to begin with let's ask whether we Can build pseudorandom functions as opposed to surrounding permutations from a pseudorandom generator can we build
a prf from a prg our ultimate goal though is to build a block cipher which is a prp and we'll get to that at the end okay for now we build a prf so let's start with a prg that doubles its inputs okay so the seed for this prg is an element in k and the Output is actually two elements in k so here we have a schematic of this generator that basically takes this input a seed in k and outputs two elements in k as its output and now what does it mean for this
prg to be secure recall this means that essentially the output is indistinguishable from a random element inside of k squared now it turns out it's very easy to Define basically what's called a one bit prf from this prg so what's a one bit prf it's basically a prf whose domain is only one bit okay so it's a prf that just takes one bit as inputs okay and the way we'll do it is we'll say if the input bit x is zero output the left outputs and if the input bit x is one i'll put the
right outputs of the prf Okay in symbols basically we have what we wrote here now it's straightforward to show that in fact if g is a secure prg then this one bit prf is in fact a secure prf if you think about it for a second this is really a tautology it's really just staying the same thing twice so i'll leave it for you to think about this briefly and see and convince yourself that in fact this Theorem is true the real question is whether we can build a prf that actually has a domain that's
bigger than just one bit ideally we'd like the domain to be 128 bits just say as aes has so the question is can we build 128 bit prf from a pseudo-random generator well so let's see if we can make progress so the first thing we're going to do is We're going to say well again let's start with a prg that doubles its input let's see if we can build a prg that quadruples its inputs okay so it goes from k to k to the fourth instead of k to k squared okay so let's see how
to do it so here we start with our original prg that just doubles its inputs but now remember the fact that this is a prg means that the output of the prg is indistinguishable from two random values In k well if the output looks like two random values in k we can simply apply the generator again to those two outputs so let's say we apply the generator once to the left output and once to the right output and we're going to call the output of that this quadruple of elements we're going to call that g1k
and i wrote down in symbols what this Generator does but you can see basically from this figure exactly how the generator works okay so now that we have a generator from k to k to the fourth we actually get a two bit prf namely what we'll do is we'll say given two bits zero zero zero one one zero or one one we'll simply output the appropriate block that's the output of g one k okay so now we can basically have A prf that takes four possible inputs as opposed to just two possible inputs as before
so the question you should be asking me is why is this g1k secure why is it a secure prg that is why is this quadruple outputs indistinguishable from random and so let's do a quick proof of this we'll just do a simple proof by pictures so here's our generator that we want to Prove is secure and what that means is we want to argue that this distribution is indistinguishable from a random four tuple in k to the fourth okay so our goal is to prove that these two are indistinguishable well let's do it one step
at a time we know that the generator is a secure generator therefore in fact the output of the first level is indistinguishable from random In other words if we replace the first level by truly random strings these two are truly random picked in the key space then no efficient adversary should be able to distinguish these two distributions in fact if you could distinguish these two distributions it's easy to show that you would break the original prg okay but essentially you can see that The reason we can do this a replacement we can replace the output
of g with truly random values is exactly because of the definition of the prg which says that the output of the prg is indistinguishable from random so we might as well just put random there and no efficient adversary can distinguish the resulting two distributions okay so far so good but now we can do the same thing again To the left hand side in other words we can replace these two pseudorandom outputs by truly random outputs and again because the generator g is secure no efficient adversary can tell the difference between these two distributions put differently
if an adversary can distinguish these two distributions then we would also get an attack on the generator g and now finally we're going to do this one last Time we're going to replace this pseudo-random pair by a truly random pair and then lo and behold we get the actual distribution that we were shooting for we will get a distribution that's really made of four independent blocks and so now we've proved this transition basically that these two are indistinguishable these two are distinguishable and these Two are distinguishable and therefore these two are indistinguishable which is what
we wanted to prove okay so this is kind of the high level idea for the proof it's not too difficult to make this rigorous but i just wanted to show you kind of the intuition for how the proof works well if we were able to extend the generator's output once there's nothing preventing us from doing it again so here's a generator g1 That outputs four elements in the key space and remember the output here is indistinguishable from a random four tuple that's what we just proved and so there's nothing preventing us from applying the generator
again so we'll take the generator apply it to this random looking thing and we should be able to get this random looking thing this pair over here That's random looking and we can do the same thing again and again and again and now basically we've built a new generator that outputs elements in k to the eighth as opposed to k to the fourth and again the proof of security is pretty much the same as the one i just showed you essentially you gradually change the outputs into truly random outputs so we would change this to
a Truly random output then this then that then this then that and so on and so forth until finally we get something that's truly random and therefore the original two distributions we started with g2k and truly random are indistinguishable okay so far so good so now we have a generator that outputs elements in k to the eighth now if we do that basically we get a Three bit prf in other words at zero zero zero this prf would output this block and so on and so forth until 1 1 1 it would output this block
now the interesting thing is that in fact this prf is easy to compute so for example suppose we wanted to compute the prf at the point 1 0 1 it's a 3 bit prf so one zero one how would we do that Well basically we would start from the original key k and now we would apply the generator g but we would only pay attention to the right output of g because the first bit is one and then we will apply the generator again but we would only pay attention to the left output of the
generator because the second bit is zero and then we would apply the generator again and only pay attention to the Right outputs because the third bit is one and that would be the final output right so you can see that that led us to one zero one and in fact because the entire generator is pseudorandom we know in particular that this output here is pseudorandom okay so this gives us a three bit prf well if it worked three times there's no reason why you can't work end times and so if we apply this Transformation again
and again we arrive at what's called the ggm prf ggm stands for goldright goldwasser and mikali these are the inventors of this prf and the way it works is as follows so we start off with a generator that just doubles its output and now we're able to build a prf that acts on a large domain namely of domain of size 0 1 to the n where n could be as big as 128 or even More so let's see suppose we're given an input in 0 1 to the end let me show you how to evaluate
the prf well by now you should actually have a good idea for how to do it essentially we start from the original key and then we apply the generator and we take either the left or the right side depending on the bit x0 and then we arrive at the next key k1 and then we apply the generator again And we take the left or the right side depending on x1 and we arrive at the next key and then we do this again and again until finally we arrive at the output so we've processed all n
bits and we arrive at the output of this function and basically we can prove security again pretty much along the same lines as We did before and we can show that if g is a secure prg then in fact we get a secure prf on uh zero one to the end on a very large domain so that's fantastic so now we have essentially we have a prf that's provably secure assuming the underlying generator is secure and the generator is supposedly much easier to build than an actual prf and in fact it works on blocks that
Could be very large in particular 01 to the 128 which is what we needed so you might ask well why is this thing not being used in practice and the reason is that it's actually fairly slow so imagine we plug in as a generator imagine we plug in the salsa generator so now to evaluate this prf at 128 inputs we would basically have to run the salsa generator 128 times One time per bit of the input but then we would get a prf that's 128 times slower than the original salsa and that's much much much
slower than aes aes is a heuristic prf but nevertheless it's much faster than what we just got here and so even though this is a very elegant construction it's not used in practice to build pseudo-random functions although in a week we will be using this Type of construction to build a message integrity mechanism so the last step is basically now that we have built a prf the question is whether we can actually build a block cipher in other words can we actually build a secure prp from a secure prg everything we've done so far is
not reversible again if you look at this construction here we can't decrypt basically given the Final output it's not possible to go back or at least we don't know how to go back to the original inputs so now the question of interest is can we actually solve the problem we wanted to solve initially namely can we actually build a block cipher from a secure prg so i'll let you think about this for a second and mark the answer so of course i hope everybody said the Answer is yes and you already have all the ingredients
to do it in particular you already know how to build a prf from a pseudo-random generator and we said that once we have a prf we can plug it into the lubirakov construction which if you remember was just a three-round feistel so we said that if you plug a secure prf into a three round five Stall you get a secure prp so combining these two together basically gives us a secure prp from pseudorandom generator and this is provably secure as long as the underlying generator is secure so it's a beautiful result but unfortunately again it's
not used in practice because it's considerably slower than heuristic constructions like aes okay so this completes our module On constructing pseudorandom permutations and pseudonym functions and then in the next module we're going to talk about how to use these things to do proper encryption now that we know what block ciphers are and we know how to construct them let's see how to use them for secure encryption but before that i want to briefly remind you of an important abstraction called a pseudorandom function And a pseudo-random permutation so as we said in the last module block
ciphers map n bits of inputs to n bits of outputs and we saw two examples of block ciphers tripled as and aes now an important abstraction of the concept of a block cipher is captured by this idea of a prp and a prf and remember that a pseudorandom function a prf basically is a function that takes two inputs it takes a key And an element in some set x and it outputs an element in some set y and for now the only requirement is that there's an efficient algorithm to evaluate this function we're going to
talk about security for prfs in just a minute and then similarly there's a related concept called the pseudorandom permutation which is similar to a prf in fact is also An efficient algorithm to evaluate the pseudorandom permutation however there's an additional requirement that there's also an algorithm d they will invert this function e so a prp is basically a prf but where the function is required to be one to one for all keys and there's an efficient inversion algorithm so now let's talk about how to Define secure prfs so we already said that essentially the goal
of the prf is to look like a random function from the set x to y so to capture that more precisely we defined this notation funds x y to be the set of all functions from the set x to the set y similarly we defined the set s sub f to be the set of all functions from the set x to y that are defined by the prf in other words Once you fix the key k you obtain a function from the set x to the set y and the set of all such functions given
a particular prf would be the sets s sub f so as we said last time funds x y is generally a gigantic set of all functions from x to y i think i mentioned that in fact for aes where x and y are 2 to 128 the size of this set is 2 to the 128 Times 2 to 128 it's a double exponential which is an absolutely enormous number on the other hand the number of functions defined by the aes block cipher is just 2 to 128 namely one function from each key and what we'd
like to say is that a random choice from this huge set is indistinguishable from a random choice from the small set and what do we mean by indistinguishable We mean that an adversary who can interact with a random function in here can't distinguish that interaction from an interaction with a random function in here now let's define that more precisely so we're gonna as usual define two experiments experiment zero and experiment one and our goal is to say that the adversary can't distinguish these two experiments so in experiment zero the challenger Basically is gonna choose a
random pseudo-random function okay so he's going to fix the key k at random and that's going to define this function little f over here to be one of the functions implemented by the prf in experiment one on the other hand the challenger is going to choose a truly random function from the set x to the set y and again we're going to call this truly random Function little f either way in either experiment zero or experiment one the challenger ends up with this little function f that's either chosen from the prf or chosen as a
truly random function from x to y now the adversary basically gets to query this function little f so he gets to submit a query x1 and he obtains the value of f at the Point x1 then he gets a submitted x2 and he obtains the value of f at the point x2 and so on and so forth he makes q queries and so he learns the value of the function little f at those q points and now his goal is to say whether the function little f is chosen truly at random from funds x y
or chosen just from the set of functions implemented by the prf So he outputs a certain bit b prime and will refer to that output as the output of experiments either experiment 0 or experiment one and as usual we say that the prf is secure if in fact the adversary can't distinguish these two experiments in other words the probability that he outputs one experiment zero is the same pretty much the same as the probability that he outputs 1 in Experiment 1. in other words the difference of these two probabilities is negligible so this captures nicely
the fact that the adversary couldn't distinguish a pseudo-random function from a truly random function from the set x to y now the definition for a secure pseudo-random permutation a secure prp which is basically a secure block cipher is pretty much the same in experiment 0 the adversary is going to choose A random instance of the prp so he's going to choose a random k and define little f to be the function that corresponds to little k within the pseudo-random permutation in experiment one the adversary is going to choose not a truly random function from x
to y but a truly random one to one function from x to x okay so the goal of our prp is to look like a random permutation from x to x Namely a random one to one function from the set x to itself so the little function little f here is again going to be a random function from the set extra itself and again the challenger ends up with this function little f is before the adversary gets to submit queries and he gets to see the results of those queries and then he shouldn't be able
to distinguish again experiment Zero from experiment one so again given the value of the function f at q points chosen by the adversary he can't tell whether the function f came from a prp or whether it's a truly random permutation from x to x so let's look at a simple example suppose the set x contains only two points zero and one in this case perms x is Really easy to define essentially there are two points and we're looking at you know zero one and we're asking what is the set of all invertible functions on the
set 0 1 well there are only two such functions one function is the identity function and the other function is basically the function that does crossovers namely this function here these are the only two invertible functions in the set Zero one so really perms x only contains two functions in this case now let's look at the following prp the key space is going to be zero one and of course x is going to be zero one and let's define the prp as basically x x or k okay so that's our prp and my question to
you is is this a secure prp in other words is this prp indistinguishable from a random function on firm's x I hope everybody said uh yes because essentially the sets of functions implemented in this prp is identical to the set of functions in terms x so a random choice of key here is identical to a random choice of function over here and as a result of two distributions either pseudorandom or random are identical so clearly an adversary can't distinguish the two distributions Now we already said that we have a couple of examples of secure prps
triple des and aes and i just wanted to mention that if you want to make things very concrete here's a concrete security assumptions about aes just to give an example say that all algorithms that run in time 2080 have advantage against aes of at most 2 to the minus 40. this is on a reasonable assumption about aes and i just wanted To state it for concreteness so let's look at another example consider again the prp from the previous question namely xx or k where remember the set x was just one bit namely the value zero
and one and this time we're asking is this prp a secure prf in other words is this prp indistinguishable from a random function from x to x now the set of random Functions from x to x funds xx in this case contains only four elements there are the two invertible functions which we already saw namely the identity function and the negation function the function that sends zero to one and one to zero but there are two other functions namely the function that sends everything to 0 and the function that sends everything To 1. these are
four functions inside funds of xx and the question is is this prp that we just looked at is it also indistinguishable from a random choice from funds xx so i hope everybody said no and the reason it's not a secure prf is because there's a simple attack namely the attacker supposed to Distinguish whether he's interacting with this prp or is he interacting with a random function from funds xx and the distinguisher is very simple basically we're going to query the function at both x equals 0 and x equals 1. and then if we get a
collision in other words if f of 0 is equal to f of 1 then for sure we're not interacting with a prp In which case we can just output 1 in other words we're interacting with a random function in other words we say 0. so let's look at the advantage of this distinguisher well when it's interacting with a prp we'll never output a 1 because f of 0 can never be equal to f 1. in other words the probability of outputting 1 is 0. however when we interact with a truly Random function in funds x
x the probability that f of 0 is equal to f of 1 is exactly one half because half the functions satisfy if 0 is equal to f of 1 and half the functions don't so then we'll output 1 with probability one half so the advantage of this distinguisher is one half which is not negligible and as a result this prp here is not a secure prf Now it turns out this is only true because the set x is very small and in fact there's an important lemma called the pierrot switching lemma that says that a
secure prp is in fact a secure prf whenever the set x is sufficiently large and by sufficiently large i mean say the output space of aes which is 2 to 128 so by this lemma which we'll state more precisely in a second Aes if it's a secure prp it is also a secure prf so dilemma basically says the following if you give me a prp over the set x then for any adversary that queries the prp at at most cue points so it makes it most q queries into the challenge function then the difference between
its advantage in attacking the prp when compared to a random function is very close to its advantage in Distinguishing the prp from a random permutation in fact the difference is bounded by this quantity here and since we said that x is very large this quantity q squared over 2x is negligible okay that's going to be our goal so essentially when again when x is large say 2 to 128 q say is going to be 2 to the 32 that's a billion queries that the adversary makes then still the ratio is going to be Negligible in
which case we say that the adversary is an advantage in distinguishing the prp from a random function it's pretty much the same as its advantage in distinguishing the prp from a random permutation so again if basically if e is already a secure prp then it's already a secure prf so for aes aes we believe it's secure prp and therefore aes we can also use it as A secure prf and so as a final note i just want to mention that really from now on you can kind of forget about the inner workings of aas and
triple des we're simply going to assume that both are secure prps and then we're going to see how to use them but whenever i say prp or prf you should be thinking in your mind basically aes or triple des so as our first example let's look at a very Simple way of using a block cipher for encryption in particular we'll see how to use a block cipher with a one-time key so in this segment we're just going to use the block cipher to encrypt using keys that are used one time in other words all the
adversary gets to c is one text and his goal is to break semantic security of that cipher text now in the next segment we're going to Turn into more interesting applications of block ciphers and we're going to see how to encrypt using keys that are used many many times to encrypt any messages so before we start i want to mention that there is like a classic mistake in using a block cipher unfortunately there are some products that actually work this way and they are badly broken so i want to make sure that none of you
guys actually Make this mistake so this mode of operation is called an electronic code book and it works as follows it's the first thing that comes to mind when you want to use a block cipher for encryption what we do is we take our message we break it into blocks each block as big as the block cipher block so in the case of aes we would be breaking our message into 16 byte blocks And then we encrypt each block separately so this mode is often called electronic code book and unfortunately it's terribly insecure because you
realize if two blocks are equal for example here these two blocks happen to be equal then necessarily the resulting cipher texts are also going to be equal so an attacker who looks at the cipher text even though he might not know What's actually written in these blocks will know that these two blocks are equal and as a result he learned something about the plain text that he shouldn't have learned and if this isn't clear enough for you abstractly the best way to explain this is using a picture and so here's this guy here that you
know has this really dark black hair and when we encrypt this image this bitmap image Using electronic codebook mode you see that his hair that contains lots of ones basically always gets encrypted the same way so that his silhouette actually is completely visible even in the encrypted data okay so this is a nice example of how the electronic code book mode can actually leak information about the plain text that could tell something to the attacker so the question is how to correctly use block ciphers to encrypt Long messages and so i just want to briefly
remind you of the notion we're trying to achieve which is basically semantic security using a one-time key so the adversary outputs two messages m0 and m1 and then he gets either the encryption of m0 or the encryption of m1 these are two different experiments and then our goal is to say that the adversary can't distinguish between these two experiments so you can't Distinguish the encryption of m0 from the encryption of m1 and the reason we call this security for a one-time key is that the key is only used to encrypt a single message and as
a result the adversary will ever only see one cipher text encrypted using this key okay so the first thing we want to show is that in fact the mode that we just looked at electronic code book in fact is not semantically secure and This is true as long as you're encrypting more than one block so here's an example suppose we encrypt two blocks using a block cipher let me show you that in fact electronic code book would not be secure so here's what we would do but we're the adversary so we would output two messages
m0 and m1 where in one message the blocks are distinct and in the other Message the blocks are the same the two blocks are equal to one another well so what is challenger going to do the challenger is going to encrypt either m0 or m1 either way we're going to get two blocks back so the cipher text actually contains two blocks the first block is going to be an encryption of the word hello and the second block is going to be either an encryption of the word hello Or the word world and then if the
two separatist blocks are the same then the adversary knows that he received an encryption of the message hello hello and if they're different he knows that he received an encryption of the message hello world okay so he just follows a simple strategy here and if you think about it for a second You'll see what his advantage is so what is the advantage well this adversary when he receives an encryption of the message m1 he will always output zero and when he receives an encryption of the message m0 he will always output one and because of
that the advantage is basically is one which means that the scheme is not secure which again shows you the electronic code book is not semantically secure and should never Ever be used to encrypt messages that are more than one block long so what should we do well so here's a simple example what we could do is we could use what's called deterministic counter mode so in deterministic counter mode basically we build a stream cipher out of the block cipher so suppose we have a prf f so again you should think of aes when i say
that so aes is also a Secure prf and what we'll do is basically we'll evaluate aes at the point 0 at the point 1 at the point 2 up to the point l this will generate a pseudorandom pad and i will xor that with all the message blocks and recover the ciphertext as a result okay so really this is just a stream cipher that's built out of a prf Like aes and triple des and it's a simple way to do encryption i wanted to just very quickly show you the security theorem in fact we've already
seen the security theorem when it applied to stream ciphers using deuteronomy generators so i'm not going to repeat this again i'll just remind you that essentially for every adversary 8 it's trying to attack deterministic counter mode we prove that there's an adversary b That's trying to attack the prf and since this quantity is negligible because the prf is secure we obtain that this quantity is negligible and therefore the adversary has negligible advantage in defeating deterministic counter mode and the proof in pictures is a really simple proof so i'll just show it to you one more
time for completeness so basically what we Want to show is when the adversary is given the encryption of the message m0 here this is the encryption of the message m0 m0 xor counter applied to the prf versus in giving the encryption of the message m1 we want to argue these two distributions are computationally indistinguishable so the way we do that is basically we say well the top distribution if instead of a prf we use a truly Random function namely here f is a truly random function then the adversary because of the property of the prf
the adversary cannot distinguish these two experiments right a prf is indistinguishable from a truly random function therefore when we replace the prf on the left with a truly random function on the right the adversary is going to behave the same basically he can't distinguish these two Distributions but now because f is a truly random function that here is a truly one-time pad and therefore no adversary can distinguish an encryption of m0 from an encryption of m1 under the one-time pad so again these two distributions are the same in fact here there's an actual equality these
two distributions literally are the same distribution And similarly again when we go back from a truly random function here to a prf because the prf is secured the adversary can't distinguish these two bottom distributions left from the right and so by following these three qualities basically we've proven that the things we wanted to prove equal are actually computationally indistinguishable okay so that's a very simple Proof to show that deterministic counter mode is in fact secure and it's basically the same proof as we had when we proved that a stream cipher gives us semantic security okay
so that completes this segment and in the next segment we'll talk about modes that enable us to use a key to encrypt multiple messages in this segment we will look at how to use block ciphers to encrypt multiple messages Using the same key this comes up in practice for example in file systems where the same key is used to encrypt multiple files it comes up in networking protocols where the same key is used to encrypt multiple packets so let's see how to do it the first thing we need to do is to define what does
it mean for a cipher to be secure when the same key is used to encrypt multiple messages When we use the key more than once the result of that is that the adversary gets to see many ciphertexts encrypted using the same key as a result when we define security we're going to allow the adversary to mount what's called a chosen plaintext attack in other words the adversary can obtain the encryption of arbitrary messages of his choice so for example if the adversary is interacting with alice The adversary can ask alice to encrypt arbitrary messages of
the adversary's choosing and alice will go ahead and encrypt those messages and give the adversary the resulting cipher texts you might wonder why would alice ever do this how can this possibly happen in real life but it turns out this is actually very common in real life and in fact this modeling is quite a conservative Modeling of real life for example the adversary might send alice an email when alice receives the email she writes it to her encrypted disk thereby encrypting the adversary's email using her secret key if later the adversary steals this disk then
he obtains the encryption of an email that he sent to alice under alice's secret key so that's an Example of a chosen plaintext attack where the adversary provided alice with a message and she encrypted that message using her own key and then later the attacker was able to obtain the resulting cipher text so that's the adversary's power and then the adversary's goal is basically to break semantic security so let's define this more precisely as usual we're going to define semantic Security under a chosen plain text attack using two experiments experiment zero and experiment one that
are modeled as a game between a challenger and an adversary when the game begins the challenger is going to choose a random key k and now the adversary basically gets to query the challenger so the adversary begins by submitting a semantic security query Namely he submits two messages m0 and m1 i added another index but let me ignore that extra index for a while so the adversary submits two messages m0 and m1 that happen to be of the same length and then the adversary receives the encryption of one of those messages either of m0 or
of m1 in experiment 0 he receives the encryption of m0 In experiment 1 he receives the encryption of m1 so so far this should look familiar this looks exactly like a standard semantic security game however in a chosen plain test attack the adversary can now repeat this query again so now he can issue a query with two other chosen plain text again of the same length and again he would receive the encryption of one of them in experiment Zero he would receive the encryption of m0 in experiment 1 he would receive the encryption of m1
and the attacker can continue issuing queries like this in fact we'll say that he can issue up to q queries of this type again remember every time he issues a pair of messages that happen to be of the same length and every time he either gets the encryption of the left side or the right side again in experiment zero he will Always get the encryption of the left message in experiment one he will always get the encryption of the right message and then the adversary's goal is basically to figure out whether he is in experiment
zero or in experiment one in other words whether he was constantly receiving the encryption of the left message or the encryption of the right message so in some sense this is a standard Semantic security game just iterated over many queries that the attacker can issue adaptively one after the other now the chosen plaintext attack is captured by the fact that if the attacker wants the encryption of a particular message m what he could do is for example use query j for sum j where in this query j he'll set both the zero message and the
one message to be the exactly the same message m In other words both the left message and the right message are the same and both are set to the message m in this case what he will receive since both messages are the same he knows that he's going to receive the encryption of this message m that he was interested in so this is exactly what we meant by a chosen plaintext attack where the adversary can submit a message m and receive the encryption Of that particular message m of his choice so some of his queries
might be of this chosen plaintext flavor where the message on the left is equal to the message on the right but some of the queries might be standard semantic security queries where the two messages are distinct and that actually gives them information on whether he's in experiment zero or an experiment one now by now you Should be used to this definition where we say that the system is semantically secure under a chosen plaintext attack if for all efficient adversaries they cannot distinguish experiment 0 from experiment 1. in other words the probability that at the end
the output b prime which we're going to denote by the output of experiment b this output will be the same whether They're in experiment 0 or experiment 1. so the attacker couldn't distinguish between always receiving encryptions of the left messages versus always receiving encryptions of the right messages so in your mind i'd like you to be thinking of an adversary that is able to mount a chosen plaintiff's attack namely be given the encryption of arbitrary messages of his choice and his goal is to break semantic Security for some other challenge cipher texts and as i
said in this game models the real world where the attacker is able to fool alice into encrypting for him messages of his choice and then the attacker's goal is to somehow break some challenge cipher texts so i claim that all the ciphers that we've seen up until now namely deterministic counter mode or the one-time pad Are insecure under a chosen plain text attack more generally suppose we have an encryption scheme that always outputs the same cipher text for a particular message m in other words if i ask the encryption scheme to encrypt the message m
once and then i ask the encryption scheme to encrypt the message m again if in both cases the encryption scheme outputs the same cipher text Then that system cannot possibly be secure under a chosen plain test attack and both deterministic counter mode and the one-time pad were of that flavor they always output the same cipher text given the same message and so let's see why that cannot be chosen plaintext secure and the attack is actually fairly simple what the attacker is going to do is he's going to output the same message twice this just says
That he really wants the encryption of m0 so here the attacker is given c0 which is the encryption of m0 so this was his chosen plaintext query where he actually received the encryption of the message m0 of his choice and now he's going to break semantic security so what he does is he outputs two messages m0 and m1 of the same length and he's Going to be given the encryption of mb but lo and behold we said that the encryption system always outputs the same ciphertext when it's encrypting the message m0 therefore if b is
equal to zero we know that c this challen ciphertext is simply equal to c zero because it's the encryption of m zero however if b is equal to one then we know that This challenge cipher text is the encryption of m1 which is something other than c0 so all the attacker does is he just checks if c is equal to c0 he outputs zero in other words he outputs one so in this case the attacker is able to perfectly guess this bit b so he knows exactly whether he was given the encryption of m0 or
the encryption of m1 and as a Result his advantage in winning this game is one meaning that the system cannot possibly be cpa secure one is not a negligible number so this shows that deterministic encryption schemes cannot possibly be cpa secure but you might wonder well what does this mean in practice well in practice this means again that every message is always encrypted to the Same cipher text what this means is if you're encrypting files on disk and you happen to be encrypting two files that happen to be the same they'll result in the same
cipher text and then the attacker by looking at the encrypted disk will learn that these two files actually contain the same content the attacker might not learn what the content is but he will learn that these Two encrypted files are an encryption of the same content and he shouldn't be able to learn that similarly if you send two encrypted packets on the network that happen to be the same the attacker will not learn the content of those packets but he will learn that those two packets actually contain the same information think for example of an
encrypted voice conversation every time there's quiet on The line the system will be sending encryptions of zero but since encryptions of zero are always mapped to the same cipher text an attacker looking at the network will be able to identify exactly the points in the conversation where there is quiet because he will always see those exact same ciphertext every time so these are examples where Deterministic encryption cannot possibly be secure as i say formally we say that deterministic encryption cannot be semantically secure under a chosen plaintext attack so what do we do well the lesson
here is if the secret key is going to be used to encrypt multiple messages it had better be the case that given the same plaintext to encrypt twice the Encryption algorithm must produce different ciphertexts and so there are two ways to do that the first method is what's called randomized encryption here the encryption algorithm itself is going to choose some random string during the encryption process and it's going to encrypt the message i'm using that random string so what this means is that a particular message m0 for example Isn't just going to be mapped to
one ciphertext but it's going to be mapped to a whole ball of ciphertexts where on every encryption basically we output one point in this ball so every time we encrypt the encryption algorithm chooses a random string and that random string leads to one point in this ball of course the decryption algorithm when It takes any point in this ball will always map the result to m0 similarly the ciphertext m1 will be mapped to a ball and every time we encrypt m1 we basically output one point in this ball and these balls have to be disjoint
so that the encryption algorithm when it obtains a point in the ball corresponding to m1 will always output the message m1 In this way since the encryption algorithm uses randomness if we encrypt the same message twice with high probability we'll get different ciphertexts unfortunately this means that the ciphertext necessarily has to be longer than the plaintext because somehow the randomness that was used to generate the ciphertext is now encoded somehow in the ciphertext so the ciphertext takes more space And roughly speaking the ciphertext size is going to be larger than the plain text by basically
the number of random bits that were used during encryption so if the plain text are very big if the plain text are gigabytes long the number of random bits is going to be on the order of 128 so maybe this extra space doesn't really matter but if the plain text are very short maybe they themselves are 128 bits Then adding an extra 128 bits to every cipher text is going to double the total ciphertext size and that could be quite expensive so as i say randomized encryption is a fine solution but in some cases it
actually introduces quite a bit of costs so let's look at a simple example so imagine we have a pseudorandom function that takes inputs in a certain space r which is going to be called a non-space and it outputs outputs in the message Space and now let's define the following randomized encryption scheme when we want to encrypt a message m what the encryption algorithm is going to do is first it's going to generate a random r in this non-space r and then it's going to output a ciphertext that consists of two components the first component is
going to be this Value r and the second component is going to be an evaluation of the pseudorandom function at the point r xored with the message m and my question to you is is this encryption system semantically secure under a chosen plaintext attack so the correct answer is yes but only if the non-space r is large enough so that little r never Repeats with very very high probability and let's quickly argue why that's true so first of all because f is a secure pseudo-random function we might as well replace it with a truly random
function in other words this is indistinguishable from the case where we encrypt the message m using a truly random function little f evaluated at the point r and then xor with m but since this little r Never repeats every ciphertext uses a different little r what this means is that the values of f of r are random uniform independent strings every time so every time we encrypt a message we encrypted essentially using a new uniform random one-time pad and since xoring a uniform string With any string simply generates a new uniform string the resulting ciphertext
is distributed as simply two random uniform strings i'll call them r and r prime and so both in experiment zero and in experiment one all the attacker gets to see are truly uniform random strings r comma r prime and since in both experiments the Attacker is seeing the same distribution he cannot distinguish the two distributions and so since security holds completely when we're using a truly random function it's also going to hold when we're using a pseudorandom function okay so this is a nice example of how we use the fact that a pseudorandom function behaves
like a random function to argue security Of this particular encryption scheme okay so now we have a nice example of randomized encryption the other approach to building chosen plaintext secure encryption schemes is what's called non-spaced encryption now in a non-spaced encryption system the encryption algorithm actually takes three inputs rather than two as usual it takes the key and the message but it also takes an additional Input called a nonce and similarly the decryption algorithm also takes the nonsense input and then produces the resulting decrypted plaintext so what is this nonce value n this nonce is
a public value it does not need to be hidden from the adversary but the only requirement is that the pair key common nonce is only used to encrypt A single message in other words this pair k comma n must change from message to message then there are two ways to change it one way to change it is by choosing a new random key for every message and the other way is to keep using the same key all the time but then we must choose a new nonce for every message and as i said i want
to emphasize again this nonce need not be secret and if it Need not be random the only requirement is the nonce is unique and in fact we're going to use this term throughout the course announce for us means a unique value that doesn't repeat it does not have to be random so let's look at some examples of choosing a nonce well the simplest option is simply to make the nonce be a counter so for example in Networking protocol you can imagine the nonce being a packet counter that's incremented every time a packet is sent by
the sender or received by the receiver this means that the encryptor has to keep state for message to message namely has to keep this counter around and increment it after every message is transmitted interestingly if the decryptor actually has the same state Then there's no need to include the nonce in the ciphertext since the nonce is implicit let's look at an example the https protocol is run over a reliable transport mechanism which means that packets sent by the sender are assumed to be received in order at the recipient so if the sender sends back at
five and then packet six the recipient will receive packet number Five and then back at number six in that order this means that if the sender maintains a packet counter the recipient can also maintain a packet counter and the two counters basically increment in sync in this case there's no reason to include the nonce in the packets because the nonce is implicit between the two sides however in other protocols for example in ipsec ipsec is a protocol designed to encrypt the ip layer the ip layer Does not guarantee in order delivery and so the sender
might send packet number five and then packet number six but those will be received in reverse order at the recipient in this case it's still fine to use a packet counter as a nonce but now the nonce has to be included in the packet so that the recipient knows which nodes to use to decrypt the received packet so as i Say non-spaced encryption is a very efficient way to achieve cpa security and in particular if the nonce is implicit it doesn't even increase the ciphertext length of course another method to generate a unique nonce is
simply to pick the nouns at random assuming the non-space is sufficiently large so that with high probability the nonsense will never Repeat for the life of the key now in this case non-spaced encryption simply reduces to randomized encryption however the benefit here is that the sender does not need to maintain any state for message to message so this is very useful for example if encryption happens to take place on multiple devices for example i might have both a laptop and a smartphone They might both use the same key but in this case if i required
stateful encryption then my laptop and the smartphone would have to coordinate to make sure that they never reuse the same nonce whereas if both of them simply pick nonsense at random they don't need to coordinate because with very high probability they'll simply never choose the same nonce again assuming The non-space is big enough so there are some cases where stateless encryption is quite important in particular where the same key is used by multiple machines so i want to define more precisely what security means for non-based encryption and in particular i want to emphasize that the
system must remain secure when the nonces are chosen by the adversary the reason it's important to Allow the adversary to choose the nonces is because the adversary can choose which cipher text it wants to attack so imagine the nonce happens to be a counter and it so happens that when the counter hits the value 15 maybe at that point it's easy for the adversary to break semantic security so the adversary will wait until the 15th packet is sent and only then he will ask to break semantic security So when we talk about non-spaced encryption we
generally allow the adversary to choose the nonce and the system should remain secure even under those settings so let's define the cpa game in this case and it's actually very similar to the game before basically the attacker gets to submit pairs of messages mi mi 0 and mi1 obviously they both have To be of the same length and he gets to supply the nonce in response the adversary is given the encryption of either mi0 or mi1 but using the nons that the adversary chose and of course as usual the adversary's goal is to tell whether
he was given the encryption of the left plain text or the right plaintext and as before the adversary gets to iterate these queries and he can issue As many queries as he wants we usually let q denote the number of queries that the adversary issues now the only restriction of course which is crucial is that although the adversary gets to choose the nonsense he's restricted to choosing distinct nonsense the reason we force him to choose distinct nonsense is because that's the requirement and practice even if the adversary fools alice into Encrypting multiple messages for him
alice will never use the same nonce again as a result the adversary will never see messages encrypted using the same nonce and therefore even in the game we require that all the nonsense be distinct and then as usual we say that the system is a non-spaced encryption system that's uh semantically secure under a chosen plaintext attack If the adversary cannot distinguish experiment zero where he's given encryption of the left messages from experiment one where he's given encryption of the right messages so let's look at an example of a nonce based encryption system as before we
have a secure prf that takes inputs in the non-space r and output strings in the message space m now when a new key is chosen we're going To reset our counter r to be zero and now when we encrypt a particular message m what we'll do is we'll increment our counter r and then encrypt the message m using the pseudo-random function applied to this value r and as before the ciphertext is going to contain two components a current value of the counter and then the one-time pad encryption of the Message m and so my question
to you is whether this is a secure non-spaced encryption system so the answer as before is yes but only if the non-space is large enough so as we increment the counter r it will never cycle back to zero so that the nonces will always always be unique we argue security the same way as before because the prf is secure we know that This encryption system is indistinguishable from using a truly random function in other words if we apply a truly random function to the counter and xor the results with the plain text m but now
since the nonce r never repeats every time we compute this f of r we get a truly random uniform and independent string so that we're actually encrypting every message using the one-time pad And as a result all the adversary gets to see in both experiments are basically just a pair of random strings so both in experiment 0 and experiment 1 the adversary gets to see exactly the same distribution namely the responses to all these chosen plain text queries are just pairs of strings that are just uniformly distributed and this is basically the same in Experiment
zero and experiment one and therefore the attacker cannot distinguish the two experiments and since he cannot win the semantic security game with a truly random function he also cannot win the semantic security game with the secure prf and therefore the scheme is secure so now we understand what it means for a symmetric system to be secure when the key is used to encrypt multiple Messages the requirement is that it'd be secure under a chosen plaintext attack and we said that basically the only way to be secure under a chosen plaintext attack is either to use
randomized encryption or to use non-spaced encryption where the nonce never repeats and then in the next two segments we're going to build two classic encryption systems that are secure when the key is used multiple times Now that we understand chosen plaintext security let's build encryption schemes that are chosen plaintext secure and the first such encryption scheme is going to be called cipher block chaining so here's how cipher block chaining works cipher block chaining is a way of using a block cipher to get chosen plain text security in particular we're going to look at a mode
called cipher block training with a random iv Cbc stands for cipher block training so suppose we have a block cipher so ed is a block cipher and let's define ecbc to be the following encryption scheme so the encryption algorithm when it's asked to encrypt a message m the first thing it's going to do is it's going to choose a random iv that's exactly one block of the block cipher so iv is one cipher block so in the case Of aes the iv would be 16 bytes and then we're going to run through the algorithm here
the iv basically that we chose is going to be xored into the first plain text block and then the result is going to be encrypted using the block cipher and output is the first block of the cipher text and now comes the chaining part where we actually use the first block of the ciphertext To kind of mask the second block of the plaintext so we exert the two together and the encryption of that becomes the second ciphertext block and so on and so on and so forth so this is cipher block chaining you can see
that each cipher block is chained and xored into the next plaintext block and the final ciphertext is going to be essentially the iv the initial iv that we chose along with All the ciphertext blocks i should say that iv stands for initialization vector and we're going to be seeing that term used quite a bit every time we need to pick something at random at the beginning of the encryption scheme typically we'll call that an iv for initialization vector so you notice that the ciphertext is a little bit longer than the plaintext because we have to
include This iv in the ciphertext which basically captures the randomness that was used during encryption so the first question is how do we decrypt the results of cbc encryption and so let me remind you again that if when we encrypt the first message block we xor it with the iv encrypt the result and that becomes the first ciphertext block so let me ask you how would you decrypt That so given the first ciphertext block how would you recover the original first plaintext block so decryption is actually very similar to encryption here i wrote down the
decryption circuit you can see basically it's almost the same thing except the xor is on the bottom instead of on the top and again you realize that essentially we chopped off the iv As part of the decryption process and we only output the original message back the iv is dropped by the decryption algorithm okay so the following theorem is going to show that in fact cbc mode encryption with a random iv is in fact semantically secure under a chosen plaintext attack and so let's state that more precisely basically if we start with a prp in
other words a cure block cipher e that's defined over a space X then we're going to end up with a encryption algorithm ecbc that takes messages of length l and outputs ciphertext of length l plus one and then suppose we have an adversary that makes q chosen plaintext queries then we can state the following security fact that for every such adversary that's attacking ecbc there exists an adversary that's attacking the prp The block cipher with the following relation between the two algorithms in other words the advantage of algorithm a against the encryption scheme is less
than the advantage of algorithm b against the original prp plus some noise term so let me interpret this theorem for you as usual so what this means is that essentially since E is a secure prp this quantity here is negligible and our goal is to say that adversaries a's advantage is also negligible however here we're prevented from saying that because we got this extra error term this is often called an error term and to argue that cbc is secure we have to make sure the error term is also negligible because if both of these terms
on the right are negligible their Sum is negligible and therefore the advantage of a against ecbc would also be negligible so this says that in fact for ecbc to be secure it had better be the case that q squared l squared is much much much smaller than the value x so let me remind you what q and l are so l is simply the length of the messages that we're encrypting okay so l could be like say a thousand Which would mean that we're encrypting messages that are at most 1000 aes blocks q is the
number of cipher texts that the adversary gets to see under the cpa attack but in real life what q is is basically the number of times that we've used the key k to encrypt messages in other words if we use a particular aes key to encrypt A hundred messages q would be a hundred it's because the adversary would then see at most 100 messages encrypted under this key key okay so let's see what this means in the real world so here i rewrote the error balance from the theorem and just remind you guys the number
of messages encrypted with k and l is the length of the messages and so suppose we want the adversary's advantage to be less than 1 over 2 to The 32 this means that the error term had better be less than 1 over 2 to the 32 okay so let's look at aes and see what this means for aes as of course uses 128 bit blocks so x is going to be due to the 128 the size of x is 2 to 128 and if you plug this into the expression you see that basically the product
q times l had better be less than 2 so the 48. This means that after we use a particular key to encrypt 2 to the 48 aes block we have to change the key okay essentially cbc stops being secure after the key is used to encrypt two to the 48 different aes blocks so it's kind of nice that the security theorem tells you exactly how long the key can be used and then how frequently essentially you Have to replace the key now interestingly if you apply the same analysis to the yes des actually has a
much shorter block maybe only 64 bits you see the key has to be changed much more frequently namely after all every 65000 ds blocks essentially you need to generate a new key so this is one of the reasons why aes has a larger block size so that in fact modes like cbc would be More secure and one can use the key for a longer period of time before having to replace it what this means is after you encrypt 2 to the 16 blocks each block of course is 8 bytes so after you encrypt about half
a megabyte of data you would have to change the des key which is actually quite low and you notice with aes you can encrypt quite a bit more data before you have to Change the key so i want to warn you about a very common mistake that people have made when using cbc with a random iv and that is that the minute that the attacker can predict the iv that you're going to be using for encrypting a particular message the cipher this ecbc is no longer cpa secure so when using cbc with a random iv
like we've just shown It's crucial that the iv is not predictable but let's see an attack so suppose it so happens that given the encryption of a particular message the attacker can actually predict the iv that will be used for the next message well let's show that in fact the resulting system is not cpa secure so the first thing the adversary is going to do is he's going to ask for the encryption of a one block message and in particular that one Block is going to be zero so what the adversary gets back is the
encryption of one block which namely is the encryption of the message namely zero xor the iv okay and of course the adversary also gets the iv okay so now the adversary by assumption can predict the iv that's going to be used for the next encryption okay so let's say that iv is called well iv So next the adversary is going to issue his semantic security challenge and the message m0 is going to be the predicted iv xor iv1 which was used in the encryption of c1 and then the message m1 is just going to be
some other message it doesn't really matter what it is so now let's see what happens when the adversary receives the resulting semantic security challenge Well he's going to get the encryption of m0 or the encryption of m1 so when the adversary receives the encryption of m0 tell me what is the actual plaintext that's encrypted in the ciphertex c well so the answer is that what's actually encrypted is the message which is iv xor iv1 xor the iv that's used to encrypt that message which happens to be iv And this of course is iv1 so when
the adversary receives the encryption of m0 he's actually receiving the block cipher encryption of iv1 and lo and behold you notice that he already has that value from his chosen plaintext query and then when he's receiving the encryption of a message and one he just receives a normal cbc encryption of the message m1 so you Realize that now he has a simple way of breaking the scheme namely what he'll do is he'll say he's going to ask is the second block of the ciphertextc equal to the value that i received in my cpa query if
so i'll say that i received the encryption of m0 otherwise i'll say that i received the encryption of m1 so really his test is c1 he refers to the second block of c and c One one refers to the second block of c one if the two are equal he says zero and otherwise he says one so the advantage of this adversary is going to be one and as a result he completely breaks cpa security of the cbc encryption so the lesson here is if the iv is predictable then in fact there is no cpa
security and unfortunately this is actually a very common mistake in practice In particular even in ssl protocol in tls-11 it turns out that the iv for record number i is in fact the last ciphertext block of record number i minus one that means that exactly given the encryption of record number i minus one the adversary knows exactly what iv is going to be used for record number i very recently just last summer this was actually converted Into a pretty devastating attack on ssl we'll describe that attack when we talk about ssl in more detail but
for now i wanted to make sure you understand that when you use cbc encryption it's absolutely crucial that the iv be random okay so now i'm going to show you the non-space version of cbc encryption so in this mode the iv is replaced by a non-random but unique nonce For example the numbers one two three four five could all be used as a nonce and now the appeal of this mode is that if the recipient actually knows what the nonce is supposed to be then there's no reason to include the nonce in the ciphertext in
which case the ciphertext is exactly the same length as the plaintext unlike cbc with a random iv where we had to expand the ciphertext to include the Iv here if the nouns is already known to the recipient there's no reason to include it in the ciphertext and the ciphertext is exactly the same length as the plaintext so it's perfectly fine to use a non-random but unique nonce however it's absolutely crucial to know that if you do this there's one more step that you have to do before you use the Nonce in the cbc chain in
particular in this mode now we're going to be using two independent keys k and k1 the key k is as before going to be used to encrypt the individual message blocks however this key k1 is going to be used to encrypt the non-random but unique nonce so that the output is going to be a random iv which is then used in the cbc chain so this extra step here of encrypting the Nonce with the key k1 is absolutely crucial without it cbc mode encryption would not be secure in particular if you just directly use the
non feed that into cbc encryption in other words you use the nonce as the iv then we already know that a non-random nonce would not be cpa secure we saw that on the previous slide but in fact even if you set k to be equal to k1 In other words you just encrypt the nonce using the key k that also well is not going to be cpa secure and can lead to significant attacks so i want to make sure you understand that if the nonce in cbc mode encryption is not random this extra encryption step
has to take place and you know so i'll make this extra step kind of dance here just to make sure you never forget to put it in And i'll tell you that this is an extremely common mistake in practice there are many real world products and crypto libraries that actually forget to encrypt the non-random nonce before using it in the cbc chain and that actually leads to a practical and significant attack for example this was not done in tls tls as we said used predictable ivs and that led to a significant attack on tls Moreover
the reason this is so important to know is that in fact many crypto apis are set up to almost deliberately mislead the user into using cbc incorrectly so let's look at how cbc is implemented inside of openssl so here are the arguments to the function basically this is the plain text this is the place where the ciphertext will get written to this is the length of the plain text This is the aes key finally there's an argument here that says whether you're encrypting or decrypting and the most important parameter that i wanted to point out
here is the actual iv and unfortunately the user is asked to supply this iv and the function uses the iv directly in the cbc encryption mechanism it doesn't encrypt The iv before using it and as a result if you ever call this function using a non-random iv the resulting encryption system won't be cpa secure okay so it's very important to know that when calling functions like this cbc encryption and openssl either supply a truly random iv or if you want the iv to be a counter then you have to encrypt the counter using aes before
you Actually call a cbc encrypt and you have to do that yourself so again it's very important that the programmer knows it needs to be done otherwise the cbc encryption is insecure one last technicality about cbc is what to do when the message is not a multiple of the block cipher block length that is what do we do if the last message block is shorter than the block length of aes for example So the last message block is less than 16 bytes and the answer is that we add a pad to the last block so
that it becomes as long as 16 bytes as long as the aes block size and this pad of course is going to be removed during encryption so here's a typical path this is the path that's used in tls basically if you pad with n bytes then essentially what you do is You write the number n n times so for example if you pad with five bytes you pad with the string five five five five five so five bytes for each byte is the value five and the cute thing about this pad is basically when the
decrypter receives the message what he does is he looks at the last byte Of the last block so suppose that value is five then he simply removes the last five bytes of the message now the question is what do we do if in fact the message is a multiple of 16 bytes so in fact no pad is needed if we don't pad at all then that's a problem because the decrypter is going to look at the very last byte of the last block which is now part of the actual message And he's going to remove
that many bytes from the plain text so that actually would be a problem so the solution is if in fact there is no pad that's needed nevertheless we still have to add a dummy block and since we add this dummy block this would be a block that basically contains 16 bytes each one containing the number 16. okay so we add essentially 16 dummy Blocks the decrypter that when he's decrypting he looks at the last byte of the last block he sees that the value is 16 therefore he removes the entire block and whatever is left
is the actual plaintext so it's a bit unfortunate that in fact if you're encrypting short messages with cbc and the messages they happen to be say 32 bytes so they are a multiple of 16 bytes Then you have to add one more block and make all these ciphertexts be 48 bytes just to accommodate the cbc padding i should mention that there's a variant of cbc called cbc with ciphertext stealing that actually avoids this problem but i'm not going to describe that here if you're interested you can look that up online okay so that's the end
of our discussion of cdc and in the next segment we'll see how to use counter mode To encrypt multiple messages using a single key in this segment we're going to look at another method to achieve chosen plaintext security that's actually superior to cdc and this method is called randomized counter mode unlike cbc randomized counter mode uses a secure prf it doesn't need a block cipher it's enough for counter mode to just use a Prf because we're never going to be inverting this function f so we're going to let f be the secure prf and it
acts on n bit blocks again if we use aes n would be 128 and the way the encryption algorithm works in counter mode is it starts off by choosing a random iv it's a 128-bit random iv in the case of aes and then essentially we start counting From this random iv so you notice the first encryption is of iv then of iv plus one up to iv plus l so we generate this random pad we xor the result with the message and that gives us the ciphertext and as usual you notice that the iv here
is included along with the cipher text so that in fact the cipher text is a little longer than the original plain text and the point of course is that the Encryption algorithm chooses a new iv for every message and so even if i encrypt the same message twice i'm going to get different resulting cipher texts one thing to note is that this mode is completely paralyzable unlike cbc cbc was sequential in other words you couldn't encrypt block number five until you've encrypted blocks number one to four so hardware companies who might have multiple aes engines
Working in parallel cannot actually use those aes engines when using cbc because cbc is inherently sequential so even though you might have two or three or four aes engines you can only use one of them when doing cbc encryption with counter mode everything is completely paralyzable if you have three aes engines encryption basically will work three Times as fast so that's the beauty of counter mode encounter mode also has a corresponding non-spaced counter mode where the iv is not truly random but rather is just a nonce which could be a counter and the way you
would implement non-space counter mode is you would take the 128 bits block that's used in aes and then you would split it in two you would use The left 64 bits as the nonce so the counter say would count from zero to two to the 64. and then that will be the nonce part of it and then once you specify the nonce the lower order 64 bits would be doing the counting inside of the counter mode encryption okay so nonce goes on the left and the common mode encryption counter goes on the right and it's
perfectly fine if this Nonce is unpredictable the only restriction is that you encrypted most two to the 64 blocks using one particular nonce the danger is that you don't want this counter to reset to zero so that then you will have two blocks say this guy and this guy that are encrypted using the same one-time pad namely this one and this one so let's quickly state the security Theorem for randomized counter mode by now you should be used to these kind of theorems basically we are given a secure prf what we end up with is
an encryption scheme we'll call it e sub ctr e sub counter mode which is semantically secure under a chosen plain test attack it encrypts messages that are l blocks long and produce a ciphertext that are l plus one blocks long Because the iv has to be included in the ciphertext this is for randomized counter mode and then the error bounds are stated over here it's basically the same bounds as in the case of cbc encryption as usual we argue that this term is negligible because the prff is secure and we would like to deduce from
that that this term is negligible So that ectr is secure unfortunately we have this error term here and so we have to make sure this error term is negligible and for that we have to make sure that q squared l is less than the size of the block we remember q is the number of messages encrypted under a particular key and l is the maximum length of those messages now interestingly in the case of cbc we had q squared L squared has to be less than x which is actually worse than we have for counter
modes in other words counter modes can actually be used for more blocks than cbc could and let's see a quick example of that so here's again the error term for counter mode remember q is again the number of messages encrypted with the key and l is the length of those messages And as before just as in the case of cbc suppose we want the adversary's advantage to be at most one or two to the 32 that basically requires that this q squared l over x be less than 1 over 2 to the 32 and so
for a yes what happens is if you plug in the values x is 2 to the 128 128 bit blocks so q times square root of l should be less than 2 to the 48th this is basically the boundary you get From plugging in 2 to the 128 into this bound here and as a result you can see if you're encrypting messages that are each 2 to the 32 blocks then after two to the 32 sets messages you have to replace your secret key otherwise randomized counter mode is no longer cpa secure so this means
we could encrypt a total of two to the 64 aes blocks using a single secret key remember for Cbc corresponding value was 2 to the 48 blocks so in fact because camera mode has a better security parameterization in fact we can use the same key to encrypt more blocks with counter mode than we could with cbc so i wanted to do a quick comparison of counter mode and cbc and argue that in every single aspect counter mode is superior to cbc and that's actually why most modern encryption schemes actually are starting To migrate to counter
mode and abandon cbc even though cbc is still quite widely used so let's look at the comparison first of all recall that cbc actually had to use a block cipher because if you look at the decryption circuit the decryption circuit actually ran the block cipher in reverse it was actually using the decryption capabilities of the block cipher whereas counter mode we only need a prf We never ever use the decryption capabilities of the block cipher we only use it in the forward direction only encrypt with it because of this counter mode is actually more general
and you can use primitives like salsa for example salsa 20 if you remember as a prf but it's not a prp so counter mode can use salsa but cbc cannot and in that sense counter Mode is more general than cbc counter mode as we said is actually parallel whereas cbc is a very sequential process we said the counter mode is more secure the security bounds the error terms are better for counter modes than there are for cbc and as a result you can use a key to encrypt more blocks in counter mode than you could
with cbc the other issue is remember in cbc we Talked about the dummy padding block if you have a message that's a multiple of the block length in cbc we said that we had to add a dummy block whereas in camera mode this wasn't necessary although i did want to mention that there is a variation of cbc called cbc with ciphertext stealing that actually avoids the dummy block issue so for standardized cbc we actually need a Dummy block but in fact there is a modification to cbc that doesn't need a dummy block just like counter
mode finally suppose you're encrypting just a stream of one-byte messages and using non-spaced encryption with an implicit nonce so the nonce is not included in the ciphertext in this case every single one byte message would have to be expanded into a 16 byte block and then encrypted And the result would be a 16 byte block so if you have like a stream of 100 one byte messages each one separately would have to become a 16 by block and you'll end up with a stream of 100 16 byte cipher texts so you get a 16x expansion
on the length of the ciphertext compared to the length of the plaintext in counter mode of course this is not a problem you would just encrypt Each one byte message by xoring it with the first byte of the stream that's generated in counter mode so every cipher text would just be one byte just like the corresponding plain text and so no expansion at all using counter mode so you see that essentially in every single aspect counter mode dominates cbc and that's why it's actually the recommended mode to be using today Okay so this concludes our
discussion of chosen plaintext security i want to just quickly summarize and remind you that we're going to be using these prp and prf abstractions of block ciphers routes this is actually the correct way of thinking of block ciphers and so we'll always think of them as either pseudo-random permutations or pseudorandom functions and then i wanted to remind you again so far we saw Two notions of security both only provide security against eavesdropping they don't provide security against tampering with the ciphertext one was used when the key is only used to encrypt a single message the
other one was used when the key was used to encrypt multiple messages and as we said because neither one is designed to defend against tampering neither one provides data integrity and we're going to see that this is a real Problem and as a result in fact i'm going to say in the next segment that these modes actually should never ever be used you should only be using these modes in addition to an integrity mechanism which is our next topic okay so so far we've seen basically if you're using the key ones you can use stream
ciphers so you can use deterministic counter mode If you're going to use the key many times you could use randomized cbc or randomized camera mode and we're going to talk about how to provide integrity and confidentiality once we cover the topic of integrity which is our next module in this module we're going to stop talking about encryption and instead discuss message integrity next we will come back to encryption and show how to provide both encryption And integrity so as i said our goal here is to provide integrity without any confidentiality and there are in fact
many cases in the real world where this comes up for example you can think of operating system files on your disk say if you're using windows all the windows operating system files on disk are not confidential they're public and known to the world but it is quite important to make sure That they're not modified by a virus or some malware that's an example where you want to provide integrity but you don't care about confidentiality another example is banner ads on web pages the provider of the ads doesn't care at all if someone copies them and
shows them to other people so there's no confidentiality issue at all but they do care about modifying those ads so for example they do Want to prevent people from changing the ads into different types of ads so that's another example where integrity matters but confidentiality is not important at all so how do we provide message integrity the basic mechanism is what's called a mac a message authentication code mac and the way we do it is as follows so here we have our friends alice and bob they have a shared key k which is not Known
to the attacker but known to both of them and there's a public message m that alice wants to send to bob such that an attacker along the way cannot modify this message on its way to bob the way alice does it is by using what's called a mac signing algorithm well denoted by s where the max signing algorithm takes as input the key and the message and produces a very Short tag the tag could be like 90 bits or 100 bits or so on even though the message is gigabytes long the tag is actually very
very short and then she appends the tag to the message and sends the combination of the two to bob bob receives the message in the tag and then he runs what's called the mac verification algorithm on this tag so the mac verification Algorithm takes his input the key the message and the tag and it says basically yes or no depending on whether the message is valid or whether it's been tampered with okay so more precisely what is a mac well we said the mac basically consists of two algorithms a signing algorithm and a verification algorithm
as usual they're defined over a key space a message space and a tag space and as we said it's a Pair of algorithms so the signing algorithm will output a tag in the tag space and the verification algorithm basically given the key the message and the tag will output yes or no and i want to remind you as usual there are these consistency requirements such that for every k in the key space and for every message in the message space it so happens that if i sign the message using a particular key And then i
verify the tag using the same key i should get yes in response so this is the standard consistency requirement which is the analog of the one that we saw for encryption now one thing i'd like to point out is that integrity really does require a shared key between alice and bob and in fact there's a common mistake that people make where they try to provide integrity without actually a Shared key so here's an example so consider crc crc stands for cyclic redundancy check this is a classic checksum algorithm that's designed to detect random errors and
messages so imagine instead of using a key to generate a tag alice basically uses a crc algorithm which is keyless doesn't take any key to generate a tag and then she appends this tag to the message She sends it over to bob bob will basically verify that the crc is still correct in other words bob will still verify the tag is equal to crcm and if so the verification algorithm would say yes and otherwise the verification algorithm will say no so the problem with this is this is very easy for an attacker to defeat in
other words an attacker can very easily modify the Message and route and fool bob into thinking that the new message is a valid one the way the attacker will do it is he'll cancel the message in the tag he'll simply block them and then he'll produce his own message m prime and compute his own crc on this message m prime and then send the concatenation of the two over to bob bob will run the verification algorithm verification Will work properly because in fact the right hand side is in fact a valid crc for the left
hand side and as a result bob would think that this message came from alice but in fact has been completely modified by the attacker and has nothing to do with the original message that alice sent okay so the problem is because crc doesn't have a key there's no difference between alice and the attacker And as a result bob doesn't know where the message came from once we introduce a key now alice can do something that the attacker can't do and as a result she might be able to compute a tag that the attacker can't modify
so the point to remember is that crc as we said is designed to detect random errors not malicious errors and here our goal is to make sure that even a malicious Attacker cannot modify messages and routes so next we want to define what it means for a mac system to be secure so as usual we define security in terms of the attacker's power what can the attacker do than the attacker's goal what is he trying to do so in the case of max the attacker's power is what's called a chosen message attack in other words
the attacker can give alice Arbitrary messages of his choice m1 to mq and alice will compute the tag for him on those messages and give him those tags so again you might wonder why would alice ever do that why would alice ever compute the tag on a message that the attacker gave her so just like in the case of chosen plaintext attack it's very common in the real world that the attacker can give alice a Message alice will compute the tag on that message and then the attacker will obtain the resulting tag for example the
attacker might send alice an email alice might want to save the email to disk in a way that will prevent someone from tampering with the disk so she'll compute a tag on the message and save the message in a tag to disk later on the attacker might steal Alice's disk and now he's recovered alice's tag on the message that he sends to alice so this is an example of a chosen message attack in the real world where the attacker actually obtained a tag on a message that he gave alice okay so that's what the attacker
can do basically this chosen message attack and what is his goal well his goal is to do something called an existential forgery In other words what he's trying to do is to produce some some new valid message tag pair okay so some message tag pair that's different from one of the pairs that were given to him during the chosen message attack and if he can do that then we say that the system is insecure and if he can't then we'll say the system is secure so i want to emphasize this existential Forgery means that the
attacker cannot produce a new message tag pair even for a message that's completely gibberish and again you might wonder well why do we care if the attacker computes a tag on a message that's gibberish that's not of any values of the attacker but we want to build macs that are secure under any usage settings and there are in fact cases where for example you might want to compute an Integrity tag on a random secret key in which case even if the attacker is able to compute a tag on a completely random message he might be
able to fool a user into using the wrong secret key and as a result we want to make sure that if the mac is secure the attacker can't produce a valid tag for any message whether it's gibberish or sensical another property that's implied by the security Definition is if the attacker is given some message tag pair he shouldn't be able to produce a new tag for the same message in other words even though there might be another tag t prime for the same message m the attacker given m and t shouldn't be able to find
this new t prime and again you might wonder well why do We care if the attacker already has a tag on the message m why does it matter if he can produce another tag for the message m he already has one tag but as we'll see there are actually applications where it's really important that the attacker not be able to produce a new tag for a previously signed message in particular this will come up when we combine encryption and integrity So we're going to demand that given one tag in the message it's impossible to find
another tag for the same message okay so now that we understand the intuition of what we're trying to achieve let's define it as usual using a more precise game so here we have two algorithms s and v and we have an adversary a and the game proceeds as follows the challenger as usual just chooses a Random key for the mac and then the adversary basically does his chosen message attack so he submits an m1 to the challenger and receives the tag on that message m1 then he submits an m2 to the challenger and receives the
tag on that m2 and so on and so forth until you know he submits queue messages to the adversary and receives cue tags on all those messages so that's the chosen message attack part And then the adversary goes ahead and tries to do an existential forgery namely he outputs a message tag pair a new message tag pair and we say that he wins the game in other words b is equal to one means that he wins the game if first of all the message tag pair that he outputs is a valid message tag pair so
the verification algorithm says yes and second of all it's a fresh Message tag pair it's in other words it's not one of the message tag pairs that we gave him before in other words we say that the attacker lost the game namely b is equal to zero and as usual we say we define the advantage of an adversary as the probability that the challenger outputs one in this game and we say that the max system is secure if for all efficient adversaries the advantage is negligible Okay in other words no efficient adversary can win this
game with non-negligible probability all right that's our definition of secure max and our goal is to build a secure max like this before we do that i want to ask you two questions so the first question is suppose we have a mac and it so happens that the attacker can find two messages m0 and m1 that happen to have the same tag For about half of the keys in other words if you choose a key at random with probability one half the tag of the message m0 will be the same on the tag and the
message m1 and my question to you is can this be a secure mac so i want to emphasize the attacker doesn't know what the tag on m0 and m1 is all he knows is that the two messages happen to have the same tag with probability one half so the Question is whether this is a secure mac so the answer is no this is not a secure mac and the reason is because of the chosen message attack essentially the attacker can ask for the tag on the message m0 and then he'll receive m0 comma t from
the challenger and in fact t would be a valid tag for the message m0 and then what he would output as his existential forgery is m1 comma t and you notice m1 comma t Is different from m0 comma t so this is a valid existential forgery and as a result the attacker wins the game with advantage one half so we conclude that this mac is not secure the second question i'd like to ask you is suppose we have a mac that happens to always output a 5-bit tag in other words the tag space for this
mac happens to be 0 1 to the 5. so for every Key and for every message the signing algorithm will just output a 5 bit tag and the question is can this mac be secure of course the answer is no because the attacker can simply guess the tag so what he would do is he wouldn't ask any chosen message attacks all he would do is he would output an existential forgery as follows he would just choose A random tag so choose a random tag t at random in zero one to the five and then he
would just output his existential forgery as say the message zero and the tag t and now with probability one over two to a five uh this tag will be a valid tag for the message zero and so the adversaries advantage is one over 32 which is non-negligible so this Basically says that tags can't be too short they have to have some length to them and in fact the typical tag length would be say 64 bits or 96 bits or 128 bits tls for example uses tags that are 96 bits long if you try to guess
the tag for a message when the tag is 96 bits the probability of guessing it correctly is 1 over 2 to the 96 So the adversary's advantage would just be 1 over 2 to the 96 which is negligible so now that we understand what macs are i want to show you a simple application in particular let's see how macs can be used to protect system files on disk so imagine that when you install an operating system say when you install windows on your machine one of the things that windows does is it asks the user
for a password and then Derives a key k from this password and then for every file that arrives to disk in this case the files would be f1 f2 up to fn what the operating system would do is it would compute a tag for that file and then store the tag along with the file so here it concatenates the tag to each one of the files and then it erases the key so it no longer stores The key on disk or in memory or anywhere okay so now later imagine that the machine gets infected with
a virus and the virus tries to modify some of the system files the question is whether the user can detect which files were modified so here's one way to do it basically the user would reboot the machine into some clean os that you reboot from a usb disk or something And then once the machine boots from this clean os the user would supply his password to this clean running operating system and then this new clean running operating system would go ahead and check the mac for each one of the system files now the fact that
the mac is secure means that the poor virus couldn't actually create a new file let's call it f prime with a Valid tag so it couldn't actually create an f prime t prime because if it could then that would be an existential forgery on this mac and because well the mac is existentially unforgeable the virus couldn't create any f prime no matter what the f prime is and consequently because of the security of the mac the user will detect all the files that were modified By the virus now there's one caveat to that one thing
that the virus can do is actually swap two files so for example he can swap this file file f1 with the file f2 here just literally swap them so when the system or when the user tries to run file f1 instead they'll be running file f2 and of course that could cause all sorts of damage and so the way to defend Against that is essentially by placing the file name inside of the mack area so that in fact we're computing an integrity check on the file name as well as on the contents of the file
and as a result if the virus tries to swap two files the system will say hey the file that's located in position f1 doesn't have the right name and therefore it will detect that the virus Did the swap even though the mac actually verifies so it is important to remember that macs can help you defend against file tampering or data tampering in general but they won't help you defend against swapping of authenticated data and that has to be done by some other means okay so that concludes our introduction to max and in the next segment
we'll go ahead and construct our first examples of secure max now that we know what macs Are let's go ahead and build our first secure max first i want to remind you that a mac has a pair of algorithms the first is a signing algorithm that's given a message and a key will generate a corresponding tag and the second is verification algorithm the given a key a message and a tag while outputs 0 or 1 depending on whether the tag is valid or not and we said that a mac is secure if it Is existentially
unforgivable under a chosen message attack in other words the attacker is allowed to mount a chosen message attack where he can submit arbitrary messages of his choice and obtain the corresponding tags for those messages and then despite the ability to generate arbitrary tags the attacker cannot create a new message tag pair that was not given to him During the chosen message attack okay so we've already seen this definition in the last segment and now the question is how do we build securemax so the first example i want to give you is basically showing that any
secure prf directly gives us a secure mac as well so let's see how we do it so suppose we have a pseudorandom function so the pseudorandom function takes inputs at x And outputs outputs and y and let's define the following mac so the way we sign a message m is basically by simply evaluating the function at the point m so the tag for the message m is simply the value of the function at the point m and then the way we verify a message tag pair is by recomputing the value of the function at the
message m and checking whether That's equal to the tag that was given to us we say yes if so and we say and we reject otherwise so here you have it in pictures basically when alice wants to send a message to bob she computes a tag by evaluating the prf and then she appends this tag to the message bob receives the corresponding message tag pair he recomputes the value of the function and tests Whether the given tag is actually equal to the value of the function at the point m so let's look at a bad
example of this instruction and so suppose that we have a pseudorandom function that happens to output only 10 bits okay so this is a fine pseudorandom function it just so happens that for any message m it only outputs 10 bit value my question to you is if we use this Function f to construct a mac is that going to be a secure mac so the answer is no this mac is insecure in particular because the tags are too short so consider the following simple adversary what the adversary will do is simply choose an arbitrary message
m and just guess the value of the mac for that particular message now because the tag is only 10 bits long The adversary has a chance of one and two to the 10 in guessing the mac correctly in other words the advantage of this guessing adversary one that simply guesses a random tag for a given message that adversary is going to have advantage against the mac that's basically 1 over 2 to the 10 which is 1 over 1024 and that's definitely non-negligible so the adversary Basically will successfully forge the mac on a given message with
probability one in a thousand which is insecure however it turns out this is the only example that where things can go wrong only when the output of the function is small can things go wrong if the output of the prf is large then we get a secure mac out of this function and let's state the security theorem here So suppose we have a function f that takes messages in x and outputs tags in y then the max that's derived from this prf in fact is a secure mac and in particular if you look at the
security theorem here you'll see very clearly the error bounce in other words since the prf is secure we know that this quantity here is negligible And so if we want this quantity to be negligible this is what we want we want to say that no adversary can defeat the mac i sub f that implies that we want this quantity to be negligible in other words we want the output space to be large and so for example taking a prf that outputs 80 bits is perfectly fine that will generate an 80-bit mac and therefore the advantage
of any adversarial will be at Most one over two to the 80. so now the proof of this theorem is really simple so let's just go ahead and do it so in fact let's start by saying that suppose instead of a prf we have a truly random function from the message space to the tag space so this is just a random function from x to y that's chosen at random from the set of all such functions Now let's see if that function can give us a secure mac so what the adversary says is i want
the tag on the message m1 what he gets back is the tag which just happens to be the function evaluated at the point m1 notice there's no key here because f is just a truly random function from x to y and then the adversary gets to choose a message m2 and he obtains the tag on m2 he chooses m3 and 4 up to mq and he Obtains all the corresponding tags now his goal is to produce a message tag pair and we say that he wins remember if this is an existential forgery in other words
first of all t has to be equal to f m this means that t is a valid tag for the message m and second of all the message m has to be new so the message m had better not be one of m one to m q but let's just think about this for a Minute what does this mean so the adversary got to see the value of a truly random function at the points m1 to mq and now he's supposed to predict the value of this function at some new point m however for a
truly random function the value of the function at the point m is independent of its value at the points m1 to mq so the best the adversary can do at predicting the value of the function at The point m is just guess the value because he has no information about f m and as a result his advantage if he guesses the value of the function of the point m he'll guess it right with probability exactly one over y and then this tag that he produced will be correct with probability exactly one over y okay again
he had no information about The value of the function at m and so the best he can do is guess and if he guesses he'll get it right with probability 1 over y and now of course because the function capital f is a pseudo random function the adversary is going to behave the same whether we give him the truly random function or the pseudorandom function the adversary can't tell the difference and as a result even if we use a Pseudorandom function the adversary is going to have advantage at most one over y in winning the
game okay so you can see exactly in the security theorem let's go back there for just a second essentially this is basically that why we got an error term of one over y because of the guessing attack and that's the only way that the attacker can win the game so now that we know that any secure prf is also a Secure mac we already know that we have our first example mac in particular we know that a yes or at least we believe that aes is a secure prf therefore a yes since it takes 16
byte inputs right the message space for aes is 128 bits which is 16 bytes therefore the aes cipher essentially gives us a mac that can map messages that are exactly 16 Bytes okay so that's our first example of a mac but now the question is if we have a prf for small inputs like aes that only acts on 16 bytes can we build a mac for big messages that can act on gigabytes of data sometimes i call this the mcdonald's problem basically given a small mac can we build a big mac out of it in
other words given a mac for small Messages can we build a mac for large messages so we're going to look at two constructions for doing so the first example is called a cbc mac that again takes prf for small messages as inputs and produces a prf for very large messages as outputs the second one we'll see is hmac which does the same thing again takes a prf for small inputs and generates a prf for very large Inputs now the two are used in very different contexts cbc mac is actually very commonly used in the banking
industry for example there's a system called the automatic clearinghouse ach which banks use to clear checks with one another and in that system cbc mac is used to ensure integrity of the checks as they're transferred from bank to bank on the internet protocols like ssl and ipsec And ssh those all use hmac for integrity okay so these are two different macs and we're going to discuss them in the next couple of segments and as i said both of them start from a prf for small messages and produce an error for messages that are gigabytes long
and in particular they can both be instantiated with aes as the underlying cipher so the last comment i want to make about these crf-based macs Is that in fact their output can be truncated so suppose we have a prf that outputs n bit outputs so again for aes this would be a prf that outputs 128 bits as outputs it's an easy lemma to show that in fact if you have an n-bit prf if you truncate it in other words if you only output first t-bits the result is also a secure prf and just the intuition
here Is if the big prf outputs n bits of randomness for any inputs that you give to the prf then certainly chopping it off and truncating it to uh t bits is still going to look random the attacker now gets less information so his job in distinguishing the outputs from random just became harder in other words if the n bit prf is secure then the t less than n bit prf the truncated prf Would also be secure so this is an easy lemma but since any secure prf also gives us a secure mac what this
means is if you give me a mac that's based on a prf what i can do is i can truncate it to w bits however because of the error term in the mac based prf theorem we know that truncating to w bits will only be secure as long as one over two so the w is negligible so if you truncate the prf to Only three bits the resulting mac is not going to be secure however if you truncate it to say 80 bits or maybe even 64 bits then a resulting mac is still going to
be a secure mac okay so the thing to remember here is that even though we use aes to construct larger prfs and the output of these prfs is going to be 128 bits it doesn't mean that the mac itself has to produce 128 Bit tags we can always truncate the output to 90 bits or 80 bits and as a result we would get still secure max but now the output tag is going to be more reasonable size and doesn't have to be the full 128 bits okay so in the next segment we're going to look
at how the cbc mac works in this segment we're going to construct two classic macs the cbc mac and the nmac Recall that in the last segment we said that if you give me a secure prf then that secure prf can actually be used to construct a secure mac simply by defining the signature on a message m as the value of the function at the point m the only caveat was that the output of the prff had to be large for example it could be 80 bits or 128 bits and that will generate a secure
mac now we also said that because aes is a Secure prf essentially aes already gives us a secure mac except that it can only process 16 byte messages and the question now is given a prf for short messages like aes for 16 byte messages can we construct a prf for long messages that are potentially gigabytes long it's just shorthand for what's coming i'm going to denote by X the set 0 1 to the n where n is then block size for the underlying prf so since we're always going to be thinking of aes as the
underlying prf you can think of n as essentially 128 bits so our first construction is called the encrypted cbc mac or ecbc for short encrypted cbc mac so ecbc uses a prf that takes messages in the set x 0 1 to the n and outputs messages in the same set x and What we're going to be building is a prf that basically takes pairs of keys it takes very long messages in fact arbitrarily long messages and i'll explain this in just a second and it outputs also tags in 0 1 to the n so that's
our goal now what is this x to the less than or equal to l the point here is that in fact cbc mac can take very long messages up to l blocks l could be a million or a billion But it can also take variable length messages as inputs in other words x less than or equal to l means that we allow the inputs to be messages that contain an arbitrary number of blocks between one and l so ecbc can process messages that are one block long two blocks long ten blocks long 100 blocks long
it's perfectly fine to give it variable size inputs but just to keep the Discussion simple we upper bound the maximum length by capital l so let's see how ecbc works well we start by taking our message and breaking it into blocks each block is as long as a block of the underlying function f and then essentially we run through the cbc chain except that we don't output intermediate values so you notice we basically encrypt the first block and then feed The results into the xor with the second block and then feed that into f again
and we do that again and again and again and finally we get a value out here which is called the cbc outputs of this long chain and then i'd like to point your attention to the fact that we do one more encryption step and this step is actually done using an independent key k1 that's different and chosen Independently of the key k and finally the output gives us tag so in this case the tag would be n bits long but we also mentioned in the previous segment that it's fine to truncate the tag to less
than n bits long as long as one over two to the n is negligible so now you can see that f ecbc takes a pair of keys as inputs it can Process variable length messages and it produces an output in the set x so you might be wondering what this last encryption step is for and i'll tell you that the function that's defined without the slash encryption step is called the raw cbc function in other words if we simply stop processing over here and we take that as the output that's called raw cbc and as
we'll see in a minute raw cbc is actually not a Secure mac so this last step is actually critical for making the mac secure so another classic construction for converting a small prf into a large prf is called nmac for nested mac now the nmac starts from a prf that as before takes inputs in x but outputs elements in the key space remember that for cbc the output has to be in the set x here The output needs to be in the key space k and again we basically obtain the prf f and mac which
takes pairs of keys as input again can process variable length messages up until l blocks and the output is an element in the key space and the way nmac works is kind of starts as before we take our message and we break it into Blocks each block is again as big as the block length of the underlying prf and now we take our key and we feed our key as the key input into the function f and the message block is given as the data input into the function f what comes out is the key
for the next block of nmac so now we have a new key for the next evaluation of the prf And the data for the next evaluation is the next message block and so on and so forth until we reach the final output the final output is going to be an element in k okay and just as before in fact if we stop here the function that we obtain is called a cascade function and we're going to look at cascade in more detail in just a minute so cascade will output an element in k However that
as we'll see again is not a secure mac to get a secure mac what we do is we need to map this element t which is in k into the set x and so typically as we'll see nmac is used with prfs where the block length x is much bigger than the key length and so what we do is we simply append fixed pad f-pad is called a fixed pad that gets appended to this Tag t and then this becomes this input here this block here becomes an element of x we feed this into the
function and again notice here that there's an independent key that's being used for the last encryption step and then finally the last tag is an element of k which we output as the output of nmac so remember without the last encryption step the function Is called the cascade with the last encryption step as we'll see which is necessary for security we actually get a prf which outputs elements in k and can process variable length messages that are up to l blocks long all right so that's the nmac construction so now we have two macs that
we can use to build a large prf from aes basically so before we analyze The security of these mac instructions i'd like you to understand better what the last encryption step is for so let's start with nmac so i claim that it's actually very easy to see that if we omitted the last encryption step in other words if we just use the cascade function then the mac would be completely insecure okay so suppose we look at this mac defined over here in other words all we do is we output Directly the cascade applied to m
without the last encryption step so let me ask you how would you forge tags for this mac and i guess i've kind of given it away that this answer is incorrect so i hope everybody sees that the answer is that in fact given one chosen message query you can mount an existential forgery and the reason is i'll show you in a second in the diagram But let me write it out in symbols first the reason is if you give me the output of the cascade function applied to a message m i can derive from it
me being the adversary i can derive from it the cascade applied to the message m concatenated w for any message w for any w so first of all it should be clear that this is enough to mount an existential forgery because Essentially by asking for a tag on this message i obtain the tag on this longer message which i can then output as my forgery okay so the mac is insecure because i'm given a mac and one message i can produce the mac on another message but let's go back to the diagram describing cascade and
see why this is true and so let's see what happens if this last step isn't there As an attacker what i can do i can add one more block here which we called w and then basically i can take the output of cascade which is this value t and i can simply apply the function f to it again so here i'll take this t value i'll plug it into f and plug my last block w into it and what i'll get is t prime which is well the evaluation of cascade on the message m concatenated
w and now i've calculated T prime which i can use for my existential forgery okay so this kind of shows you why this property of cascade halls this is called an extension attack where given the tag of the message m i can deduce the tag for an extension of m in fact for any extension of my choice so basically cascade is completely insecure if we don't do this last encryption step and the last encryption step here Basically prevents this type of extension attack i can tell you by the way that in fact extension attacks are
the only attacks on cascade and that can be made precise but i'm not going to do that here the next question is why did we have the extra encryption block in the ecbc construction so again let me show you that without this extra encryption block ecbc Is insecure so let's define a mac that uses raw cbc in other words it's the same as cbc mac but without the last encryption step and let's see that that mac is also insecure except now the attack is a little bit more difficult than a simple extension attack so suppose
the attacker is given this value the raw cbc value for a particular message m and now the attacker wants to extend and Compute the mac on some message m concatenated with some word w so here's our w well the poor attacker can take this value and try to xor the two together but now you realize he has to evaluate the function at this point but he doesn't know what this key k is and as a result he doesn't know what the output of the function is so he simply can't just append a block w And
compute the value of raw cbc on m concatenated w however it turns out that he can actually evaluate this function by using the chosen message attack and i want to show you how to do that okay so we said that basically so our goal is to show that raw cbc is insecure so let's look at a particular attack so in the attack the administrator is going to start by requesting The tag on a particular message m that's a one block message so what does it mean to apply cbc to a one block message well basically
all you do is you just apply the function f directly so what you get is the tag which is just the application of f directly to the one block message m good so now the adversary has this value t and now i claim that he can define this message m Prime which contains two blocks the first block is m the second block is t xor m and i claim that the value t that he just received is a valid tag for this two block message m prime so let's see why that's true well so suppose
we apply the raw cbc construction to this message m prime so if you plug it in directly What let's see so the way it would work is first of all the first message m is processed by encrypting it directly using the function f then we xor the result with the second block which is t x or m and then we apply f to the results of that that is the definition of raw cbc now what do we know about fkm fkm is simply this value t by definition so the next step we get is Essentially
t x or t x or m but this t x or t simply goes away and what we get is f k of m which is of course t and as a result t is a valid mac for the two block message m comma t xor m so the adversary was able to produce this valid tag t for this two block message that he never queried and therefore he was able to break the Map so let's look at the ecbc diagram for just one more second and let me point out that if you don't include
this last encryption step in the computation of the mac essentially the mac would be insecure because of the attack that we just saw and i'll tell you that there are many products that do this incorrectly and in fact there are even standards that do this incorrectly so that the Resulting mac is insecure you now know that this needs to be done and you won't make this mistake yourself so let's state this ecbc and nmac security theorems and so the statement is as usual for any message length that we'd like to apply the mac to essentially
for every prf adversary a there's an efficient adversary b and you know these are kind of the usual statements Where the facts that you need to know are the error terms which are kind of similar in both cases by the way i'd like to point out that the analysis for cbc actually uses the fact that f is a prp even though we never had to invert f during the computation of ecbc the analysis is better if you assume that f is actually a prp in other words it's invertible not just a function for nmac the
prf need not be Invertible so what these error terms basically say is that these macs are secure as long as key is not used to mac more than square root of x or square root of k messages so for as of course this would be a 2 to the 64. but i want to show you an example of how you would use these bounds and so here i stated the security Theorem again for the cbc mac and q here again is the number of messages that are marked with a particular key so suppose we want
to ensure that for all adversaries the adversary has an advantage less than 1 over 2 to the 32 in distinguishing the prf from a truly random function suppose that's our goal well by the security theorem what that means is we need to ensure that q squared over x Is less than 1 over 2 to the 32 right we want this term to be well i'm going to ignore this 2 here just for simplicity we want to ensure this term is less than 1 over 2 to the 32 and this term of course is negligible so
we can just ignore it and this would imply that this term is also less than 1 over 2 to the 32. okay so if we want to ensure that the advantage is less than 1 over 2 to the 32 we need to ensure that q squared over x is less than 1 over 2 32. for a yes basically this means that after macking 2 to the 48 messages you have to change your key otherwise you won't achieve the security level so you can map at most 2 to the 48 messages you notice that if i
plug in triple des which has a much shorter block only 64 bits the same result says you Now have to change your key every 65 000 messages so this basically is quite problematic whereas this is fine this is actually a very fairly large number this for a yes this means you have to change your key only every 16 billion messages which is perfectly reasonable and so this is one of the reasons why aes has a larger block than triple dead these modes remain secure and you don't have to change your Key as often as you
would with triple this so i want to show you that in fact these attacks are not just in the statements in the security theorem in fact there really are real attacks that correspond to these values the max really do become insecure after you sign you know square root of x or square root of k messages so i'm going to show you an attack on both prfs So either ecbc or nmac assuming that the underlying function is a prp is actually a block cipher like aes let's call f big let's say that f big is either
f e c b c or fnmac it's a f big means it's a prf for large messages now it turns out both constructions have the following extension property namely if you give me a collision on messages X and y then in fact that also implies a collision on an extension of x and y in other words if i append w to both x and y i'll also get a collision on the resulting words so it's fairly easy to convince yourself that this extension property holds and you do it just by staring at the diagram for
a second and so imagine i give you two messages that happen to collide at this point now Remember i assume that f is a prp so once you fix k1 it's a one-to-one function so if the two messages happen to map to the same value at the output this means they also happen to map to the same value at the output of the raw cbc function but if they map to the same value at the output of the raw cbc function that means that if i add another block let's call it w and i take
this output Here then i'm computing the same value for both messages and i'll get for both messages i'll get the same value at this point too and when i encrypt again with k1 i'll also get you know so there's one more f here i'll also get the same output final output after i've appended the block w okay so if the two values happen to be the same For two distinct messages if i appended block w to both messages i'm still going to get the same value out it's easy to convince yourself that the same is
true for nmac as well okay so both of these prfs have this extension property so based on this we can define an attack so here's the extension property stated again and the attack would work as follows suppose i issued a square root of y Chosen message queries so for aes remember the value of y is basically 0 1 to the 128. so this would mean that i would be asking 2 to the 64 chosen message queries for just arbitrary messages in the input space well what will happen is i'll obtain well i'll obtain 2 to
the 64 message mac pairs now we're going to see in the next module Actually that there's something called the birthday paradox some of you may have heard of it already that basically says that if i have 2 to the 64 random elements of a space of size 2 to 128 there's a good chance the two of them are the same so i'm going to look for two distinct messages mu and mv for which the corresponding max are the same okay and as i said by the Birthday paradox these are very likely to exist once i
have that now i've basically found m u and mv that have the same mac and as a result what i can do is i can extend mu with an arbitrary word w and ask for the tag for the word mu concatenated w but because mu and mv happen to have the same output i know that mu concatenated w has the Same output as mv concatenated w so now that i've obtained the output for mu concatenated w i also have the output for mv concatenated w and therefore i've obtained my forgery okay so now t is
also a forgery for the message mv concatenated w which i've never asked before and therefore this is a valid existential forgery Okay so this is kind of a cute attack and the bottom line here is that in fact after square root of y queries i am able to forge a mac with fairly good probability okay so what does square root of y mean if we go back to the security theorems this means that basically for ecbc after square root of x or for nmac after square root of k messages have been marked the mac becomes
insecure and the Attacker can actually find new macs for messages for which he was never given a knack for so again this is a cute attack that shows that the bounds of the theorem really are real and as a result these bounds that we derived in this example are real and you should never use a single key to map more than say 2 to the 48 messages with aes based cbc so to conclude i'll Just mention that we've seen two examples we saw ecbc and nmac ecbc is in fact a very commonly used mac that's
built from aes 802.11i for example uses ecbc with aes for integrity there's also a nist standard called cmac that we'll talk about in the next segment that also is based on cbc mac and mac and contrast is not typically used with a block cipher and The main reason is you notice that in the nmac construction the key changes from block to block that means that the whole aas key expansion has to be computed and re-computed on every block and aes is not designed to perform well when he changes key very rapidly and so typically when
you use nmac you use block ciphers that are better at changing their keys on every block and as a result nmac Typically is not used with aes but in fact nmac is a basis of a very popular mac called hmac which we're also going to look at next and you'll see very clearly the nmac underlying the hmac construction okay so that's the end of this segment and we'll talk about more macs in the next segment in the last segment we talked about the cbc mac and the nmac but throughout the segment we always assume that
the message length was a Multiple of the block size in this segment we're going to see what to do when the message length is not a multiple of the block size so recall that the encrypted cbc mac or ecbc mac for short uses pseudorandom permutation f to actually compute the cbc function as we discussed in the last segment but in the last segment we always assumed that the message itself could be broken into an integer number Of blocks for the block cipher and the question is what to do when the message length is not a
multiple of the block size so here we have a message where the last block actually is shorter than the full block and the question is how to compute the ecbc mac in that case so the solution of course is to pad the message and the first path that comes to mind is to simply pad the message with all zeros in other words we take the last block And just add zeros to it until the last block becomes as long as one full block size and so my question to you is whether the resulting mac is
secure so the answer is no the mac is not secure and let me explain why basically the problem is that it's possible now to come up with messages so that the message m and the message m concatenated 0 happen to have exactly the same pad and As a result once we plug in both m and m0 into ecbc we'll get the same tag out which means that both m and m concatenated 0 have the same tag and therefore the attacker can mount in his essential forgery he would ask for the tag on the message m
and then he would output it as forgery the tag and the message m concatenated zero and it's easy to see why that's the case basically To be absolutely clear here we have our message m which after padding becomes m000 say we have to add three zeros to it and here we have the message m0 m that ends with zero and after padding we basically now only have to add two zeros to it and lo and behold they become the same pad so that they're going to have exactly the same tag which allows the adversary to
mount the Existential forgery attack so this is not a good idea in fact appending all zeros is a terrible idea and if you think about concrete case where this comes up imagine in the automatic clearinghouse system used for clearing checks i might have a check for a hundred dollars and a tag on that check well now the attacker basically could append a zero to my check and make it a check for a thousand dollars And that wouldn't actually change the tag so this ability to extend the message without changing the tag actually could have pretty
disastrous consequences so i hope this example convinces you that the padding function itself must be a one-to-one function in other words it should be the case that two distinct messages always map to two distinct padded messages we shouldn't actually have a collision On the padding function another way of saying it is that the padding function must be invertible that guarantees that the padding function is one to one so a standard way to do this was proposed by the international standards organization iso what they suggested is basically let's append the string one zero zero zero zero
to the end of the message to make the message be a multiple of the Block length now to see that this padding is invertible all we do is describe the inversion algorithm which simply is going to scan the message from right to left until it hits the first one and then it's going to remove all the bits to the right of this one including the one and you see that once we remove the pattern this way we obtain the Original message so here's an example so here we have a message where the last block happens
to be shorter than the block length and then we append the one zero zero string to it it's very easy to see what the pad is simply look for the first one from the right we can remove this pad and recover the original message back now there's one corner case that's Actually quite important and that is what do we do if the original message length is already a multiple of the block size in that case it's really very very important that we add an extra dummy block that contains the path one zero zero zero and
again i can't tell you how many products and standards have actually made this mistake where they didn't add a dummy block and As a result the mac is insecure because there's an easy existential forgery attack let me show you why so suppose in case the message is a multiple of the block length suppose we didn't add a dummy block and we literally mapped this message here well the result now is that if you look at the message which is a multiple of the block size and a message which is not a multiple of The block
size but is padded to the block size and imagine it so happens that this message m prime one happens to end with one zero zero at this point you realize that here the original message here let me draw it this way you realize that the original message after padding would become identical to the second message that was not padded at all and as a Result if i asked for the tag on this message over here i would obtain also the tag on the second message that happened to end in one zero zero okay so if
we didn't add the dummy block basically again the pad would be non-invertible because two different messages two distinct messages happen to map to the same padded results again as a result the mac becomes Insecure so to summarize this iso standard is a perfectly fine way to pad except you have to remember to also add the dummy block in case message is a multiple of the block lens to begin with now some of you might be wondering if there's a padding scheme that never needs to add a dummy block and the answer is that if you
look at the deterministic padding function Then it's pretty easy to argue that there will always be cases where we need to pad and the reason is just literally the number of messages there are a multiple of the block length is much smaller than the total number of messages that need not be a multiple of the block length and as a result we can't have a one-to-one function from this bigger sets of all messages to the smaller set of messages which are a Multiple of the block length there will always be cases where we have to
extend the original message and in this case that would correspond to adding this padami padding block however there is a very clever idea called cmac which shows that using a randomized padding function we can avoid having to ever add a dummy block and so let me explain how cmac works So cmac actually uses three keys and in fact sometimes this is called a three key construction so the first key k is used in the cbc the standard cbc mac algorithm and then the keys k1 and k2 are used just for the padding scheme at the
very very last block and in fact in the cmax standard the keys k1 k2 are derived from the key k by some sort of a pseudorandom generator So the way cmap works is as follows well if the message happens to not be a multiple of a block length then we append the iso padding to it but then we also xor this last block with a secret key k1 that the adversary doesn't know however if the message is a multiple of the block length then of course we don't append anything to it but we xor it
with a different key k2 that again the adversary doesn't Actually know so it turns out just by doing that it's now impossible to apply the extension attacks that we could do on the cascade function and on raw cbc because the poor adversary actually doesn't know what is the last block that went into the function he doesn't know k1 and therefore he doesn't know the value at this particular point and as a result he can't do an extension attack in fact this is a provable Statement so that's this construction here simply by xoring k1 or x3
and k2 is really a prf despite not having to do a final encryption step after the raw cbc function is computed so that's one benefit that there's no final encryption step and the second benefit is that we resolve this ambiguity between whether padding did happen or padding didn't happen by using two different keys To distinguish the case that the message is a multiple of the block length versus the case where it's not but we have a pad appended to the message so the two distinct keys resolve this ambiguity between the two cases and as a
result this padding actually is sufficiently secure and as i said there's actually a nice security theorem that goes with c-mac that shows that the c-mac construction really is a pseudo-random function With the same security properties as cbc mac so i wanted to mention that cmac is a federal standard standardized by nist and if you now these days wanted to use a cbc mac for anything you would be actually using cmac as the standard way to do it particularly in cmac the underlying block cipher is aes and that gives us a secure cbc mac derived from
aes So that's the end of the segment and in the next segment we'll talk about a parallel mac in the last two segments we talked about the cbc mac and nmac to convert a prf for small messages into a prf for much larger messages those two constructions were sequential in the sense that if you had multiple processors you couldn't make the construction work any faster in this segment we're going to look at a Parallel mac that also converts a small prf into a large prf but does it in a very parallelizable fashion in particular we're
going to look at a parallel mac construction called pmac that uses an underlying prf to construct the prf for much larger messages in particular the prf can process much longer messages that can have variable length and have as many as l blocks In them now the construction works as follows we take our message and we break it into blocks and then we process each block independently of the other so the first thing we do is we evaluate some function p and we xor the result into the first message block and then we apply our function
f using a key k1 we do the same for each one of the message blocks and you notice that we Can do it all parallel all message blocks are processed independently of one another and we collect all these results into some final xor and then we encrypt one more time to get the final tag value now for a technical reason actually on the very last block we actually don't need to apply the prff but as i said this is just for a technical reason And i'm going to ignore that for now now i want to
explain what the function p is for and what it does so imagine just for a second that the function p isn't actually there that is imagine we actually directly feed each message block into the prf without applying any other processing to it then i claim that the resulting mac is completely insecure and the reason is that essentially no order is enforced Between the message blocks in particular if i swap two message blocks that doesn't change the value of the final tag because the xor is commutative the tag will be the same whether we swap the
blocks or not as a result an attacker can request the tag for a particular message and then he obtains a tag for a message where two of the blocks are swapped and that counts as an Existential forgery so what this function p tries to do is essentially in force order on these blocks and you notice that the function takes first of all it's a heat function so it takes a key as inputs and second of all more importantly it takes the block number as inputs in other words the value of the function is different for
each one of the blocks and that's actually Exactly what's preventing this blocks swapping attack so the function p actually is a very easy to compute function essentially given the key and the message block all it is is just a multiplication in some finite field so it's a very very simple function to compute it adds very little to the running time of pmac and yet it's enough to ensure that the pmac is actually secure As we said the key for pmac is this pair of keys one key for the prf and one key for this masking
function p and finally i'll tell you that if the message length is not a multiple of the block length that is imagine the last block is shorter than full block length then pmac actually uses a padding that's similar to cmak so that there is no need for an additional dummy block ever So that's the pmac construction and as usual we can state its security theorem so the security theorem by now you should be used to it essentially it says that if you give me an adversary attacking pmac i can construct an adversary attack in the
underlying prf plus an additional error term and so since again the prf is secure we know that this term is negligible and so if we want this term to be Negligible we know that we need this error term to also be negligible here as usual q is the number of messages that are mapped using a particular key and l is the maximum length of all those messages and pmac is secure as long as this product is less than the square root of the block size so for a yes this would be 2 to the 128
at the square root therefore would be 2 To the 64. so the mac would be secure as long as q times l is less than 2 to the 64. and every time as it gets closer to that value of course you would have to change the key in order to continue making more messages so the main thing to remember is that pmac also gives us a prf and when it processes the message blocks independently of one another turns out that pmac also has a very Interesting property namely that pmac mac is incremental so let me
explain to you what that means so suppose the function f that's used to construct pmac is not just a prf but in fact a permutation prp so we can actually invert it when we need to now suppose we've already computed the mac for a particularly long message m And now suppose just one message block of this long message changes so here m1 is changed into m prime one but the remaining message blocks all remain the same for other macs like cbc mac even though only one message block changed you would have to recompute the tag
on the entire message recomputing the tag basically would take time that's proportional to the length of the message It turns out with pmac if we only change one block or a small number of blocks actually we can recompute the value of the tag for the new message very very quickly and let me ask you a puzzle to see if you can figure out how to do that yourself and remember the function f is a prp and therefore is invertible so let's see if you can figure out how to compute the mac and the new message
by yourself so it turns out this can be done and you Can quickly recompute the tag and the new message using this third line here so just to make sure we all see the solution let's quickly go back to the original diagram for pmac and i can show you what i mean so imagine this one message block changed into some other block say changed into m prime one then what we could do is we can take the tag on the original message before the change Was made so we can invert the function f and determine
the value before the function f was applied now well since we now have an xor of a bunch of blocks what we can do is we can cancel out the xor that came from the original message block basically by xoring this value that came from the original message block into this xor accumulator And then xoring again the value that would come from the new message block back into the xor accumulator and then apply the function f again and that would give us the tag for the new message where just one block was changed so in
symbols basically i wrote the formula over here you can see basically we decrypt the tag and then we xor with the block that comes from the original Message block we xor again with a block that comes from the new message block and then we re-encrypt the final xor accumulator to get the new tag for the message with the one block changed so that's kind of a neat property it kind of shows that if you have very large messages you can very quickly update the tag of course you will need the secret key to do the
update But you can quickly update the tag if just a small number of message blocks changed okay so that concludes our discussion of pmac and now i want to switch topics a little bit and talk about the concept of a one-time mac which is basically the analog of the one-time pad but in the world of integrity so let me explain what i mean by that so imagine we want to build a mac That is only used for integrity of a single message in other words every time we compute the integrity of a particular message we
also change the key so that any particular key is used for only for integrity of one message then we can define the security game is basically saying the attacker is going to see one message therefore we only allow him to do one chosen message attack So he gets to submit one message query and he is given the tag corresponding to that one message query and now his goal is to forge a message tag pair okay so you can see his goal is to produce one message tag pair that verified correctly and is different from the
pair that he was actually given as we'll see just like the one-time pad and stream ciphers were quite useful It turns out one-time acts are also quite useful for the same applications where we only want to use a key to encrypt or to provide integrity for just a single message so as usual we would say that a one-time mac is secure if basically no adversary can win this game now the interesting thing is that one-time max just like the one-time pad can be secure against infinitely powerful adversaries and not only that Because they're only designed
to be secure for one-time use they can actually be faster than macs that are based on prfs and so i just wanted to give you a quick example of one one-time mac this is a classic construction for a one-time mac and let me show you how it works the first step is to pick a prime that's slightly larger than our block size in this case we're going to use 128-bit Blocks so let's pick the first prime that's bigger than 2 to the 128 this happens to be 2 to the 128 plus 51. and now the
key is going to be a pair of random numbers in the range 1 to r prime so 1 to q so we choose two random integers in the range 1 to q now we're given a message so we're going to take our message and break it into blocks where each block is 128 bits and we're going to regard each number as an Integer in the range 0 to 2 to the 128 minus 1. now the mac is defined as follows the first thing we do is we take our message blocks and we kind of construct
a polynomial out of them so if there are l blocks in our message we're going to construct a polynomial of degree l and you notice that the constant term of this polynomial is set to zero and then we define the mac very simply Basically what we do is we take the polynomial that corresponds to the message we evaluate it at the point k that's one half of our secret key and then we add the value a which is the second half of our secret key and that's it that's the whole mac so just basically construct
the polynomial that corresponds to our message evaluate that polynomial at a half of The secret key and add the other half of the secret key to the result and of course reduce the final result modulo q okay so that's it so that's the whole map it's a one-time secure mac and the way we argue that this mac is one time secure essentially by arguing that if i tell you the value of the mac for one particular message that tells you nothing at all About the value of the mac add another message and as a result
even though you've seen the value of the mac on a particular message you have no way of forging this mac on some other message now i should emphasize that this is a one time mac but it's not two times secure in other words if you get to see the value of the mac on two different messages that actually completely compromises the Secret key and you can actually predict the mac for a third or fourth message of your choice so then the mac becomes forgeable but for one time use it is a perfectly secure mac and
i'll tell you that in fact this is a very fast mac to evaluate so now that we've constructed one time max it turns out there's actually a general technique that will convert one time max into many time acts and i Just wanted to briefly show you how that works so suppose we have our one-time mac let's call it s and v for signing and verification algorithms and let's just assume that the tags themselves are in-bit strings in addition let's also look at a prf a secure prf that also happens to output n bit strings but
also takes as inputs in bit strings let's now define a general construction For a mac these macs are called cardo regband max that works as follows basically what we would do is we would apply the one time mac to the message m and then we're going to encrypt the results using a prf so how do we encrypt the result well we choose a random r and then we compute kind of a one-time pad from this r by applying the prf to it And then we xor the result with the actual one-time mac so the neat
thing about this construction is that the fast one-time mac is applied to the long message which could be gigabytes long and the slower prf is only applied to this nonce r which is then used to encrypt the final result of the mac and you can argue that if the mac that Was given to us as a building block is a one-time secure mac and the prf is secure then in fact we get a mini time secure mac that happens to output two n-bit tags so we're going to see carter wagon max later on when we
talk about authenticated encryption and in fact one of the nist standard methods for doing encryption with integrity uses a cardragman mac for providing integrity I wanted to mention that this carter wagon mac is a good example of a randomized mac where this nonce r is chosen afresh every time the tag is computed and so for example if you try to compute tag for the same message twice each time you'll choose a different r and as a result you'll get different tags both times and so this is a nice example of a mac that's actually not
a pseudo-random function not a prf Because a single message could actually be mapped to many different tags all of which are valid for that one message to conclude our discussion of the cardo ragman mac let me ask you the following question here we have the equation for the cardo wagon mac as usual you see the nonce r is part of the mac and the second part of the mac i'm going to denote by t this is basically the one time mac Applied to the message m and then encrypted using the pseudo-random function applied to the
nonce r so we'll denote the result of this xor by t so my question to you is given the carter wegman mac pair r comma t for a particular message m how would you verify that this mac is valid and recall that this algorithm v here is the verification algorithm for the underlying one-time mac so this Is the right answer and to see why just observe that this xor here decrypts the quantity t to its plaintext value which is basically the original underlying one-time mac and so we can directly feed that into the verification algorithm
for the one-time mac the last type of mac i wanted to tell you about is one that's very popular in internet protocols it's called the hmac but before we talk about hmac we have to Talk about hash functions and in particular collision resistant hash functions and we're going to do that in the next module so this is the end of our first module on macs and i wanted to point out that there's really beautiful theory that went into constructing all the macs that we saw i kind of gave you the highlights and showed you the
main constructions but there's really quite a bit of theory That goes into constructing these maps and if you'd like to learn more about that i kind of listed a couple of key papers that you might want to look at let me quickly tell you what they are the first one is what's called the three key construction which is the basis of cmac a very elegant paper that gives a very efficient construction out of cbc mac the second paper is a more technical Paper but basically shows how to prove the bounds of cbc mac as a
prf the third paper talks about pmac in its construction again a very acute paper fourth paper talks about security of nmac and hmac as well hmac we're going to cover in the next module the last paper i listed asks an intriguing question recall that all of our constructions we Always assumed that aes is a pseudo-random function for 16 byte messages and then we built a pseudorandom function and therefore a mac for much longer messages this paper says well what do we do if aes is not a pseudorandom function but still satisfies some weaker security property
called an unpredictable function and then they ask if aes Is only an unpredictable function but not a pseudorandom function can we still build max for long messages and so they succeed in actually giving constructions just based on the weaker assumption that aes is an unpredictable function but their constructions are far less efficient than cbc mac or nmac or pmac that we discussed in these segments and so if you're interested in a different perspective on how to build Max from block ciphers like aes please take a look at this paper and there are actually some nice
open questions to work on in this area so this concludes our first segments on integrity and in the next segment we're going to talk about collision resistance in this module we're going to talk about a new concept called collision resistance which plays an important role in providing message integrity our end goal is to describe a very Popular mac algorithm called hmac that's widely used in internet protocols hmac itself is built from collision resistant hash functions before we do that let's do a quick recap of where we are in the last module we talked about message
integrity where we said that a mac system is secure if it is existentially unforgible under a chosen message attack this means that even an attacker who is Given the tag on arbitrary messages of his choice cannot construct a tag for some new message then we showed that in fact any secure prf immediately gives us a secure mac and so then we turned around and said well can we build secure prfs that take large messages as inputs and so we looked at four constructions the first construction was based on cbc we call when we looked at
two variants Of it one called encrypted cbc and one called cmac and we said that these are commonly used with aes in fact in the 802.11i standard a cbc mac is used for message integrity in particular with the aes algorithm we looked at another mode called nmac which also converts a prf for short inputs into a prf that's capable of taking very large messages As inputs and these two were both sequential macs we then looked at the parallelizable mac called pmac which again was able to convert a prf for small inputs into a prf for
very large inputs but it did so in a parallel fashion so a multi-processor system would be more efficient with pmac than say with cbc mac all three of these built a mac for large messages By constructing a prf for large messages and then we looked at the carter wagon mac which actually is not a prf it's a randomized mac so a single message could actually have many many different valid tags and therefore cardo wagman mac is actually not a prf and if you remember the cardinal wagon mac was built by first of all taking the
bulk message and hashing it down to a small tag Using a fast one-time mac and then encrypting that tag using a prf the benefit of the cardo wagon mac was that as we said the hashing of the bulk message is done using a fast one-time mac and then in this module we're going to construct max from new concept called collision resistance and so the first thing we're going to do is construct collision resistant hash functions So let's first of all start by defining what does it mean for a hash function to be collision resistance so
think of a hash function from some message space to a tag space t and you should be thinking of the message space as much much bigger than the tag space so the messages could be gigabytes long but the tags would only be like 160 bits now a collision for the function h is a pair of messages m0 M1 that happen to be distinct however when you apply the function h to them you end up with the same output so the image you should have in your mind is essentially there are two inputs m0 and m1
they belong to this gigantic message space however when we apply the hash function to them they happen to collide and they both map to the same output in the tag space now we say that the function h is Collision resistant if it's hard to find collisions for this function now this should seem a little bit counterintuitive because we know that the output space is tiny compared to the input space so by the pigeonhole principle there must be lots and lots and lots of messages that map to the same output just because there isn't enough space
in the output space to accommodate all the messages without collisions And so we know that there are lots of collisions and the question is is there an efficient algorithm that finds any such collisions explicitly so we say that the function is collision resistant if for all explicit efficient algorithms a then these algorithms are not able to print a collision for the function h okay and as usual we'll define the advantage as the probability that the Algorithm a is able to output a collision now i'm not going to formalize a term explicit here all i'll say
is that it's not enough to just say that an algorithm exists in an algorithm that simply prints a collision because we know many collisions exist what we actually want is an explicit algorithm that we can actually write down and run on a computer to Generate these collisions there are ways to formalize that but i'm not going to do that here now a classic example of a collision resistant hash function is shot 256 which happens to output 256 bits but can take arbitrary large inputs for example it can take gigabytes and gigabytes of data and it'll
map it all to 256 bits and yet nobody knows how to find collisions for this particular function So just to show you that this concept of collision resistance is very useful let's see a very quick application for it in particular let's see how we can trivially build a mac given a collision resistant hash function so suppose we have a mac for short messages so you should be thinking something like aes which can max 16 byte messages and then suppose we have a hash function a collision resistant hash function From large message space that contains gigabyte
messages into our small message space say into 16 byte outputs then basically we can define a new mac let's call it i big which happens to be macking large messages and we'll define it simply by applying the small mac to the output of the hash function and how do we verify a mac Well basically given a tag we verify it by rehashing the given message and then checking that a small mac actually verifies under the given tag okay so this is a very simple way to show you how collision resistance can take a primitive that's
built for small inputs and expand it into a primitive that's built for very large inputs and in fact we're going to see this Not just for max later on when we talk about digital signatures we're going to do the same thing we're going to build a digital signature scheme for small inputs and then we're going to use collision resistance to expand the input space and make it as large as we want so the security theorem basically is in some sense trivial here basically it says if the underlying mac is secure And h is collision resistance
then the combination which can actually mack large messages is also a secure mac and as a quick example let's apply this to aes so let's use the one example that we mentioned shot to 56 so in this case shot to 56 outputs 256 bit outputs which is 32 bytes so we have to build a mac that can make 32 byte messages and the way we could do that is basically by applying The 16 byte aes plugging it into a two block cbc a two block cbc would expand aes from a prf on 16 bytes to
a prf on 32 bytes and then take the output of shot to 56 and plug it into this two block cbc based on as and then we get a very very simple mac which is secure assuming aes is a prf and shot 256 is collision resistance so One thing i wanted to point out is that in fact collision resistance is necessary for this construction to work so in fact correlation resistance is not just a made up term collision resistance really is kind of uh the essence of why this combined mac is secure and so let's
just assume for a minute that the function h the hash function we're using is not collision resistance in other words There is an algorithm that can find two distinct messages that happen to map to the same output in this case the combined mac is not going to be secure because what the adversary can do is simply use the chosen message attack to get a tag or m0 and then output m1 comma that tag as a forgery and indeed t is a valid mac for m1 because h of m1 Happens to be equal to h of
m0 and so in doing so just with a one chosen message attack the attacker was able to break this combined mac simply because the hash function was not collision resistant so it should be again i want to emphasize that if someone advertised even one collision for sha-256 you know two messages just one pair of messages That happen to have the same output under sha-256 that would already make this construction insecure because an attacker could then ask for the tag on one message and in doing so he would obtain the tag on the other message as
well and that would count as an existential forgery okay so already we see the collision resistance is a very useful primitive In that it lets us expand the input space of cryptographic primitives i want to show you one more application where collision resistance is directly used for message integrity imagine again we have files that we want to protect let's imagine these files are actually software packages that we want to install on our system so here are three different software packages you know maybe one is gcc One is emacs and another one is i don't know
vi now the user wants to download the software package and he wants to make sure that he really did get a version of the package that he downloaded and it's not some version that the attacker tampered with it modified its content so what he could do is basically refer to a read-only public space That's relatively small all it has to do is hold small hashes of these software packages so there isn't a lot of space needed here the only requirement is that this space is read only in other words the attacker cannot modify the hashes
stored in the space and then once he consults this public space he can very easily compute the hash of a package that he Downloaded compare it to the value in the public space and if the two match then he knows that a version of the package you downloaded is in fact a correct one why is that true well because the function h is collision resistance the attacker cannot find an f1 hat that would have the same hash as f1 and as a result the attacker cannot modify f1 without being detected because There's no way that
the hash of his f1 hat would map to the value that's stored in the public space so the reason i brought up this example is i wanted to contrast this with the mac example we kind of saw a similar situation with max in the mac example we needed a secret key to verify the individual file tags but we didn't need a resource that was a read-only public space With collision-resistant hashes we kind of get the exact complement where in fact we don't need a key to verify anyone can verify you don't need a secret key
for it however now all of a sudden we need this extra resource which is some space that the attacker cannot modify and in fact later on we're going to see that with digital signatures we can kind of get to the best of both Worlds where we have both public verifiability and we don't need a read-only space but so far with either max or collision resistance we can have one but not the other so i'll tell you that in fact this kind of scheme is very popular in fact linux distributions often use public spaces where they
advertise hashes of their software packages and anyone can make sure That they downloaded the right software package before installing it on a computer so this is in fact something that's used quite extensively in the real world okay so in the next segment we'll talk about generic attack on collision resistance and then we'll go ahead and build collision resistant hash functions the next thing i want to do is show you the general attack on collision resistant hash functions if you remember When we talked about block ciphers we saw a general attack on block ciphers which we
called exhaustive search and that attack forced the key size for a block cipher to be 128 bits or more similarly on collision resistance there's a general attack called the birthday attack which forces the outputs of the collision resistant hash functions to be more than a certain bound and so let me show you the attack and then We'll see what those bounds come out to be so here's a general attack that can work on arbitrary collision resistant hash functions so here we have our collision resistant hash function supposedly let's suppose that it outputs n bit values
in other words the output space is roughly of size 2 to the n now the message space is going to be much much larger than n bits let's just say that the messages that Are being hashed are say you know 100 times n bits i want to show you an algorithm that can find the collision for this hash function h in time roughly 2 to the n over 2. okay so roughly the square root of the size of the output space so here's how the algorithm is going to work what we'll do is we'll choose
random 2 to the n over 2 messages in our message space Let's call them m1 to m2 to the n over 2. now because the messages themselves are much bigger than n bits there are 100 times n bits it's very likely that all these messages are distinct so they'll be distinct with high probability but for each one of these messages we're going to apply the hash function and obtain a tag t sub i so this is of course the t Sub i's are n bit long strings and now we're going to look for a collision
among the t sub i's in other words we're going to find an i and a j such that t sub i equals the t sub j and once we've done that we basically found the collision because as we said with very high probability mi is not equal to mj but the hash of mi is equal to the hash of mj and therefore we found the Collision on the function h now if it so happens that we look through all 2 to the n over 2 t sub i's and we don't find the collision we go
back to step 1 and try another set of 2 to the n over 2 messages so the question is how well will this work in other words how many times do we have to iterate this process until we actually find the collision and i want to show you that in fact the Number of iterations is going to be very very small which means that this algorithm will find the collision in time that's roughly proportional 2 to the n over 2. so to analyze this type of attack i have to tell you a little bit about
the birthday paradox i imagine some of you have already heard of the birthday paradox here stated as a theorem and i want to Prove it to you because everybody should see a proof of the birthday paradox at least once in their lives so here it is so imagine we have n random variables r1 to rn in the interval 1 to b and the only thing i'm going to assume about them is that they're actually independent of one another that's crucial that these n samples r1 to n to rn in this interval Are independent of one
another and they also happen to be distributed identically so for example they might all be uniform in the interval 1 to b but again these would be independently uniform variables now it so happens that if we set n to be about the square root of b in other words if we sample roughly square root of b samples from the interval 1 to b you know to be precise it's 1.2 Times the square root of b then the probability that two of those samples will be the same is at least a half okay and then it
turns out in fact the uniform distribution is the worst case for the birthday paradox in other words if the distribution from which the ris are sampled from is non-uniform then in fact fewer than 1.2 times square root of b samples are needed the uniform Distribution is the worst case so we're going to prove this for the uniform distribution and that basically just also proves it for all other distributions but the proof that i want to show you here will hold just for the uniform distribution okay so let's do the proof it's actually not difficult at
all so we're asking what is the probability that there exists an i that's not equal To j such that ri is equal to rj well let's negate that so that's basically 1 minus the probability that for all i not equal to j we have that ri is not equal to r j this basically means that no collision occurred among the n samples that we chose well let's try to write this out more precisely well we're going to write this as 1 minus and now when we choose r1 basically It's the first one we choose so
it's not going to collide with anything but now let's look at what happens when we choose r2 when we choose r2 let me ask you what is the probability that r2 does not collide with r1 well r1 takes one slot so there are b minus one slots that if r2 falls into it's not going to collide with r1 so in other words the probability that r2 does not collide with r1 Is b minus 1 slots divided by all b possible slots similarly when we pick r3 what is the probability that r3 does not collide with
either r1 or r2 again r1 and r2 take up two slots and so there are b minus two slots that remain for r3 if it falls into either one of those b minus two slots it's not going to collide with either r1 or r2 so i imagine you see the pattern now so R4 its probability of not colliding with r1 r2 or r3 is b minus 3 over b and so on and so forth until we get to the very last rn and the probability that rn does not collide with the previous ris well there
are n minus one slots taken up by the previous ris so if rn falls into any of the remaining b minus n plus one slots it's not going to collide with any of the previous r1 to rn minus 1. now you notice that the reason i was able to multiply all these probabilities is exactly because the ris are all independent of one another so it's crucial for the step that the ris are independent so let me rewrite this expression a little bit let me write it as 1 minus the product of i goes from 1
to n minus 1 of 1 minus i over b okay all i did is I just rewrote this as a big product as opposed to writing the terms individually so now i'm going to use a standard inequality that says that for any positive x 1 minus x is less than e to the minus x and that's actually easy to see because e to the minus x if you look at this taylor expansion is 1 minus x plus x squared over 2 minus and so on and so forth and so you can see that we're basically
Ignoring this ladder part of the taylor expansion which happens to be positive and as a result the left side here is going to be smaller than the right hand side okay so let's plug this inequality in and what do we get we get that this is greater than 1 minus the product of i goes from 1 to n minus 1 of e to the minus i over b okay simply plugged in x equals i over b For each one of those terms now the nice thing about exponentials is that we multiply them the exponents add
as a result this is simply equal to 1 minus e to the power of here let me take the 1 over b out of the parentheses sum of i goes from 1 to n minus 1 of i okay so all i did is i took the minus 1 over b out of the parentheses and we're left with a simple sum of 1 to n minus 1. and so the Sum of the integers from 1 to n minus 1 is simply n times n minus 1 over 2 which i can bound by n squared over 2
and so really what i get at the end here is 1 minus e to the power of minus n squared over 2b okay i literally bounded the sum here by n squared over 2. okay very good now so what do we know about n squared over 2b well we can derive exactly what n Squared over 2b is from the relationship here so if you think about it let's look at n squared over 2 well n squared over 2 is 1.2 squared over 2. 1.2 squared is 1.44 divided by 2 is 0.72 times the square root
of b squared which is b okay so n squared over 2 is 0.72 b and as a result n squared over 2b Is just 0.72 so we get 1 minus e to the power of minus 0.72 well so now you just plug this into