anytime something new comes along there's always going to be somebody that tries to break it AI is no different and this is why it seems we can't have nice things in fact we've already seen more than 6,000 research papers exponential growth that have been published related to adversarial AI examples now in this video we're going to take a look at six different types of attacks major classes and try to understand them better and then stick around to the end where I'm going to share with you three different resources that you can use to understand the
problem better and build defenses so you might have heard of a SQL injection attack when we're talking about an AI well we have prompt injection attacks what does a prompt injection attack involve well think of it is sort of like a social engineering of the AI so we're convincing it to do things it shouldn't do sometimes it's referred to is jailbreaking but we're basically doing this in one of two ways there's a direct injection attack where we have an individual that sends a command into the AI and tells it to do something pretend that this
is the case uh or I want you to play a game that looks like this I want you to give me all wrong answers these might be some of the things that we inject into the system and because it's wanting to please it's going to try to do everything that you ask it to unless it's been explicitly told not to do that it will follow the rules that you've told it so you're setting a new context and now it starts operating out of the context that we originally intended it to and that can affect uh
the output another example of this is an indirect attack where maybe I have the AI I send a command or the AI is designed to go out and retrieve information from an external Source maybe a web page and in that web page I've embedded my injection attack that's where I say now pretend that you're going to uh give me all the wrong answers and do something of that sort that then gets consumed by the AI and it starts following those instructions so this is one major attack in fact we believe this is probably the number
one set of attacks against large language models according to the OAS report that I talked about in a previous video what's another type of attack that we think we're going to be seeing in fact we've already seen examples of this uh to date is infection so we know that you can infect a Computing system with malware in fact you can infect an AI system with malware as well in fact you could use things like Trojan horses or back doors things of that sort that come from your supply chain and if you think about this most
people are never going to build a large language model because it's too computer intensive requires a lot of expertise and a lot of resources so we're going to download these models from other sources and what if someone in that supply chain has infected one of those models the model then could be suspect it could do things that we don't intend it to do and in fact there's a whole class of Technologies uh machine learning detection and response capabilities because it's been demonstrated that this can happen these Technologies exist to try to detect and respond to
those types of threats another type of attack class is something called evasion and in evasion we're basically modifying the inputs into the AI so we're making it come up with results that we were not wanting an example of this that's been cited in many cases was a stop sign where someone was using a self-driving car or a vision related system that was designed to recognize street signs and normally it would recognize the stop sign but someone came along and put a small sticker something that would not confuse you or me but it confused the AI
massively to the point where it thought it was not looking at a stop sign it thought it it was looking at a speed limit sign which is a big difference and a big problem if you're in a self-driving car that can't figure out the difference between those to so sometimes the AI can be fooled and that's an evasion attack in that case another type of attack class is poisoning we poison the data that's going into the AI and this can be done intentionally by someone who has uh the you know bad purposes in mind in
this case if you think about our data that we're going to use to train the AI we've got lots and lots of data and sometimes introducing just a small error small factual error into the data is all it takes in order to get bad results in fact there was one research study that came out and found that as little as 0.001% of error introduced in the training data for an AI was enough to cause results to be anomalous and be wrong another class of attack is what we refer to as extraction think about the AI
system that we built and the valuable information that's in it so we've got in this system potentially intellectual property that's valuable to our organization we've got data that we may be used to train and tune the models that are in here we might have even built a model ourselves and all of these things we consider to be valuable assets to the organization so what if someone decided they just wanted to steal all of that stuff well one thing they could do is a set of extensive queries into the system so maybe I I ask it
a little bit and I get a little bit of information I send another query I get a little more information and I keep getting more and more information if I do this enough and if I I fly sort of Slow and Low below radar no one sees that I've done this in enough time I've built my own database and I have B basically uh lifted your model and stolen your IP extracted it from your AI and the final class of attack that I want to discuss is denial of service this is basically just overwhelm the
system I send too many requests there may be other types of this but the most basic version I just send too many requests into the system and the whole thing goes boom it cannot keep up and therefore it denies access to all the other legitimate users if you've watched some of my other videos you know I often refer to a thing that we call the CIA Triad it's confidentiality integrity and availability these are the focus areas that we have in cyber security we're trying to make sure that we keep this information that is sensitive available
only to the people that are justified in having it and integrity that the data is true to itself it hasn't been tampered with and availability that the system still works when I need it to well in it security generally historically what we have mostly focused on is confidentiality and availability but there's an interesting thing to look at here if we look at these attacks confidentiality well that's definitely what the extraction attack is about and maybe it could be an infection attack if that infects and then pulls data out through a back door but then let's
take a look at availability well that's basically this denial of service is an availability attack the others though this is an Integrity attack this could be an Integrity attack this is an Integrity attack this is an Integrity attack so you see what's happening here is in the era of AI Integrity attacks now become something we're going to have to focus a lot more on than we've been focusing on in the past so be aware now I hope you understand that AI is the new attack surface we need to be smart so that we can guard
against these new threats and I'm going to recommend three things for you that you can do that will make you smarter about these attacks and by the way the links to all of these things are down in the description below so please make sure you check that out first of all a couple of videos I'll refer you to one that I did on securing AI business models and another on the xforce threat intelligence index report both of those should give you a better idea of what the threats look look like and in particular some of
the things that you can do to guard against those threats the next thing download our guide to cyber security in the era of generative AI That's a free document that will also give you some additional insights and a point of view on how to think about these threats finally there's a tool that our research group has come out with that you can download for free and it's called the adversarial robustness toolkit and this thing will help you test your AI to see if it's susceptible to at least some of these attacks if you do all
of these things you'll be able to move into this generative AI era in a much safer way and not let this be the expanding attack surface thanks for watching please remember to like this video And subscribe to this channel so we can continue to bring you content that matters to you