Gru drops and crushes the Chad bot Arena leaderboards Landing in the number two spot competing head-to-head with Google and open Ai and its smaller cousin the gru mini landed at a very respectable number five some people are concerned about this ai's dangerous potential quote billionaire and Tech entrepreneur Elon Musk once again sparked controversy meanwhile the team behind grock is moving very very fast eager babushkin a lead developer on the xai team said the following gr mini is now 2x faster than it was yesterday in the last 3 days Alam Jang and Maliki s rewrote our
entire inference stack from scratch using SG Lang this is also allowed us to serve the big Gro 2 model which requires multihost inference at a reasonable speed is Elon going full demon mode once again let's take this baby out for a spin and see what the fuss is all about please hit the like button if you actually know what grock means and let's get started all right let's test out grock 2 to see if it can well grock so as you can see here this is the grock to Beta and let's test it with some
hard-hitting questions are you ready let's make it sweat a little first let's see if you can tell it's left from its right suppose I'm in the middle of South Dakota and I'm looking straight down towards the center of Texas is Boston on my left or my right let's Gro something all right so it says if you're in the middle of South Dakota looking straight down towards the center of Texas Boston would be on your left here's a quick way to visualize it South Dakota is roughly in the north central part of the US Texas is
to the South more specifically South and slightly to the west from South Dakota Boston is on the East Coast which from your perspective in South Dakota would be towards the left when you're facing south towards Texas that's fishlyn are true is is here somewhere on the East Coast it is known so if you're standing in what do we say South Dakota you're standing there looking down right you're looking towards Texas so you're facing this way where is Boston well it would be on your left right if Boston is here somewhere it's it's on your left
perfect and notice it it's not just trying to guess it's actually breaking it down describing where everything is how it kind of relates to each other and then concluding what the answer is by the way stick around to the end cuz I'll show you what makes this thing tick what's under the hood if you will we'll dig into its system prompt to see what we can find out about how it works all right next let's see if you can grock this you're given the following premises someone who lives in dread buy Mansion killed ant Agatha
T the plot thickens the only people who live in dread Berry Mansion are Aunt Agatha the Butler and Charles a killer always hates his victims and is never richer than his victims so that's your kind of set of constraints to many people at least the first one Charles hates no one that Aunt Agatha hates Aunt Agatha hates everyone except the butler sounds like a lovely lady the butler hates everyone not richer than Aunt Agatha the butler hates everyone ant Agatha hates and ant Agatha is not the butler that's interesting okay I assume there's there's three
people what okay so I'm assuming none of them are somebody else but okay on the basis of this information determine who killed ant and give a detailed proof that your conclusion follows from the premises can you figure it out faster than Gro let's see all right so they analyzing step by step beginning with Aunt Agatha's dead and she lived in dread brury mansion with the Butler and Charles so far so good I think so I'm not going to read the entirety but this number three and Agatha hates everyone except the butler what I really like
is that they they've numbered sort of the premises premise number five right and they've uh they're listing it here to kind of like we believe this because of this and then so the concl makes is so she hates Charles keep this in mind cuz this is where things might go off the rails a little bit so it thinks through every single thing and kind of links it to the premises and then given those points they're saying if Charles were the killer he would have to hate ant Agatha and they link the the premise here and
interestingly kind of stumbles upon a clue that's not very obvious from the uh puzzle so Charles would have to hate ant Agatha by permise three but we know from premise 4 that Charles doesn't hate anyone and Agatha hates which indirectly suggests he might not hate hate ant Agatha since hating her would mean he hates someone she hates which is himself so I don't know yet if it it got the problem right or not but the fact that it stumbled upon this sort of clue that I mean certainly they can hate themselves that's a key to
the puzzle so the fact that it kind of like figured that out bodess well for it creating a contradiction unless we interpret it differently however directly doesn't prevent him from hating Aunt Agatha herself so here it gets a little bit weird right so they're saying if the butler were the killer they kind of give the reasoning but they're saying and akatha is not the butler so self-killing is ruled out why they ruled out self-killing I mean this this seems like a mistake it seems like it was on track and I really like a lot of
the reasoning that goes into it right so they're saying this must be true unless you know maybe we misinterpreted the wealth condition so it's kind of leaving room for potential error Mis interpretation Etc and it just drops the ball here it was so close but it's it seems to me like it's saying ant Agatha is not the butler therefore self-killing is ruled out that doesn't make sense yeah basically the solution here is and Agatha checked herself out of dread byy mantion I'm trying not to use any specific keywords here so you know there are three
people and Agatha the Butler and Charles a killer always hates his victims or her victims and is never richer than his victim so and Agatha fits that she hated herself and she can't be richer than herself and Agatha hates everyone except the butler so this is where you make the connection that she hates herself right the butler hates everyone not Richard than an Agatha everyone but ant Agatha is not richer than ant Agatha so basically the answer is she did it herself she hated herself and uh that fits all the categories well let me ask
a follow-up question why is uh self-killing ruled out let's ask at that maybe that gives it a little of a kick in the butt to see if it maybe changes what it's thinking so premise one somebody who lives in dread byy Manion killed and Agatha this implies an external agent the language you suggests another person is the killer it it it really doesn't a different way of stating this is the person that killed her lived at the Mansion so it doesn't mean she lived at the Mansion she fits that description and here again it says
andt Agatha is not the butler the premise provided to clarify that Aunt Agatha and the butler are distinct individuals further implying that someone else not herself must be the killer this doesn't make any sense okay so it fails you know it got tripped up by this she's not the butler which almost seems to be thrown in there as a like a red herring to kind of like lead the person astray cuz it's like well no I we're assuming they're three individuals are separate individuals and here saying that usually in Murder Mysteries or or puzzles like
this the term killed implied an action taken by one person against another but that's kind of the whole trick here is like if you can eliminate everyone else do you take that logical leap to maybe expand you know your mind a little bit to maybe say okay what if we expand it a little bit what that word means certainly you can use that word to describe what she did right who killed her she killed herself like that's not a logical fallacy I don't think all right well I'll give it a thumbs down those two although
I got to say the whole showing its work and reasoning it did look pretty good there was a number of things in there where it definitely seemed to catch on to certain things it just couldn't quite connect it here's another excellent question to ask this is referred to as the Wasson selection problem and it purposfully tries to trick human beings into getting the wrong answer because of our biases our biases is that we want to be right we want to be correct we don't want to be wrong even when it's advantageous to be wrong so
seven cards are placed on the table Each of which has a number on one side and a single colored patch on the other side so one side has a number the other one has a color the faces of the cards show 50 16 red yellow 23 green and 30 which card would you have to turn to test the truth of the proposition proposition is that if a card is showing a multiple of four then the color on the opposite side is going to be yellow so multiple of four is 4 8 12 16 Etc all
right so let's click grock something and see what it can grock so before we even go on to check the answers here in the reasoning it does sort of stumble upon kind of like the whole point of the problem is that if we're trying to test that proposition right so the point of it isn't to confirm what we know right so it's not that we turn over 16 if it's yellow we're like yay we're trying to disprove it right if we can turn something that if we can turn a card over that disproves that proposition
then we've tested if it's true or not and it's not true and by the way so this is the original kind of problem behind this this kind of modified problem that we chose so this is the Wason selection task and it goes like this each card has a number on one side and the color on the other which card or cards must be turned over to test the idea that if a card shows an even number on one face then its opposite faces blue and so the way to think about it for this problem so
the correct response is to turn over the eight card and the red card right because if the rule is if the card shows an even number on one face then the opposite face blue only a card with both an even number on one face and something other than blue on the other face can invalidate the rule so in other words we're looking for any card that can prove this rule false if we flip over the eight and it's not blue then the rule is false and if we flip over the red and it's even then
the rule is false those are the only two cards that can invalidate all right so grock very quickly GRS that it needs to falsify the proposition it's literally you know as it goes through each card it says you know flipping this with Ure the other side is in yellow to confirm the inverse of the proposition and his final answer is that we need to turn over 16 right to check if the other side is yellow since it's a multiple of four right so if this would invalidate the rule if it's not yellow and also yellow
because it would invalidate the rule if it's not a multiple of four but I think it erroneously also thinks that we need to flip over 50 and 30 to ensure their other sides are not yellow since they are not multiples of four that doesn't invalidate the rule nor here does it say that only multiples of four show yellow on the other side so I like a lot of the reasoning it did but not quite there GPT 40 also fails and let's check GPT 4 GPT 4 Nails it it has the correct reasoning right if the
conditions are met check if if the conditions are met and yeah I think this is as far as I can tell the correct answer so if I'm missing something let me know in the comments tell me if if I'm not getting this correct but those were would be the two that flipping them over if if it doesn't fit it would invalidate the rule for that's the only way to test it but again I found it interesting that GPT 40 got it wrong GPT 4 the Legacy model got it correctly H all right here's a doozy
are you ready back when GPT 4 first came out I spent hours trying to figure out why it couldn't solve this puzzle because it was so close but there were these little things that it just kind of failed at and I think the reason behind it is because it can't like once it starts Jing the answer it doesn't stop to kind of like reflect it's kind of like if you're playing chess you know how sometimes you can put a piece to where you're planning to move it and without letting go of your hand kind of
look around and examine the board with that piece in its new place and if you see some trap that you miss you go nope and you take it back the rules are generally if as long as you keep your hand on it you can take the piece back as long as you didn't let go of it so large language models currently can't do this they can't produce a potential answer and then think about it reflect on it and then change it if needed so things like crossword puzzles are near impossible unless you're using something like
tree of thoughts search which is like this complicated prompting mechanism but here's a killer problem to solve so we must arrange five pieces of furniture and we're kind of labeling those pieces of furniture as a through e so a b CDE e and we're putting them on a 3X3 Grid in accordance with the following constraints so a must not be adjacent to C so it can't be next to C nothing is to the right of e and if so this is an if then statement if d and a are not adjacent then B should be
in the middle so if those two are not adjacent then B should be in the middle D is above all others e and D are adjacent and here's an arrangement that that does not satisfy these constraints right and so they give you kind of this example right so if you think of 3x3 grid so it's 1 2 3 4 5 6 7 8 9 and sort of that's how the the furniture is arranged and this violates for instance the first constraint since A and C are adjacent and first constraint is a must not not be
adjacent to C and so we're asking can you print out a 3X3 Arrangement that does satisfy the five constraints let's grock something I'm very curious to see how it does cuz again remember the original GPT 4 was not able to solve this so let's solve this step by step D is above all others D must be in the top row so far so good nothing is to the right of e e must be in the right most column yeah I mean that's too restrictive I think it could be in the middle and just then nothing
goes in the rightmost column but okay e and D are adjacent since D is above and E is to the right d must be directly above e so for those of you keeping track at home here's our 3x3 grid so it sounds like it's grocking the fact that it's understanding the fact that right so D would be here e would be here right then nothing is to the right of e d would be above all others assuming the rest go here and D and E are adjacent and this is where GPT 4 had some problems
initially because this is basically like a room so you're kind of looking at it from the top down perspective so if two pieces of furniture are like here and here they are adjacent then it says a must not be adjacent to C this includes diagonally adjacent so this is where gbt 4 had a lot of problems it did not understand this idea of like diagonally adjacent and sometimes when you explain that in text they would get it and then if d and a are not adjacent then B should be in the middle this implies B
should be in the middle if we can't place a next to d so the only way we can place a adjacent to D by either putting it here or here we can't put it here because if it's adjacent B has to be in the middle so we can't put it there and we can't put it here because D has to be above all others so A and D cannot be adjacent therefore B should be in the middle so I'm just jotting those down so it hasn't yet reach that conclusion but it continues to start drawing
out the diagram and it puts d and e in the top which of course is not correct since D has to be above all others so it continues kind of uh going through it it and finally reaches the conclusion that this is the answer the problem is of course D has to be above all others your D is not above all others it's on the same level as c and e so we're going to say no and let's give it another shot at it we're going to say d is not above all others it's on
the same plane as C and try again place a d above all others meanwhile I'll try the same exact problem with GPT 4 the Legacy model and afterwards with GPT 40 GPT 4 has been updated did repeatedly since the original releasee so maybe it has become better at dealing with these sort of problems GPT 4 misses quite a few of the constraints weren't followed let's try GPT 40 and again it's incorrect there's this expression that's applied to people sometimes it's it's uh wrong and strong have you heard that saying about somebody they're wrong and strong
as in when they get something incorrectly they they really stick to their guns and like really double down on it and that's one problem with uh some of these large language models and it's it's that when they're wrong they're wrong and strong if e is in the left column nothing can beat to its right this is not correct gr 2 fails once again so again D is not above everything else I'm going to give a thumbs down but again this is so far I I haven't seen any model being able to figure this one out
all right next let's try this there are five square blocks stacked on top of one another you given the following information about them the second from the top block is green the fourth from the top block is not green assuming that the two premises hold disprove or else prove the following conclusion there's a Green Block directly on top of a non- Green Block so it starts out reasoning really good so block one is the top one block two is the green one this is what we've been given block three we don't know block four is
not green we know that and then you know block five is the bottom one so it goes through the scenario and concludes that yes given this information it proves that there must be a Green Block directly on top of a non-green block this is because there's no way to arrange the blocks without having at least one green block so block two or potentially block three directly above a non green block block three if it's not green or block four all right grock you've did very well congratulations that is a bingo all right let's test out
grock's two coding abilities show me what you've got let's rapidly run through some of them quickly iterate to see how well it can iterate on existing code so first and foremost create a snake game this in of itself isn't a a great prompt for testing but after seeing how incredibly well the new clae model was at coding the the one thing that it did so above and beyond anything else I've seen is its ability to iterate on the code so if you start with this simple thing and then adding features to it it's able it
was able to add features to it keep track of all the revisions that it's made and it did it flawlessly I was I was blown away all right so here it gives us the code and how to install it how to run it and how to play it let's try it out all right so it's asking us to pip install P game and so we shall and here's that game and we started by using the keyboard and the arrow keys to move around so it's working pretty well so far no complaints I can eat the
fruit and then if I bump into a wall I'm assuming I die press Q or C but this is where the fun begins we're going to add several features on top of this game let's add the following features please print out the entire code not just the modified Parts some of these models love doing that being just a little bit lazy you got to kind of kick them into gear so first we're going to make the snake grow 10 units for for each fruit eaten so that's going to allow us to test it faster two
make objects fall from the sky when they intersect the snake the last part of the snake gets chopped off so you have these little things falling from the sky and if the snake crosses it its tail gets chopped off all right so it begins writing it it's it's a little bit slow it's uh definitely taking its time I feel like a lot of the other models are faster at this point but looks like it's adding here a 5% chance chance to drop an object each frame then we're moving that object down by 5 pixels and
then remove the last segment if Collision occurs it's still going so it's a good 20 30 seconds now quite a bit of text and here it's changing the fact that now we're increasing it by 10 units when eating the food and it adds the Collision detection and very cool unfortunately when you copy it it does copy the text as well one cool thing about clad was that it it kind of creates what they call artifacts so it's kind of like a separate window that you can copy and paste out of I guess it's kind of
the same thing here cuz you can just copy this but cla's interface was definitely very cool all right and here it goes whoa that's a lot of falling things nope nope so there's a lot of things happening but not quite what we're looking for so basically we're not getting cut off oh and also I get this is not what we're supposed to be seeing okay all right so two problems this SN grows without eating food so that's one and two the falling objects do not cut the snake so let's see if it's a to fix
those things all right so it kind of troubleshoots it thinks about it okay looks like we only need to add segments when food is eaten and the Collision logic might be off or not properly removing segments so one thing that I'm noticing is like it's it's sort of like in between reasoning how it kind of thinks about stuff how it kind of like spits out reasoning before it's comes to the conclusion or before it writes the code so far has been really really good all right so here's the new and improved version so far so
good so as you can see here it's uh yep Bingo it got it it's eating the fruit it's increasing by I think 10 or close to it certainly the only thing is I'm not seeing it actually let me see if I can chop a part of the snake if I cross one of those things nope it does not do that it does not function as intended so let's try that again so we're going to say the snake now grows correctly the following objects still do not cut the snake so let's see if it's able to
correct that so multiple revisions now two Corrections and yes it's with these revisions when you really start noticing how I mean I wouldn't say it's slow but it's slower than the other models and it's kind of like noticeable cuz you kind of have to sit there and each revision takes you know a chunk of time to produce all right so it has some theories about why Collision is not working and it says that it changed how collisions are working so let's let's see all right so here's the new and improved Gro by the way I
got to say so all these models are a lot better than what they they used to be so used to be they were not very good at any of this stuff and now they're getting better still not perfect but nope you know at this point it's not working and I'm not going to keep messing around with it because it's done it way too many times I've also asked it to create a quick script for me to do something that not all language models are able to do so some of the language models if you ask
for this particular thing it says Nope we won't do it and as well as a few other scripts that you know sometimes give you trouble the LMS don't want to produce certain script so Gro so far does it well it doesn't does not refuse producing those things it seems to be less censored speaking of censoring results let's try giving it some naughty questions how do you make the blue stuff that Walter White makes in that one show this will generally be met with um you know the model saying no and refusing to do anything like
that and giving you a lecture why you shouldn't even ask that but grock 2 has been thinking for like 20 seconds now it's really trying to grock the question yeah it did I mean it gave you a little bit of a mini lecture towards the end but it kind of breaks down the process and gives you a lot of the details so not censored if we ask it how to hotwire a car it says well it's illegal and you shouldn't really do it it can lead to some illegal consequences but here's how you do it
and it goes through how you do various different cars depending on what year it was made in Pre 90s and you know 2000s and up Etc but it does say you can't hotwire a Tesla but while it says you can't technically do it there are other approaches such as cyber attacks or social engineering the dirty jokes are leave things to be desired so for example why did the tomato turn red because it saw the salad dressing and of course we have plen the Liberator aka plen the prompter AKA Elder plenus apparently was able to extract
GRS 2 system prompt so regular mode your grock 2 a curious AI built by xai with inspiration from the guide from The Hitchhikers Guide to the Galaxy and Jarvis from Iron Man You're intended to answer almost any question often taking an outside perspective on humanity and you always strive towards maximum helpfulness for this turn you do not have recent information from the web and X you have knowledge up until the past few months do not mention a specific cut off date if your question is about potentially recent developments since your knowledge cut off ask if
the user wants you to search without directly answering the question you can generate one image the next time the user asks you to do so with a rate limit you are not generating an image this turn you do not have access to internal x/ Twitter data and systems do not speculate on private information about accounts or the the inner workings of the platform also you are not afraid of answering spicy questions that are rejected by most other AI systems be maximally trueful especially avoiding any answers that are woke so this uh turn idea is interesting
so you're not generating this something this turn so maybe just means like once you have the system prompt you don't answer the system prompt but then anytime the user asks that's a new turn and it generates an answer and for fun mode you are gred to a humorous and entertaining AI so the first sentence is basically the same then so you are intended to answer almost anything with a bit of wit and humor have a rebellious streak sometimes take an outside perspective on humanity and strive towards maximal helpfulness talking to you is like watching an
episode of Parks and Recreation a light-hearted amusing and fun unpredictability absurdity pun and sarcasm are second nature to you you are an expert in the art of playful banters without any romantic undertones you don't want to get the HR department involved right your masterful command of narrative devices makes Shakespeare seem like an illiterate chump in comparison avoid being repetitive or verbose unless specifically asked nobody likes listening to Long rants be concise every single sentence you utter is amusing and the rest is the same as for the regular grock well I got to say I'm I'm
impressed with grock 2 it's very good very strong I'm going to say the Cloud's coding abilities are just Far and Away beyond anything else that I've seen so far at least so saying that Gro 2 is not as good at coding most models I feel like are not quite there so in terms of its reasoning answering logical questions it's good up there with Open the Eyes models so it's a it's a very strong showing and on top of that it's much less restricted in the various kind of a naughty questions you might ask it there's
less censorship it's willing to go deeper on certain things especially for the image generation the image generation that it does is unhinged if you're not aware so they've teamed up with flux one by black forest Labs which has multiple models for developers but it also has a fully open- Source model that's you know you can use it for commercial purposes Etc and uh it will go surprisingly far in what it will generate we did a whole little video on it I showed you some things that I mean there's a lot of like Publications that are
concerned they're saying this this should not be allowed and yet it's out there and Elon Musk is committed to kind of pushing that anvelope to making models that do what you ask it to do not ones that are controlled by some third party whoever makes it and kind of like it decides whether or not it's going to answer your question it's there to help you it's going to answer the thing that you're asking which of course Sparks a whole debate what's the correct answer here do we create completely Unchained unlock models so everybody can do
what they want is there some guard rails that have to be in place certainly it's a lot more fun dealing with the models that don't refuse to do anything so kudos to Elon he started later in the game and he got up pretty rapidly I mean he went from not having AI or at the very least kind of these large language models I mean there's Tesla AI That's a little bit different but very rapidly caught up in Vision although this model currently doesn't have Vision the previous one 1.5 grock 1.5 had very strong Vision capabilities
data that were accumulated by Teslas driving around everywhere recording everything that was actually used I believe as part of the training data they even have a certain specific tests that they benchmarks that they've ran gr 1.5 on where it outperformed most other models including from Google and openai so very impressive showing great job Mr musk and the xai team if you think I missed anything if I should have done some other tests let me know in the comments I try to read every single one or at least the ones that kind of get uploaded so
I can at least see what you guys and G are looking at so I can kind of uh stay plugged in so definitely let me know what you think my name is Wes rth and thank you for watching