Twitter has been on absolute fire for the last few days with strawberry memes the hype is real the memory is strong and I know a lot of you come to this channel for the more serious stuff and of course that's going to be my bread and butter but every once in a while I like to join in on the memory and the hype so I'm having a little fun here hopefully you're having fun too but in this video I'm going to actually break down everything we really know about strawberry what it could be all of

the research papers that it could be built on top of and so we're going to be taking things a little bit more seriously right after this intro so just a quick catch up on the memes and the hype Sam Alman just yesterday posted I love somewhere in the garden featuring a picture of strawberries and the point is open AI has this project internally that has been rumored leaked whatever you want to call it it's called qar then it was renamed to project strawberry and that is where all the memes come from so from there we

have the user I rule the world Mo basically creating this intense hype train around project strawberry he posts every other minute he's gone from just a few thousand followers to now almost 12,000 in a matter of days and for what it's worth the memes are awesome and everybody seems to be getting in on the fun Jimmy apples of course replying to Sam alman's Post with less recommending more skitso strawberry posting Sam Alman saying patience Jimmy again patience Jimmy and of course he replies Sam please so this is in reference back to the last time he

said patience Jimmy really telling us to hold our horses they're cooking we have the CEO of perplexity tweeting out a strawberry yesterday night we have MC Hammer the entire world would take notice strawberry Emoji we have run the famous open AI leaker posting a picture SF is so back with a picture of strawberries at the center of a get together in San Francisco and of course Jimmy apples again Google's corporate marketing can't compete with the online a non- strawberry religion yes so we have I rule the world we have Sam Alman at the top we

have Jimmy apples right there and a dead Google and we have Gary tan the president of Y combinator even getting in on the fun but now what we're here to talk about chubby on Twitter has posted a thorough breakdown of everything we know about strawberry and if you don't follow chubby I really recommend it he is a fantastic follow he posts really technical stuff and really well researched stuff so please follow him so that's what we're going to be doing today we're going to be going through this very lengthy post and we're going to be

breaking down everything we know about strawberry so qar has not been published or made publicly available there are no papers on it and open aai is holding back on information about it so Sam Alman in this video says we are not ready to talk about that let's just watch that video quickly can you speak to what qar is we are not ready to talk about that see but an answer like that means there's something to talk about it's very mysterious Sam I mean we work on all kinds of research uh we have said for a

while that we think better reasoning in these systems is an important direction that we'd like to pursue we haven't cracked the code yet since the first hints the community has been trying to figure out what qar might be in this article I compile I being chubby all of the information I could find to paint a possible picture of qar and of course a lot of this is rumors a lot of this is hypotheticals but this is what we know so far so let's find out all right so the beginning of this entire two star strawberry

drama about half a year ago the information and Reuters learned from open AI employees at a scientific breakthrough had been achieved at the research facility although the first rumors emerged in December 2023 and the nice thing is he also gives references to everything that he posts so really well done so basically for the first time a model had succeeded in learning by itself with the help of a new algorithm and acquiring logical mathematical skills without external influence something that Transformer architecture cannot normally do due to their characteristics as their results are outputs of probabilities logical

thinking and self-learning are a necessity that many thinkers believe is a prerequisite for artificial general intelligence AGI and AGI therefore requires absolute correctness in its output in order to be transferred to all human processes something that open AI itself repeatedly emphasizes in blog posts and so here's a quote in recent years large language models have greatly improved in their ability to perform complex multi-step reasoning however even state-of-the-art models still produce logical errors also known as hallucinations mitigating hallucinations is a critical step in building aligned AGI and he also references my video thank you chubby for

that and by the way I'll drop all of the links to my qar videos and my strawberry videos in the description below if you want to check those out so let's read what Reuters wrote and this is back in 2022 some at open AI believe qstar could be a breakthrough in the startup search for what is known as artificial general intelligence given vast Computing resources the new model was able to solve certain mathematical problems though only performing math on the level of grade school students acing such tests made researchers very optimistic about qars future success

but Conquering the ability to do math where there is only one right answer that's really important implies AI would have greater reasoning capabilities resembling human intelligence this could be applied to novel scientific research for instance AI researchers believe and so this is really an important quote from Sam Alman and I remember watching this clip for the first time where he said this and it's kind of mind-blowing just hearing this so listen to this four times now in the history of open AI the most recent time was just in the last couple weeks I've gotten to

be in the room we sort of push the veil of ignorance back and the frontier of Discovery forward and getting to do that is the professional honor of a lifetime now what he's talking about here of course we can't be sure but a lot of people think it is qar and this logical and mathematical ability of large language models now at the same time we know a lot of other things happened around this time and after this time Ilia sus tried to get Sam Alman fired the board fired him there was a mutiny then Sam

Alman came back we know that multiple super alignment researchers have left stating that they're not getting the resources they need there's a lot of memes around what did ilas see so there's just a lot going on where maybe they discovered something and they're a little bit afraid of it and then the information around that time also reported open AI made an AI breakthrough before Altman firing stoking excitement and concern one day before he was fired by open ai's board last week he alluded to a recent technical Advance the company had made that allowed it to

push the veil of ignorance back and the frontier of Discovery forward which we just covered and Jacob and Simon sedor two top researchers use SS's work to build a model called qar that was able to solve math problems that it hadn't seen before an important technical Milestone so here's a small note from chubby which I think is super relevant to what we're talking about the great fear and concern about qar also stemmed from imagining that if qar could already teach itself math without prior training only Elementary School level at first but certainly more than that

with enough compute it could be at risk in the foreseeable future due to exponential development on all data encryption what would stop an AI that teaches itself math from finding a solution to encryption if you just give the model enough time and compute so basically what he's saying is if all of a sudden AI is able to teach itself math and if we throw enough compute at it it's going to get really good at math and encryption is just math that's all it is and so if a model gets good enough then all of a

sudden it might actually be able to crack encryption putting everything at risk Banks nuclear codes all of your passwords of course and really so much more and it's interesting I was just rewatching the show Silicon Valley by the way fantastic show if you haven't seen it and in the show and this is going to be a spoiler towards the end of the entire show they basically discover artificial intelligence mixed with some unique compression algorithm that is able to crack every encryption algorithm and so that's kind of how the show ends so it's interesting to see

that even years ago the creators of the show were thinking about this problem but it's not just about reasoning and math a lot of people including myself think that qar is more about long-term thinking right now when you give a prompt to a large language model you get back the first thing that they quote unquote think of essentially they predict what the next token is and they tell it to you instantly there's no longer processing behind it they just give it to you and so what qar and strawberry really could be is the ability for

large language models to spend time coming up with a better solution to think about it more akin to how a human thinks about things when we're asked a question or we're assigned a research task we don't just answer with the first thing we think of we write things down we research we have deep thought about it and then finally when we're ready we present our answers and making large language models and AI in general perform like that could be a huge unlock so let's read a little bit more about what chubby says one can say

that qar is a method algorithm for approximating language models to human thinking and its conclusions it is a method for mapping step-by-step thinking iterative thinking and thinking in process subdivisions and adapting them to large language models according to Conan there are two thought processes two systems in which people think system one thinking and system two thinking system one thinking is intuitive thinking thinking that takes place automatically and intuitively at the moment the large language models can only think in System One Way by outputting a probability result based on their training data this corresponds with intuitive

thinking so however system 2 Thinking is complex thinking it is the kind of thinking that involves thinking steps and process divisions if we want to solve difficult mathematical problems we cannot arrive at the results intuitively but have to approach the solution step by step this is what we need to teach language models to approach results slowly so how do you actually teach language models to think in system to thinking to start he quotes himself so let's read a little bit about what he says instead of laborous training llms using rhf reinforcement learning through human feedback

a model with qar is able to learn on its own so this is very similar to how the alphago system works we didn't give it any feedback it is the best at the world at the game go by far simply because it is able to play the game with itself over and over and over again and it just gets better and better and better by itself we gave it no instruction so here in this example GPT with qstar taught itself math without any external intervention so he also references AAR which is a Monte Carlo tree

search algorithm and let's read a little bit about it by dividing thinking into subprocesses the model is given a sense of security solution finding using AAR is a search algorithm that finds the best solution AAR is a graph traversal and pathfinding algorithm which is used in many fields of computer science due to its completeness optimality and optimal efficiency so given a weighted graph a source node and a goal node the algorithm finds the shortest path from source to goal the combination of Q learning and AAR SE Arch teaches the model to think and find Solutions

independently and to correct itself so think about this it thinks about something it moves forward it predicts what's next it kind of checks its work and is able to go back and try different paths to finding the most optimal solution so this means that the hallucination will stop and correct Solutions will be output as a result because the solution is not simply taken from the training data and is based on probability however qar is likely to be very computationally intensive and this is is where GPT mini comes into play my hypothesis is that open AI

will use GPT mini to reduce energy requirements and save compute and perhaps will incorporate a qar variant into the small model this is just speculation so because qar and AAR have to go back and forth and do this longer term compute they are going to be incredibly compute intensive and that is why it's going to be so costly to run these models so if you have a smaller much cheaper model using this technique that might be the right way to approach it so system 2 Thinking is explained in more detail in a research paper by

open AI let's verify step by step published by ilas husk and Yan Leakey both of who are exop AI at this point so the idea is already used today in prompts by telling a model to think step by step and if you've watched any of my llm test videos you know that I put that in most of my prompts because we tend to get a much better response when you say think step by step or divide the task into subsections which is of course only a superficial attempt to apply system to thinking so we can

somewhat apply some of these techniques simply by structuring our prompts using some of these nudges to think in the right direction he also talks about tree of thoughts which is another version of this that you can accomplish with prompts alone and of course there's tree of thought there's Chain of Thought there's a lot of different methods that we've talked about on this channel that you can apply just strictly through a little bit of programming a little bit of prompt engineering so just a reminder tree of thought framework work consists of four main components there's the

thought process decomposition basically breaking the problem into smaller manageable steps there's thought generation so giving a bunch of options to choose from status evaluation evaluating those options and then the actual search algorithm systematic exploration of the thought tree using algorithms such as breath first search or depth first search and referencing that paper in experiments with tasks such as game of 24 creative writing and mini crossword puzzles tree of thought showed significant improvements over conventional methods and again this is simply prompt engineering so imagine if these techniques are built in natively to the model that's when

it becomes really powerful so qar is probably a combination of q-learning and AAR search qstar combines elements of q-learning and AAR search which leads to an improvement in goal oriented thinking and solution finding this algorithm shows impressive capabilities in solving complex mathematical problems without prior training data and symbolizes an evolution towards General artificial intelligence it is a fusion of q-learning and AAR search there is also an element of self-play which we've already talked about that is how alphao works self-play is the idea that an agent can improve its gameplay by playing against slightly different versions

of itself because it'll progressively encounter more challenging Solutions in the space of llms it is almost certain that the largest portion of self-play will look like AI feedback rather than competitive processes there we have another term look ahead planning it's the idea of using a model of the world to reason into the future and produce better actions or outputs so we have a nice summarization at the bottom qar algorithm combines Q learning and AAR search for improved goal oriented thinking we have process monitoring provide step-by-step feedback to improve model performance star approach generates rational thought

with each token prediction and then overcoming hallucinations basically by process monitoring and star it makes the decision- making more accurate and reliable and what we will do with these is develop a more robust and reliable AI system and a couple last things open AI Executives told employees that the company believes it is currently on the first level according to the spokesperson but on the cusp of reaching the second which it calls reasoners so that's what we have for today that is a breakdown of what qar strawberry a star search what all of these things could

be and it's super fascinating to me hopefully you enjoyed this if you enjoyed this video please consider giving a like And subscribe and I'll see you in the next one

OpenAI Q* Strawberry - Everything We Know So Far