so you recently posted a paper star bootstrapping reasoning with reasoning uh so can you explain like uh Chain of Thought yeah and that whole direction of work how useful is that so Chain of Thought is a very simple idea where uh instead of just training on prompt and completion uh what if you could force the model to go through a reasoning step where it comes up with an explanation and then arrives answer almost like the the intermediate steps before arriving at the final answer and by forcing models to go through that reasoning pathway uh you're

ensuring that they don't overfit on extraneous patterns and can answer new questions they've not seen before uh but at least going through the reasoning chain and and like the high level fact is they seem to perform way better at NLP tasks if you force them to do that kind of Chain of Thought like let think step by step or something like that it's weird isn't that weird um it's not that weird that such tricks really help a small model compared to a larger model which might be even better instruction tuned and more common sense so

so these tricks matter less for the let's say gbd4 compared to 3.5 uh but but the key Insight is that there's always going to be proms or tasks that your current model is not going to be good at mhm and how do you make it good at that uh by bootstrapping its own reasoning abilities mhm um it's not that these models are unintelligent but it's almost that we humans are only able to extract their intelligence by talking to them in natural language but there's a lot of intelligence they've compressed in their parameters which is like

trillions of them but the only way we get to like extract it is through like exploring them in natural language and it's one way to to uh accelerate that is by feeding its own Chain of Thought rationals to itself correct so the idea for the star paper is that you take a prompt uh you take an output you have a data set like this you come up with explanations for each of those outputs and you train the model on that now there are some imprs where it's not going to get it right now instead of

just training on the right answer you ask it to produce an explanation uh if you were given the right answer what is explanation you have provided you train on that and for whatever you got right you just train on the whole string of prompt uh explanation and output this way uh even if you didn't arrive the right answer if you had given been given the hint of the right answer you're you're you're trying to like reason what would have gotten me that right answer and then training on that and mathematically you can prove that it's

like related to the variation lower bound uh in L with the latent and uh I think it's a very interesting way to use natural language explanations as a latent that way you can refine the model itself to be the Reasoner for itself and you can think of like constantly collecting a new data set where you're going to be bad at trying to arrive at explanations that will help you be good at it train on it and then seek more harder data points train on it and if this can be done in a way where you

can track a metric you can like start with something that's like say 30% on like some math benchmark and get something like 75 80% MH so I I think it's going to be pretty important and the way it transend just being good at math or coding is if getting better at math or getting better at coding translates to Greater reasoning abilities on a wider array of tasks outside it too and could enable us to build agents using those kind of models that that's when like I think it's going to be getting pretty interesting it's not

clear yet nobody's empirically shown this is the case that this can go to the space of Agents yeah but this is a good bet to make that if you have a model that's like pretty good at math and reasoning it's likely that uh it can handle all the Conor cases when you're trying to prototype agents on top of them

Chain-of-thought explained | Aravind Srinivas and Lex Fridman