My goodness, AlphaGeometry, an amazing Google DeepMind paper fresh out of the oven. And I think this paper might be history in the making. So, how is an AI that learned on a 100,000,000 mathematical problems able to compete in the International Mathematical Olympiad?
That is perhaps the most prestigious math competition in the world. So, did they achieve a breakthrough? Well, let’s see together.
Dear Fellow Scholars, this is Two Minute Papers with Dr Károly Zsolnai-Fehér. Now, competing on such problems is trouble. This requires not just a huge glorified calculator for adding numbers, but it requires planning your solution, logic and reasoning.
And more. But, of course, GPT-4 can do all of that, so just give the problems to it and and off you go. End of the video, right?
Just as a demonstration, GPT-4 can ace the bar exam, would be hired to Amazon for its coding skills, and it is better than almost all of humans on the biological olympiad. So, it can sweep all of these, easy, right? Well, let’s see, if you try to use GPT-4 and give it 30 of these tasks, it can solve exactly…wow.
It can solve zero of them. I hope this gives you a feel of how difficult these problems are. Devilishly difficult.
And scientists at DeepMind just proposed a system that is about a 100 times smaller? I can’t believe it. But to start, how do you even solve such a problem?
Let’s see an easy example. For instance, if we ask a human mathematician to prove that there is an infinite number of primes, we could prove it by listing them. But no one can list an infinite amount of numbers, that is impossible.
No. So what do we do instead? How do we do the impossible?
Well, instead we start with the assumption that there is a finite number of primes, and then find a contradiction that says this assumption cannot possibly be true. This part of the proof requires thinking out of the box. It requires a moment of brilliance.
And without it, the problem is intractable. It is akin to pulling a rabbit out of your hat. And when you have the rabbit, mechanically finishing the problem is relatively easy.
And the strategy could be that if the rabbit didn’t work, try to pick a new one. We can do that. But, can an AI, can a machine pull a rabbit out of a hat?
And the initial answer is no. Mostly not. But let’s try anyway.
This is how a human would do it, and here is their proposed AI that could hopefully try to do it. So, does it work? Well, let’s have a look.
When given a problem, with blue, it first creates the key ideas, the rabbit, and then, the green part is the remainder of the calculation that leads to the solution. Okay, but this was easy peasy. Now give me some proper problems.
Oh my, now we’re talking! So, little AI, can you solve this? Whoa, it pulls the blue rabbit out of the hat, and then, runs the green calculations until it solves it.
And make no mistake, this is just an excerpt of the solution. Now hold on to your papers Fellow Scholars, because the full solution looks more like this. My goodness, look at that.
More than a hundred steps have been concealed here. And it had done all of this correctly. And that is not even the longest proof it is capable of writing.
Not even close. Wow. So, how good is it?
Well, a previous technique was this good, and the new technique without the rabbit is this good. So it is almost as good as the average mathematical olympiad contestant. Note that these are really smart people, so the average of those is also really smart.
And it can compete with that. But, wait a second. This can run the mechanical calculations, but it only takes you so far.
You also need the brilliance to pull the rabbit out of the hat. And as soon as we add the rabbit part of the solution, what happens? What?
Are you seeing what I am seeing? It is nearly as good as the smartest of these super smart people. And, Fellow Scholars, if you think that is impressive, hold on to your papers because we are just getting started.
Here are 2 mind-blowing facts about the paper: One, it learned from scratch by itself, without any human demonstrations. Yes, the proposed system does all this without human intervention. This is essentially an AI implementation of the two modes of human thinking, and that is thinking fast and slow.
Thinking fast is about quick, instinctive responses, like reading something, while thinking slow involves deliberate, logical, and calculated decision-making. This can do both as well as some of the smartest humans can do. So when we are worried that this cannot possibly get any better, because there are no humans good enough to teach it anymore, now we know that it only needs synthetic training data, so it can learn by itself.
And it has already found more general, more elegant solutions for some of the tasks than humans did. Two, this project is open source from day 1. Every piece of the solution is out there for you, for free.
Yes, you Fellow Scholars can run your own experiments with it. And all this with a model that is about a 100 times smaller than GPT-4. An absolute slam dunk of a paper.
We are still early, but I think it might be fair to say that this is a breakthrough. Now, as incredible as this AI is, note that it is still relatively narrow. It can do geometry, but it cannot play StarCraft or do anything else.
However, the ideas and concepts described in the paper are general enough to make sure that this can be applied to other problem domains as well. And that, Fellow Scholars, is going to be a series of incredible breakthroughs. By the way, it is a possibility that I will visit San Francisco around mid April.
For the first time ever. If this is the case, if you are a local lab like DeepMind and OpenAI and you would like me to visit, or if there is someone who I should really meet, please let me know on Twitter/X.