Currently AI systems are in many ways very stupid we are fooled into thinking they are smart because they can manipulate language very well Professor Yan Lun is a vice president at meta where he oversees the development of one of the world's most powerful AI systems so one of the things that my colleagues and I have been working on designing a new type of AI system still would be capable of understanding physical world Have persistent memory be able to reason and plan the system will have emotions fear or excitement or ation he received the touring award
computer science's highest honor and the Queen Elizabeth II prize for engineering MK said that Tesla would reach five level autonomy within next 5 years he's been doing he's been saying this for the last eight years he said you know this is going to happen next year for the last eight years and obviously it hasn't clearly have to stop Believing in him on this um because he's been consistently wrong he thought he was right and he turned out to be wrong or he was just R his work has earned nearly 400,000 citations and his 2015 deep
learning paper with Nobel laurat Jeffrey Hinton is among the most frequently cited in scientific history are you surpris when you look at AI development today the progress day after day night after night no really no host of the interview is Dr Matt keki A science popularizer and a former digital ambassador of the European Union Mr professor it's a great honor having you here um and my first question to you is about your uh research uh you've been cited half million time on Google Scholar uh what made your work with Jeffrey Hinton your deep learning work
such Game Changer so you're probably referring to a paper that Jeff inono and I uh published in nature in 2015 and This was uh this was not new work it was basically uh a bit of a Manifesto if you want or a a review paper to to tell the wide uh community of Science and and researchers there is this new set of techniques that work really well here is a list of things where it works well here is where the future is going and so it sort of marked the sort of public beginning if you
want popularize ation of deep leing but there was no new result in that paper really the the new Results um and most of the other citations go uh go back to the work that I did in the 1980s and '90s do remember the moment when that popularity was beginning was becoming you know uh the moment in the history that you saw that Jesus Christ it's one of the most popular research work in the history history there were there were two waves really it happened twice so the first the first one was in the late 80s
when we started to have really good results Um using multi-layer neural networks We Now call this de planning uh for for task like uh like image recognition at the time we could not recognize complex images it was more like simple images like like handwritten characters and things like this but this was working really well um and um and I was really excited at the time when we started getting those results because I thought this may completely change the way we do uh pent recognition and eventually Computer vision and perhaps AI more generally um and so
there was a wave of excitement between the late 80s and mid 90s and then the interest kind of disappeared in the mid90s is because the the techniques that we had developed required a lot of data for training and we could only get good data this was before the internet so we could only get good data for you know a few applications things like uh like like you know handwriting recognition Character recognition and speech recognition but that was about it um and it required computers that were at the time really expensive and it was a big
investment so interest in this kind of disappeared in the mid 90s um and and then interest sort of went up again slowly in the in the 2000s the late 2000s and it totally exploded around 2013 so in 2013 is really the key year where the the research World realized that deep Learning really worked well and could be applicable to a lot of different uh different things um and and you know it's been kind of growing really really quickly since then and you know 2015 was another we push AI to match their human capabilities today um
will we pick up the human flow uh anger a something like that do you believe that it would happen no I think uh So currently AI systems are in many ways very stupid um we are fooled into Thinking they are smart because they can manipulate language very well uh but they can't they don't understand the physical world they don't really have any persistent memory of the type that we have they can't really reason and they can't plan and those are essential characteristics of uh intelligent Behavior so one of the things that my colleagues and I
have been working on uh at fair and at NYU actually is uh designing a new type of AI system still based on De learning um that would be capable of understanding physical world have fa memory be able to reason and plan and in my opinion once we succeed in in you know building those systems around this blueprint those system will have emotions so they'll have emotions like uh maybe uh fear or excitement or Elation because those are anticipations of outcome those systems will basically work by having a goal that we set them To fulfill will
will give them goals to accomplish and then they will try to figure out what kind of actions can I take so that I fulfill that goal if they can predict in advance that this goal will be fulfilled it will kind of make them happy if you want okay or if they predict that they can't it will you know not make them happy so to some extent they will have emotions because they'll be able to anticipate the the outcome of sequence of actions they might take uh But we will not hardwire to them anything like anger
or uh you know jealousy or or anything like that because or Consciousness or Consciousness but Consciousness is something else we don't know what it is really uh there's no real no definition there's no definition of it there's no kind of measurable thing really that can tell us where you know whether something is consciousness or not like even you know if we observe like you know animals Um so we we would probably all agree that uh you know apes and monkeys are conscious and maybe elephants and maybe you know animals of that type that's what penro
said in our interview so you you fully agree with him probably um yeah but like you know he's a dog conscious is a is a rat conscious conscious where is the barrier like because we don't have a good definition for it we really can't tell um about the Year um you said machine learning sucks um was something changed uh that's what we're working on when you look at AI development today well we we're working towards uh you know new ways of building machine Learning Systems so that they can learn as efficiently as humans and animals
uh because currently it's not the case Okay um and I I can I can tell a little bit of the history of how machine learning has progressed over Over the last uh couple decades so there's really three paradigms of machine learning one is is called supervisor learning which is the most classical one and so the way you train a supervised Learning System is that let's say an image A system that is meant to recognize images you show it a picture let's say of a table and you tell it this is a table okay so it's
supervised because you tell it what the correct answer is the system computes is output And if it says something else on table then then is going to adjust its parameters its internal structure so that the output it produces gets closer to the output you want okay and if you keep doing this with lots of examples of tables and chairs and cars and and you know cats and dogs eventually the system will find a way to recognize every image you trained it on but also images it's never seen that are similar to the one You train
it on okay this is called a generalization ability there's another paradigm uh which people thought was closer to the way animals and humans learn called reinforcement learning so in reinforcement learning you don't tell the system what the correct answer is you only tell it whether the answer it produced was good or bad and to some extent that can explain some type of human and animal learning You know you you try to ride a bike and you don't know how to ride the bike and after a while you fall so you know you did something bad
so you change your strategy a little bit right and eventually you learn how to write a back um now it turns out reinforcement learning is extremely inefficient it works really well if you want to train a system to play chess or play go or poker or something like that um because you can have the system play millions and Millions of games against itself and basically F tune itself uh but it doesn't really work in the real world if you want to train a car to drive itself you're not going to do it with reinforcement learning
it's going to crash t thousand of times uh if you to train a robot to learn how to grab things reinforcement learning can be part of the solution but it's not the complete answer it's not sufficient so there is a third form of learning called Self-supervised learning and this is what has enabled uh the recent progress in natural language understanding and chatbots um and in s supering you you don't train the system to accomplish any particular task you just train it to basically capture the structure of its of its input so the way this is
used for um for text for example for language is that you take a piece of text you corrupt it in some way by for example Removing some words and then you train a big noral net to predict the words that are missing okay a special case of this is that you take a piece of text um and the last word in that text is not visible and so you train the system to predict the last word in that text and this is the way large language models are trained on every chb is trained this way
um technically it's a little Different but that's the basic principle okay so that's called self supervisor learning you don't train the system for a task you just train it to learn the internal dependency of the input um and the success of this has been astonishing it it works amazingly well you get system in the end that seem to really understand language and be able to understand question if you f tune them to to answer question properly using supervisor running or Reinforcement running um so this is what everybody has been working on in the industry right
but that model does not work if you want a system to understand the physical world something is missing yes it's just that the physical world is much more difficult to understand than language we think of language as the op of intelligence because only humans can manipulate language but it turns out language is simple and it's simple because it's discrete it's a sequence of Discrete symbols there's only a finite number of possible words in a dictionary um and so you can never train a system to exactly predict what word is going to come next but you
can train it to produce something like a score for every word in a dictionary or a probability for every word in the dictionary to appear at that location and so you can handle the uncertainty in the prediction that way but you cannot train a system to predict what's going To happen in a video people have tried to do this I've tried to do this for 20 years um and a lot of people have had this idea that if you could train a system to predict what's going to happen in a video then that system will
implicitly understand the underlying structure of the world you know intuitive physics everything that any animal and any of us as babies learn physical R yeah physical intuition you know if Um um you know you know that if I if I take an object and I I I I let it go it's going to fall you you've learned that you know gravity basically attracts a your object toward towards the ground uh human babies learned this by the age of nine months roughly it takes about n months to to learn maybe not limitation of AI development today
it's our knowledge about the reality we cannot replace more that we know we have no idea how gravity Was born we have no idea how Quantum world is transformed into classical one yeah but it's it's a simple problem because you know your cat or your dog can you know learn about gravity in just a few months right and the cat cats are really really good at this right I mean they can plan complex actions and uh you know climb on all kinds of stuff and you know jump so obviously they have a very good intuitive
understanding of what we call intuitive physics Um and we don't know how to reproduce this with computers yet and and the reason is um it's it's another example of what AI researchers have called the morave Paradox so H morave was a roboticist and he you know he he made that point that how come we can have computers play chess and solve mathematical puzzles and things like this but we can get them to do physical things like manipulate objects that you know um animals can do Or or or or jump or things like that so it's
another example of this Paradox that um the the the space of discrete objects and symbols is easily manipulate manipulated by computers but the real world is just too complicated yet and the techniques that work in one case don't work in the other case um a good uh a good way to kind of visualize this if you want is that um the amount of information that gets to us through our senses let's say Vision or Touch uh is absolutely enormous compared to the amount of information we can get through language okay and this may explain why
we have llms chatbots they can pass the bar exam or they can solve mathematical problems or or you know write that sound that sound good we still don't have domestic robots we still don't have robots that can accomplish tasks that a cat or a dog can accomplish we still Don't have completely autonomous level five cell driving cars and we certainly don't have cell driving cars that can train themselves to drive in about 20 hours of practice like any 17year old so so clearly we're missing something big right um and what we're missing is you know
how to train a system them to understand complex sensory input like like vision and this is necessary uh if we want to learn machines as professionally as humans and Animals yeah if you want machines that have intelligence that is similar to that of animals and humans that have common sense um perhaps at some point have Consciousness and everything um but like are capable of really soling really complex uh uh structure of of complex world we need to we need to crack that uh that problem so we we've been working on let me let me give
you a very simple um calculation yeah a um a Typical large language model uh is trained with something on the order of 20 trillion tokens right 20 20,000 billion tokens a token is like word more more or less a token typically is represented on three byes okay so 20 or 30 trillion tokens each on three bytes that's about 10 to the 14 bytes a one with 14 zeros behind it this is the totality of all the text available publicly on the internet uh it would take any of us Several hundred thousand years to read through
that material okay so it's an enormous amount of information but then you compare this with the amount of information that gets to our brain through the visual system in the first four years of life and it's about the same amount in four years a young child has been awake a total of about 16,000 hours um the amount of information getting to the brain through the the the optic nerve is about 2 megabytes per Second do the calculation and that's about 10 to the 14 bytes it's about the same in four years a young child has
has seen as much information or data as the biggest llms and what it tells you is that we're never going to get to human level AI by just training on text we're going to have to get systems to understand the real world um and that understanding the real world is really hard on your LinkedIn and Facebook you are linking Ai And entropy uh what's the link between it's a very very difficult uh to understand uh what you've wrote so it would be great if you explain us a little bit simple okay uh so simple word
it's been a bit of an obsession of mine like there's a big question which is at the root of a lot of problems in computer science in physics information theory in a lot of different fields which is the question Of how you quantify information okay how much information resides in a message and uh the the point I've made multi times is that the amount of information in a message is not uh an absolute quantity because it's it's it depends on the person interpreting this message the amount of information you can extract from sensors from a
message language that someone that someone tells you or whatever it depends on how you Can interpret that that's so the the idea that you can measure information in absolute term is probably false you every measure of information is relative to a particular way of interpreting that information so that's kind of the the point I was making and this has very far ranging consequences because if there is no absolute way of measuring information that means there's a lot of Notions in physics that don't really have you know kind of objective definitions like Entropy so entropy is
a measure of our ignorance of the state of a physical system and of course that depends on how much you know about the system um and so um I've I've been sort of uh obsessed with this idea of trying to find good ways of defining uh entropy complexity or information content that is is relative don't you think that our Global database to train AI models is over we digitalize 100% of our data in 2000 that was 25% uh data digitalize today all so we're not even close no this there's a huge amount of uh textual
knowledge that is not has not been digitized um and you know maybe in a lot of the developed World a lot of it has been digitized but most of it is not public there's a lot of medical data for example that is not public and then there is a lot of cultural data Historical data in a lot of regions in the world that is not accessible in digital form or if it's if it is in digital form it's in the form of scan documents so it's not you know text or anything so I so it's
not true I think there's still a lot of data out there that uh and that questions about the nature of the reality because for example we have no idea how matter is transferred into Consciousness in a human brain so we Have no data about it but maybe in the future we will do this well so I think I don't think we should be obsessed by the question of Consciousness I think it's the world is obsessed I think the world is I mean some parts of the world are obsessed by it frankly I think it's a
bit of an nippy phenomenon and I think it's probably the reason why we can't find a good definition of Consciousness is because we're not asking the right question let me let me give you an Example in the 18th century people discover the 17th century they they discovered that the image on the retina you know you know light comes through theis and have a lens and the image on the retina forms upside down and so the people at time were completely puzzled how is it that we see the world right side up even though the image
is formed upside down in a retina that was a puzzle for them and now we realize that question makes no sense I mean it's just That you know the way you think about U you know how how your your brain interprets images it's irrelevant you know in what direction the image forms on your on your retina so so I think Consciousness is a bit like this it's something that you know we we can't Define we think exists but we can't put our finger on it and would make us individuals so maybe that's different that's different
no obviously I mean there's a lot of things that you know Make us all different from each other we have a different experience um so we we learn different things right um we we we grow up in different environments uh but also our brains are wired slightly differently all of us are slightly different and that's a necessity for evolution to uh make sure that every individual human is different because we are you know we are a social animal so there is a big Advantage Um when different people in the same tribe are slightly different because
that means they can combine their expertise if if every one of us was identical then there would not be strength in number okay but because we're different we're stronger because we're diverse so um so that's a result of uh Evolution and that can be done by you know different slightly different wiring of the brain slightly different tuning of the you know different Neurotransmitters and hormones and whatever uh that makes us different what's about a free reasoning abstraction thinking models such as a one uh and can we expect something like this from your laboratory so um
the question of elaborating abstract representations from observation is key to deep learning deep learning is all about learning representations in fact one of the main conferences on on deep learning is called International Conference on learning representations uh which I created and co-created with sheno so this this tells you how Central this question of learning abstract representations is to to AI generally and to deep learning in particular um now if you want the system to be able to reason um you you need another set of characteristics you basically the the act of of reasoning or planning
uh classically in AI not just In machine learning based AI but but since the 1950s consists in um having a way of searching for a solution to a problem okay so for example if I give you a list of cities and I ask you give me the shortest circuit that goes to all those cities okay you you're going to think about it and say well you know I should go from cities that are nearby so that my total circuit is as close as possible now there is a space of all possible Circuits which is a
set of all permutations of the Cities right in all the orders in which you can go through the cities it's an enormous uh space and the way algorithms that you know in your GPS and things like this search for pass is that they they search through among all possible paths for one that is the shortest all reasoning uh systems are based on this idea of a search okay for in a space of possible solution you search for one that matches what uh you Know the objective that you want um so the way current systems uh
are are doing this current llms like o1 like you know R1 a bunch of of those things are doing it is in a very very primitive way they're doing this in in what's called token space which is a space of outputs so they basically have the system generate lots of different sequences of tokens um more or less randomly and then they have another Neuronet looking through all of those um hypothesize sequences for one that looks the best and then it outputs that it's extremely expensive because it requires generating lots and lots of outputs and then
selecting good ones and it's not the way we think so we don't think by you know generating lots and lots and lots of actions and then looking at the result and then figuring out which one is best that's not the way we think if I If I ask you for example imagine a cube floating in the air just in front of you okay now take that Cube and rotate it by 90° around a vertical axis okay so you have a cube rotated by 90° now picture that Cube and tell me if it looks like the
original Cube before you rotated okay the answer is yes because you know that a cube has you know is if you rotated by 90 de it's still a cube and you're still seeing it from the from the same uh the same Viewpoint you mean That is illusion of free reasoning well so what you're doing is that you're reasoning in your mental state you're not reasoning in your output action State action space in the physical world in the physical world or in whatever your your output state is right you're you're reasoning in in an abstract space
and so we have those mental models of the world that allow us to kind of predict what's going to happen in the world manipulate um reality predict in Advance what the consequences of our actions are going to be and if we can predict what the consequences of our actions are going to be like rotating a CU by 90° or whatever it is then we can plan a sequence of actions so as to arrive at a particular goal right so um you know whenever we accomplish a task consciously um you know all of our mind is
focused on it and we think about like what sequence of action do I have to do To you know assemble this piece of you know eka furniture or whatever or or build this uh thing out of wood or uh or just you know do anything basically everything we do every day that we use our our mind for um our task of this type we need to plan and most of the time we plan we plan uh hierarchically so we don't for example you're going to go go back to Raza at some point right um if
you decide right now to go back to War from New York mhm um you know that you have to go to the airport and catch a plane okay now you have a sub goal going to the airport and this is what hierarchical planning is about you you define sub goals to ultimate goal your ultimate goal is go to Roa your sub goal is go to the airport how do you go to the airport where we're in New York so you go down on the street And you have a taxi to the airport how do you
go down on the street where you have to move out of this building go to the elevator take the elevator down move out about how do you go to the elevator you have to stand up go to the door open the door Etc and at some point you get down to a goal that is sufficiently Clos that you don't need to plan like to to stand up you know from your chair you don't need To plan because you're so used to doing it you you can just do it right uh and you have all the
information that's necessary for that so this idea that we're going to need to do hierarchical planning that intelligence systems need to do hierarchical planning is crucial we have no idea how to do this with machines today that's a big challenge for the next few years that's why you spend so much time at Davos uh talking about the robotics um you spoke about The coming decade of robotics robotics has had endless Winters why is this time different yeah so robotics I mean robots are used a lot today but they're used in you know cheap sensors better
simulators or what well the robots are usable in tasks that are relatively simple and can be kind of automated in a very simple Way so where um the sensing doesn't need to be hard um so so you know you have manufacturing robots that paint cars in in factories and you know assemble parts and things like that as long as everything is in the right place and um you know those robots basically are just automata um but then let's take another task for like like like driving uh a s driving car is a robot or a
car that has driving assistance is also a Robot um and we still don't have selling cars that are as reliable as humans yet I mean we do but they you know there's weo and and companies like that but they they cheat a little bit they use you know sensors that are much more sophis than than human human sensing musk uh said that Tesla would reach uh five level autonomy within next five years he's been doing he's been saying this for the last eight years he said you know this is going to happen next year For
the last eight years and obviously it hasn't um so um either I mean you clearly have to stop believing in him on this because he's been consistently wrong uh you know either because um he thought it was right and it turned out to be wrong or he was just lying um I think it's a way for him to kind of inspire his team to kind of reach unattainable goal year after year yeah but uh but I think it's actually um very difficult for an engineer or scientist To be told by their their CEO the problem
you've been devoting your entire career to solve you know with we're going to solve it next year so you think that is the biggest challenge of our era to integrate Ai and Robotics and sensors sensors if we are able to build AI systems that understand the physical world that have persistent memory can reason and plan then we'll have the basis for AI that can power robots that would be much more flexible than current Robots that we have so there's a lot of robotic companies that have been formed over the last year or two um you
know they buil human robots and things like this and all the demos are really impressive but those robots are very stupid they cannot do what you know a human can do uh not because they don't have the physical ability it's because they just are not smart enough to deal with the real world and so a lot of those companies are counting on the fact That AI is going to make fast progress over the next 3 to five years so that when they are ready to sell those robots at a large scale and build them at
a large scale um they'll be smart enough because AI would have made progress It's it's a it's a big bet um so I can't tell you whether it's going to happen within the next three or five years but um but it's very likely that we're going to make significant progress in AI that is going To enable you know more flexible robots within the next decade which is why I'm I've said the next decade is decade of Robotics are you surprised when you look at AI development today the progress day after day night after night not
really no no what surprised me was the kind of the fact that is was very highly non-continuous the fact that there was a lot of progress in the 1980s and 90s and then nothing and then some more progress during the 2000 but it was under the Radar most people didn't realize we were making progress um and then as soon as those progress became visible around 2013 or so then the whole field exploded and all of a sudden a lot of smart people started working on it a lot of companies started investing um there was a
lot more interest so now progress has been accelerating just because there was more investment and more smart people working on it um but I would have thought the progress you know Since the 1980s would have been much more continuous today over world is talking about the new Chinese model uh deeps yeah about the deeps the open source um much cheaper uh than the Americans do don't you think that the horses have left the barn uh that what what do you make of it okay um there's something that needs to be explained extremely clearly if a
piece of research or Development is published um so the techniques that are used to produce it are published in a paper or white white paper report or some kind and if the code is open source then the entire world profits from it okay not just whoever produced it the person who produced it or the group that produced it gets Prestige okay and recognition and perhaps you know investment or whatever but uh but the entire world profits from it this is the magic of Open research and open source software uh meta I mean I myself and
and meta more generally have been extremely uh strong proponent of this idea of open research and open source and whenever an entity that is practicing open research and open source produces something the entire community of Open Source profits from it as well so people are kind of formulating this as if it's a competition but it's not it's more like a cooperation the Question is is um do we want this cooperation to be worldwide and my answer to this is yes because there are good ideas coming from everywhere in the world um Lama for example the
first model uh the first llm that uh that meta put out yeah I mean it wasn't the first llm there were llms before that we put out but they they were a little bit uh under the radar um it was produced in Paris in our lives in Paris and so this is not which I created 10 years ago okay um this is fair fair Paris um has you know over 100 over 100 researchers there working so and a lot of really good stuff came out of that laby in Paris a lot of good stuff came
out from our laby in Montreal um so the research Community is really worldwide everybody contributes no entity has a monopoly on good ideas which is why open collaboration makes the field progress faster that's why we Are big proponents of open research and open source is because the entire field progresses faster when you communicate with other scientists now there are some people in the industry who used to kind of practice open research and clammed up that's the case for open Ai and thropic was never open so they they they keep everything secret Google kind of went
from being partially open to being open Because of us tuna being partially closed uh they're not revealing all the techniques behind jini for example they're still doing a lot of open research but it's more kind of fundamental long term um so um I think it's sad because a lot of people are kind of basically putting themselves outside of the of the world research community and not participating not contributing toh to progress the reason why progress in AI has been so fast in the last 10 years is because of open research and you have to realize
that everybody believe it oh absolutely no this is a fact I'm not the only one it's not a belief it's a fact uh let me give you an example almost practically the entire AI industry builds or at least at the research and development stage uh uses a a a software to build a system called PCH PCH is open source it was produced by my colleagues at meta at Fair initially and then a bigger population uh a few years ago P the ownership of P was transferred to the Linux Foundation So Meta does not own it
anymore it's still the main contributor but it doesn't control it it's controlled by a community of developers essentially um the entire industry uses it if that includes open a it includes entropic um Google has their own thing but it includes Microsoft it includes Nvidia it includes everybody everybody Uses by torch the entire academic uh World research uh uses by torch um I think among all the papers that appear in the scientific literature uh PCH is mentioned in something like 70% of them so what that tells you is is that you know progress in AI Builds
on each other's uh work and and you know that that's how you make uh sense and Technology progress if if not deep six so maybe American gate Stargate project would Change everything no no don't you agree that it's the biggest project in the humanity history okay so let me say one more thing about about deeps um it's good work um the people working on this have had really good ideas they did some really good work this is not the first time that very good Innovative work comes out of China we've known this for a long
time particular in areas like computer vision progress I mean the contribution from China in large language model is more recent but in computer vision it's been a long tradition um you look at you know top computer vision conferences half the attendance is Chinese I mean they you know they they're very good scientists there and you know very smart people um so neither the us nor Europe nor any regions of the world has a monopoly on good ideas um so you know the ideas from deeps will soon be uh reproduced probably within weeks and perhaps Integrated
in sort of future versions of what comes out of entities in the US in Europe you know Middle East whatever um now it's part of the World Knowledge right that's that's the beauty of Open Source and open research it's not it's a competition at the level of products but at the level of s basic myth methods it's not it's a coroporation okay now let's talk about U Stargate Stargate um um now all the all the companies that Are involved in AI are are seeing a future pretty near future where you know billions of people will
want to use AI assistant on a daily basis um I'm wearing a pair of glasses now I don't know if you can see it but it's got cameras on it this are the Rand meta okay they're built by by meta and uh you can you can talk to them there is an assistant um that is connected to and you can ask it any question you can even ask it to kind of you know recognize a Species of plant from the camera and everything um so we see a future where people would be kind of wearing
smart glasses or maybe using their smartphone or their smart devices and basically we use AI assistant all the time in their daily lives they will help them in their daily lives now that means they're GNA have there's going to be billions of users of those AI assistants using them multiple times a day and for this you Need a very big infrastructure of compute because running a llm is or an AI system whatever it is is not cheap so you need a lot of compute power most of that investment so um you know meta is investing
this year on the order of 60 to 65 billion in infrastructure mostly for AI um Microsoft has announced they're investing 80 million 80 billion um and the starget is 500 billion but it's Over five or 10 years and we don't know where the money is coming from so it's on the same order of magnitude of investment it's really not that different from what you know Microsoft and meta are already doing um and most of it is for inference so it's for uh running AI assistant to serve billions of people it's not for training large models
that is actually relatively Cheap so I think the reaction of financial markets for example that we we've seen uh in the last few days to the appearance of DS saying oh now you know we can train systems cheaper so we don't need all those computers anymore that's just false uh training may become back to the normality well I mean training would just become a little more efficient um but the result is that we just going to train bigger models um and in the end most of the infrastructure And most of the investment goes into actually
running the models not training them that's what the investment is in I have a question from our viewer uh you propose an alternative to the Transformer architecture which is the most important piece of llms how does uh Jaa World model differ from transform perers and why do you think world models are the future you mention a little bit about it but being mostly focused on uh Gpath okay so there is this architecture um which really should be called a macro architecture called jepa so that means joint embedding predictive architecture and it is not an alternative
to Transformers you can have Transformers Inside of japas okay jepa is kind of a macro architecture within which you arrange uh different modules and those modules could be Transformers they could be other things If you want but they could be Transformers so it's not those are orthogonal Concepts you you it's not a they're not in opposition if you want what japa is an alternative to it's an alternative to uh something that doesn't have a common name uh but basically the current corop of large language models uh in the business they are called Auto regressive decoder
only architectures but uh or Transformers or uh open a I call them gpts okay general Purpose Transformer so a GPT is just a particular and by the way it doesn't need to be a Transformer but um it's a particular architecture that train that's trained using this self supervised learning technique I was describing earlier where you take a sequence of symbols let say text a sequence of words and you train a system um the system is organized in such a way that to predict a particular word on the input it can only look at the ones
that Are to the left of it okay it's called a causal architecture and if you train a system to you feed it a text and you just train it to reproduce that text on its input then basically implicitly you train it to predict the next word in a text all right so then you can use that system once it's train you can use that system to Just Produce one word after the other Auto regressively and that's what laring models are now try to apply this to the real world because you want To train a robot
to you know plan things or predict what's going to happen in the world it doesn't work so if instead of words you put uh you take frames from a video and you turn those frames into things like tokens like the words and you try to train a system to predict what's going to happen in the video doesn't work doesn't work very well and so we have to and the reason it does work is because there's a lot of things that happen in the world that you Simply cannot predict and representing the fact that you cannot
exactly predict what's going to happen is essentially a mathematically intractable problem in in in high dimensional space like videos it's possible in discrete space like text so you cannot predict what word comes after a text but you can predict prity distribution over all the possible words we don't know how to do this with videos we don't know how to represent a Distribution over all possible video frames and so um so the techniques that are used for text that work really well for text and for DNA sequences and proteins do not work for video or other
natural signals so jepa is an answer to this it's a way to basically uh the main idea is that instead of making that prediction in in the space of of the Inputs you train the system to learn an abstract representation of the input and then train need to make prediction in that representation space and that turns out to be a much better way of formulating the problem um because you know if I take a video of the room you are in we are in right now any room okay and I point the camera at one
at one location and then I slow turn the camera and then I stop and I ask the System tell me what happens next in the video system might predict that the camera is going to keep turning but there's no way you can predict all the details of what's going to be in the in the field of view after the camera rotates you know there is a plant there might be a u a painting on the wall there might be people sitting it cannot predict what those people are going to look like he cannot predict what
the species of plant is or what the you know What the texture of the floor is going to to be or things like it's just impossible to predict and so if you're training a system to make those prediction it spends a huge amount of resources trying to predict things he cannot predict and it fails the greatest achievement of yan laon laboratory is there is noan laboratory uh it's it's hard to put the finger on it um I mean what I'm known for is something called convolutional neural Network which is a particular architecture uh inspired by
the architecture of the visual cortex designed to handle natural signals like images video audio speech things like this and those systems are used everywhere so if you have any kind of driving assistance system in your car and most car all the cars sold in the EU now have to have that right at least they have to have a system that breaks your car Automatically when there is an obstacle in front of it that's your labatory it's using commcial net okay all of them that's my invention from 1988 okay it goes back a long time so
that's what I'm most famous for the first applications were uh character recognition handwriting recognition reading zip codes reading checks the amounts on checks things like that um that was in the early 90s and then uh since you know 2010 roughly there's been like a very Quickly growing set of applications for this when you talk to your phone you know speech recognition systems the first few layers of the neural net that does this usually uses conval Nets um when you have an application on your phone that you know you can take a picture of uh I
know a plant and ask uh you know your your app what what is the species of that plant or species of that insect or listening to the song of a bird or something and tell me what Species it is that's a on that you are European where is the place of Europe in that AI race between us and China so uh I think Europe has a very important role to play um because Europe has the most difficult thing to implementing regulations implementing implementing regulations uh well there are issues of that type in the EU that's
for sure like for example the glasses I'm wearing right now one uh uh one application of This is you know interpreting the images that go through the camera so you can look at a a menu I could look at a menu in Polish or you could be speaking to me in Polish and you know there would be kind of translation of future of the menu actually that's available today future that's available today in those glasses except the but the glasses are not available the glasses are available in Europe except the vision feature except the vision
is is not available Because of uncertainty about regulation it's not even clear the regulation would make it illegal it's just that it's unclear um so um but let me say that Europe has uh you know big assets big advantages and the first one is talents our programista phys mathematicians uh computer scientists engineers more generally uh you know phys IST Etc um a lot of top scientists in AI regardless of where they work in the world come from Europe uh I come From Europe okay I've been in the US for a long time you are a
European you are still living in Paris yeah no I live in New York but I spend I spent a lot of time in Paris yeah yeah there a final question uh I need to ask I remember the Press Noble press conference um when I asked Jeffrey Hinton the question if you could turn back time would you do this if there is something That you regret when you look at your research on AI development and I would like to ask you the same question uh I don't know what Jeff answered to that question but I can
guess what he answered um probably know I think um okay let me give you my answer first okay so my answer is um for the longest time I was not interested in what we now call self-supervised learning because I Thought it was badly formulated as a problem and in fact I had those discussions with Jeff Hinton for for many years where you know I I was pushing supervisor learning and and and he told me like ultimately we need to figure out how to do what it calls unsupervised running which is now a particular form of
ital supervised running and I only changed my mind about this in the mid 2000 okay and that was probably 10 years Too late so I should have probably get interest gotten interest in that problem earlier but the thing is between the mid90s and and the early 2000 uh not much happened in neuronet and deep learning because the whole world was completely uninterested in that so you know we we had to do something else I worked on something else I worked on image compression a system called djvu which I heard is was pretty popular in Poland
actually um in Eastern Europe More generally but um so so I think I think that's that's one thing I would have I would have done differently um other than that I've been pretty happy with things the the way things have been going I would have been pretty a little more forceful also at kind of keeping the interest of the community in in neuron Nets and machine learning in the late 90s than I was um so that there wouldn't have been a kind of a winter of deing if you want right um I'm guessing Perhaps one
one thing that Jeff might have answered is that he had a bit of a change of mind uh two years ago where the quest of his career was to figure out the learning algorithm of the of the cortex of the brain uh he always thought that back propagation which is the main technique that we use to train neur today which he had something to do with and I had something to do with as well he always thought that was not what the Brand used because and the bra was be using something else because back propagation
is not really conically uh plausible um and and so he kept coming up with sort of new ways of doing machine learning every two years for the last four years um and two years ago he just gave up he said well maybe the brand doesn't use back propag but bation works really well and maybe That's what we need maybe it works even better than whatever it is that the brain uses and so so he had his Epiphany and then retired basically he could declare Victory and my last one question to you is why you are
supporting axis Polish American startup regarding breast cancer predictions using AI from New York University you are on the board your adviser right uh so I mean first of All uh medical applications of deep learning are extremely promising uh this already been deployments of uh deep Ling methods for for diagnosis including for things like breast cancer from from mammograms and things like that um and I have a young colleague who has a postu in our lab and now is a faculty is a professor at the medical school in the radiology department um Christo gas who is
brilliant um and uh and recently he he said like there is too many Opportunities I you know I'm I'm I'm going to do a startup co-found a startup with with a few friends so they came to me and say like you know would you like to be advisers and I knew their scientific work um was was really good and so I I thought this uh company was really promising and I was really curious to see what they could do with uh with that I mean they broad you know Spectrum of applications is basically you know
diagnosis using deep learning Particularly for for Imaging but more generally than that so in fact they want to go directly not from from uh measurements basically to treatment not not not just uh not just diagnosis and uh I found that uh really promising and fascinating that's why Mr Professor thank you very much for your time and having you here it great honor thank you pleasure