[music] [music] Hi an welcome to the information bottleneck and I have to say this is a bit weird for me like I've known you for almost five years and we've worked closely together but this is the first time that I'm interviewing you for podcast, right? Usually our conversations are more like, "Yan, it doesn't work. What I should do?" [laughter] Um, okay. So h even though I'm sure all of our audience knows you, I will say Janaun is a touring award winner, one of the godfathers of deep learning, the inventors of convolutional neural networks, founder of
metals fundamental AI research lab and still their chief AI scientist and a professor at NYU. So welcome pleasure to be here. >> Um yeah and [clears throat] it's a Pleasure for me to be anywhere near you. Um I have been you know in this industry for a lot less time than either one of you and doing research for a lot less time. So uh the fact that I'm able to publish papers somewhat regularly with Rave uh has been an honor and to be able to uh start hosting this podcast has been even more of [snorts]
one. So it's really a pleasure to sit down with you. >> Awesome. >> Um yeah, so we like cong congratulations On the the new startup, right? You recently announced that after 12 years at Meta. Uh you're starting a new startup advanced machine intelligence h that you focus on world model. Um so first of all how does it feel to be in the in the other side going from a big company uh to starting something from scratch? >> Well I co-ounded companies before uh I was you know involved more peripherally than uh than this new one
but uh but I You know I know I know how this works. What's unique about this one is a new phenomenon where um there is enough hope from the part of investors that you know AI will have a big impact that they are ready to invest a lot of money essentially which means now you can create a startup where you know the first uh couple years are essentially focused on research uh that just was not possible before like you know the only place to do research in industry before Was in a large company that was,
you [snorts] know, not fighting for its survival and basically had a dominant position in its market and had a, you know, long enough view that they they were willing to to fund long-term projects. Um, so from you know, history, the the big labs that we remember um like Bell Labs belong to AT&T, which basically had a monopoly on telecommunication in the US. uh you know IBM had a monopoly on big computers Essentially right and they had a good research lab xerox has monopoly on photocopers and that enabled them to fund park did not enable them
to profit from the research going on there but that profited Apple um and then more recently Microsoft research google research and fair at at meta um [snorts] and and the industry is you know shifting again uh fair had a big influence on AI the AI research ecosystem by essentially being very open Right publishing everything open sourcing everything with and and with tools like PyTorch but also like research prototypes that a lot of people have have been using uh in industry so we caused other labs like Google to become more open and and other labs to
also kind of publish much more systematically than before but what's been happening over the last couple is that um a lot of those labs have been kind of clamming up and becoming more Secretive. Uh and uh that's certainly the case. I mean that was the case for open eye several years ago and uh and and now now Google is becoming more closed and possibly even meta. So, um, yeah, I mean, it was it was time for the the the type of, uh, stuff that I'm interested in to kind of, [snorts] um, do it outside beta
than inside. >> So, so to be clear then, does, uh, AMI, advanced machine intelligence, plan to Do their research, uh, in the open? >> Yeah. like the upstream research. I mean in my opinion, you cannot really call it research unless you publish what you do because otherwise you can get easily fooled by yourself. Um you know you you come up with something you think it's the best thing since sliced bread. Okay, if you don't actually submit it to the rest of the community, you might just be delusional. [laughter] >> [clears throat] >> And I've seen
that phenomenon many times you know in lots of industry research lab where there sort of internal hype about you know some internal projects um without kind of realizing that other people are doing things that actually are better right [laughter] so so if you if you tell the scientists like you know publish your your your work uh first of all that is an incentive for them to do better work that is more you know where the methodology is kind of more thorough And the results are kind of more reliable. The research is more reliable. Uh it's
good for them because very often when you work on a research project, the impact you may have on product could be months, years or decades down the line. And you cannot tell people like you know come work for us. Don't say what you're working on and maybe there is a product you will have an impact on five years from now. like in the in the meantime like they can't be motivated to really Do something uh useful. So if you tell them that they they they tend to work on things that have a short-term impact, right?
So if you really want breakthroughs, you need to let people publish. You can't do it any other way. And this is something that a lot of the industry is forgetting at the moment. >> Does AMI uh like what products, if any, does AMI plan to to produce or make? Is it research or more than that? No, it's More than that. It's actual products. Okay. Um but uh you know things that have to do with uh with you know world models and you know planning and and basically we have the ambition of uh becoming kind of
one of the main suppliers of intelligent systems down the line. We think the the current architectures that are employed you know LLM or you know agentic systems that are based on LLMs um work okay for language um even agenting systems really don't Work very well uh they require a lot of data to basically clone the behavior of of humans and they're not that reliable. Um so we think the proper way to handle this and I've been saying this for almost 10 years now is uh is have well models that are capable of predicting what would
be the consequence or the consequences of an action or sequence of actions that an AI system might take and then the system arrives at at a sequence of actions or an output by optimization By figuring out what sequence of actions will optimally accomplish a task that I'm, you know, setting for myself. That's planning. Okay. So, I think an essential part of intelligence is being able to predict the consequences of your actions and then use them for planning. Um, and that's what we're that's what I've been working on for many years. Uh, we've been making fast
progress uh with, you know, a combination of projects um here at NYU and also at Meta. Um, and Now it's it's time to basically make it make it real. >> And what do you think are the missing parts like and why you think it's taking so long? Because you're talking about it as you said like for many years already, but it's still not better than LL, right? >> It's not the same thing as LM, right? It's designed to handle uh modalities that are high dimensional, continuous, and noisy. And LM completely Suck at this. Like they
really do not work, right? If you try to train an LLM to kind of learn good representations of images or video, they they're really not that great. Um, you know, generally vision capabilities for for AI AI systems, right, are trained separately. They're not part of the whole LLM u uh LM thing. So yeah, if you want to to handle data that is uh uh highdimensional, continuous and noisy, you cannot use generative models. You Can certainly not use generative models that tokenize your data into kind of discrete symbols. Okay, it it's just no way. And we
have a lot of empirical evidence that this simply doesn't work very well. What does work is learning an abstract representation space that eliminates a lot of details about the input. essentially all the details that are not predictable which includes noise and make predictions in that representation space uh and this is the Idea of JEA right joint embedding predictive architectures which you know you are >> as familiar to yeah with as I [laughter] you worked on this yeah so uh >> also Randel was hosted in the past >> renov in the podcast I probably talked about
this at length so so there's a lot of ideas around this And let let me tell you my history around this. Okay. Um I um have been convinced for a long time Um probably the better part of 20 years that the the proper way to building intelligent systems was through some form of unsupervised learning. I started working u on unsupervised learning as the basis for you know making progress uh in the early 2000s mid 2000s before that I wasn't so convinced this was the way to go and [clears throat] and basically this was the idea
of uh you know training autoenccoders to learn Representations right so you have an input you run it to an encoder it finds a representation of it and then you decode so you guarantee that the representation contains all the information about the input does that that intuition is wrong like insisting that the representation contains all the information about the input is a bad idea okay I didn't know this at the time so what we worked on was um you have several ways of doing this you know Jeffington at the time was working on restricted Boston machines
um uh yes Benjjo was working on the noising autoenccoders which actually became quite successful in different context right for NLP among others and I was working on sparse autoenccoders so basically ally you know if you train an autoenccoder you need to regularize the the representation so that the autoenccoder does not trivially learn an identity function and this is the Information bottleneck uh podcast this is about information bottleneck right you need to create an information [snorts] bottleneck to limit the information content of the representation and I thought highdimensional sparse representations was actually a good way to
go so um so a bunch of my students did their PhD on this Korakulu who's now the chief AI architect at deep at at alphabet and also the CTO at deep mind actually did His PhD on this with me um and uh you know a few a few other a few other folks macro and Zetto and Elen B and a few others so so this was kind of the idea and then [laughter] as it turned out and the idea the reason why we worked on this was because we wanted to pre-train very deep neural nets
by pre-training [snorts] those things as autoenccoders We thought that was the way to go. What happened though was that we started like you know experimenting with things like Uh normalization uh rectification instead of hyperbole tangent to sigmois uh like values >> and that ended up you know basically allowing us to to train fairly deep network completely supervised. So self-s supervised learning and and this was at the same time that data sets started to get bigger and so it turned out like you know supervised learning worked fine. Um so the whole idea of self-s supervised Or
unsupervised learning was put put aside. Um and and then came reset and you know that's completely solved the problem of training very deep architecture right uh in 2015. Um but then in 2015 I started you know thinking again about like how how do we push towards like human level AI which really was the original objective of fairly and and my object my life you know mission [snorts] um and realized that you know all the Approaches of reinforcement learning and and things of that type were basically not scaling you know reinforcement learning is incredibly inefficient in
terms of samples and so this is not was not the way to go. Um and and so the idea of run models, right? A system that can predict the conse consequences of its action and can plan. I started really seriously playing with this around 2015 16. My keynote at at uh what was still called NIPS at the time uh in 2016 was on world model. I was arguing for it. Um that was basically centerpiece of my talk was like this is what we should be working on like you know run models action condition and a
few of my students started working on this on video prediction and things like that. We had some papers on video prediction in 2016 and um I made a the same mistake as as before and the same mistake that everybody is doing at the moment which Is training a video prediction system to predict at a pixel level. um which is really impossible and you can't really represent useful probability distributions on the space of video frames. Um and so those things don't work. I knew for a fact that because the prediction was non-deterministic, we had to have
a model with lat variables, okay, to represent all the stuff you don't know about the the Variable you're supposed to predict. >> And so we experimented with this for years. I had a a student here who is now a scientist at at fair mik enough who developed a video prediction system with latent variables. Um and it kind of solved this problems we're facing slightly. I mean today the solution that a lot of people are employing is uh diffusion models which is a way to train a nondeterministic function essentially or energy based Models which I've been
uh advocating for decades now which also is another way of training nondeterministic functions. Um, but in the end I discovered that this was all about the idea that the really the the way to get around the fact that you you you really can't predict at the pixel level is to just not predict at the pixel level [laughter] is to is to run a representation and predict at a representation level >> eliminating all the details you cannot Predict. uh and I I wasn't really thinking about those methods early on because I thought there was a huge
problem of preventing collapse. Um, so I'm sure Randall talked about this, but um, you know, when you train u, let's say you have an observed variable X and you're trying to predict a variable Y, but you don't want to predict all the details, right? So you run both X and Y through encoders. So now you have both a representation for S, for X, SX, a representation for Y, S Y. You can train a predictor to produce, you know, predict the representation of Y from the representation of X. But if you want to train this whole
thing end to end simultaneously u this is a trivial solution where the system ignores the input and produces constant representations and the predictors problem now is trivial right so if your only criterion to train the system is minimize the Prediction error it's not going to work it's going to collapse I knew about this problem for a very long time because I worked on joint embedding architectures we used to call them siamese networks back in the 90s >> those are the same because people have been using that term from these networks even recently. >> That's right.
I mean the the concept is still, you know, up to date, right? So you have you have an X and a Y and think Of the X as some sort of degraded, transformed or corrupted version of Y. Okay. Uh you run both X and Y through encoders and you tell the system look X and Y really are two views of the same thing. So whatever representation you compute should be the same, right? Um so if you just train a neural net um you know two neural nets with shared weights right to produce the same representation
for slightly different versions of the same object view whatever it is uh it Collapses it doesn't produce anything useful. So you have to find a way to make sure that the system you know extract as much information from the input as possible. And the original idea that we had, you know, it was a newspaper from 1993 with Siamese Net was to have a contrastive term, right? So you have other pairs of samples that you know are different and you train the system to produce different representations. So you have a cost Function that attracts the two
representations when you show two examples that are identical or similar and you repel them when you show it two examples that are that are dissimilar. And we came up with this idea because someone came to us and said like can you encode signatures um of someone you know drawing a signature on a tablet. Can you encode this on less than 80 bytes? Because if you can encode it in less than 80 bytes We can uh write it on the magnetic uh tape of uh of a credit card and we can do signature [laughter] for credit
cards, right? And so we came up with this idea. I came up with this idea of training a neural net to produce >> 80 variables that would be quantized in one bite each >> and then training training it to kind of do this this thing >> and did they use it? Uh so it worked really well and they showed it to their You know business people who said um oh we're just going to ask people to type pink hose [laughter] >> we have a lesson of like the like how you can integrate the technology >>
right and you know I knew this thing was kind of fishy in the first place because like you know there were countries in Europe that were using smart cards right and there was much better [laughter] problem but they just didn't want to use smart cards for some reason anyway So, So we had this uh technology in in the mid 2000. Uh I I worked with two of my students on to revise this idea. We came up with some new objective functions to train those. So those are what people now call contrastive methods. It's a special
case of contrastive methods. We have like positive examples, negative examples and you train you know on positive examples you train the system to have low energy and for negative samples you train them to have higher Energy where energy is the distance between the representations. So we had two papers at CDPR in 2005 2006 by Rya Rya Hatel who is now the uh head of uh uh deep mind foundation the the sort of fairlike division of deep mind if you want uh and Sumi Chopra who is actually a faculty here at NYU now working on medical
imaging and so um this gathered a bit of interest uh in the community and sort of revived a little bit of work on this on those ideas, [sighs] But it still wasn't working very well. Those contrasting methods really um were producing representations of images, for example, that were kind of relatively low dimensional. If we measured like those, you know, spec the value spectrum of the corus matrix of the representations that came out of those things, it would fit up 200 dimensions, never more like even training on imageet and things like that even with augmentation. And
so that was kind of disappointing and it it did work. Okay, there was a bunch of papers on this and it worked okay. Um there was there was one paper from deep mind seem clear that that demonstrated you could get decent performance with um contrastive training applied to sis nets. Um but then about five years ago um uh one of my posttos Stephan Duni at at MITA um tried an idea that um at first I didn't think would work [snorts] which was to Essentially have some measure of information quantity that comes out of the encoder
and then trying to maximize that. Okay. And the reason I didn't think it would work is because I'd seen a lot of experiments along those lines that Jeffington was doing in the 1980s. They're trying to sort of maximize information and you can never maximize information because you never have uh appropriate measures of information content that is a lower bound. If you Want to maximize something, you want to either be able to compute it or you want a lower bound on it so you can push it up, right? And for information content we only have upper
bounds. So I always thought this was completely hopeless. And then um you know Stefan kind of you know came up with a technique and which was was called um B twins. Uh B is a famous theoretical neuroscientist who came up with the idea of information maximization And uh and it kind of worked. It was wow. [laughter] Uh so then I said like we have to push this right. So we came up with another method with a student of mine Adrian Bard co-advised with Jean who's affiliated with with NYU too uh technique called Vicreg variance invariance
covariance regularization uh and that turned out to work to be simpler and worked even better and since then we made progress and Randall Recently u you know discussed an idea with him that can be pushed and made practical it's called sigreg the whole system is called logic Ja. Okay. He's responsible for the name. I don't know. [laughter] >> That means latent Ukidian Japa, right? >> Yeah. >> Um and and and Sigreg has to do with sort of uh making sure that the distribution of of vectors that come out of the encoder is An isotropic Gaussian.
That's the I and the G. Um so um I mean there's a lot of things happening in this domain which are really cool. I think there's going to be some more progress over the next year or two. Uh and we'll get a lot of experience with this and uh and I think that's kind of a really good um promising set of techniques to train models that learn abstract representations which I think is key. >> And what do you think are the missing Parts here? Like do you think like more compute will help or like we
need better algorithms or like it's kind of like do you believe in the bitter lessons right like do you think >> well and and furthermore what do you think about you know the data quality problems with the internet post 2022 right I've heard people compare it to low background steel now to refer to all that data before LLM came out like low background tokens I mean >> okay >> I think I'm totally escaping that problem okay here's the here's the thing And I've I've been you know using this argument publicly uh over the last couple
of years. Uh training an LLM uh if you wanted to have any kind of you know decent performance requires training on basically all the available freely available text on the internet plus some you know synthetic data plus license data etc. So a typical like you Know three you know going back uh a year or two is trained on 30 trillion tokens. A token is typically three bytes. So that's 10^ the 14 bytes for pre-training. Okay, we're now talking about fine tuning. Um 10 to the 14 bytes. And for the L&Ms to be able to really
kind of exploit this um they need to have a lot of memory storage because basically those are isolated facts. There is a little bit of redundancy in Text but but a lot of it is just isolated facts, right? Um and so you need a lot of you need very big networks because you need a lot of memory to [clears throat] store all those facts uh and regurgitate them. [sighs] Okay. And I compare this with uh video um 10^ the 14 bytes if you count uh 2 megabytes per second uh for video for you know relatively
[snorts] compressed video not highly compressed but a bit that would represent 15,000 hours of Video 10^ the 14 bytes in 15,000 hours of video you have the same amount of data as the entirety of all the text available on the internet. Now, 15,000 hours of video is absolutely nothing. It's 30 minutes of YouTube uploads. Okay? It's the amount of visual information that a 4-year-old has seen in uh his or her life. Entire life waking time is about 16,000 hours in four years. [snorts] Um it's not a much It's not a lot of information. We have
video models now vja vja 2 actually that just came out last summer. >> [snorts] >> um that was trained on the equivalent of a century of video data and it's all public data okay much more data but much less than the biggest L&M actually because even though it's it's more bytes it's more redundant so you say okay it's more redundant so it's less useful actually when you use self-supervised Learning you do need redundancy you cannot run anything in self-supervised or anything by the way uh if it's completely random redundancy is what you can learn. And
so um so there's just much richer structure in uh you know real world data like video than there is in text which kind of led me to claim that we absolutely never ever going to get to human level AI by just training on text. It's just never going to happen. Right? So the it's this big debate in philosophy of whether AI should be grounded in reality or whether it could be just you know in the realm of symbolic manipulation and things like this. Well, and when we talk about for world models and grounding, I think,
you know, there's still a lot of people who who don't even understand what the idealized world model is in a sense, right? So, for example, I'm influenced by having watched Star Trek, which I Would hope you've seen a little bit of thinking of the holidays, right? I always thought that the holiday was like an idealized perfect world model, right? Even so many episodes of it going too far, right? And people walking out of it, right? But it, you know, it even simulates things like smell and physical touch. So, do you think that something like that
is like the idealized world model or do you think like a different model or or like way of of defining it Would be? >> Okay, this is an excellent question. Uh, and the reason excellent is because it puts it goes to the core of of really what uh you know what I think we should be doing, which I'm doing and you know how wrong I think everybody else is. Okay. [laughter] So, so people think, you know, think that a world model is something that reproduces all details of what the world does. They think of it
as A simulator. >> Yeah. >> Right. And of course, because, you know, deep learning is the thing, you're going to use some deep learning system as a simulator. A lot of people also are focused on video generation, which is kind of a cool thing, right? You you produce those cool videos and wow, you know, people are sort of really impressed by them. Now, there's no guarantee whatsoever that when you train A video generation system, it actually has an accurate model of the underlying dynamics of the world and it's learned anything, you know, particularly abstract about
it. [snorts] >> Um, and so, so the idea that somehow a model needs to reproduce every detail of the of reality is wrong and hurtful. >> And I'm going to tell you why. Okay. Uh a good example of simulation is uh CFD computational fluid dynamics. It's used All the time. People use supercomputers for that, right? So you want to simulate the flow of air around an airplane. >> You you know cut up the space into little cubes. uh and within each cube you you have a small vector that represents the state of that cube which
is uh you know velocity, density or mass and uh temperature and maybe a couple other things right so and then you solve maybe a sto equations which are which is a differential uh Partial differential equation u and you can simulate the flow of air now the thing is this does not actually necessarily solve the equations very accurately. If you have chaotic behavior like turbulences and stuff like that, simulation is only, you know, approximately correct. Um, but in fact, that's already an abstract representation of the underlying phenomenon. The underlying phenomenon is molecules of air that bump
into each Other and bump on the on the wing and on the airplane, right? But nobody ever goes to that level to do the simulation. That would be crazy, right? It would require an amount of computation that's just insane and it would depend on the initial condition. I mean, there's all kinds of reasons we don't do this. And maybe it's not molecules. Maybe it's, you know, at a lower level, we should simulate particles and like, you know, do the fiveman diagrams and simulate, You know, you know, all the different paths that those particles are employing
because they don't take one path, right? It's not classical. It's quantum, right? >> Yeah. Um so at the bottom it's like quantum field theory and probably already [snorts] that is an abstract representation of the reality. [laughter] So um so you know everything that takes place between us at the moment in principle can be described through quantum field theory. Okay, we Would just have to measure the wave function of the universe in a you know a cube that you know contains all of us and even that would not be sufficient because there are untangled particles in
the other side of the universe that you know we have. So it wouldn't be sufficient. But let's imagine okay for the sake of uh of uh of the argument. First of all, we would not be able to measure this uh wave function. Um and second of all, the amount of computition We would need to devote to this is absolutely gigantic. It was would be some gigantic quantum computer that you know is the size of the earth or something. U so no way we can we can describe anything at that level. And it's very likely that
our simulation will be accurate for maybe a few nanconds and yeah >> you know beyond that we'll we'll diverge uh from reality. So what do we do? We invent abstractions. We invent Abstractions like particles, atoms, molecules. In the living world it's proteins, organels, cells, organs, organisms, >> societies, ecosystems, etc. Right? Um, and basically every level in this hierarchy ignores a lot of details about the level below. And what that that allows us to do is make longer term, more reliable longerterm predictions. Okay, so we can describe the dynamics Between us now in terms of the
underlying science and in terms of psychology. Okay, that's a much much higher level abstraction than particle physics, right? Um and in fact you know every level in the hierarchy I just I just uh mentioned is a different field of science. A field of science is essentially defined by the level of abstraction at which you start making predictions right that you allow yourself to use to make predictions. Um In fact physicists have this down to an art uh in the sense that um you know if I give you a a box full of gas um you
could in principle simulate all the molecules of the gas right uh but nobody ever does this but at a very abstract level we can say you know PV equals NRT right you know pressure times volume equals you know number of particle times you know temperature blah blah blah. And uh so you know that at a you know global emerging phen Phenomenological level if you increase the pressure the temperature will go up or if you increase the temperature the pressure will go up right uh or if you let some particles out then the pressure will go
down and blah blah blah right so um so we all the time we build phenomenological models of something complicated by ignoring all kinds of details that physicists call entropy. [laughter] Um But uh but it's really systematic. That's the way we understand the world. We you know we we do not memorize every detail of uh we certainly don't reconstruct it of what we perceive. >> So world models don't have to be simulators at all. >> Well there are simulators but in abstract representation space and what they simulate is only the relevant part of reality. Okay. If
I ask you where is Jupiter going to be 100 years from now? I mean we have a an enormous amount of information about Jupiter right but within this whole information that we have about Jupiter to be able to make that prediction where Jupiter is going to be 100 years from now you need exactly six numbers three positions and three velocities and the rest doesn't matter. >> So you don't believe in synthetic data sets? >> I do. No, it's useful [clears throat] u You know data from games. I mean there's certainly a lot of things that
you learn uh from synthetic data from you know from from games and things like I mean you know children learn a huge amount from from play which basically are kind of simulations of you know the the world a little bit big right >> but but in conditions where they can't kill themselves but I I worry at least for video games that for example the green screen like actors doing the Animations they're doing extremely it's designed to look good you for like an often >> badass, I guess, for an action game. But these often don't correspond
very well to reality. And so I I worry that like a physical system that's, you know, been trained or through or with the assistance of world models might get similar quirks at least in the very short term. Is this something that worries you? >> No, it depends on what level you train them. So for example, I mean sure if you use a very accurate robotic simulator for example, right? It's going to accurately simulate the dynamics of an arm, uh, you know, when you apply torqus to it, it's going to move in a particular way. There's
dynamics, no problem. Now, simulating the friction that happens, you know, when you grab an object and manipulate it. That's super hard >> to do it accurately. Friction is very hard to simulate. >> Okay? And so those simulators are not particularly accurate for uh manipulation. they're good enough that you know you can train a system to do it and then you can do you know seem to real uh with a little bit of adaptation so that that can work but but it's not I mean the point is much more important like for example there's a lot
of completely basic things about the world That we completely take for granted which we can learn at a very abstract level but it's not language related okay so the fact for example and I've used this example before and people made fun of me for it but it's really true okay I have those objects on the table >> and the fact that when I push the table, the object moves with it. >> Like this is something we learned. It's not something that you're born with. Okay? The fact that most objects will Fall when you let let
them go, right, with gravity. [clears throat] Maybe it's learned this around the age of nine months. >> Uh and the reason people make fun of me with this is because I said, you know, LM don't understand this kind of stuff, right? Uh and they and they absolutely do not even today. But but you can train them to give the right answer when you ask them a question you know if I put an object on the table then I push the Table uh what will happen to the object it will answer the object moves with it
but because it's been fine tuned to do that okay so it's more like regurgitation that sort of real understanding of the underlying dynamics >> but if you look on I don't know sur like nano >> um nano banana they have a good physics of the world right they are not perfect physics yeah >> they have some physics So do you think like we can push it farther or do you think like it's a one way to to learn physics? >> So all of those models actually make predictions in representation space. They use the diffusion transformers
and that predict that um the the the computation of of the the video snippet at an abstract level is done in representation space. Okay. Not always auto reggressively by the way. Sometimes it's just in parallel. And then there's A second diffusion model that turns these abstract representations into a nicel looking video and that might be mode collapse. We don't know right because we can't really measure like the coverage of uh such systems with reality. Um but but like you know the to the the the previous point I can train like here is another completely obvious
concept to us that we don't even imagine that we learn but we do learn it. A person cannot be in two Places at the same time. >> Okay we learn this because very early on we learn object permanence. The fact that when an object disappear still exist. Okay. Okay, we we are pierce. It's the same object that you saw before. [clears throat] >> Um, how [snorts] can we train an AI system to learn this concept? So, object permanence, you know, you just show it a Lot of videos where objects, you know, go behind a screen
and then reappear on the other side or where they go behind a screen and the screen goes away and the object is still there. And when you show four months old babies scenarios where things like this are violated, their eyes open like super big and they're like super surprised because reality just, you know, violated their internal model. Same thing when you show a scenario of like a little car on the Platform, you push it off the platform and it appears to float in the air. Um, they also look at it if you know 9 months,
10 months old babies look at it like really surprised. six months old baby barely pay attention because they haven't learned about gravity yet. Uh so they haven't been able to like you know incorporate the notion that every object is supposed to fall. So this kind of learning is really what's uh what's uh what's important and you do this you can Learn this from very abstract things you know the same way uh babies learn about like you know social interactions by you know being told stories with like simple pictures. It's a simulation, an abstract simulation of
the world, but it sort of learns them, you know, particular behavior. So, you could imagine like training a system from, let's say, an adventure game, like a top down 2D adventure game >> where, you know, you you you tell your Character like, you know, move north and he goes to the other room and it's not in the first room anymore because it moved to the other room, right? Of course, in adventure game, you also have Gandalf that you can call and he just appears, right? So, that's not physical. But um but like when you pick
up a key from a you know from a treasure chest, you have the key, no one else can have it >> and you can use it to open a door. Like There's a lot of things that you learn that are very basic, you know, even in sort of abstract environments. >> Yeah. And I I just want to observe um that some of those adventure games that they try to train models on, one of them you might have know about is Net Hack, right? Sure. And Net Hack is fascinating because it is an extraordinarily hard game.
like ever ascending in that game without cheats is like 20 years without you know going to the wiki. People still Don't do it from playing >> and my understanding [clears throat] is that uh AI agents the very best agent [snorts] models we have or even world models are pathetic. [laughter] >> Yeah. Yeah. So to the point people have come up with sort of you know dumb down version of net hack. >> Mini hack. Exactly. Mini hack. they had to dumb it down just for for AI. So some Of my you know colleagues have been working
with this actually one of my masters you don't see the talk [laughter] so uh and and you know Michael Haf who I mentioned earlier has been also doing some work there now what's interesting there is that uh there there's a type of situations like this where you need to plan okay but you need to plan in the presence of uncertainty the problem with you know all games and adventure games in Particular is that you don't have complete visibility of the state of the system you don't know the map in advance you need to explore >>
blah blah blah you can get killed every time you do this and you know uh but the actions are essentially discreet >> yes >> okay finite number of possible actions it's turnbased >> uh and so in that sense it's like chess except it's it's not you know chess is Fully observable go also is fully observable >> stratego isn't though >> stratego isn't uh you know poker is not [snorts] >> uh and so it makes it more difficult if you have uncertainty of course Um uh but those are games where the the number of actions you
can take is discrete and basically you know what you need to do is uh do tree exploration. Okay. And to do of course the tree of possible states You know goes exponentially with the number of of moves. And so you have to have some way of generating only the moves that are likely to be good and basically never generate the other ones or select them down. [snorts] And you need to have a value function which is something that tells you okay I can't plan to the end of the game but [snorts] even though I'm planning
only sort of nine moves ahead I have some way of estimating whether evaluating whether a Position is good or bad is going to lead me to you know a victory or a solution. Right? So you need those two components basically something that guesses what the good moves are and then something that um you know essentially uh evaluates uh ants and and if you have those both those both of those things you can train those functions using something like reinforcement learning or or behavior cloning if you have uh data I mean the basic idea for this
goes back To Samuel's checker players from 1964 it's not recent um but but of course was you know the power of it was demonstrated ated with, you know, AlphaGu and and and Alpha Zero and things like that. So that's good, but that's a domain where humans suck. Humans are terrible at playing chess, right? And playing go. Like machines are much better than we are >> um because of the speed of tree Exploration and because of the memory that's required for for tree exploration. We just don't have enough memory capacity to do breath first tree exploration.
So we suck at it. Like you know when AlphaGo came out uh you know people before that thought that the best human players were maybe two or three stones handicap like below an ideal player that they call God. Uh turns out no like you know humans are terrible like we you know the best Players in the world need like eight or nine stones to [laughter] to kill. >> Well I I can't believe I I get the pleasure to talk about game AI with with Yan. I I just have a few follow-up questions on this. The
first one is this example that you talk about around um humans being terrible at chess and uh I I'm familiar a bit with the development of chess AI over the years. Um I'm do you you know I've heard this referred to As Moravik's paradox and explained as you know humans have evolved over billions or millions sorry large number of years to uh physical locomotion and that's why babies and humans are very good at this but we have not evolved at all to play chess. So that's one question. And then a second question that's related is
a lot of people today who play video games and I'm one of them have uh have observed that it feels like AI at least in terms of like enemy AI Has not improved really in 20 years right that some of the best examples are still like Halo 1 and Fear from the early 2000s. So when do you think that, you know, advancements that we've been doing in the lab are going to actually have real impact on like gamers, you know, and and in a non- like generative AI sense, right? >> Yeah. I I used to
be a gamer, never like a addicted one, but uh but my family is in it because my I have three sons in Their 30s and they have a video game design studio between them. So um so I was sort of, you know, embedded in that culture. Uh but uh yeah, no, you're right. Uh and you know it's it's also it's also true that the you know despite the accuracy of physical simulators a lot of a lot of those simulations are not used by studios who make uh animated movies because they want control. >> They don't
necessarily want accuracy. They want control. And in games is really the same thing. It's a creative act. what you want is some control about the course of the story or the way the you know NPC kind of behave and all that stuff right uh and and DI kind of is you know it's difficult to maintain control at the moment so I [snorts] mean it will come but uh >> um you know there's there's some resistance From the creators but um >> I think okay Marv paradox is is very much still in force so Marave I
think I think formulated did it in 1988 if I remember correctly. >> Mh. >> And he said like, "Yeah, how come you know things that we think of as uniquely human intellectual task like PHS we can do with computers or or you know computing integrals or whatever. Um, But the thing that we take for granted, we don't even think is an intelligent task like what a cat can do, what can do with robots." >> Mhm. Mhm. >> And even now u 47 years later we still can't do them well. Mhm. >> I mean of course
we can you know train robots you know by imitation and a bit of reinforcement learning and you know by training through simulation to kind Of locomote and you know avoid obstacles and do various things but they're not nearly as inventive and creative and and uh you know agile as as a cat. It's not because we can't build a robot. We certainly can. It's just we can't we can't make them smart enough to do all the stuff that a cat or even a mouse can do, let alone a dog or a monkey, right? So, uh so
you have all those people bloiating about like, you know, AGI in in a year or two. Just completely delusioned, just [laughter] complete delusion because the real world is way more complicated and you're not going to get it. You're not going to get anywhere by tokenizing the world and and using NLMs. It's just not going to happen. Um so um >> so what is your timelines >> when we when we will see like I know AGI whatever it means or like >> and and also where are you on the >> and and where are you on
the optimist pessimist side because you know there's some doomers among or doomerism amongst like Gary Marcus and and I think well Gary Marcus is not a dmer he's a critique he's critiques it sorry the doomer would be is it Joshua yeah there you go like where do you fall in all these things >> okay I'll answer the first question first. Okay. Uh so first of all, there is no such thing as General intelligence. This concept makes absolutely no sense because it's it's really designed to designate human level intelligence. But human intelligence is super specialized. Okay,
we can handle the real world really well like navigate and blah blah blah. We can handle other humans really well because we evolved to do this and chess. we suck. Okay, [laughter] so and there's a lot of tasks that we suck at that where a lot of other animals are Much better than we are. Okay, so what that means is that um we are specialized. We think of ourselves as being general, but it's simply an illusion because all of the problems that we can apprehend are the one that we can think of, >> right? And
vice versa. And so we're in general in all the problems that we can imagine. Okay? But there's a lot of problems that we cannot imagine. Um and there's some mathematical arguments for This which I'm may not go into unless you ask me. But um so there is so this this this concept of general intelligence is complete BS. Um we can talk about human level intelligence, right? So are we going to have machines that are as good as humans in all the domains where humans are good or better than humans? And the answer is, you know,
we already have machines that are better than humans in some domains. Like, you know, We have machines that can translate, you know, 1500 languages into 15 100 other languages in any any direction. No humans can do this, right? [snorts] Uh and and you know, there's a lot of examples in, you know, chess and go and various other things. Um but will we have machines that are as good as humans in all all domains? The answer is absolutely yes. There's no question that at some point we'll have machines that are as good as humans in All
domains. Okay? And and and that leads but it's not going to be an event. It's going to be very progressive. We're going to make some conceptual uh advances maybe based on you know jetpack world models planning things like that uh over the next few years. And if we're lucky, if we don't hit an obstacle that we didn't see, [snorts] uh perhaps this will lead to kind of good path to human level AI. But but perhaps we we're still missing a lot of Basic concepts. And so the most optimistic view is that perhaps this you know
the you know learning good models and u and and you know being able to do planning and and you know understanding complex signals that are continuous noisy. If we make significant progress in that direction over the next two years, the most optimistic view is that we'll have something that is close to human intelligence or maybe dog intelligence Within, you know, 5 to 10 years. >> Okay. >> Okay. But that's the most optimistic. Um it's very likely that as what happened you know multiple times in the history of AI in the past there's some obstacle we're
not seeing yet which will you know actually kind of uh require us to invent some new conceptual uh new things to go beyond in which case that may take 20 years maybe maybe more. >> Okay but no question it will happen. Do You think it will be easier to get from the current level to a dog level intelligence compared to a dog to humans levels? >> No, I think I think uh the hardest part is to get to dog level. >> Once you get to dog level, you basically have most of the ingredients, >> right?
And then you know what's missing from okay what's missing from like primates to humans beyond just size of brain [clears throat] is language maybe. Okay. But language is basically handled by the veriki area which is a tiny little piece of brain that's right here and the bark area which is a tiny piece of brain right here. Uh both of those evolved in the last you know less than a million years maybe two. Um and it can be that complicated and we already have that do a pretty good job at at you know you know encoding
language into abstract representations and then decoding thoughts into into text. So maybe we'll Use LM for that. So LLM will be like the vernick and broch areas in our brain. What we're working on right now is the prefrontal cortex which is where our one model resides. >> Well well this this gets me into you know a few questions about safety and the destabilizing potential impact. So I I'll start this with something a little bit funny which is to say if we really get dog level intelligence then the AI of tomorrow has gotten profoundly better Than
any human at smell. and and something like that is just, you know, tip of the iceberg for the the destabilizing impacts of AI tomorrow, let alone today. I mean, we have Sam Alman talking about super persuasion. Uh because AI doxes you, so it it figures out who you are through the multi-turn, so it gets really good at kind of customizing its arguments towards you. We've had AI psychosis, right? like people who have uh done horrible things As a result of uh kind of believing in a synopantic AI that that is telling them to do things
they shouldn't do. Uh >> happened to me, by the way. >> Whoa. You've got to tell us about that, too. What? Uh, one day a few months ago, um, I was I was at NYU and I walked down to get lunch and there's a a dude who's surrounded by a whole bunch of police officers and and security guards and I walk past and and the guy recognizes me and says, "Oh, Mr. Lun." >> And the police officer kind of whis me away outside and tells me like, "You don't want to talk to him." Uh, turns
out the guy had come from, you know, [snorts and clears throat] the Midwest by bus to here >> and he he's kind of emotionally disturbed. He, you know, he >> had gone to prison, blah, blah, blah, [snorts] >> uh, for kind of various things. And he Was carrying a bag with, you know, like a huge wrench and and pepper spray and a knife. And so the the security guards got alarmed and basically called the police. >> Oh. >> And then [snorts] the police realized, okay, [laughter] you know, this guy is kind of weird. So they,
you know, took him away and had him examined and eventually he went back to the Midwest. >> Um, but I mean, he didn't feel threatening to me, but police wasn't so sure. [laughter] >> So, so yeah, it happens. Um, I had, you know, high school students writing emails to me saying, I read all of those, you know, piece by by doomers who said like, you know, AI is going to take over the world and either kill us all or take take our jobs. So, like, I'm totally depressed. I don't go to school anymore. And so,
I, you know, I answer To them say like, you know, don't don't believe [laughter] all that stuff, you know, but humanity is still going to be in control of of uh of all of this. Now there's there's no question that you know every powerful technology has you know good um uh consequences and bad side effects that sometimes are predicted and corrected sufficiently in advance and sometimes not so much right and it's it's always a trade-off that's a history of Technological progress right so uh let's take cars as an example okay cars crash sometimes and initially
you know brakes were not that reliable and and cars would like flip over and there was no, you know, seat belts and blah blah blah, right? And eventually kind of the industry made progress and, you know, started putting seat belts and and crumple zone and and uh and and you know, automatic kind of uh controlling systems for so that the you know the car Doesn't go sway and doesn't flip or whatever. Um so cars now are much safer than they used to be. Now, there's one thing that is now mandatory in every car sold in
the EU, and it's actually an AI system that looks out the window. Uh, it's called u it's called AEBS, uh, automatic emergency braking system. >> It's basically a commercial shot, right? Uh, and it looks at the it looks at the windshield and it detects, you know, all Objects. And if it detects that an object is too close, it just automatically breaks and uh or if it detects that there's going to be a collision that the driver is not going to be able to avoid. Uh, it just stops the car and or sways, right? And um
that one statistics I I I read is that this reduces frontal collisions by 40%. And so it became mandatory equipment in every car sold in the EU even low end because it saves lives. So this is AI Not killing people, >> saving lives, >> right? I mean also same thing for like medical imaging and everything. There's a lot of life being saved by AI at the moment. And like so do but do you think so you Jeff and Jo right like both of you won the the touring award together and like and you have different
opinions about it right and and and Jeff says like he's regret regrets and Joshua works on Safety and you trying to to push it forward then do you think you will get to some uh some level of of intelligent that you will say oh this become too dangerous we need to work more on on the safety side >> I mean you have to do it right um I'm going to use another example uh u jet engines okay uh I find this astonishing that you can fly halfway around the world on a two engine airplane in
complete safety and I really Really say halfway around the world like it's a 17 hour Right. [laughter] Yeah. Okay. Um came from the ar from from New York to Singapore, right? From a Airbus 350. Um it's astonishing. Uh and when you look at a jet engine, a turbo fan, it should not work, right? I mean, there's no metal that can stand the type of temperature that takes place there. uh and and the kind of like efforts when you have like a huge turbine like you know rotating at 2,000 Or I don't know what speed like
the the the force that puts on it is just ins you know it's hundreds of tons so it should not be possible yet those things are incredibly reliable so what I'm saying is you can't you know build something like a turbo jet the first time you build it it's not going to be safe. It's going to run for 10 minutes and then blow up. [laughter] Okay? Uh and it's not going to be fuel Efficient and it's, you know, etc. It's not going to be reliable. Uh but, you know, as you make progress in engineering, in
materials, etc. There's so much, you know, economic motivation to make this good that, you know, eventually it's going to be of the type of reliability we see today. Um the same is going to be true for AI. We're going to start making systems that, you know, have agency, can plan, can reason, have role models, blah, blah, blah. But we, You know, they're going to have the power of maybe the a cat brain, right, which is about 100 times smaller than a human brain. Um, and then we're going to put guard rails in them to prevent
them from doing, you know, taking actions that are obviously uh dangerous or or something. You can do this at a very low level like if if you have I don't know a domestic robot right that uh uh so so one example that Stuart Russell for example have used is um is To say well you know if you have a robot a domestic robot and you ask you to fetch you coffee and someone is standing in front of the coffee machine if the system wants to fulfill its uh its goal it's going to have to you
know either assassinate or smash the person in front of the coffee machine to get access to the coffee machine. And obviously, you don't want that to happen. Now, it's like the paperclip maximization. It's kind of a ridiculous example because It's super easy to fix this, >> right? You put some guard rail that say, well, you know, you're a domestic robot. You should stay away from people and maybe ask them to move if they are in the way, but not actually kind of, you know, hurt them in any way or whatever. And you can do like,
you know, you can put a whole bunch of low-level conditions like this, like uh if you're a domestic robot and it's, you know, it's it's a cooking robot, right? So it Has a big knife in its hand and it's you know cutting the cucumber. You know don't flail your arms if there are if you have a big knife in your hand and people around. [laughter] Okay that can be kind of a low-level constraint that the system has to satisfy. Now some people say oh but you know with LMS we can fine-tune them to not do
things that are dangerous but there is always you can you can jailbreak them. You can always find Prompts where they're going to kind of escape their condition. you know, the the all the things that we stop them from from uh from doing. I agree. That's why I'm saying we shouldn't use LMS. We should use those objectivedriven AI architectures I was talking about earlier where you have a system that has a world model can predict the consequences of its action and can figure out the sequence of actions to accomplish a task but also is subject to
A bunch of constraints >> that guarantee that whatever action is being pursued and whatever state of the world is being predicted uh does not endanger anybody or does not have you know negative side effects right so there It's it's by construction the system is intrinsically safe because it has all those guardrails and because it obtains its output by optimization by minimizing the objective of the task and satisfying The constraints of the guardrails it cannot escape that it's not a finetuning right it's it's by construction >> yeah um and and I'll uh there's a technique you
know that that for LLMs for constraining the output space where you say that you ban all outputs except whatever you want like maybe 0 to 10 and everything else and they have that even for diffusion models. Sure. >> Do you think that tactics like that as they exist today significantly improve The utility of those kinds of models? >> Well, they do, but they're ridiculously expensive because the the way they work is that you have to have the system generate lots of uh uh proposals for an output and then have a filter that says, well, this
one is good, this one's terrible, etc. or rank them and then just put out the the one that has the the less toxic uh uh rating essentially. So it's it's insanely expensive, right? So unless you have, >> you know, some sort of uh objective driven value function that kind of drive the system towards producing those those high, you know, high score uh outputs, low low toxicity outputs. Um it's it's going to be it's going to be expensive. >> Yeah. And I want to change the topic just a a tiny bit off. We've been very technical
for a moment, but we, you know, I think uh our audience in the world has a few questions that are maybe a little bit more more social related. Um you know, the person who appears to be trying to fill your shoes in at uh Meta, Alex Wang, where I'm curious as to do you have any thoughts or or anything about, you know, kind of how how that will uh play out for for Meta? >> He's he's not he's not in my shoes at all. he's uh he's uh he's in charge of all the uh R&D
and product that are AI related at at Beta. So he's he's not a researcher or scientist or or or Anything like that. It's more kind of you know overseeing the entire operation. So within uh Meta Super Intelligence Lab which is his organization there are four um kind of divisions if you want. So one of them is fair which is long-term research. One of them is TBD lab which is basically uh building frontier models [clears throat] which is mostly entirely LM focused. Um A fourth organization is uh AI infrastructure um software infrastructure hardware is some other
organization. >> Um and [snorts] then the last one is products. Okay, so people who who take the front models and then turn them into actual chat bots that people can use and you know disseminate them and you know plug them into WhatsApp and everything else, right? So um so those are four divisions. Um he oversees all of that. So and there are several AI scientists. There is AI scientist affair that's me. Uh and I really have a long-term view and basically you know I'm gonna be at Meta for another, you know, three weeks. Okay. So
uh um and and fair is led by our NYU colleague Rob Fergus >> uh right now after Joel Pino left several months ago. Um, fair is being pushed towards kind of working on Slightly, you know, shorter term projects that it has done in the in the traditionally with less less emphasis on publication, more focus on sort of helping helping TVD lab with the LLMs and frontier models >> uh and and you know less publication which means you know meta is becoming a little more close closed um and TVD lab has a chief scientist also >>
u which is really focused on L&M [snorts] >> uh and uh and the the other organizations are more like infrastructure and products so you know there's there's some applied research there so for example the group that works on SAM segment yeah >> that's actually part of the product division of missile they used to be at at fair but because they worked on kind of relatively you know kind of outside facing uh kind practical things that were kind of moved to department >> and and do you have any opinions on uh like some of the other
companies that are trying to move into world models like thinking machines or even I've heard Jeff Bezos and and some of his >> it's not clear at all what thinking machine is doing. [snorts] [laughter] >> Not at all. >> Maybe you have more information than me but like >> maybe not sorry maybe I'm mixing it up Here. Uh it's >> physical intelligence >> physical sorry >> yes sorry and and then I I mix them up with like SSI as well. They're all kind of like >> SSI nobody knows what they're doing including their own investors.
[laughter] >> Okay. At least the rumors that's a [laughter] rumor I heard I hear. I don't know if it's true. It's become a bit of A joke. uh uh but uh yeah physical intelligence uh his company is is focused on uh on you know basically producing geometrically correct uh videos okay where you know there is persistent geometry and you know when you look at something and you turn around and you come back it's the same object you had before like it doesn't change behind your back right [snorts] So it's it it's generative, right? I Mean,
the whole idea is to to to generate pixels, >> okay, >> which I just spent, you know, a long time arguing against that was a bad idea. >> Uh there are other companies that are have role models. A good one is Wave. >> Wave >> WA Yve. So um it's a company based in Oxford and they I'm an adviser, full Disclosure. uh and they have they have a world model for autonomous driving >> uh and the way they're training it is that they're training a representation space by basically training a VA or VQE and then
training a predictor to do temporal prediction in that abstract representation space. So they have half of it right and half of it wrong. The piece they have right is that you make predictions in representation space. The piece they have wrong is that they Haven't figured out how to train their representation space in any other way than by reconstruction. And I think that's bad. Okay. >> But their model is great. Like it works pretty well. I mean among all the people who kind of work in this kind of stuff, they they're pretty far advanced. >> Um
there are people who talk about similar things at Nvidia, the company called Sandbox AQ. U the the CEO of it, Jack Hitery, talks about quantitative Models, you know, large quantitative models as opposed to large language models. So basically predictive models that can deal with continuous high dimensional noisy data right which is what also I've been kind of talking about. Um and Google of course has been working on you know on world models mostly using generative approaches. Um there was an interesting effort at Google by Danisha. So he built models called dreamer Dreamer v1234. >> Yeah.
uh that was on a good path except he just left Google to create his own startup [laughter] and do you have so I'm interested so you were really criticized about the Silicon Valley culture that they are focusing on LLM and this is like one of the reasons that now you started that the new company is starting in at Paris right Um so this is something do you think that Uh we will see more and more or do you think this is something will be very unique that only a few companies will will be in Europe?
Well, the company I'm starting is global. Okay. Okay. It has an office in Paris, but it's a global company. There's office in New York, too. Um, a couple other places. So, um, okay. There is an interesting phenomenon in industry, which is that everybody has to do the same thing as everybody else Because it's so competitive that if you start taking a tangent, you're taking a big [clears throat] risk of falling behind because you're using a different technology than everybody else, right? So basically everyone is trying to catch up with the others and so that creates
this herd effect uh and a kind of monoculture which is really specific to Silicon Valley where you know uh OpenAI, Meta, Google, Anthropic, everybody is Basically working on the same thing and you know sometimes like what happened a while back uh another group you know like DeepSeek in China comes up with kind of a new way of doing things and everybody's like >> [laughter] >> Right. [snorts] You mean like other people in Silicon Valley are not stupid and can come up with original ideas. Um I mean there's a bit of a you know superiority complex,
right? >> Um but you're basically in your trench and you are you have to move as fast as possible because you can't afford to kind of you know fall behind the the other guys who you think are your competitors. uh but you run a risk of being surprised by something that's completely out of the left field that uses a different set of technologies and um or maybe addresses a different problem. Um so um you know what I've been interested in is completely Orthogonal because the the the whole JA idea and world model is really to
handle data that is not easily handled by LM. So the the type of applications we're envisioning uh that have tons of applications in industry where the data comes to you in the form of continuous high dimensional noisy data uh including video are domains where L&Ms basically are not present where where people have tried to use them and totally failed essentially Right okay so if you don't want to be okay so the the expression in in in Silicon Valley is that you are l&M pill [laughter] you you take that the path to super intelligence. You just
scale up NLMs. You train on more synthetic data. You license some more data. You hire thousands of people to kind of find, you know, to basically school your system in post training. You invent a new tweaks on RL and you're going to get to super Intelligence. And this I think is complete [ __ ] Like it's just never going to work. Um and then you add a few you know kind of reasoning techniques which basically consist in you know doing like super long chain of thoughts and then having the system generate lots and lots of
different token outputs you know from which you can select good ones using some sort of valuation function the second basically right I mean that's the way all the things work um this is Not going to take us there it's just [clears throat] not so um so yeah I mean you need to escape that culture and there are people within all the companies in Silicon who think like this is never going to work. I want to I want to do and Japa and blah blah blah. I'm hiring them. [laughter] >> So, uh yeah. So, escaping the
monoculture valley, I think is important is yeah, this this Part of the the story. And what do you think about the competition between like the US, China and the and Europe like now that you are starting a a company like do do you see more [snorts] um I know that some there are some places are more attractive than others? We're in we're in this very uh paradoxical situation where um all the American companies up until up till now not meta but all American companies have been kind of becoming really secretive >> [snorts] >> uh and
to preserve their competitive what they think is a competitive advantage. Um and by contrast the Chinese uh players companies and others have been completely open. So the best open source systems at the moment are Chinese and that causes a lot of the industry to use them u because they want to use open source system u and they hold their nose a little bit because they know those Models are kind of fine tuned to not answer questions about politics and stuff like that right um but they don't they don't really have a choice and and certainly
a lot of academic research now you know uses uh the the best Chinese models um certainly everything that has to do with like reasoning and things like that, right? So, so it's really paradoxical and a lot of people in in the US in the industry are really unhappy about this. They they really Want a serious non-Chinese uh open source model that could have been four but four was a disappointment for various [snorts] reasons. Um maybe that will get fixed with, you know, the the new efforts at Meta or maybe Meta will decide to to go
closed as well. It's not clear. Hey, >> Mistl just had a model release, which >> is really cool for codegen. Yeah. >> Yeah. Yeah, >> that's right. No, it's it's it's cool. So, yeah, they they maintain openness. >> U no, it's really uh really interesting what they're what they're doing. >> Yeah. >> Wow. >> Okay. Um let's go to more personal questions. >> Yeah. >> Um Yeah. So, like you are uh 65, right? Years old. You won a Turing award. You just got a Queen Elizabeth prize. Basically, you could retire, right? >> Yeah, I could.
That's what my wife wants me to do. [laughter] >> So, why why to start a new company now? Like what keep you up? >> Because I have a mission, you know. Um I mean I always thought that uh either making people smarter or more knowledgeable or making them smarter with the help of machines. So basically increasing the Amount of intelligence in the world was an intrinsically good thing. Okay. Intelligence is really kind of the commodity that is the most in demand. uh certainly in like government. Okay. Um so but but like in you know every
uh aspect of of life we are limited as you know as as a species as a planet by the limited supply of of intelligence right which is why we we we spend enormous resources educating people and and and things like that. Um So you know in increasing the amount of intelligence uh at the service of of humanity or the planet more globally not just humans um is intrinsically a good thing despite all the what the doomers are saying. Okay, of course you are dangerous and you have to protect against that. The same way you have
to make sure your jet engine is safe and reliable and your car, you know, doesn't kill you with a, you know, small crash, Right? Um, but that's okay. That's an engineering problem. We no there's no like fundamental issue with that. Um, also a political problem, but not it's not like insurmountable. Um, so that's an interestingly good thing and if I can contribute to this, I will. And basically all research projects I've done in my entire career, even those that were not related to machine learning uh and my professional activities were all focused on either Making
people smarter. That's what that's why I'm a professor. [laughter] Uh that's [clears throat] why also I'm communicating publicly a lot about AI and science and things like that and have big presence on social networks and stuff like that, right? because I think people should know stuff, right? >> Um but also on machine intelligence because I think uh machines will assist humans And make them smarter. Okay. Uh people think there is a fundamental difference between trying to make uh you know machines that are intelligent and autonomous and blah blah blah and and it's a different set
of technologies from trying to make machines that are assistive to humans. it it's not it's the same technology. It's exactly the same. Um and it's not because the system is intelligent or even a human is intelligent that it wants to dominate or Take over. Uh it's not even true of humans. Like it's not the humans who are the smartest that want to dominate others. [laughter] We see this on the you know international political scene >> every day. [snorts] Uh it's not the smartest among among us who want to be the chief. Uh, and probably many
of the smartest people that we've ever met are people who basically want nothing to do With the rest of humanity, [laughter] right? They just want to work on their problems. >> Yeah. >> Like I mean, you know, kind of [laughter] stereotyping and that's what Hannah Ren talks about the VA contemplativa, right? Versus like the active life or the contemplative life, right? And her like philosophical analysis and like making a choice kind of early on on what you work On, >> right? Right. >> But you can be, you know, simultaneously kind of, >> you know, a
dreamer or contemplative but have a big impact on the world. >> Yeah. >> Right. By, you know, your scientific production like think of Einstein or something. >> Yeah. >> Or even Newton. Like Newton basically Didn't want to meet anybody >> famously. [snorts] [laughter] >> Or Paul Dak. Paul Dak was kind of, you know, practically autistic. [laughter] Well, well, um, is there like a paper or idea you haven't written or or something else that you you're, you know, nagging that you want to get to or maybe that you don't have time or any regret? >> Oh,
yeah. A lot. Oh, my my entire career has been a a succession of me not Devoting enough time to express my ideas and writing them down >> uh, and mostly getting scooped. [laughter] What is the the most significant ones? >> Uh I don't want to go through that. [laughter] >> But back prop is a good one. >> Okay. >> Okay. I I published some sort of early version of some algorithm to train multi nets which today we would call target Prop. Uh and I had the old back prop thing figured out. um except I didn't
write it before um you know demo heart and Jeffington they were nice enough to site my earlier paper in their in theirs but uh so there's been a few of those yeah the current it's you know various other things but um uh and things that are more perhaps more recent but but you know I have no regrets about this like you know this is life like you Know I'm not going to say oh you know I invented this in 1991 and I should >> [laughter] >> like somewhere people >> I I don't know if I
should say the name [laughter] we all know. >> Well, I mean if you know you know >> the way ideas pop up you know is it's relatively complex. It's rare that someone comes up with an idea in complete isolation and [clears throat] That you know nobody else comes up with similar ideas at the same time. Most of the time they appear simultaneously but then there is various ways to there's having the idea and then there is kind of writing it down but there's also writing it down in a sort of convincing way and a clear
way and then there is kind of making it work on toy problems maybe okay and then there is making the theory that shows that it can work uh and then there is making it work on a Real application right and then there is making a product out of it okay so this whole chain and you know some people a little extreme think that the only person who should get all the credit is the very first person who got the idea. >> I think that's wrong. There's there's a lot of really difficult steps to get this
idea to the state what actually works. So this idea of world model I mean goes back to the 1960s you know people in optimal control had world Models to do planning. That's the way NASA, you know, planned the trajectory of the rockets to go to uh to orbit. Um basically simulating the rocket and sort of by optimization figuring out the the control law to uh to get the rocket to where it needs to be. Um so that's an old idea, very old idea. The the fact that you could do some level of training or adaptation
in this is called system identification in art control. Very old idea too. goes back to the 70s is Something something called uh yeah system identification or even NPC where you adapt the model as it goes like while you're you're running the system that go back to the 70s to some paper in France um and and and then the fact that you can just learn a model from data um people have been working on this with neural net since the 1980s right and and not just yog [laughter] like a whole bunch of people who've uh working
people who came from optimal Control and realized they could use neural nets as kind of a universal function approximator and use it for direct control or feedback control or world models for planning blah blah blah. Um and like a lot of things in neural nets in the 1980s and 90s um it kind of worked but not like to the point where it took over the the the industry. Um so it's the same for you know computer vision, speech recognition. There were attempts at using neural nets For that back in those days. uh but it started
really really working well in the late 2000s where it totally took over right and then early 2010s for vision uh mid 2010s for NLP and for robotics it's starting but it's >> why why you think it's only like in the this time started to get over the >> well it's combination of like having the right state of mind about it and the right mindset uh having the right uh architecture the right machine learning Techniques like you know residual connections real use whatever uh then having powerful enough computers and having access to data and it's only
when those planets are aligned that you get a breakthrough right uh which appears like a conceptual breakthrough but it's actually just a practical one u like okay let's talk about conventional nets okay um lots of people during the the 70s had the idea or even during the 60s actually had the Idea of using local connections like building a neural net with local connections for extracting local features and the idea that local features is like convolution like in in image processing is like you know goes back to the 60s. So these are not new concepts. The
fact that you can uh learn adaptive filters of this type using data goes back to the perceptron and a line which is early 60s. Okay, but that's only for one layer. Now the concept that You can train a system with multiple layers everybody was looking for this in the 60s nobody found a lot of people made proposals which kind of half worked uh but like none of them was convincing enough for people to say ah okay this is a good technique one technique that was adopted is what's called polomial classifiers so now we would turn
this into kernel methods but it's you know basically you sort of have a handcrafted feature extractor and then you train a Basically what amounts to a linear classifier on top of it um that was kind of common practice in the 70s and and certainly uh 80s but the idea that you could train u a nonlinear I mean a system composed of multiple nonlinear steps using gradient descent u the basic concept for this goes back to the Kelly Bryson algorithm which is optimal control was mostly linear in you know from 1962 uh and people in optimal
control kind of wrote things about this You know in the 60s but nobody realized you could use this for machine learning to do pattern recognition or to do you know natural language processing >> that really only happened after uh you know the Ramlad Hinton Williams paper in 1985 even though people had proposed the very same algorithm uh a few years before like Paul Robbos you know proposed you know what he called order derivatives which turns out to be back prop but it's the same thing as the Adjoint state method in optimal control so like those
ideas I mean the fact that an idea or a technique is reinvented multiple times in different fields and Then only after the fact people say, "Oh, right. It's actually the same thing and we knew about this before. We didn't realize we could use this for it for for this particular stuff, right?" Um, so all those like claims of plagiarism. It's just [ __ ] It's just a complete misunderstanding of ideas. [snorts] >> Okay. Um, what do you do when you're not thinking about AI? um have a whole bunch of hobbies that have uh very little
time to actually uh partake in. Um I like sailing. So I go I go sailing in the summer. [snorts] Uh I like selling multihole boats like triar and catamarans. Um I uh have a bunch of boats. Um I like uh building flying contraptions. So, a Modern Da Vinci. >> I [laughter] I wouldn't I wouldn't call them airplanes because a lot of them don't look like airplanes at all, but they do fly. Okay. I like the the sort of, you know, concrete creative uh act of that. My dad was aerospace engineer and he mechanical engineer working
in the aerospace industry and he was you know building airplanes as as as a hobby and like you know building his own radio control system and stuff like that. He Got me and my brother into it. My brother who works at Google like Google research >> uh >> in France >> in in Paris and uh and and that became kind of a family activity if you want. [laughter] So my brother and I still still do this but um and then uh in in the COVID years I picked up astrophotography. So I have bunch of telescopes
and take pictures of the sky Uh and I build electronics. So I've uh since I was a teenager I was interested in uh music. I was playing Renaissance and Baroque music and also some type of folk music. Um see playing wind instruments woodwinds and but I was also into electronic music >> and uh my cousin who was slightly older than me was an aspiring electronic musician. So we had like analog synthesizers and because I knew electronics I would like you know modify Them for him and I was still in high school at the time. Um
and uh and now in um in my home I have all bunch of synthesizers and I build electronic musical instruments. So So these are wind instrument. >> You blow into them. You know there's fingering and stuff but >> what they produce is control signals for the synthesizer. Oh, it's cool. [laughter] Very cool. >> Heard a lot of people in tech are into sailing like in >> Yeah. gotten that answer surprising amount. I'm going to start trying to sail now. >> Yeah. Okay. So, I tell you something about sailing. It's very much like the world model
story >> to be able to uh you know kind of control the sailboat properly to make it go as fast as possible and everything. You have to anticipate a lot of things. You have to anticipate the motion of the waves like how the waves are going to affect your boat. U you know whether a gust of wind is going to come and and have to you know start you know the boat is going to start healing and things like that. And you basically have to run CFD in your head because you have to figure out
like the uh you know flu dynamics. You have to figure out like what is the flow of air around the around the around the sails and you know That if the angle of attack is too high it's going to be turbulent on the back and the the lift is going to be much lower. So blah blah blah. So like you know um tuning cells is basically requires running CD in your head but at an abstract level you're not solving the you know the stokes right you have good really good intuitive so what that's what I
like about it like the the whole thing that you have to build this mental you know predictive model of the world To be able to do a good job >> the question is how many samples you need >> uh yeah probably a lot [laughter] but but you know you get to it only in, [sighs] you know, a few a few years of practice and >> Yeah. [clears throat and snorts] >> Yeah. Um, okay. Um, you're French and you lived in the US for many decades Already. H do you still feel French that does that perspective
shape your view of the of the world of the American tech culture? Well, inevitably, yeah, I mean, you you can't completely escape your your upbringing and your your your culture. So, I mean, I feel both French and American uh in the sense that, you know, I've been in the US for 37 years. Um, and, uh, in North America for 38 because I was in Canada before. Uh, or or Children [snorts] grew up in in the US. uh and so from that point of view I'm I'm American but I have a view certainly on you know
on various aspects of science and society that probably are you know a consequence of growing up in France. Yeah, absolutely. And I feel French when I'm in France. >> I'm curious. I did not actually realize that you had a brother that also worked in tech. I'm fascinated by this because um Yosua Benjio's brother also works in Tech and I always thought that he was the only Serena Venus Williams situation in AI [laughter] but you you too also have a brother. So how many more AI research like like is it that common that it just runs
in families? >> I have no idea. [laughter] I also have a sister who you know is not uh in tech but she's also a professor. Uh my brother was a professor before he moved to Google. Wow. >> And uh [clears throat] he he he doesn't Work on AI or machine learning. He's very careful not to [laughter] he he's my he's a younger brother. He's six years younger than me. Um and he works on operations research and optimization essentially >> which now is actually also being invaded by machine learning. [laughter] So as >> yeah um okay
one more question. So like if the world models work in 20 years from Now, what is the what is the dream like? How what how does it look like? How does I know like our lives will be? >> Uh total world domination. Okay. No, it's a joke. It's a [laughter] joke. [clears throat] >> It's a joke. uh the the uh I I I said this stats because this is what Linu Starold used to say. You said like what's your goal with with Linux? He said totalation [laughter] and I thought it was super funny and it
Actually succeeded. I mean basically you know to first approximation every computer in the world runs Linux. >> There's only a few desktop that don't and a few iPhones but you know everything else runs Linux. So um really [clears throat] like you know having you know pushing towards like a a recipe for training and building intelligence systems perhaps all the way to human intelligence or more uh and and and and basically Building AI systems that would you know help people and humanity more generally in their daily lives uh at all times amplifying human intelligence will be
their boss right it's not like those things are going to dominate us because again it's not because something is intelligent that it wants to dominate. Those are two different things. Um in humanity uh you know we are hardwired to having to influence other people and Sometimes it's through domination, sometimes it's through prestige. Um but um but we're hardwired by evolution to do this because we are a social species. there's there's no reason we we would build those kind of drive into into our intelligent systems and it's not like they're going to develop those kinds of
drives by themselves. So, um so yeah, I'm quite optimistic. >> Me too. >> So am I. >> All right. >> Um okay, so we have final questions from the audience. Um so yeah, let's start. Um if you were uh starting your AI career uh today, what skills and research directions would you focus on? >> I get a lot this question a lot from young students or parents of future students. [laughter] So what what you know I mean I think you should learn things that have a long shelf life and you should learn things that help
you Learn to learn because technology is you know evolving so quickly that you you want kind of you know the ability to learn really quickly. [snorts] Um and basically that can that is done by you know learning very busy. So in the context of STEM right science and technology, engineering, mathematics, uh I'm not talking about humanities here. Um this is although you should learn philosophy um This is done by learning things that have a long shelf life. So the joke I say is that if you first of all you the things that have a long
shelf life tend to not be computer science. [laughter] Okay. Okay. So, here's a computer science professor, you know, arguing against studying computer science. >> Don't come don't come to to study. >> Uh, and I have a terrible confession to make, which is I studied electrical Engineering as a as an undergrad. So, I'm not a real computer scientist. Okay. Uh, but what what um what you should do is learn kind of basic things in in mathematics, in modeling, uh, mathematics that can be connected with reality. you you tend to learn this kind of stuff in engineering
uh in some schools that's linked with computer science but sort of you know electrical engineering mechanical engineering etc engineering disciplines You know when you learn in the US calculus one two three that's gives you a good basis right uh computer science you don't you know you can get away with just calculus one that's not enough right um you know learning you know probability theory and uh algebra you know all this stuff that are really kind of basic and And if if you do uh electrical engineering things like I don't know control theory or signal processing
like all of those methods Optimization you know all of those methods are really useful for things like AI. Um and then uh you can you can basically learn similar things in physics um because physics is all about like what should I represent about reality to be able to make predictive models right and that's really what intelligence is about. So um so I think you can learn most of what you need to learn also if you go through a physics uh Uh curriculum. Uh but obviously you need to learn enough computer science to kind of program
and use computers. And even though AI is going to help you be more efficient at programming, you still need to know like how how to do this. >> What do you think about Vive coding? >> I mean it's it's cool. Um it it makes it's going to cause a funny kind of thing where a lot of the code that will be written will be used only once right Because it's be going to become so cheap to write code right you're going to ask your kind of AI assistant like you know produce this graph or or
like you know do this research blah blah blah and it's going to write a little piece of code to do this or maybe it's an applet that you need to play with for you know a little simulator and you can use use it once and throw it away because so cheap to produce, right? Um so the idea that we're not going to need programmers Anymore is false. We we're going to [clears throat] the the cost of generating software, you know, has been going down continuously for decades. And it's that's just the next step of of
the cost going down, but it doesn't mean computers are going to be less useful. They're going to be more useful. >> Um okay, one more question. So what do you think about the connection between neuroscience and and machine learning? There there are some ideas a lot of time that AI borrows from neuroscience and the other way right predictive coding for example. Um do you think that it's useful to use ideas from >> well there's a lot of inspiration you can get from uh from neuroscience from biology in general but neuroscience in particular I certainly was
very influenced by you know classic work in neuroscience like you know hub and result's work on the architecture of the Visual cortex is basically what led to coalition nets right and and you know I wasn't the first one to use those ideas in uh in in artificial neural nets right there were people in the 60s trying to do this. There were, you know, people in the 80s building locally connected networks with multiple layers. They didn't have ways to train them with uh back prop. You know, there was the the the cognitron, the neocognitron from Fukushima,
uh which had a lot of the Ingredients just not proper learning algorithm. And there was another kind of aspect of the cognitron, which is that it was really meant to be a model of the visual cortex. So it tried to reproduce every quirks of uh of of of biology. For example, the fact that in the brain you don't have u positive and negative weights, you have positive and negative neurons. So all the neurons, all the synapses coming out of an inhibitory neurons >> neuron have negative weights. >> Yeah. >> Okay. And all the synapses coming
out of a non-inhibitory neurons have positive weights. So Fukushima implemented this in his model, right? He implemented the fact that uh you know neurons spike. So he didn't have a spiking neuron uh model but but you cannot have a negative number of spikes and so so his function was Basically >> rectification like a relu except it had a saturation. >> Yeah. Um and then he knew from you know various works that there was some sort of normalization and he had to use this because uh otherwise the there was no back props of the activation in
this network would go haywire. So we had to do like division normalization that turns out to actually be um correspond to some theoretical models of The visual cortex uh that some of our colleagues at at the center for neuroscience at NYU have been pushing like David Hager and people like that. So um yeah I mean I think neuroscience is very a source of inspiration. um you know more more recently the the sort of macro architecture of the brain in terms of you know perhaps the world model and and planning and things like this like how
how is that >> reproduced like why do we have a Separate >> u module in the brain for for for factual memory the hypoc campus right and we see this now in certain neural net architectures that there is like a separate memory module right maybe it's a good idea I think we're going to have I think what's going to happen is that we're going to come up with sort of new AI neural net architectures, deep learning architectures and apostori will Discover that the characteristics that we implemented in them actually exist in the brain [laughter] and
in fact that's a lot of what's happening now in in our science which is that there is a lot of feedback now from AI to neuroscience where the best models of human perception are basically commercial nets. today. [snorts] >> Yeah. Um, okay. Do you have anything Else that you want to add to say to the audience? Um, >> whatever you want. >> I think we covered a lot of grounds. Uh, I think you, you know, you want to be careful what who you listen to. >> Uh, so don't listen to AI scientists talking about economics.
Okay? So when some AI person or even a business person tells you AI is going to put everybody out of work Talk to an economist basically none of them is saying anything anywhere close to this. Okay. uh and uh you know the effect of technological revolutions on the labor market is something that a few people have devoted their career on on on doing. None of them is predicting massive unemployment. None of them is predicting that radiologists are going to be all unemployed [laughter] Uh you know [clears throat] etc. Right? also realize that actually fielding practical
applications of AI uh so that they are sufficiently reliable and everything is super difficult and it's very expensive and in previous waves of interest in AI the techniques that people had put a big hope uh in uh turned out to be overly unwieldy and expensive except for a few applications. So there was a big wave of interest in uh expert Systems back in the 1980s. Uh Japan started a huge project called the fifth generation computer project uh which was like computers with CPUs that were going to run list and you know inference engines and stuff
right and [snorts] the hottest uh job in the late 80s was going to be knowledge engineer. You were going to sit next to an expert and then turn the knowledge of the expert into rules and facts, right? And then the computer would be able to Basically do what the expert want. This was manual behavior coding. Okay. Uh and it kind of worked but only for a few domains where where economically it made sense and it and it was doable at a level of reliability that was good enough. Um but it [snorts] was not a path
towards kind of human level uh intelligence. And so the idea somehow like the delusion that people today have that the the current AI mainstream you know fashion is going To take us to human intelligence has happened already three times during my career and probably five or six times before right you you should you should see what people are saying about the perceptron right there were New York Times article people were saying oh we're going to have like super intelligent machines within 10 years Marvin Minsky in the 60s says oh within 10 years the best chess
player in the world would be a computer. It took a bit Longer than that. U and you know um and you know this you know this happened over and over again. Uh in in 1956 or something when Newton Simon [clears throat] uh produced the the general problem solver very modestly called the general problem solver. Okay. What they what they thought um was really cool. They say okay the way we think is very simple. We pose a problem. There is a number of different solutions to that problem. Different proposals for Solution. A a a space of
potential solutions like you know you do like traveling salesman right? There is a number of >> you know factorial you know n factorial uh path possible paths you just have to look for the one that is the best right um and they say like every problem can be formulated this way essentially for a search for the best solution. If you can formulate the problem as an objective by writing a program that checks whether It's a good solution or not or gives a rating to it and then you have a search algorithm that search through the
space of possible solution for one that you know optimizes that score then you solve AI okay now what they didn't know at the time is all of complexity theory that basically every problem that is interesting is exponential or complete [laughter] or whatever Right? And so, oh, we have to use heristic programming, you know, kind Of come up with heristics for every new problem. >> Yeah. >> And basically, you know, the general problem solver was not that general. So, like this idea somehow that the latest idea is going to take you to, you know, AGI or
whatever you want to call it is very dangerous and a lot of very smart people fell into that trap many times over the last seven decades. Do do you think that um the field [clears throat] Will ever figure out continual or incremental learning? >> Sure. Yeah. That's that's not a technical problem. >> Well, I thought I thought catastrophic forgetting, right? Because your weights that you trained so much money on get overwritten. >> Sure. So, you train just a little bit of it. I mean, we don't already do this with SSL, right? We train model uh
like for video or something like VJAT 2, you Know, produces really good representations of video. And then if you want to train the system for a particular task, you train a small head on top of it and that head can be you know learned continuously and even your world model can be trained continuously. That's not not an issue. I don't see this as like a big a huge challenge frankly. Uh [snorts] in fact uh Rya had sa and I and a few of our colleagues back in 2005 2006 building Based navigation system for mobile robots
that had this kind of idea. So it was it was commercial net that was doing semantic segmentation uh from camera images and uh on the fly the top layers of that network [clears throat] would be adapted to um the current environment. Um so it would do a good job and the labels came from short range uh traversibility that were indicated by stereo vision essentially. So yeah, I mean you can do this. It's particularly If you have multi multimodel. Um yeah, I don't see this as a big challenge. >> It's been a pleasure to have you.
>> All right. It was real pleasure to >> Thank you so much. >> Thank you. [music] [music]