How about sort of outside the United States, maybe even outside of the sort of um the larger most developed economies? How do you think AI is going to affect the world more generally? I feel like um I hope that AI will have a very large democratizing effect. Um because one of the most expensive things in today's world is intelligence. uh cost a lot to get a highly skilled specialist doctor uh or to to co tell you what's going on or cost a lot to get a high skilled tutor to mentor your child oneonone and AI
and and whereas I don't see a path to making human intelligence cheap it just costs a lot to train a ski human being there is a path to making artificial intelligence cheap and what this means is today only the relatively wealthy can hire certain types of staff to do certain types of things for them. But in the future, I would love if every single person can have an army of smart, well-informed staff to do all sorts of things for us. And I think this would >> their health advisor, their tutor, that kind of thing. >>
Yeah. Yes. And and I think I I I think that um giving everyone an army of staff to help them that today is available only to the relatively wealthy, this would this will lift up a lot of people. [Music] Andrew, welcome. It's so glad to have you back. >> No, always great to see you, Astro. >> Awesome. We're going to talk about a lot of different things, but I want to start with something that I remember fondly. Uh we were in graduate school kind of roughly the same time. And when we were talking to you
about starting something at X that later got called Google Brain, um you had done something really kind of unusual and memorable for your graduate thesis. Can you tell the audience kind of what your graduate thesis was? both kind of what was technically interesting at a very very high level but also like what you made happen >> uh for my PhD thesis at Berkeley I built a little neuronet network uh that flew a helicopter and I guess it was unusual at the time because reinforcement learning which is hot now it wasn't hot back then um but
I wound up asking some friends to let me use a helicopter uh and then you know trained a little neuronet network with a little algorithm that we invented to keep it hovering in the air and it stayed rock solid. So you could watch a video of it and you go, "Is this a video?" You look at a picture and and I thought that was that was cool. That actually got reinforcement learning a lot more attention than it was getting at the time. And it was fun to fly helicopters. >> Exactly. That's people listening to this
now might not remember because now that they see all these vertical takeoff and landing kinds of craft, they think that something that's rock steady in the air is pretty easy. But back then, something that learned how to do that kind of like was a bit of a kick in the head in an exciting way to the field. I remember >> it was fun. I think I've been fortunate for a lot of things I do to go off the beaten path and do something weird and sometimes it doesn't work out. That's part of what happens when
you go off the beaten path, but when it works, you know, I think the helicopter result got a lot of attention and and move reinforcement learning forward at a time. >> Yeah. I remember as we were talking from the X perspective about starting something sort of like what ultimately became Google Brain here at X and later at Google uh that you co-founded. I remember and you you brought this um piece of history >> what you had written when you were at Stanford as um as a young professor about what you thought we should or Stanford
later ex should do my memory not having skimmed this yet I'm very curious to go look at it is that there were two things that I think was your thesis though I remember that myself and Sebastian my co-founder co-director of X at the time very much bought into one of them was that scale matters that Yan Lun and others had sort of showed this academically but that no one had actually managed to deliver scale and so it wasn't proved yet even though everyone strongly suspected it >> I would say at that time people hadn't proved
it academically and in fact I remember going to Nurip's uh at that time the NIPS new conference And I was talking to people saying got to scale up deep learning algorithms. And I was getting advice from very senior people saying hey Andrew why are you trying to build bigger neural networks like go invent new algorithms. So it was actually >> 10 2011. >> Yeah I remember pitching Google brain what became Google brain to Larry Page in 2010 but it was around 2008 it was going around academic conferences and then at that time scale was actually
controversial. people did not believe in it and well-meaning very senior people you know I remember Yosha Benjo saying hey Andrew this is not good for your career >> that's very ironic in in retrospect given not only how right you were but how good for your career in particular it's been the other theory that I remember correct me if I'm wrong that I I felt like was part of your thesis was that in the human brain and in other brains a lot of different signals The signals from our nose, the signal from our eyes, the signal
from our ears, even from our taste buds goes through similar parts of our brain. >> And you were asking the question, maybe there's something useful about that about forcing a system to be overloaded in that way and that asking a system to do a lot of very different things might cause it to be less brittle, more intelligent. Am I remembering that correctly as a part of the thesis that you had at the time? >> Yeah, it was kind of a two parts. It was really the one learning outcome hypothesis. So I was inspired by these
um neurore rewire experiments which show that things like if someone tragically has damaged in part of the brain other parts of the brain the same physical brain tissue could learn to see that was previously learning to hear. So it really made me wonder um and I wasn't the only one really made me wonder do we need totally different software or algorithms for seeing and hearing and doing all these different things or is there one learning algorithm that just depending on what data you give it is it text or images or audio or something else will
learn to figure out how to deal with that data. Um in hindsight I think this one learning alum hypothesis it turned out I think to be much more right than wrong. But again looking back I think I um overemphasize um looking to neuroscience for inspiration. Uh and so the specifics from neuroscience mostly turned out not to be hopeful but this high level idea that maybe the human brain has one algorithm that does a lot of stuff. So we should try to get a computer to have one algorithm rather than in rather than having 10,000 people
invent a thousand algorithms. Maybe you can have a small team invent one algorithm and just feed a very different data and that that really worked out >> right which at the time was heresy uh but is now what everyone is doing. >> Yes. I actually remember uh speaking at the National Science Foundation workshop where I was talking about one learning hypothesis. I was, you know, younger at the time. I was kind of poking fun at at at uh uh computer vision people, hand engineering. I remember one very senior computer vision researcher stood up in public
and kind of yelled at me. Um uh and as a young professor that was slightly traumatizing uh but you know maybe years later I can look back on it and smile and then say you know it it actually worked out okay. >> Yeah. And people who are changing a field as you did uh generally do get yelled at along the way. So as you were coming into X, why you know you had this vision, you could have tried it in different places and in different ways. Why come to X? Like what what was your memory
of um choosing to come to X with this idea? your initial experiences with me, with Sebastian, with Larry and Sergey like >> Yeah. So I remember um at the time Sebastian Thun who he's running ex together with you at the time um Sebastian Thun deserves much more credit for the starting of Google Brain than I think he's he's received so far. Candly um Sebastian and I had offices at Stanford right next to each other. So we share the wall. If I banged on my wall, you he on the other side. And so my students at
Stanford uh Adam Coo and others had demonstrated that the bigger we build a neuronet network or bigger we built these learning systems the better it performed. And so I felt like I had this you know secret data it wasn't secret I talked about it but people just didn't believe me which maybe you know was a positive thing I don't know with this data showing that the bigger we build neuronet networks the better they perform. And so I was talking to people saying we should scale up these algorithms. And um Sebastian points out to me, Google
has a lot of computers. Why don't you pitch Google um to let me use Google's massive computer infrastructure uh to build much bigger neuronet networks than anyone else had built. So Sebastian set up a meeting uh for me to pitch to Larry Page. And remember um I prepared slides on my laptop, came all prepared with slides, but eating a Japanese restaurant. So it was inconvenient to pull out my laptop. So I w up just talking to Larry with Sebastian there as well. And fortunately, you know, Larry happened to buy into whatever was telling him um
and then authorized letting me work with Sebastian and and and UN and X to take forward the project that later became Google Brain. So I think that I I I think that dinner I still remember it clearly. It was very high stakes conversation for me and I'm really grateful to this day to Larry Page for you know buying on to what was a pretty crazy vision at the time. >> Neural networks by 2012 things had changed. People were starting to get pretty excited about neural networks. But even 2010, maybe 2000 2010, neural networks were mostly
still out of vogue and had been for a long time in the artificial intelligence world. What was your thinking? Like scale is one thing. We we can scale these things up and scale may really matter. Separate question, what to scale up? any thoughts on neural networks as a representation? That's now something that everyone just takes for granted as being sort of in the water. But as you remember back in 2008, you know, neural networks was not what everyone was doing and was not at all taken for granted that that was what was going to be
the breakthrough for artificial intelligence. >> Yeah, neuronet networks have been out in the wilderness kind of rejected by a lot of AI for a long time. In fact looking back um it was difficult to publish neuronet network papers in the leading conferences which is why a lot of my early work was published in workshops rather than in the main conference uh at the time. Um yeah, I think back then um a lot of the intellectual excitement, you know, the way you get a paper published at the top conference was by doing really tricky mathematical work,
have a really clever idea, maybe prove a theorem. That's how you get a research paper published and you earn the respect of your peers with your very clever ideas. And then I showed up and said, you know what, let's get a lot of computers and make this much bigger. And that was viewed as gee, you know, where's the intellectual rigor in that? You're just building stuff. Why do you want to do that? So I think it was a really controversial and then frankly I actually saw this was a I actually saw as scaling up deep
learning started to really take off. I saw some of the people that frankly has spent 20 years of their career tweaking algorithms it was actually emotionally wrenching for them because they spent decades of their career fiddling with these algorithms in very clever ways. And then a bunch of people like me, we said, you know what, let's build a really big computer, throw a lot of data at it, and when it started to outperform their, you know, decades of intellectual work, it was actually it was actually, you know, it was tough. Um, many of them adapted
uh and and and then kept on continue to do good work, but when the disruptive innovation that comes that obsoletes what someone's worked on for a long time is sometimes takes a while to adjust. It turns out uh our first paper pushing to use GPUs to scale up neuronet networks, another controversial idea. We published that in a workshop because couldn't get a set of the conference at the time and now it's so obvious, you know, that everyone knows you should use GPUs. Um I think one of the things that I was seeing back then was
there was a small group of us not just me but also you know um at this sea meeting in in Canada where where Jeff Hinton and others hang out as well a few of us were generating data showing real momentum a lot of times when there's disruptive innovation it really doesn't compete with the incumbent technology and at that time the neuronet networks we're training they were definitely worse than traditional computer vision algorithms traditional text processing algorithms, but we knew we're on to something because while not yet competitive, it was rapidly getting better. And I think
um my students and I at Stanford saw that if only we could build bigger versions, then it would become competitive. And that was the bet to place. >> And as a very young professor at Stanford, you were describing before how you got a lot of push back at the idea that scale might really matter. What gave you the confidence in those early days to keep pushing for it when there were people who not only disagreed with you but were angry at you for for saying that that's this was the way forward? I had secret data
well not really secret we actually published it but others didn't believe me so it might as well have been secret but my students um Adam cocky generated a chart where the horizontal axis was the size of the model vertical axis was how well performed and we tried a ton of different models and for every single model we tried in that research pro just went up into the right so I knew based on data that the bigger we could build the models the better we perform and um I think as a scientist or as an innovator
Um, you don't get to do good work by just asking what everyone thinks and taking an average. Um, >> no sense. >> It's it's fine to ask people what they think, but in the end, you have to have a hypothesis for what you believe in. And mine was shaped by data that we had generated at Stanford that that we published, but somehow I, you know, struggled to get people to pay attention to this. So, we actually had a long head start on scaling before other teams then then jumped onto that too. Once Google Brain was
starting to get set up, Jeff Dean um kind of ended up as your partner in crime in building out Google Brain. What was like how did you two meet? Um how did that work out? Um how did you divide up the work? >> I think I was really fortunate that Jeff Dean joined the project. Um I think when uh Sebastian thun and I you know under Larry patient's direction were working to start the project um Larry asked me to speak with a lot of people in Google and so I remember chatting with Jeff Dean Greg
Colorado Tom Dean uh uh Jay Yagnik many many others. Um and I remember pitching to Jeff this idea that if only we can bigger neuronet networks things would get better. And that was the idea that excited Jeff. And then as the project got going um all of us still working on the project at the time knew that if we could get Jeff more and more involved he would be a tremendous you know value ad >> force multiplier for the team >> and so you know I don't know if any of us ever told Jeff this
but we actually had some conversations uh like Greg Ro and I were chatting all right what do we do to make sure Jeff is excited and keep him engaged and we're always uh wanting to make sure he was excited and we want to do more and more and fortunately he did I think um as you know when when when he became highly involved and we're pondering day-to-day he wound up being the systems person I mean he built a lot of Google's infrastructure he understands scaling at a very deep level and I wound up being the
machine learning person and I think it was that partnership of me bringing the machine learning expertise and Jeff bring the computer systems expertise that allowed us to use Google's infrastructure to scale up machine learning algorithms and that saw to deliver roll else. >> Yeah, it was an interesting analogy at the time. You know, one of the things that Jeff had brought to Google and to the world was taking a really hard problem like looking at all of the potential search information in the world, finding what you're looking for and returning it within milliseconds, which means
you have to split up the problem. And it turned out that that splitting up of the problem and then recombining the results which was central at the time to how Google solved his problems turned out to be very similar to the training uh work that you all did as you were training larger and larger neural networks, right? >> Yeah. So I think Jeff had invented this technology called map reduce that takes work, splits it up, does on lots of computers and brings them back together. That was version one of how we did some of the
training. Uh and then as we built more and more versions which then eventually led to things like TensorFlow, I think the technology stack kept on evolving. Um I would say one thing that I that I uh I think we were slower to do at Google was embrace GPUs. Um because partly because Google has such a brilliant CPU compute infrastructure. Um >> yeah. So let's let's dive into that for a second because our commitment at X generally was that we would be mostly focused on things that had some hardware component. So in the early days we
were excited about Google Brain being at X, but it was partly because we imagined that there might need to be specialized hardware that would go with the specialized software, a brain to go with the the mind as it were. And then over time you and the team were having so much success that at least for a while the interest in specialized hardware went away but then later Google brain ended up starting the whole TPU process though that was after it left X. Any memories about how we got to both temporarily not doing hardware and then
later getting back to hardware? Yeah. So, you know, I think I think Google Brain made a ton of great decisions. One that I I wish we'd made differently even earlier was the GPU maybe TPU decision. And I remember um uh Jeff and I were actually speaking with a lot of the data center operators, you know, building up the Google launch clusters and so on. And there was a concern at the time, which is legitimate, that we started sticking a few GPUs here and there, it would create a very heterogeneous uh compute environment that became hard
to use. Can you unpack for people what a GPU is and what a TPU is just so they recognize those terms? >> Sure. So, uh, most computers powered by CPUs, uh, uh, computer processing units. And then GPUs are, um, graphics processing units are originally designed for making computer graphics, but turns out to be fantastic for training very large AI systems, very large neuronet networks. And then TPU's Google's uh invention, Google brain teams invention is uh Google's take on specialized hardware for training these very large >> tensor processing >> tensor processing units right um and so
I think we saw that GPUs were working well and in fact early on in Google brain we're working on speech recognition and um we actually had a GPU server one maybe two I can still picture it sitting under someone's desk with a nest of wires feeding to it and we actually saw with one computer. I think we saw GPUs were hopeful but but the concern from the you know Google data infrastructure point of view was um Google has a at that time had a compute infrastructure so someone could write code and have it run pretty
seamlessly almost anywhere but GPUs is a very different type of hardware and so would change the work that programmers have to do to specialize it and so we're looking at boy if we buy a lot of GPUs could we use this also for YouTube transcoding and is it good for anything other than training AI models. And I think because of those things, we stalled out a little bit and and and did not as aggressively pursue that um uh here here at Google as as maybe I should have, you know, push harder for. Um uh I
wound up uh actually doing demo stuff using GPUs at my Stanford group because that was a scrappy team that could didn't very messy infrastructure and was okay. Um but having said that I think you know we got quite far with CPUs and then um as brains started to move in more into GPUs a little bit later and then building TPUs clearly that worked out just fine. >> It was after uh Google Brain had left X that the transformer was formally invented that the paper was published. Um did you see little bits um of transformer or
transformer like work? Tell us about maybe some of the snags that you hit in the early days of Google brain and also some of the things that felt exploratory then but turned out to be really important. You know the the brilliant thing about the transformer paper um was and I think to this day only some people understand this was the authors uh grew up in that Google brain tradition of scale and so a lot of decisions of how to architect to transform a network. It was all about designing a neuronet network that would scale well
on GPUs. So a lot of things like the um attention mechanism which is a very clever way for a neuronet network to decide which parts of a sentence to pay attention to. explain to the audience what the transformer innovation was around attention. >> So I think um before the transformer paper there used to be algorithms that say if you want to translate a sentence from English to French um you would read the whole sentence try to memorize a whole sentence and then try to regurgitate the French translation and and it kind of worked. Okay, >>
that's pretty hard, >> right? It was pretty hard. It's a long sentence. Um, and the transformer paper had this innovation innovative architecture that would keep the English sentence around and then as you're trying to write the French sentence, depending on where in the sentence you are in generating the output, you could pay attention to the specific part of the English sentence you're translating. Um, and it turned out that this required a lot of computation uh, uh, to be able to look at the entire sentence in English and entire sentence in French and figure out what
to look at when you're doing wipe. But because it scared really well on parallel hardware GPUs and TPUs, it it worked really well. And this later became you know as we all know the foundation for modern foundation models where instead of translating from English to French we quote translate from a user prompt to the you know answer to whatever the user is asking. Uh and a large part of why the transformer paper worked so brilliantly and and got so much traction was because the authors designed the neuronet network architecture very cleverly to make sure that
every single step was highly paralyzable and could run well on the GPU. And so that gave it a fantastic compute substrate to train it on tons of data and that made it work really well. >> In the early days, Google brain uh when it was still at X, uh you could have worked on anything. You could have worked on translation, you could have worked on turning speech into text or image recognition. How did you pick a few things to focus on and did you throw some away because they weren't working as well or because they
would be less commercially useful? >> So um in one of the first things I worked on when I started at X was um actually teach a class within Google on neuronet networks. I think Tom Dean and then Greg Curado and I work closely on this correctly >> but it turned out to be a great I think something like just shy of a 100 people came. Um but we were meeting every week uh sharing my weird ideas about neuronet networks and scale and why what we're doing at uh Google brain. Um and then fortunately this helped
us make you know many friends and find many allies across Google. And so one of the first teams we wound up working with was a speech team um for two reasons. First we felt that there was a lot of potential in scale to improve speech. By speech, you mean listening to the audio of speech and figuring out the words that they're saying. So, you could literally write down the text that comes from that audio. >> Yes. Speech recognition. Right. Uh at that time, I think voice search uh wasn't yet as mature as it is today.
But the idea that you can talk to your mobile app and use your voice to search in Google, that was really exciting. So, we want to improve the accuracy of speech transcription. Um, but I think at that time the speech team was already looking a little bit into neuronet networks and we felt that by helping them scale we could help Google speech recognition improve and so it wound up being a little bit opportunistic based on um who wanted to work with us uh uh and who who we thought we could work with to drive the
scale hypothesis >> that gave you a team to work with and help. It also helped you understand if you were becoming good enough to be useful, right? Because they had a very clear benchmark of what they considered hard and so what they would consider impressive in terms of progress. >> Yeah. So we're fortunate that we could work on deep tech innovation, you know, invent neuronet network architectures, uh, and then also we're kind of held accountable to deliver real business results relatively quickly. So I remember um work on speech um working on uh Google Street View
where we were at the time using computer vision to look at street view images to read house numbers to more accurately geoloccate houses in Google maps and that turned out at the time to be a bigger impact thing than speech recognition. Um you know had conversations on how to help advertising. Um I remember in early skepticism about uh web search I I I I I struggled to convince the web search team at the time. Uh fortunately the advertising team was was much more open to this. So >> do I remember correctly that you were also
looking at YouTube videos and doing some filtering for inappropriate content. >> Yes. Sorry. Jay Yagnik's team at the time um was running AI on YouTube and did a lot of really good work on uh helping tag YouTube videos based on the content and also some of the moderation filtering. Um so there there was actually a lot of ex because because of the class you know that that I led with about 100 googlers something there was actually a lot of interest from different application teams and we're fortunate um we're actually fortunate to have a lot more
people wanting to join Google brain even early on than you know we had headcom for um but but sometimes if someone wanted to join us but we just couldn't bring them in full-time we say you know let's just work together and and that set up a lot of collaborations it maybe just under two years I think from the day you started at X till when um Google Brain graduated from X and moved to Google. What did you think on that graduation? Was this like about time or we're being kicked out of the Garden of Eden
or something in between? Like how did that process of moving over to Google feel or was like well whatever it's all just kind of Google so it doesn't matter. It was a little bit of all of the above to be honest. Um, you know, X was still is a very special place. And so I remember when I was working in the uh X building way back then, it was really nice that you know like 10 feet from me would be the at that time chauffeur now Whimo team uh and then team working on balloons and
then also the you know gloss team and just several feet from my desk where all of these teams doing all of these wild exploratory insanely exciting things. So you know while graduating from X was presented as a graduation and the next step and I think ultimately it wasn't a bad thing that we moved to Google core and became closer to the business and got more resources. So no regrets at all. I think it helped set us up for success. It was also a little bit bittersweet to leave behind that wildly exciting X building which with
crazy stuff happening just a few feet from where I was sitting every day. Uh what changed after the team moved? You stuck around at Google for another maybe year and a half after it moved. >> I think so. >> Yeah. I think I think after we moved um I think it became more focused on one thing which was neuronet networks and scaling. We probably spend less time hanging out with the way more people and getting free rides in the very early way more prototypes at the time. Um and then I think we became more I
I I was say corporate but not at all in a bad way. I think it was helpful for the brain team that we became much more connected to a lot more Google businesses. Um one thing I believe back then I still believe now is this technology is exciting. We should work on deep tech but in isolation is completely useless. The value is when we find applications for it. So when we moved into the main Google building um uh we're physically much closer to a lot of the important application teams. So it was you know a
minute walk away to talk to different teams building really important applications that we could collaborate on. I w up uh shifting gradually from Google brain to um running Corsera more day-to-day. Yeah. So I think you know uh had started corer all of my machine learning causes Sanford my co-founder Daffy and I were running it dayto-day um and then partly uh because Google brain was going well and I was so confident I could hand off the leadership of the team to Jeff Dean who was a wonderful partner um uh I felt where whereas in contrast I
think Corsera is a very early thing needed much more day-to-day leadership so then I talked to Jeff and we very gradually over the course of like a year or something. Um, you know, had me hand over the reigns to him and I think that that fortunately worked out well too. >> Yeah, for sure. You're still on the board of Corsair, aren't you? >> Yes, I'm still chairman of the board. >> Congratulations. >> Oh, thank you. >> So, I'd love to hear from you a little bit where you think AI and machine learning is going and
also sort of where you've gone afterwards, what you're doing now. >> Yeah. So well these days um uh including taking some early lessons I learned by watching you Astro and and X operate is uh spent a lot of my time running AI fund which is venture studio and we build an average of about one new startup per month. Um I continue to also do a lot of uh uh AI education things through deep learning.ai through Corsera but I think AI is wildly exciting. companies like Google are doing a fantastic job training foundation models. Um I
think the latest version of Gemini really, you know, team really did a great job. Um and I'm wildly excited about the number of applications that would be built on top of these foundation models. So it feels like when I go into where every day there are so many cool applications where I think there's clear market demand, people make people lives better off. It just no one's gotten around to building it yet. So I find that very exciting. >> You said one new startup per month. That's the rate at which they're coming out of your pipeline.
>> Yeah. >> How long do they spend in your pipeline? >> Um so uh about six months from idea to launching a startup. Uh but about half of that time is getting up to hiring a CEO. But once you hire a CEO, they spend three months with us. uh after which after 3 months with about let's say a 75% graduation rate uh the 25% we you know we we or they decide to not move forward but so basically once the cos with us 3 months and we launch a startup and I think one of the
things that's changed in AI is the cost of prototyping has gone down dramatically so you know if you have an idea um it's so inexpensive now to build a prototype and then take the users validate or falsify it and if falsified, great, you just wasted, you know, two days and $5,000 or something. >> Um, and this this is really picking up the pace of innovation, especially in the application layer where we take AI and build applications. Um, as opposed to the AI technology foundation model layer, which still needs these massive billion dollar budgets and massive
data center buildings. Yeah, I think about it as the difference between like electricity, transistors, the sort of foundational layers of what by the end of the 20th century was sort of the computer industry, the the internet, uh, infrastructure, all of that was profoundly enabling, but we also then had tens of thousands of things to build on top of that to realize the value of that. In that same way, foundation models, sort of machine learning and these these large models that are now available to everybody in the world is like the electricity. It's like the transistors.
It enables an incredible number of things, but you have to go do things with it. >> Yeah. In fact, if you look at the electrification of the United States and other countries, building electric power plants was a big great business. Lots of people build electric power plants. They they do very well. But if you look at the um consumer electronics industry or the things built using electricity that's far bigger than the power plant industry and I think it will be like that for AI. You know building AI models will be huge. It will be massive
but it won't be nearly as massive as a collective set of things will do. Building tons of applications on top. >> I love your passion about artificial intelligence and we'll come back to that one. You also have a passion for education. You mentioned Corsera. You were a professor for a while, but I'm betting that your passion for education is much bigger than the time that you spent actually teaching. Tell me about your passion for education. >> I think I grew up, you know, trained uh by my parents and so on to to realize it's not
about me. It's always about setting others up for success. And then um at Stanford I remember teaching machine learning and I was walking into the same room delivering the same lecture year after year even telling the same jokes and after a while I asked is this really the best use of my time in terms of setting up students to be successful. So over a span of a few years I wound up um trying to get the videos recorded and posted online free for anyone to access. wound up prototyping things like autograded quizzes. Uh learn from
Sicon Khan Academy that we should do shorter videos and it turns out that before Corsera which you know suddenly went viral there were I think five other versions that you would never have heard of. Uh some of which had like 20 users or something but that allowed me to learn important lessons about how to build a scalable online educational platform. And when that worked out, I felt there was an opportunity to take training to a very large audience. Um, so I invited, you know, Oafi caller to join me and and we wound up building up
from there. >> And tell me about artificial intelligence and machine learning. When did you get passionate about that? How did that sort of bug start with you? At what age? >> I remember it was in high school. I did a internship uh as an office admin and I just remember doing so much photocopying. Not my favorite and and frankly it was boy it was boring and I remember as a teenager thinking oh boy if only there was something I could do like some sort of automation that could do all this photoing for me. Maybe I
could do something more fun. So since a very young age I was really excited about automation and how it could free up people's time. And I was fortunate also that my father, a doctor, um at the time was actually experimenting with very rudimentary AI algorithms for medical diagnosis. So my my distaste for the amount of photocopying had to do as office admin and learning about neuronet networks as a teenager. Um I think since then I've been passionate about AI as as a form of automation on steroids. M >> so since we started talking about this
vision that you had that was at the time um literally people were getting up and yelling at you about what do you think are some of the things coming down the pipe for us as humanity with respect to artificial intelligence and machine learning that people don't see coming clearly yet? I'm not trying to suck you into some unproductive AGI conversation, but just like how do you think the world's going to be in call it 10 years that might surprise listeners? >> There's one thing that I'd love to see. I I feel like I'd love to
see everyone learn to code or with this new style of coding which is much more AI assisted. And the reason is um maybe in in my professional life obviously I do a lot of coding but in my personal life I write applications for my kids you know a few weeks ago week week ago I was writing a application to print out flashcards for my daughter to practice a multiplication table. Um it took me uh uh less than a day to build a new prototype that I could call up on my phone and talk to you
know to get a custom prom and talk to me about topics. But a lot of these prototypes that used to take weeks or months to build now can be built in hours or maybe less than a day um without writing that much code because AI can write the code for you. And I think that demand for software engineering is massive. So many of us would love for so many more programs to be written but it's just been too expensive. And I think um four states in United States out of 50 or really require some computing
education to get a high school diploma. I hope it would be someday 50 out of 50 states because if we can get everyone to learn how to use computers to build things, not just be users of computers, be building alongside computers, I think every human can be much more powerful. Um, it turns out that one of the most important skills going forward would be the ability to get computers to do what you wanted to do. Um because computers becoming more and more powerful and I feel like a world where we teach all children to code
in in a new way of coding will set up the next generation to be much more powerful than the current one. >> H and uh how about sort of outside the United States, maybe even outside of the sort of um the larger most developed economies? How do you think AI is going to affect the world more generally? I feel like um I hope that AI will have a very large democratizing effect um because one of the most expensive things in today's world is intelligence. Uh cost a lot to get a highly skilled specialist doctor uh
or to to tell you what's going on or um cost a lot to get a highly skilled tutor to mentor your child oneonone and AI and and whereas I don't see a path to making human intelligence cheap. It just cost a lot to train a skilled human being. There is a path to making artificial intelligence cheap. And what this means is today only the relatively wealthy can hire certain types of staff to do certain types of things for them. But in the future, I would love if every single person can have an army of smart,
well-informed staff to do all sorts of things for us. And I think this would >> their health advisor, their tutor, that kind. >> Yeah. Yes. And and I think I I I think that um giving everyone an army of staff to help them that today is available only to the relatively wealthy, this would this will lift up a lot of people. I'm I'm really curious to hear yours definition. This is a little tongue and cheek, but I've always felt as sort of a longtime practitioner, and I know you are as well, obviously, AI has traditionally
been this sort of receding frontier where as things start to work and become an everyday part of our lives, we stop calling them artificial intelligence. So my favorite working definition of artificial intelligence is the things that computers do in the movies, which in a certain sense is totally unfair, but you remember that moment where computers started being better than people at chess. And all of a sudden that didn't count as intelligence anymore by definition because computers were better than people at it. And I was like, well, okay, that's not a very useful definition of intelligence
then. How how would you define artificial intelligence? >> Yeah. One one of the things that I think has contributed to AI success is um while on one hand is sometimes feels like it's always far off I think as a field we've been pretty embracing of whatever people want to enter our field and call AI and so I find that um just myself if someone is doing something that makes a computer demonstrate some semblance of intelligence if you want to call it AI fine with me I will agree with you so I think the fact that
we're quite embracing. If you want to call your work AI, it's okay as opposed to too many of us going around saying, "No, that's not really AI." Just lets our field keep growing. >> So maybe as a build on that, we would call it AI in a very general sense to the extent if a person did something like that, we would call that behavior intelligent. >> Yeah. Yes. Yeah. and and the criticism is you know very simple programs use an if statement to make simple decisions that is intelligence and is that really artificial intelligence I'd
be inclined to say yes if you think it's intelligent call AI I will fully support that and I think that I find that disciplines tend to be more successful when we just embrace whatever works rather than being too defensive and saying no you're not one of us you're not one of us I I think AI has avoided that >> I agree I agree so The culmination in a way of Google brains time at X was a paper uh that was published to quite a bit of fanfare in the New York Times about um cats and
cat videos. Can you tell us a little bit about what led up to the work that was featured there? What specifically was sort of being highlighted because this was sort of the coming out moment for >> uh Google Brain. >> Yeah. when we announced Google brain was through that you know now slightly infamous Google cat paper. >> Yeah. >> Um I remember we were we had this idea that to get enough data to learn from we wanted to learn from unlabelled data. So labelled data means you had someone look at pictures and say this is
a dog, this is a cat, this is a person. Those are labels that are very labor intensive. But we want to learn from unlabelled data and specifically we built a very large neural network um possibly quite likely the largest in the world at that time which would go to YouTube and just watch tons of YouTube videos and learn from pictures or from from YouTube to see what I could learn from that. And I remember um it was uh uh my then Stanford PhD student and Google brain team intern um who's still here at Google and
doing great work. I remember Qua one day calling me over and say, "Hey, Andrew, take a look at what I have on my computer." And I walked over and he showed me this ghostly picture of a slightly fuzzy black and white, black and white cat that the algorithm had discovered all by itself by watching YouTube videos because stereotypically a lot of cat videos on YouTube. But the fact that um an algorithm just by looking through tons of data without any human intervention to tell it there's even such a thing as a cat had you know
quote discovered this cat face by itself. That was a that was incredible breakthrough moment. >> Yeah, exactly. You have a a quote that's somewhat wellknown. I'd love if you'd share the quote about uh AI and work and sort of what's led you to believe that and how you think that humanity can sort of work best with artificial intelligence going forward. >> I think every knowledge worker right now can get a significant productivity boost from AI but AI is still far from automating everything that most people could do. And what that means is um I don't
think AI will replace people, but people that use AI will replace people that don't. And I'm I'm I'm paraphrasing my friend Kurt Langlaws that first said this about radiologists, but more generally um today I can't imagine hiring an employee for most roles that doesn't know how to do Google search. It's just, you know, just strange in a knowledge economy to not know how to search in Google. in the future. I think for most roles, we just won't hire anyone that doesn't know how to use AI in a really effective way. >> Yeah, totally. Right. >>
Uh but having said that, it turns out that um salaries often get adjusted over some time to productivity. So AI will make people much more productive and so I think that a lot of people will actually do much better financially, will get paid a lot more by becoming fas of AI as well. >> Yeah, I agree. That's there's a there's a lot to be excited about and hopeful on. >> Yeah. >> Uh any thoughts on uh either some fun stories of things from your time at X or takeaways for what it takes to successfully drive
a moonshot. >> One thing that is really precious and rare in the early days of X under you and Sebastian's leadership was the cross fertilization of ideas. Um, I remember one day someone from the Whimo team, they just came and said, "Hey, Andrew, want to ride in a, you know, driverless car and say, "Sure." And then I hopped into one of the early Whimo prototypes and then drove around, you know, downtown Mountain View in a driverless car. And I think that degree of openness and idea sharing and the willingness to just do weird things, some
of which really worked out brilliantly, that's really rare and precious. >> Oh, >> yeah. And thank you. And I'm sure it went in both directions. You probably got inspiration and ideas from Whimo, from that ride even, but I guarantee you it went in the other direction as well. Like fast forward to today and Whimo uses large foundation models for a lot of what they do. >> Right. So that cross fertilization is is literally cross. It goes in both directions and helps both parties. >> Yeah. And and I really like that everyone back then and I'm
I'm sure even now felt like, you know, feels like you're doing work that matters, right? You're not showing up to do some boring thing. But I remember actually Larry used to go around and ask people if what you're doing succeeds beyond your wildest dreams, will anyone care? And the obvious implication being, you know, go work on something that you can answer yes to. And I think that felt really good that even back then it's throughout everywhere at X people felt like don't never work but if it works boy a lot of people are going to
care. >> Yeah exactly. I I want to share one thing I personally tend to obsess with which is um speed because when you're innovating almost by definition you don't really know what you're doing and your ability to execute really quickly and try lots of different things um is an important component I think to success and what I find is that um it turns out when I you know interview people for jobs everyone will say they know how to move fast but there are dramatic differences like easily 10 maybe 100x differences between the paces with which
people execute. So I've spoke with leaders that will have a you know 15-minute conversation and just make a decision. Um and I've also spoken of leaders that in a similar situation will say great let's study this for three months and we'll reconvene in three months. And I find that there are these dramatic differences. And part of the key to innovating is I think for a big company like Google, you don't want one random engineer to do something that takes down Google website. It's just not acceptable. But I think X by for the most part creating
a safe environment. Yeah, we could do whatever we wanted on Google Brain. There was no risk that I would accidentally do something to take down Google web search. I didn't have the authority to do so. Anyway, um that let us move really quickly and try things out. And I think that speed of execution with the sandboxing of the guardrails to make sure no one can do something crazy to take down the mother ship, that combination is is hard to create. And I think X managed to do that. >> Thanks, Andrew. I agree. I uh my
version of that same mantra is the tightness of the learning loops. I don't care how long it takes for us to get to greatness or to figuring out that we're on the wrong track. What I care about is the length of time between hypothesis and results that we can assess. >> Yeah. >> If that takes an hour or that takes a month, like you're in a different universe in the second case relative to the first case. >> Yeah. Well said. A lot of innovation is about learning because one, if only we knew, we could have
rebuilt the whole thing in a week. So, it's about figuring out what to build and how to build it. >> Exactly. Exactly. >> Well said. >> Absolutely wonderful. Thank you for doing this with me, Andrew. Thank you, Astro.