This week we chat with Julian Schrittwieser, a staff software engineer at DeepMind, the AI lab acquired by Google in 2014. He is an author of a recent paper on MuZero, an AI program that mastered "Go, chess, shogi and Atari without needing to be told the rules, thanks to its ability to plan winning strategies in unknown environments."
You can find the paper on MuZero here.
He blogs at Furidamu and can be found on Twitter here.
The story on drug discovery powered by AI can be found here.
Julian Schrittwieser I love this, oh, you know, you can never do X, because this gives me a great list of, okay, what should we do next? [Cassidy & Ben laugh] So you know, if you have any of these, please send it to us.
Ben Popper JetBrains Space, a unified platform for the entire software development pipeline and team collaboration. Combine Git hosting, code reviews, CI/CD, package, planning tools, issues, documents, chats, and blogs all in one place. Bring your software teams together to communicate and deliver high quality code faster. Get started for free at jetbrains.com/space.
BP Hello, good morning, and welcome to the Stack Overflow Podcast, a place where we chat about all things software programming, engineering, and technology. I'm Ben Popper, Director of Content here at Stack Overflow. And I'm joined this morning by Cassidy Williams. Hi, Cassidy.
Cassidy Williams Hey, how are you?
BP Great. So Cassidy has helped us to write the newsletter since we launched and she was an early guest on the podcast now works at Netlify. But she's also a big Go player. So we thought it'd be great to have her on the episode today. Because we have a guest from DeepMind who has worked on some very cool algorithms that have mastered games like chess, and Go and most recently, Atari. So video games is the important one, obviously. Welcome to the show, Julian. Julian, introduce yourself, say your last name because I would butcher the pronunciation and tell us who you are and what you do.
JS Hi, Ben. Hi, Cassie. Thanks for having me. I guess my last name is pronounced Schrittwieser. I'm from Austria, so you can tell it's German. I've been at DeepMind for several years now. Starting with, as you mentioned, I've been working on AlphaGo, the first program that managed to defeat professional human Go players, many years before many people expected this. And ever since then, I've been working on pushing this line of algorithm to be more general, and to be applicable to more and more real world problems. And the most recent manifestation of this was MuZero, as you already hinted that.
BP Awesome. And so tell us just a little bit more background. How did you get introduced to computer science? You know, what's your sort of background is self taught or education? And then how did you make your way to this particular area of work in a place like DeepMind?
JS Yeah, I've always, I think since I was a kid, I've been pretty interested in computers. I think initially, I started like many people, I was very interested in games, and especially making my own games. So I think this is really how I got hooked. First thing, it was basic, Visual Basic. And then I eventually I got my hands on some c++ book. But along the way, I think I was always more interested in building the engine, building the backend. So through all of this time, I never actually managed to build any game at all. I did, I did learn a lot about you know, computers and programming. And I remember, you know, the first time wrangling with pointers in C and just exploding my head.
CW Oh, man, good time.
JS So I guess over time, I became pretty sure that I wanted to work in computing in software engineering. And so this is why also I went to university to study computer science. But I think at this time, I didn't, I didn't know a lot about AI, I didn't have a particular interest in AI and machine learning. So I was actually just while interested in software engineering in general, maybe especially in computer security. I went to a lot of security conferences, like Chaos Communication Congress in Germany, or their conferences in the US like DEF CON. But the way I got into into AI is more of an accident. After university, I joined Google as a software engineer, and about one year into my job. This was about the time that DeepMind was acquired by Google, Demis came to our office to give a talk about you know, what was different about what we're doing. And it was actually my day off, I was on holiday. But for some reason, I was checking my email and I saw this email from this Demis person. And he was talking about, you know, learning to play Atari and all these cool algorithms and I was just like, Oh, I cannot miss this talk. Right. So I just made it in time just for this talk to the office. And this has been a really decided, you know, I have to work on this. I have to, you know, move to this team. And you know, I'm I'm very glad I did this. I think it's been a really exciting time.
CW That is so awesome. I geeked out when AlphaGo came out, like watch the documentary watching the games live and everything. I personally play Go I try to play it every day. And I remember my father in law he's been playing Go for like 20 years. And when, when the announcement came that AlphaGo was going to be playing against Lee Sedol and all of these great people. He stayed up all night watching games and it blew his mind that a bot could do this. It was really fun to experience that with him but just in general and the the work That you've done is so interesting. And it's just so cool, both seeing it applied to games and games that are really interesting and loved by so many people, but also just the math behind it, the AI behind it is so interesting.
JS That's really awesome to hear that he enjoyed it so much. Yeah, I think it was a really magical moment to see, this machine is algorithm play moves and play things that nobody would have expected beforehand, I think, yeah, very, very special for us.
CW It was so entertaining, cuz people were just like, "Huh, okay, he, the the computer went there. So let's see how this could play out." And then they would just like have a whiteboard, like mapping it out and everything. And it was so fun to see. Many people kind of be startled by it, but also like, Oh, this is smart. And and yeah, it's, it's the coolest thing. I'm geeking out here thinking about it again.
BP Yeah, I remember that I was working at The Verge at the time. And we were covering it. And right, sort of the confusion of the announcers or the whole Go community be like, this doesn't seem like a good strategy. And like, unclear what the strategy is, at this point, like we're way off the map. And then, you know, come around to end up winning those games. So Julian, I guess, just quickly, for people who work in the world of software engineering and at big companies, what's it like to make that switch, you have to put in a request to change teams do you have to learn like a new set of basics, or they were busy recruiting for DeepMind? So they were willing to, like take people on from any part of the company? Like, what's it like to make that transition?
JS So I guess it depends on every company. But I think inside Google, inside Alphabet, I think there is already fully a lot of flexibility to transfer between teams. And so often, you know, teams are looking for people that will put out an internal announcement or a post. And, you know, what are they looking for. And so, you know, initially, since I didn't really have much of a background in AI, I just transferred as a software engineer role, and did a lot of engineering work. But over time, I just, you know, put, read all the papers, watched all the courses and read all the books I could find. And so I think deep learning, machine learning is this exciting field that is still incredibly young. So you can catch up sort of to the state of the art relatively quickly, and really make impactful contributions. If you put in the effort. I think this is how I transferred, I sort of changed over time from to be more and more machine learning heavy and splitting my efforts.
BP Do you think of yourself now or when you you know, like, when you're working in a different sort from a different perspective, which is to say, like a paper is published in nature. So you're, you're sort of a scientist, now, you know, as opposed to like a software engineer, do you still think of yourself as a software engineer?
JW I think it's different every day or every hour of the day. One day it might just be, you know, reverse engineering clock and getting, you know, trying to get something to compile. Another day, I might be spending my whole day in Drax or some machine learning framework, staring at losses or trying to come up with a new architecture. So I think, depending on which project you're working on, there is always this frontier have a different trade off of machine learning and engineering gives you the maximum speed forward. And depending on this, you know, I think I tried to choose, what am I going to do today? Will I try to make our models better? Will try to make our code faster? Will I, you know, maybe refactor things and just delete code?
CW I think it's good to have that balance. Honestly, it keeps you sharp on both ends. But then I also think it gives you a more holistic view of what you're trying to accomplish as well.
JS Yeah, I think it's fun. And I think it also, it allows you to make progress without having to maybe wait for somebody else or having to excite somebody else about your idea. I think often when you're working in research, and you know, initially you think you have a great idea, everybody else is like, well, that sounds great man, but they're not necessarily very enthusiastic about it. And so if you need somebody else to do some foundational work for you or to prepare for you, then it can be very tough to prove it out and to try it. Whereas if you have flexibility, and you know, maybe you can hack it together yourself, or maybe you know, you know, the necessary basics, this can give you a good advantage. So I think you know, any of our listeners, if you are a student, if you're interested in this, I think this can be a very helpful part for you.
BP Yeah, whenever I need something from engineering, and they can't do it for me, I just learn it on do it myself. It's what you that's how you get rid of the blockers.
CW Sure, Ben. [Cassidy & Ben laugh]
JS I think you know, it depends on the scope of what you need. But sometimes you may find that, Oh, this is actually relatively small.
BP Let's dive a little into, yeah, what's happened since AlphaGo. And sort of the path to MuZero. Reading up on the blog post. One of the big differences is how much sort of human data, domain knowledge, knowledge of the rules goes in ahead of time. Can you walk us through sort of like the steps that have taken us from that original AlphaGo, through AlphaGo Zero, to where we are now with mu MuZero?
JS Yes, totally. We actually have a little graphic as well, for this in DeepMind blog post, if you want to check it out later.
BP Yeah, I'll put it in the show notes for sure.
JS Yeah, basically, I guess if you start with AlphaGo, this was specialized to play the game of Go. It started, it was trained initially from human Go games. But it already had this core component of the Monte Carlo Tree Search, that was using the predictions of the neural networks and combining them to make better estimates. And so the next step that we took, there was AlphaGo Zero, and there is a zero in the name to indicate that we've removed the need for the human data. So that we can start training from zero, from nothing or from scratch. And then after is the next step was Alpha Zero. So there, we dropped the Go, and extended it to chess and shogi, to sort of demonstrate that this tree search algorithm is actually pretty general. And it doesn't just work in Go. But it also works in other games. And we chose chess in particular, because you know, this has a very long and rich history in, you know, game playing computer science. And the interest traditionally, also, beta search has been very successful. And so there was a lot of skepticism about, you know, oh, this Monte Carlo Tree Search is great for Go, but we know wouldn't really work in chess, or maybe not so good. So we've really wanted to show no, actually, you know, you can use this for pretty much anything. And so this is where we are with Alpha Zero. But all of these previous algorithms, AlphaGo, AlphaGo Zero, Alpha Zero, they all require you to know, what are the rules of the game, what's going to happen when you play a move, because this is what you do inside a tree search. You roll forward, the the board game, the state, and you can tell the network, okay, you play this action. Now, this is the new state of the board, tell me what will happen. But of course, you know, for most real world applications, this doesn't work. If you have a robot, if you have a car that is supposed to drive and you want to think about possible future action sequences. It's really hard, or some maybe even impossible to have an accurate simulator that will tell you what would happen if you know, turn left or if you move the arm this way. And this was really the motivation behind MuZero, we want to keep this powerful Tree Search. But we want to remove the need for these rules or, you know, we want to remove the need for his domain knowledge. So that we can just apply to any kind of RL problem. That was the motivation behind learning the modeling in MuZero, to use this insight to research.
CW So if it's Tree Search, and you don't have rules, this is where by computer science brain is a little bit confused. If you don't have the rules for which it can be applied. How does it know what to pick? And how to learn things? Is it just try something, this appears to be a failure? Try something else? or How does it know which branch of that tree to go? And how does it know what to do?
BP In my non computer science brain, I sort of had the same question which was like, you know, there's reinforcement of some kind, but has it know that the objective is not to lose the game as quickly as possible, as opposed to like, figure out the rules and win the game?
JS So the program MuZero is still told to reward it's, you know, ever simpler at the end of the game? So for example, if you play Go, after the game finishes, you were told, did you lose? Or did you win? Or if you play Atari, then you may be you know, you're told the score, how many points that you get? But you don't know exactly, this is what it uses to learn is value function and to the value function is then what it uses to estimate how well it is doing inside of the search tree.
CW And so is it rewarded along the way or just kind of at the end once once it's gone through these decisions?
JS So this can depend on the environment. In some environments, like in the board game, you only know at the end of the game, whether you won or lost. In other games, for example, in Atari, you might get a reward at every step. Or maybe you know, you only get a reward if you discover some certain object. So this this can be flexible.
CW That's so interesting.
BP And so yeah, I guess like in a in a game like chess or Go, it's kind of binary, did you win? Or did you lose? In Atari, you're offering multiple rewards for things like extra points or the amount of time it took?
JS So exactly, in Atari day reward is the actual score of the Atari game. So you know, when you play like maybe pinball, you might get some high score, this score is exactly the reward that MuZero would get, for trying to optimize.
BP And so I guess the idea here is that you're you're sort of generalizing to unknown models. That was one of the sort of like subheads in the blog post. When AlphaGo came along. I do remember people, you know, talking a lot about what does this mean for artificial intelligence writ large, you know, are these systems going to be better as soon at everything, and then, you know, sort of the sort of like, most rational response to be like, "It's great. It's amazing at beating people at Go but it can't play chess, you know, it can't read poetry, you know, it can do what it has is very domain specific." So as we move into this area, that's a little bit more generalizable. Does that give you some sort of view on how we head towards more of that artificial general intelligence? Or is this still very much sort of defined by the scope of gaming?
JS Yeah, I love this. Oh, you know, you can never do X. Because this gives me a great list of Okay, what should we do next? So, you know, if you have any of these, please send it to us. I really love this. Yeah, it certainly is not restricted to gaming, I think the whole motivation of it was that playing games are great. But really, what we want to do is solve practical real world problems, because you know, this is ultimately what's going to make our lives better. And so I think specifically, the way we approach this is with the, the framing of an reinforcement learning problem, which means there is some environment that you interact with, and you may get some rewards at every step, and you try to optimize, you try to maximize the sum of these rewards. And then if you can phrase your problem that you're interested in as a reinforcement learning problem, then you can apply these algorithms to solve it. So of course, you can phrase a lot of games as a reinforcement learning problem. But you can do the same, you know, with other practical problems that you might care about, right? If you have a car to drive, so maybe, you know, you want to get there quickly, but also safely. So you might get some reward for driving quickly, but also very large negative reward if an accident happens or something like this.
CW Yeah, I read a really interesting quote, once that, it helps me kind of frame my mind, because there was definitely a point where I was watching AlphaGo, and watching things like this. And I was like, man, our human brains, we need to be able to be as good as these machines, someone needs to be able to defeat them. And, you know, I think quite a few people felt that way. But because of how machines learn and how they they really focus in on these things, because of how the algorithms are written and stuff, the quote that I read, I thought summed up really well. You don't see like a crane that can lift a giant thing on top of a building, and be just like "humans should be able to do that ourselves!" It's a machine that was designed to be able to do that really, really well. And I think it's kind of the same thing with with these reinforcement algorithms. And with these AIs that learn really, really well this thing, they just learn it really well. And they can understand it really well. And they can do that one thing better than the human brain sometimes.
BP Yeah, this is like kind of that classic, John Henry, you know, sort of folktale or it's like the guy competing with the steam engine, you know, that's not the way for us to win this battle in the end. Not that it's a battle. Sorry AI.
JS I think that's the thing, right? Like we don't, I think is the opposite, right? We do want the algorithm to do these tasks better than us. Because ultimately, they are our tools. Right? We use these algorithms to do things for us. So the more things they can do, the more things they can do better than us. Well, the better for us. Right? I think you know, this, if you look at the history of civilization, this is how we improve our lives, right? By improving the productivity of humans, through, you know, tools, and in increasingly sophisticated machines. You know, this is how we will be able to continue doing this in the future, I think.
CW Yeah, I think it's just all of the movies who are just like the AI has come to life, and will defeat us all. If you get out of that mindset, and think of it like that, like it's it's a tool we're going to be able to use to be better than we ever were at something. That's what's exciting about it. And that's what, that's the whole meat of it that that means that we got to move forward with it.
BP Yeah, I wonder why we're so hung up on that narrative. That's like, you know, that it's hard to find the counter. There aren't a lot of movies about like benevolent AI systems.
JS Yeah, it's interesting, right? Like, for example, with Go, you can see a lot of after the Go matches after Lee Sedol, maybe almost all of the Go professionals they actually started studying using AI's, you know to analyze their moves or to be inspired. So actually, it helped them become much better Go players. And it really has increased the interest a lot.
BP Yeah, I think it's that we just want to keep our status as the apex predator. We're fearful biological things. So Julian, I guess you were mentioning, yeah, you know, that this could be applied to other fields. And I remember reading in the blog, that Alpha Zero had been applied to stuff like chemistry. You know, I've read a lot about interesting things with drug discovery in this use of deep learning. One of the things I read that really made me happy was they had looked at all of these old, you know, drugs and medicines that have been developed, like, hundreds of 1000s millions of patents have been like, Is any of this good for anything? And the machine was like, "Oh, yeah, you should try drug X for disease Y." And the doctor is similar to go we're just like, everything I've been taught in school, that is just this is wrong. There's nothing about my training or modern science that would tell me try this. But then we did and it worked. And so like, right, what a great benefit for everyone. You know, things that have already been invented and are sitting on the shelf can now be applied, you know, potentially to medicine, but talk a little bit about some of the range of yeah, complex problems that The current iterations of the alpha n and mu zero Are you know, you're experimenting with and then maybe a little bit about, you know, what, what's down the road five or 10 years? Like, what gets you excited, you know, to think about in terms of potential?
JS Yeah, I guess the things that we're looking at at the moment is extending the algorithm to sort of very complex action spaces, like, you know, you mentioned selecting from hundreds of 1000s of possibilities. So I think that's one of the things we've been looking at recently, in a lot of real world problems, you know, then there are sort of many possibilities or degrees of freedom in which you can act, you know, if you think of a robot arm, there are all these, you know, you can move the fingers in the arm, the wrist, there are all these degrees of freedom that can make it very complex for an algorithm. So this is one of the things that we have been looking at recently, and I think very well aligned with this. Also, you know, you mentioned there's all these existing medicines or existing data, I think a critical part for an algorithm is that you can learn offline from such data. And you know, even though maybe you don't need to interact with in any environment, you can just have this store data and, you know, think about this and analyze it and improve, as you think offline on it. So this is another aspect we're looking at a lot recently. And so hopefully, we can actually share some more details with this soon. Please stay tuned.
BP Oh, great. Yeah, the news peg for our podcast. Okay, I'm excited.
JS Yeah, I think I think it's pretty cool. Like, I think this will really make it very much easy to use for many people to apply to their problems. I think long term, I think there is a lot of potential for solving sort of abstract reasoning or optimization problems, that if you take a look at what is Tree Search, what does it do? In some sense, what you're optimizing, is you have a sequence, you have a sequence of decisions, at each timed step, you maybe have a bunch of options, and you're trying to find what is the best overall sequence of actions. And you know, this works well in games. But I think there are also a lot of maybe mathematical and maybe, well, sort of real world organizational or optimization problems, where currently, you know, in sort of the famous case of the NP complete problem, right, it's incredibly hard to find a solution. But once you do have a solution, then it can be very easy to verify. I think this kind of problem tends to work very well with our algorithms and MuZero and Alpha Zero in specific, because this is exactly what happens in the, in this training loop, right? The act of running the research and selecting actions, this is you know, you're trying to find really trying to find the solution to your hard problem. And then at the end, the reward, well, this is you look at the problem and see the did I get this right? Or did I not get this right? So any problem that has this shape, I think you can really try to attack it with Alpha Zero or MuZero. Right? And we try to solve it. I think this is you reference the chemistry paper that was published by some other researchers, I think that's very interesting application of it.
BP You mentioned that, you know, you came from just doing sort of software engineering at Google, and then moved over to DeepMind. And you were able to read a lot on your own. If somebody wanted to get started with this, what resources would you recommend, what like, books or videos helped you? What, you know, languages or frameworks were useful as you moved more into this field?
JS So I think things have been changing really rapidly. So what I probably use when I started is not very useful anymore. But I think, a great source, almost all machine learning papers, you can find a lot of them on Archive. And so you can, you know, find them written for free. And usually, if you read the Previous Work section, you might find other papers in reference that you might be useful for you, the more they are cited, probably you know them. If you see them every time probably you should read them. Everybody's using them. In terms of actual frameworks. A lot of work is using Python with either Drax or Pytorch and all this, I think these are two very easy to use frameworks. Personally, I think like, we've been using Drax for all our work recently with MuZero. And it's it's actually been really fun to use. So I definitely recommend to give this a try.
CW I'm surprised you don't bring up TensorFlow. Is that one not as active or used in your area?
JS So we used to use TensorFlow as well previously, but then we migrated from TensorFlow to Drax. As I guess, Drax, maybe you don't, you might not know Drax, but you might know NumPy. And so Drax basically has the same API as NumPy. But then it compiles the whole network for efficient execution on an accelerator like a TPU or GPU.
JS So it makes it very easy to experiment with things for us.
BP I'm glad you brought that up, because I have to mispronounce some coding term on every episode and NumPy is one of my favorites. [Cassidy laughs]
JS Oh, in terms of pronunciation, I'm terrible, right?
CW Sure. My name is Cassidy Williams and I'm a Principal Developer Experience Engineer at Netlify. You can find me on the Internet @cassidoo or you can google Cassidy Williams. But there's a Scooby Doo character named Cassidy Williams. I am not the Scooby Doo character. And that's how you can find me. [Ben laughs]
BP And Julian you want to let people know, who you are just to remind them as they leave the episode and if you want to be found on the internet where they should look up?
JS I am Julian Schrittwieser, Senior Staff at DeepMind. I sometimes blog at furidamu.org. It's been fun to be on the show. See you next time!