This is part two of our conversation with Chris Lattner, creator of Swift, Clang, and LLVM and CEO/cofounder of Modular AI.
If you missed the first part of our conversation with Chris, listen to it here.
Modular’s new programming language, Mojo, is built for AI developers. Check out their docs or find them on GitHub.
Connect with Chris on LinkedIn.
Shoutout to user DanielGibbs, who earned a Lifeboat badge by answering what is the difference between getType() and getClass() in java?.
[intro music plays]
Ben Popper All right, everybody. Welcome back to the Stack Overflow Podcast. This is Episode 2 of our interview with Chris Lattner, an American computer scientist perhaps best known as one of the creators of Swift, LLVM, the Clang compiler, among others. He is now the co-founder and CEO of Modular Inc., and today we’re chatting with him about Mojo, a new programming language from Modular that tries to bring together the best of Python and C, and specifically, put those to work in the new AI ecosystem. If you want to check out Part 1, it’ll be in the show notes. Here’s Part 2, and as always, thanks for listening.
[music plays]
Ryan Donovan We talk about when Python developers have some part of their code that's C++ and they have this bridge to it, and I know a lot of languages have that C++ bridge, like Rust and Go. Do you think there's perils to writing code that way?
Chris Lattner I think that it's a really super pragmatic way to do things, and so Mojo allows you to talk to other external languages as well. The value of that is that good code should not be rewritten, and something once written doesn't get unwritten. People build other things on top of it. The challenge is when you're forced to do that to get performance. And so if you look at Python, for example, often what happens is you build a large scale Python codebase and you get into production and then you realize it's too expensive or your latency is too long or you have some other problem if you're on a server environment. And in AI, often what you'll find is that you're talking to GPUs and you're bottlenecked in Python instead of keeping the GPUs busy, and so the GPUs don't get fully exercised because Python is taking too much time.
BP And those are expensive machines. You can't waste that.
CL That's exactly right. And so as a consequence of that, what that forces or has forced is for you to say, “Okay, well, I've made this huge investment in this Python codebase. Now I have to go rewrite chunks of it in C and C++ just for performance.” And that's the thing I find deeply unfortunate, because now you've gone from a beautiful Pythonic world to this hybrid world with a lot of complexity and now you have to know the CPython internals, and there's all this stuff that goes with that, and now you have to understand the memory management stuff, and now you have to understand how C works, which many people know, but many people should not have to know anymore. And so what Mojo allows you to do is say, “Okay, well, you can start on that same journey.” And out of the box, Mojo is faster than Python. It's not 35,000 or 68,000 times faster on all things, but random people on the internet are saying it's a thousand times faster, which is still pretty good. And so what you can do is you can then say, “Okay, well I can code without fear of a rewrite. I can code without fear of having to split my world into this hybrid CPython world.” One of the other challenges with that is that in that hybrid CPython world, there's no good debuggers that really span both of those and so really living in this hybrid world becomes super painful very quickly, and Mojo can help with that. And the key here is not that C is a bad thing, it's not that C++ is a bad thing, it's not that interop is a bad thing, it’s that you shouldn't be forced into splitting your codebase and splitting your world into these two halves.
RD Yeah. A lot of them feel like an instant ecosystem, just add C.
CL Well, and in particular, this is something that the machine learning world has been struggling with because, again, GPUs and high performance and stuff like this, but I think it's quite common in many other domains. If Mojo can help with that, I think that'd be very helpful.
RD We talked about the Modular AI engine, the coolest thing that's been built in Mojo so far. Can you talk a little bit about how that works and how it replaces your PyTorch and TensorFlow?
CL Yeah. One of the things that's super interesting about the machine learning deployment problem, which I assume many of your listeners are not super familiar with.
RD We wrote a blog about it.
CL Awesome. So lots of people talk about the TensorFlow and the PyTorch part, and lots of people talk about the training part of building a model, and pretty much every computer science student coming out of university these days knows how to train a model. But there's this dark art, this unspoken truth that is super painful, which is how do you deploy a model? And on the one hand, the easy thing you can do is you can say, “I have a PyTorch model. I will take that thing and I will put it into production as is and run it,” but now what you have is you have Python in production. And so now you run into scalability problems, you run into performance issues, you run into what versions of my packages am I depending on kinds of problems, and there's a variety of challenges that come with this. And so many of the AI shops that are actually deploying models at scale, the ones that are quite serious and have significant spend on AI, they end up switching to a different system. So this is where you get, for example, the ONNX runtime translator thingy, or there's a thing called TorchScript– a whole bunch of these other systems that have been created over the years that handle specialized parts of this ecosystem. Also every hardware vendor ends up making a specialized set of software optimized for their chip, and so what it means is if you're deploying a model, you end up being exposed to this massive amount of complexity because you're deploying to cloud, you don't want to know how the hardware works generally. You don't want to know if you're on Intel chips or AMD chips. You want to be able to go where there's availability and where the instance prices are less, and you want to be able to solve your product problems instead of having to micromanage all of this stuff. Now historically, this fragmentation and all this deployment tech has come from the sad reality that there's really no good unifying theory for this tech. And so what has happened is a lot of the hardware folks, who are brilliant by the way, they have to build a vertical software stack for their chips. And so Intel has spent huge amounts of money in the space and has built a whole bunch of really good stuff, but their stuff doesn't run super well on ARM chips, for example. And this fragmentation that comes is not because Intel is mean or something, it's just that they're focused on their problems, which makes tremendous amounts of sense. And so what we're doing and what our AI engine does is it provides this missing system, this thing for hardware folks to plug into. It provides really high performance, good generality, extensibility, and programmability, and it's this kind of next step forward in terms of this AI deployment technology, and the major purpose is not even just performance, it's just simplicity. It's the fact that you can have one thing that scales instead of having 42 different tools and all the complexity that comes out of that.
RD Right, instead of having frameworks for your framework.
BP Yeah, I think Ryan and I have experienced this firsthand. We were working with a couple of clients, Qualcomm recently, that wanted to talk about their AI stack and we have others coming up. One of the things that Qualcomm wanted to emphasize was that if you use their AI stack, you can make the most of the multi-core architecture on the chip. I noticed one of the things that you mentioned on the Mojo page was that you can get high-level language if you want if you're familiar with Python, but you can also get close to the bare metal, and you mentioned, that's kind of how you grew up in the 1980’s and some of what first attracted you to computers was that ability to do that. So can you talk a little bit about what somebody could accomplish at that low level and how that works with what you build in Mojo?
CL Well, so one of the challenges of these AI stacks that have been built out, there's kind of two different categories, and they're making very different trails. One is the category of these systems that will work for one very specific point solution for one specific set of hardware, and so it’s like, “This is the solution for transformers on this chip,” for example. And maybe they have really good performance, but they're really narrow. And if your product evolves, you want to have transformers and recommenders or something like that. It becomes a big problem. The other side is you get these systems that try to provide generality or flexibility or portability or something like that. And so if you try to do that, what they end up doing is they end up watering down and giving you this lowest common denominator way of accessing hardware. And so these systems can be cool. So one example is XLA, which I worked on previously. It has a very simplified RISC-like instruction set for machine learning. That's cool because you can map this RISC-like instruction set onto many different kinds of hardware and hardware folks know how to do that. The challenge with that is that this hardware has unique capabilities. People are coming out with TF32 numerics or they're coming out with some crazy new dot product or some new Sparse kernel or machine learning models based on FFTs. The innovation in the space is just continuous, and so these lowest common denominator solutions prevent you from getting the full power of your hardware. And for a lot of folks, particularly these LLM companies for example, they may sign a contract for $100 million worth of GPUs or something, or OpenAI is talking about $1 billion or $10 billion of GPUs.
BP Yeah, I was going to say that 100 million is nothing these days for GPUs. Maybe add a couple zeros to that.
CL I know. And so when I talk to these folks, if you're going to spend a billion dollars on H100 GPUs, you don't want the lowest common denominator solution, you want to get everything possible out of an H100. One of the key technology components of Mojo is this compiler tech called MLIR. MLIR is a compiler stack that's part of the umbrella of compiler technology that LLVM includes, and so it's part of the LLVM open source project. But MLIR is a completely novel approach to building compilers. We created it back in 2018, so it's quite new still. And this allows you to talk to all these fancy accelerators, all the weird features, the tensor cores, all this kind of stuff, and be able to express that in a way that is quite low level and you get full access to the hardware. Mojo then builds on top of this, and so Mojo is the first language actually designed for MLIR. People have been retrofitting MLIR into existing systems, but Mojo is the first one that really takes advantage of that full power and allows you to build things that can go all the way down to the metal if you'd like to, but then with our AI engine, we can have higher level abstractions and we can provide that it gives you general portability without preventing you from diving into the details when you want to.
BP I know you said there's no magic in all this, but you did mention really early on that there's some alchemy when you go from human interpretable to machine language, and that's always fascinated me.
CL Well, it's alchemy, but super interesting, I haven't thought about compilers as alchemy, but you are transmuting certain kinds of elements into others, so I can see that. But there it's all about understanding the quantum mechanics that go into how these things work, I guess.
RD Maybe not alchemy, maybe just chemistry.
CL Yeah, exactly. It's alchemy until you learn chemistry, and then it becomes chemistry.
RD So are there things you can talk about for the future that you're planning to handle how AI is going to develop?
CL So on the AI side of things it's a super exciting time, and so we're working to get the AI engine out to more people. There's a whole community aspect that is really exciting about the machine learning world and so much of it is driven by research and academics and things like this. As a consequence of breaking down the complexity in the stack, what we're hoping to do is catalyze entirely new avenues of research. And one of the things that people forget is that there are people that are insanely brilliant with differential equations. There are people that are insanely brilliant with domain specific algorithms like autonomous car driving thingies. There's amazingly smart hardware people, there's amazingly smart compiler people sometimes, allegedly. But the trick is getting these people to work together. And so by driving out the complexity of this stuff by making it more consistent, what we're doing is we're allowing more of the stack to fit in any one person's head. And what's happened historically is that you become a CUDA expert, and being a CUDA expert and understanding the domain problems, some people can do this, of course, but it's a very different world and so very few people can participate at the highest levels. And so we're hoping that by making things more accessible, again, eliminating the accidental complexity, which allows you to focus on the inherent complexity, that then we think will allow many more people to participate and open new avenues of research, and hopefully the next transformer gets built on the modular stack.
RD That sounds good.
BP So we've talked a lot about your company and the language you've created. If you don't mind, I'd like to go a little higher level for a minute. Since you're very excited about what's happening in AI and what's being built, how do you view what's happening in the world of AI when it comes to the capabilities of these generative systems, and where do you sit on some of the debates that are happening about how much these systems really understand versus just being stochastic parrots.
CL So I think, to give you a broad view of AI, it's super interesting, because on the one hand there are the inherent, correct truths and fundamentals in the systems, and then there's all the hype and BS. And so let's acknowledge the fact that there's a lot of hype and a lot of BS and there's a lot of things that maybe are not true. What I do is I try to focus on what people are actually doing and when there are actual business models and when they're actually impacting products. Maybe a better way to separate it is there's three things: there's stuff that's BS, then there's stuff that's research and so it's not in production yet, and then there's stuff that's production. And I won't talk about that first category, but often what's happening with research trickles down into production, but it can take a long time. And so we've seen LLMs, for example, they've been around for quite a long time. Actually, they weren't called LLMs, but these language models which have been getting bigger and bigger and bigger, and ChatGPT is what woke up the world. And many common folks that are not focused on the AI part of the tech stack really woke up when they saw that because there's a user interface innovation that allowed this chatbot-like thing versus just token prediction to happen. But what I see happening is more of that research will get to production, more of that production ends up in products, more of that product impact ends up changing our lives in better ways. And I'll just say I'm an AI maximalist. I want to see more of that because I think that what AI is if you take a step back, is it's really the best way for computers to understand the human world and the humans themselves. And so much of what we're doing is unlocking a new compute paradigm where now you can be more personal, you can be more interactive, yes, you can understand the difference between a dog and a cat. And so I see that as inherent. Now, what does it mean at scale? Well, I think the world will continue to get more interesting. I'm not personally too worried about AGI happening next year. I think that that's just like how autonomous cars were supposed to be all solved in 2020. Some of these problems are a lot harder than you might think they were when you start out, but I'm very excited to see how it goes.
RD No fears of Skynet happening anytime soon?
CL So I have two opinions on that. One is that if it happens, then there's not much we can do about it. I think that the bigger issue that I'm worried about is the human side of this, not the tech side. And so if you look at AI as just a tool, it's the hands that wield it that are the things that are concerning and those people already have a lot of war machines and other stuff going on and so there's a lot of other things to be afraid of in the world already. And so I think it's not categorically new, it's just more of the same.
BP Well if Skynet arrives and you say there's nothing we can do about it, I'll have to check back on your next coding language, maybe it'll be focused on time travel. I think time machines is the only option we have here.
CL There you go.
BP You mentioned quantum mechanics before, maybe it’s not out of the question.
CL Well, quantum computers will need a way to program them as they reach their own scale.
[music plays]
BP All right, everybody. It is that time of the show. Let’s shout out the winner of a Lifeboat Badge. Thanks to DanielGibbs for coming on and giving a great answer and putting some knowledge out in the Stack Overflow community, helping to save a question. “What is the difference between getType() and getClass() in java?” DanielGibbs has an answer for you, helped over 35,000 people, and earned himself a Lifeboat Badge, so thanks, Daniel. As always, I am Ben Popper, Director of Content here at Stack Overflow. You can find me on X @BenPopper. You can call it Twitter if you don't like X. You can email us, podcast@stackoverflow.com with questions or suggestions. And if you enjoy the program, give us a rating and a review, because it really helps.
RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And I suppose I'm still on X @RThorDonovan.
CL I'm Chris Lattner, and you can find me on X/Twitter/whatever that thing is @CLattner_LLVM. We're building lots of cool stuff at Modular. You can visit us at modular.com and check out Mojo and check out everything else that's going on there.
BP As always, thanks for listening. And a reminder, this was Part 2 of our episode with Chris Lattner. Part 1 came out last week and you can find the link in the show notes if you want to check it out.
[outro music plays]