Code completion is part of every programmer’s working environment, but to plenty of people, it still feels like magic. On this episode, Meredydd Lyff, founder and CEO of Anvil, joins the home team to discuss code completion: what it is and how it works, from first principles to best practices. Plus: Is 90% of biology attributable to magic gremlins?
Anvil is an open-source web framework for building full-stack applications entirely in Python.
Ready to dig deeper into code completion? Check out Meredydd’s talk at PyCon 2022 (he even built a code completion engine live on stage).
ICYMI: Listen to our previous episode with Meredydd about countering the complexity of web programming: Full-stack web programming with nothing but Python.
Connect with Meredydd on LinkedIn or Twitter.
The Lifeboat badge shoutout is back. Today’s badge goes to user Tomasz Nurkiewicz for their answer to Best performance for string-to-Boolean conversion.
Meredydd Luff Oh, this is cheating, but the point is that everything's cheating. You're always cheating. Cheating works, so you're allowed.
[intro music plays]
Ben Popper Gatsby is the fastest front end for the headless web. If your goal is building highly-performant, content-rich websites, you need to build with Gatsby. Go to gatsby.dev/stackoverflow to launch your first Gatsby site in minutes and experience the speed. That’s gatsby.dev/stackoverflow. Head on over, use that link, let them know we sent you and help out the show.
BP Hello and welcome everybody to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my wonderful co-hosts, Matt and Ryan. What's going on, y'all?
Ryan Donovan Hey!
Matt Kiernander Hello everyone.
BP We have a returning guest today, Meredydd Luff, who is the co-founder, CEO, and jack of all trades at Anvil. Welcome back.
ML Hello. Great to be back.
BP Apologies for butchering your name, but I do what I can. You reached out with a pitch about code completion. I want to hear it. Apparently you gave a presentation so let's start there. Actually, no. Just for a refresher for folks, who are you and what is Anvil?
BP Yes, and you came on once before and we had a good discussion about this, but you were working on a project and now have done a presentation on it. So give us the thesis and then I'm going to step aside for a minute and let Matt and Ryan respond.
ML So the talk you are referring to was one I gave at PyCon a couple of months ago about code completion and I think code completion is a really interesting topic. It's one of those things that's part of every programmer’s toolbox. Almost everybody listening to this uses an editor, which as they are typing in their code, is popping up a little suggestion box offering to complete your identifiers, variable names, all sorts. And this is really great stuff and I can go on at length about why it is great from a sort of human factor's perspective, but everybody uses it every day and yet it feels like magic. And so I got interested in this because we ended up having to build our own code completion for Anvil for reasons I will absolutely go into in a moment, which meant that I actually had to crack open the magic box and find out how it works. And it is this marvelous combination of something that feels like magic when you are using it and is actually incredibly simple and is really satisfying because learning how it works teaches you so much more about how your program is running in the first place.
MK I watched your talk yesterday, going over your story, how you got to code completion and everything else, and I was kind of blown away by the fact that I've been using these tools for the last however many years and completely taking for granted and just not wanting to peek behind the curtain, because I was like, “That's going to be far too complicated. All I know is that it works. I'm not going to touch it. I'm going to let the little magical elves run around and do their thing in the background and then just out of sight, out of mind.” But the way that you explained it was very accessible I think for people to understand, “Oh, this is actually not as complex, at least initially, as a lot of people would think it is.”
ML Yeah. So this was a talk stunt at the talk. I'm not going to try and do it here, but I actually built a code completer live on stage in about five minutes just to show how simple it was, because the fundamental principles are simple. This is one of the things I really like about working with computers. My undergraduate degree was in biology, and when you're working in biology, 90% of what happens is the magic gremlins and you just don't know exactly what's happening. Maybe someone has an idea of what gene is involved, maybe someone doesn't. It just happens and you have to sort of keep your focus tight if you want to make any progress at all. But with computers it's fundamentally all a knowable system and so whenever you see a little box with the magic elves, it's just worth thinking about cracking it open because sometimes the answer inside is delightful.
RD I'm curious more about the answer because I would see something like that and think that this is sort of the same as a search box that does autocomplete.
ML Uh-huh. And it really isn't.
RD And it really isn't. I think that's fascinating. So why isn't it like that search box?
RD That's fascinating. I think when you started talking about it, I was like, “Oh, is this interpreting compiling the program on the fly?” It sounds like it almost is.
ML Certainly it's parsing every time, yes.
RD So when it's building out the AST, what happens if there's errors previously in the program?
ML That is an excellent question. So in the talk, I dealt with a series actually of escalating, “Okay, here's the simple version. Here's the thing I've built in front of you. Now what are the complications that I can't build in five minutes?” And this is absolutely one of them. They all reduce to a central theme, which is that if you’re looking at an AST, you're looking at the program on paper, you’re not looking at it running, which means that you’re having to guess what it's going to do at run time. Now depending on how dynamic or not your language is, the greater or lesser extent of guesses, so there's this principle called the halting problem is one of its names, Turing undecidability. If you are just looking at a program on paper, you can't tell for sure in general what it's going to do at run time, which means that in Python, you can throw an if statement around a function definition. You don't know whether that function's even defined. And obviously that means if you're autocompleting later down the file, you genuinely don't know what is and isn't in scope, which means that you are always guessing. And actually many people these days use something like VS Code, which uses quite loose code completion, because the philosophical difference here– sorry, wild tangent. The philosophical difference is there are basically two types of code editors. There are text editors and then there are IDEs, integrated development environments. And an IDE is typically something like IntelliJ or WebStorm or PyCharm or whichever of the JetBrains fantastic products you use– not a paid endorsement, because Anvil is absolutely one of these. It knows about your project. It knows for sure where all the source files are, where all the imports are, and every edit you do is really in the context of a project. And that means it can be fairly confident about where your source files are, what possible libraries you might import, what environment it's running in. Something like VS Code is fundamentally a text editor with bits of sort of IDE-type functionality like code completion bolted onto it, which means that if you’re editing, if you just open a .py file in code or a .js file, it has no idea what is around it. It can take an educated guess, but at a certain point you'll see VS Code fall back to autocompleting, “Well, I don't really know what's going on here, but at least this is a word you typed earlier in the file and maybe you are typing that word again.” There's a big philosophical difference here. You'll find programmers getting very opinionated about “I want a text editor that's being a text editor and bolting it together with lots of other tools,” or “I want an IDE that knows everything about my code.” I'm an IDE person. I like that generally speaking, the JetBrains suite is less likely to do that to you because it can be really quite disconcerting because it's really how much you trust your autocompleter. Because if your code completer will just fall back to giving you random words you've typed before, then you have to think before hitting the tab key every time. And if it's really always giving you parsed results in which it has high confidence, then you get much more confident. But the point with all of these systems is that they are always guessing. It's not like proving a theorem. Building a code completer is like building the graphics for a computer game. Your only job is to keep the human in front of the monitor happy, which means it's perfectly okay to guess. So if you see a branching if statement around that function definition, just have a heuristic guess that they probably mean to define that function and carry on and maybe you'll give someone a slightly accurate code completion, but it's probably going to keep them productive and happy. So that's the simple version. Ambiguities like that actually are fairly rare in practice, but syntax errors, which is what your original question was about, those happen all the time because it's a half-written program, of course someone's left a syntax error four lines up. And if you feed a program with a syntax error into your classic parser, it will go [popping noise] and give you a syntax error. If you do AST.pars, it'll raise a syntax error in Python, similarly again, choose your parser of choice. That means it's raised an exception rather than giving you an AST, which means you now have nothing to go on. And there are broadly two approaches you can take here. If all you have is that kind of parser, then you can go with the grotty little hacks and you can go, “Well, there's a syntax error on line 14. What if I just blanked out line 14 and tried parsing it again?” This is cheating, but the point is that everything's cheating. You're always cheating. Cheating works, so you're allowed. It's better style to do it the other way, but it does in fact get you an awfully long way just managing the text thread. But the really stylish way to do it is to have a parser with error recovery. So this is a parser that can spot that there's a syntax error, that the text it's been given does not conform to the grammar definition it's been given, and instead of giving up and throwing an exception, it will give you an AST but it will usually have like an error node in it. So, I've got a function definition and then two valid lines and one line with a syntax error and then another line afterwards. And it should give you an AST that has all those other statements in, but then just like an error marker instead of the assignment statement or whatever was supposed to be on line 14. And what that means is you can walk over it and sure, again, you're getting an approximate partial read on the code, but at least you're getting some decent autocomplete. Now of course, the problem with that is that error recovery is again a very dicey business. I mean, this was a famous problem with old C++ compilers. I'm showing my age here, but old C++ compilers had this dreadful habit of failing to recover from errors. So once something went wrong, you've got the original syntax error and then it would try to recover, fail to recover, and then because it wasn't sort of lined up with your code, it would then give you a bunch of syntax errors for every line after that in your program because as far as it was concerned, none of this made sense, which is tricky. So, error recovery is another of these approximate processes that's there to keep a human happy. Some parsers are better at it than others. It is the unfortunate lot of the Python autocomplete developer that the pgen parser used in Python is spectacularly bad at error recovery because of the particular way that parser is constructed. So actually, I'm afraid to admit it on a podcast, but here I am doing it. Anvil's code completion still uses the grotty little hacks of mashing the text around when there's a syntax error so that you can identify the syntax error and still code complete on other lines.
MK So considering that you have quite a storied history with code completion and going through a variety of different parsers and compilers, I'm very curious as to when you are trying to design Anvil’s code completion, what were the big problems or big things that you wanted to have as part of Anvil that you were like, “I hated how that was done. I'm going to do this so much better and it's going to be fixing everything and it's going to be magical.” What were the big things you were excited to work on there?
ML All right. So I think we have to acknowledge here that there are really, really good code completers out there in the wild. There are ones built into commercial products like the JetBrains suite, great fan.
MK Yeah. JetBrains just produces quality stuff.
MK With your experience with code completion and everything around that, you must be aware of GitHub Copilot that has launched recently and I'm curious as to where you see code completion coming from that standpoint versus what GitHub Copilot is trying to achieve as well.
ML So Copilot is sort of a completely different beast to classic code completion. It's this honking great big neural network. It is kind of treating your code more like English prose than like code. And this has some advantages, it displays an astonishing amount of what looks like semantic understanding of what you’re doing and the ability to sort of help you with that creative step of what you’re writing next. But equally as a result of that, it's sort of firmly into that simulating creativity space. It's not about, “What are my valid moves next?” It's about, “Make something up for me next.” It's the difference between what chess moves are available next on this board, and what's the next sentence of this poet. It's an incredible achievement. I don't think we'll ever replace traditional parsing-based code completion because when you’re using Copilot you're okay hitting the tab key, waiting a couple of seconds and reading what it produces and seeing if it's a good idea. When you’re using code completion, it's kind of burrowed into your brain stem. I said you don't have more than a couple of milliseconds because that's the length of the feedback loop you are using with your code completer. You just want to type x.i tab, and it will type x.initialize for you. Copilot is not going to give you that, it's giving you something different, and I am looking forward to a world in which we all get to use both.
BP All right, everybody. It is that time of the show. We're going to shout out the winner of a lifeboat badge– someone who came onto Stack Overflow and helped save some knowledge from the dustbin of history and shared some answers with the community. “Best performance for string-to-Boolean conversion,” awarded yesterday to Tomasz Nurkiewicz. Thank you so much, Tomasz, for coming on and answering this question. You've helped a lot of folks in the community. And if you're curious, we'll have this one in the show notes. All right, everybody. I am Ben Popper, Director of Content here at Stack Overflow. You can always find me on Twitter @BenPopper. Email us with questions or suggestions, firstname.lastname@example.org. And if you enjoyed the conversation, leave us a rating and a review on your podcast platform of choice.
RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find me on Twitter @RThorDonovan. And if you have a great idea for a blog post, email me at email@example.com.
MK I’m Matt Kiernander. I'm a Developer Advocate here at Stack Overflow. You can find me online in most of the places @MattKander.
ML And I am Meredydd Luff. I'm Founder and CEO at Anvil. You can find Anvil at anvil.works. I am at @Meredydd on Twitter, although I try to post as little as I can get away with, and Meredydd@anvil.works. It's been great being here. Thank you so much for having me.
BP Yeah, thanks for coming back on. We appreciate it. All right, everybody. Thanks for listening and we'll talk to you soon.
[outro music plays]