The Stack Overflow Podcast

Stress test your code as you write it

Episode Summary

Itamar Friedman, CEO and cofounder of CodiumAI, and Kyle Mitofsky, a Senior Software Engineer on Stack Overflow’s public platform, join the home team for a conversation about code integrity and how AI tools are changing the way developers work.

Episode Notes

CodiumAI plugs into your IDE and suggests meaningful test suites as you code. See what they’re up to on their blog or scope out their open roles. You can also follow them on Twitter.

Connect with Kyle on Linked, Twitter, or GitHub.

Connect with Itamar on LinkedIn.

Today’s Lifeboat badge is awarded to Héctor M. for answering Convert a string to a Boolean in C#. Thanks for spreading some knowledge.

Episode Transcription

[intro music plays]

Ben Popper In case you missed it, Logitech just announced the new MX Keys S Keyboard with a superior low-profile typing experience, enhanced smart illumination, and 17 programmable hotkeys. The new Smart Actions in the Logi Options+ app gives you the power to skip repetitive actions by automating multiple tasks with a single keystroke. It’s like macros with a little magic. Go to logitech.com to find out more.

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined by two of my wonderful colleagues, Ryan Donovan, who is the Editor over on our blog and newsletter, and my buddy Kyle Mitofsky, sometimes podcast co-host. We were together in San Francisco for Next.js last year. Kyle, give me your title again. 

Kyle Mitofsky I am a Senior Software Developer here at Stack Overflow. 

BP So today we're going to be chatting with Itamar Friedman, who's the founder and CEO of Codium AI, talking about code integrity, the way they're leveraging AI and other things to generate cool tests within the IDE. And of course, given that it's in the air, I think we're going to talk a bit about how tools like this and the wider swath of changes in the AI space are kind of redefining the world of software development and how some of that is even trickling out to the end user and consumer, the way people are going to be interacting with computers. The bicycle for the mind, as I like to say. I like to take it for a ride every morning. So Itamar, welcome to the program. 

Itamar Friedman I'm very excited to be here. It's great. Thank you for hosting. 

BP Of course, of course. So I'll ask a quick, easy question and then I'll get out of the way and let folks like Kyle who've played around with your stuff dig in. Just for our listeners, give them a quick background. How did you end up in the world of software and technology, and then more specifically, how did you end up as a sort of co-founder and CEO at the place you are now? 

IF Awesome. So I guess like most of the geeks, I started as a teenager. I had my first company already back then, like 30 clients and a dozen employees. Actually, my first ever employee is today my co-founder, Dedy Kredo. I did my bachelor's degree in electrical engineering, specializing in machine learning, also a master’s degree. I also was in the army in a relevant field. I worked in Mellanox, which became Nvidia. I actually worked on verification and it relates to what we do today. I worked in another startup also on verification, and then started my first startup. It was a great ride, amazing growth, but also a beautiful collapse. And then I had my third company, which is the second startup that we did quite well, eventually acquired by the Alibaba Group. And I started Alibaba Group R&D Center in Israel in Tel Aviv, and for four years there were really satisfying times. I did a lot of machine learning, developing ML solutions as part of Alibaba Cloud, as well as building an app with a dozen AI capabilities and reached 10 million monthly active users. And to sum it up, it's 20 years of managing R&D, which sounds crazy when I say it. And one of my biggest pains was dealing with code logic. There is tools for availability, tools for performance, but code logic is tough. And doing all the large language models and vision models by the way with Alibaba, I realized in 2021 that we reached quite far. And if you're able to frame a problem as a language, then AI could take you very far with helping you to mitigate or solve your pains. And that’s when I left Alibaba in the beginning of 2022, and in the middle of 2022 I started Codium AI.

BP Very cool. 

Ryan Donovan Yeah, I think before Kyle runs ham on this, we get a lot of posts about code quality, but you want to talk about code integrity. Can you give us a sort of definition of what exactly code integrity is?

IF So I think code quality, roughly speaking, we can divide into functional and nonfunctional. Do you agree about that? And I think most code quality tools today and products are focused more on the nonfunctional, like the performance. Do you repeat yourself with your code? But for functional, I think we're really missing tools that, for example, in hardware, we actually did have. When I worked in hardware verification, there were some formal tools, etc. And in software, you have fuzzy testing, mutation testing, but none of them really feels comfortable enough or useful enough, unless you're sending, I don't know, spaceships and then you probably don't have a choice. But for most other software, you probably won't use it. So when we're talking about code quality, I think roughly speaking, most people actually are relating to nonfunctional. I'm talking of the functional part. And for that, when we talk about the code level, then for us, code integrity, the most basic part of it is code logic testing– verifying that the code is working behavioral-wise as expected. That’s the narrowest definition of code integrity. And if you may, more generally speaking, then also you can take part of the vulnerabilities and developer experience, et cetera. For example, if you're able to verify code logic, then probably you can also, for example, help with automation of pull requests and verify that these are done according to best practices. So for us, code integrity is also the biggest envelope and concludes that part. 

KM Yeah, so code quality, of course we want it, and maybe it's hard to actually say, “This has quality 10, this has quality 7,” but there are paths to get there, and one of the paths is testing. We have test coverage around our code and it gives us confidence in that code, and I think that's something that Codium AI is looking to do. One of the challenges, of course, is that every team I'm a part of is like, “Yeah, we should test more. Yeah, we want to test, we have some tests, not a lot.” And it's this constant grind to kind of extract tests from the development workflow and actually there. They run on every single build. Assuming you have some CI service that's going to do it, they're there. You don't have to run them manually. You get them for free in perpetuity, but they're hard to get in there initially. And I guess that seems like where this is trying to add that nudge to be like, “Hey, test this. Hey, test this edge case here.” And there's even– I was poking around with this a little– cool edge cases like, “Hey, have you passed null into here? That's a valid input. What happens when you do that?” And you get to decide when you write the test what the valid output is. You might not have thought about that yourself, but having something automatically give you a test was kind of cool there. I guess I'm monologuing here. If you want to speak to what the role of just adding tests into a codebase is, what the role of the software is to ensure code quality, and where you see that.

IF For listeners who didn't yet try us, I'll explain really shortly what we do. So Codium AI basically is an AI coding assistant that focuses on helping developers reach zero bugs. And that's important. Our vision and our focus is around helping verify code intent. For example, Tesla, the vision is zero emission. Then for us, zero bugs. Let's clean the internet from bugs. No more bugs in software. That's where we want to get to. It’s very hard, I know. But it’s important. And basically, what we do at Codium AI is we developed our first product, which is an IDE extension that sits aside to your code. If you have one tab open on your code, then you have another tab open with Codium AI, and with Codium AI, what you can do is basically create tests and get different challenges on your code. Test is one way to do it, like you said, Kyle, but there are other ways to verify code intent. So definitely, like you said, Kyle, I think developers do not tend to add tests too much during coding because they hate it, or they do it because they think it's important, but they still hate it. And our role here in the world is to spread love, and stop with this hatred towards verifying your code logic while you write your code. And the way to do it is to somehow help to reduce the time it takes from 10 hours to 15 minutes and do it in an interactive manner. So, like you mentioned, Kyle, we try to automatically raise edge cases or happy paths, and challenge you with a failing test and explaining why it might actually fail because of the code. I don't know if you bumped into that, but it happens often.

BP So there's a little layer of sort of gamification and discovery there that you're using to try to make it more enjoyable? Folks are giving you a codebase or repo to scan, and then you're coming back to them with a few, “Hey, have you tried this?” Like Kyle was kind of saying, “But what about null?” And as you walk through that, then maybe that's a less grueling process than, “Ugh, guys, we always need more test coverage.” Do we have enough? What is enough? When do we know enough is enough?  

IF So I want to relate to both the gamification and the code coverage. So gamification, yes, but we didn't yet take it to the extreme, maybe even not close to that. For example, even badges we didn't give yet, although we're thinking about it. There are many types of tests that you can use: property based testing and a dozen other types. And usually when you do CS or a bachelor’s degree in Computer Science, you do learn so much about different software paradigms, but almost never on testing. And I guess that most developers won't go do a testing course on Udemy or something like that. So part of our purpose and our idea is to step by step getting people more educated, so adding badges for different types of tests, or you covered something we're thinking about but we didn't do it. But the core functionality, it's not gamification towards games, but the core functionality is interactive– making things smooth and fun. So definitely yes, we might go a bit toward badges and things like that for education purposes. Now about code coverage, I want to say something. I think we interviewed almost 100 directors or managers of R&D and engineers, programmers, et cetera, and we asked them about how they measure code quality or how they measure whether they have enough tests or not. And one of the things we heard, of course, is code coverage, but we heard that people think and they're aware, in my opinion, that the code coverage actually is a proxy metric that you actually sufficiently cover your code and even maybe a vanity metric. And one of the things that we want to help with is to also explain a bit the importance or the value of different tests. So when you're generating tests, basically what you make sure of is that you cover important behaviors of your software. So it's a good thing if you could supposedly map all the behavior of your software, and then verify that these behaviors are covered by tests. So it's not how many lines you cover, rather how many important behaviors. Behaviors are almost like user stories to some extent. So this is something that Codium AI also helps to autogenerate, by way also TDD style. You can create behaviorization even if they don't exist yet in the code, and after you finish defining the behavior, you can generate the test that failed, and with that, generate the code that makes the test green. 

KM Checking in here, as in a former life maybe as a writer of bad tests, where we had a codebase where we measured the test coverage and you could not get below 80%, and if you did that commit failed, and so you had to chip in some tests somewhere and whatever the minimum effort was to push out this feature unrelated to the tests, that's what people did. And so I totally agree that sometimes the absolute raw numbering code coverage is a vanity metric. I mean, it's certainly good to have, but pursuing it in its own right seems like the wrong end to getting lots and lots of test coverage.

RD Yeah. I mean, any metric used as a goal becomes worthless. It's really interesting that you all are not just generating tests, but you're trying to teach people how the tests work and how to understand it. Because I think a lot of the criticisms leveled against the sort of generative AI space is that it's just giving you the answer without you understanding it. How much of that goes into your thinking about this? 

IF Okay, that's a very important aspect for us. Two different customers told me today that they're using for a while code generation. In the beginning it's a lot of excitement, and then when they really try to analyze the effectiveness, they think it kind of sums to zero. I'm not saying that this is true for everyone and there is a lot of reports that I could totally get it. And by the way, I'm also using a code generator and I think it’s cool. But I think it’s basically like a very advanced search, and the magic might also create some selection bias that you got this piece of code, okay, it worked so many times. I'll select this and let's move on and later I'll verify and maybe we didn't put our entire thinking into it. And at Codium AI, what we try to do is not the opposite, but a different direction where we want to help you think maybe easier. Our first goal is not helping you generate more code, rather making it easier for you to think about your code and verify your code, and then as a byproduct you also generate more code, but also of higher quality. So it's very important for us, and this is why, for example, one of the features that actually was introduced I think a few days ago is the fact that if a test fails, for example, we're telling you, “Hey, we think that if actually the test is wrong, here is the reasoning and here is how to fix. And if your code is wrong, here is why and here's how to fix.” And then it forces you to think for a second, but it helps you, even maybe lets you think about things you wouldn't think otherwise. But it helps you think about these things much faster and saves you the small notches. So that's how we think about it. It's very important for us to challenge the developer and help the developer to reduce the time it would take to challenge himself. Think about it like an additional developer doing for you reviews as you go, but these are not annoying, but actually fun and helping you cover those things that you don’t want to do by yourself, as opposed to helping you to do the things that you do want to do, which is coding. 

BP Yeah. I was chatting the other day with some of the folks who are working on Google's generative models before the IO announcement and they said exactly that. But the issue, to your point, is when you have that, you can ask it to do a million things and all of a sudden you've got tons and tons of code which you now have to review. Was it actually efficient, or to Ryan's point that he makes a lot, does this really meet the requirements for the project? Now I have something that could just endlessly be generating code. That's not necessarily a good thing. It'll end up later with more headaches to deal with their test coverage that you have to spin out. So you wanted to sort of look at this more broadly, and I guess one of the things I'm curious about, to your point, is where do you use this judiciously? So from that broader perspective, what are you excited about and what are you maybe a little trepidatious about? You mentioned wanting to chat about how the world of software development is going to change, what are you looking ahead to, and within your organization you mentioned you have code generators, so how are you trying to use them and use them in a way, because you kind of said this, we're seeing a lot of productivity gains being claimed, but maybe customers are coming back to you and saying after a few weeks, “Well actually, no, we've just got tons and tons of spaghetti that we don't really know what to do with now.” 

IF So first of all I'm too excited about it. First of all, Codium AI is very active in the Auto-GPT community repository. We’re a contributor there as a few of the developers. Actually Toran and I– the maker of Auto-GPT– had just this week a workshop. It's recorded online where we talked about an Auto-GPT roadmap as well as Codium AI’s vision. So we're very excited about the idea of agents and automation. As I see it, I think there are going to be two groups of tools. Perhaps there will be assistance and agents. And for example, for me, Copilot is an assistant. It's almost like in Gmail, helping you to continue writing an email. And the agents are more like an additional team member that has certain skills that almost can complete work end to end, and maybe you need to review it. By the way, maybe that additional team member reviews your code, but you still need to review the review, and this is how you almost always do it. So I think eventually we will see agents all around the place. We will see an agent giving you comments while we're doing the pull request. You'll see a Codium AI agent, right now it's an assistant and it's getting more and more automated, but you'll see soon a Codium AI agent that helps you, “Oh, hey dude, you did a pull request finding a bug but you didn't show a test that show that it's failed before and passed after. Here's a suggestion.” And it's end to end. It tried it, it verified it, and now it's suggesting you do that. I think we'll see also agents even in other locations like deployment, helping you to deploy almost to the cloud, so it's not only about coding. 

KM So we've kind of talked about agents, kind of something that smells like a PR review. Somebody's like, “Hey, have you thought about this?” And I think if you push enough code to production you get all this scar tissue of, “This thing went wrong and, oh, I need to go fix it in the future. I forgot to add caching. I forgot to add docs or something. I forgot to do this thing that would've helped teammates work on this problem.” And so sometimes you get a pull request and somebody chips in some comments, “Hey, have you thought about this?” that reviewer mindset of doing it. And the challenge when you're trying to write code is that junior folks might not have as deep a toolkit. They might not have all of that muscle memory built up. And for senior folks, it's just a lot to remember. You start something from scratch and it doesn't have any of those things yet and you forget to do it. So for a long time we've had tools like ESLint that kind of help give you those nudges– shift left while you're in your IDE it says, “Hey, by the way you never reassigned this variable. Maybe make it a const or something.” And we've had those based on abstract syntax trees, and there's a lot we can do with those. My understanding is that now we're kind of moving into this new wave where the recommendation is not going to be based on an abstract syntax tree, an AST. It's going to be based on its AI. AI decides what imaginary rule you violated, explains the rule to you, says, “I think you violated this rule and here's where to go from here.” Do you see us moving from these kind of more static tools to these more LLM-based tools for those suggestions that get chipped in? 

IF So it goes like this. I think the collaboration between the machine and human would be much closer. What you said, I think it resonates for me, but let me suggest a small modification which I think is critical. So I think the future is going to be a bit more iterative. For example, a machine would go over your entire repository if it exists and tell you all of the best practices that it sees, and then it actually can do it not only on your own repository, it could do it on many. And then a few tech leads can go over this and say, “Oh my gosh, this is how we wanted to do it, actually. Not like this. And by the way, this is our best practices for testing, this is our best practices for calling an API. This is how we like to do the queuing. This is how we like to do caching and all that.” And it'll be configurable, the rules will actually be a mix of AI suggestions and human definition, like the best practices that a company wants or a group of developers wants to apply. And then that's the amazing thing. The agent or the assistant that will connect it to this configuration would be able to help junior developers raise their level really quickly. Not only raise their level, but also raise their level to fit the knowledge and best practices in the company. So this is how I think this aspect of the future is going to look like. 

RD So we see a lot of stuff of people worrying about AI taking coders jobs, but it seems like the actual tools that are coming out are these sort of assistants, these things that are replacing parts of the workflow that are a pain. Do you think that there will be any sort of threat to coders in the future or do you think it's all a bunch of hand rigging?

IF So I think first of all, predicting in general is really hard, especially the future. And I think there is a difference between 5 years to 10 years to 15 years. I think it's really hard to predict the 15 years part. But here is how I see the 5 years part. I think that people are actually overestimating the capabilities of technology right now and are underestimating the capabilities in the long future. So I think right now, if you actually play with these tools, there are those times where you're like, “Oh, wow. I can't believe it. It managed to do that, it’s so intelligent.” But then like, “Oh my gosh, it's so stupid,” in the other two cases. And so I believe in the next few years, yes, it's going to advance really crazy and fastly. But still, I believe that there will be actually even at least the same amount of demand for developers as there is today. Again, I'm talking about 5 years. So I think for like 80% of the software, and I'm talking really roughly speaking, I don't have the exact number, you would need less developers. I think if these companies, these products, will exploit generative AI technology like the no-code/low-code frameworks, they would be able to do much more than they could before. So creators would be able to create software that is more flexible than it was before. Supposedly we should say, “Okay, there's need for less developers,” but at the same time, I think developers would be able with this technology to do much more sophisticated programs, much more complicated programs, to take satellites and whatever that requires zero bugs–

BP Take us to Mars. 

IF Take us to Mars, and actually demand for that software will even increase for different reasons. So I think at least for five years, many more people that are not developers will be able to develop, but also developers will be able to do much more than they could do in the past with sophisticated software, and they will be less focused on building a simple PHP website or something like that. Sorry for the PHP devs.

BP Yeah, I love that idea. I think one of the things that stood out to me recently was that all teenagers now are their own video and audio editors, and that used to be kind of advanced skills which you needed certain technology or training for. Now everybody is kind of their own multimedia producer, and to your point, maybe in the future everyone will also be at least at a low level, their own software developer creating their own website and their own app or whatever the project may be.

IF Exactly.

[music plays]

BP All right, very cool everybody. Today I want to shout out Héctor M., awarded a Lifeboat Badge on May 20th for coming in and rescuing a question with a score of -3 or less. “How to convert a string to Boolean in C#.” Héctor has got an answer for you, and we've helped over, ooh, 300,000 people, so appreciate you sharing the knowledge. I am Ben Popper. I am the Director of Content here at Stack Overflow. You can always find me on Twitter @BenPopper. Email us with questions or suggestions for the show, podcast@stackoverflow.com. And if you like what you hear, leave us a rating and a review. It really helps. 

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. If you want to reach out to me on Twitter, you can find me @RThorDonovan. 

KM I am Kyle Mitofsky, a Senior Software Developer here at Stack Overflow on our Public Platform team. You can find me on Twitter @KyleMitBTV, and also at Stack Overflow in User ID 1366033. 

BP Oh yeah.

IF Thank you very much for hosting me. I'm Itamar Friedman, the CEO and co-founder of Codium AI. And you can find me personally on Twitter @itamar_mar. 

BP Alright, everybody. As always, thanks for listening and we will talk to you soon.

[outro music plays]