The Stack Overflow Podcast

Would you trust an AI bot to find the fix for vulnerabilities in your code?

Episode Summary

On this episode: Eitan Worcel, CEO and cofounder of Mobb, a company that uses AI to automate security vulnerability remediation, talks about how AI can help reduce security backlogs and free up developers’ time, what security risks emerge with GenAI, and why we still need a human in the loop.

Episode Notes

Mobb offers AI-powered technology that automates vulnerability remediations with a goal of helping development teams significantly reduce their security backlogs and free up more time for innovation.

Check out their blog or dive into their docs.

Connect with Eitan on LinkedIn.

Shoutout to Konrad, who won a Stellar Question badge for What is the difference between private and protected members of C++ classes?.

Episode Transcription

[intro music plays]

Ben Popper Maximize cloud efficiency with DoiT, the trusted partner in multi-cloud management for thousands of companies worldwide. DoiT’s innovative tools, expert insights, and smart technology make optimization simple. Elevate your cloud strategy and simplify your cloud spend. Visit doit.com. DoiT– your cloud, simplified.

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined by my colleague, Ryan Donovan. Ryan, you invited our guest today, so why don't you tee things off? Who are we going to be chatting with and what are we going to be talking about?

Ryan Donovan Sure. Today we're going to be talking to Eitan Worcel, co-founder of Mobb. They just released an automatic bug fixing tool that uses AI, goes through your repo, fixes all the bugs. It sort of seems a little bit like– oh, what do you call it– the virus checkers that automatically fix all your viruses. So it's interesting. I want to see what's going on under the hood here and how he creates security fixes with AI that you can trust.

BP All right, then without further ado, Eitan, welcome to the program and thanks for joining us.

Eitan Worcel Thanks for having me guys. I really appreciate the opportunity here.

BP So tell folks just quickly, how did you get into the world of software development and what led to you founding a security company with AI in the mix?

EW My journey into software development started the previous millennia during the dotcom era. I went to college to learn computer science with the plan to retire before the age of 30. That didn't happen. I found my way into the cybersecurity market in 2007 as I joined as a developer in a company named Watchfire, working on the first automated DAST scanner named AppScan. After a few years, I switched sides and moved to product management. I moved to the US in 2016, and my last role before starting Mobb was the Head of Product for the entire AppScan business which is a mix of technologies to find vulnerabilities within your applications. Now, at the end of 2021, I left my company and then I joined forces with Jonathan Afek, who is my CTO, and we went to start Mobb with the goal to make a big change in the application security space and the security space as a whole. Instead of focusing on tools that find problems, try to solve problems. That's what we're doing today.

RD So you've obviously been in the security space for a while working on the dynamic application security testing. How does AI change everything? I mean, not everything– the security.

EW It changes a lot. I think that AI in general, not just Gen AI, but AI in general is good at doing a lot of repetitive work instead of manual work. There is a lot of repetitive work in security in general and application security in specific. We are focusing at the moment on a technology called static analysis, which is scanning the application source code, first party, the code that your developers are writing, in order to find vulnerabilities. And what we want to do is take those reportings and fix them automatically. Doing it manually is not working. For many years, that just didn’t work, and AI helps us significantly in how we are able to remediate the reported findings.

BP Can you just give us a really simple example? Somebody has a mobile app or they have a cloud offering that people get to through the web. They agree to work with you, and so you have access to their sort of first party code. What would you go through and look for? And if it was found, how would your AI assistant go about solving that without introducing new complications?

EW So I do want to make one distinction. I know of other approaches of remediating vulnerabilities where you send it on an AI model that you either tuned or created and expect things to be fixed. We tried that approach, it didn't work. Our approach is more of ‘we build a fix and we use the AI to enhance our coverage on that fix.’ And what I mean with that is, to your point, let's say I'm a developer and I'm writing my code and I'm committing my code now to GitHub. I may already have some security scanner checking my code to see if there are vulnerabilities, and in most cases, it will find some and report some vulnerabilities. As a developer, at this point I need to fix them or ignore them, sadly. What Mobb does now is it will take the results of that scanner, identify the problem, let's say SQL injection which is a very known one. We know what the root cause of this problem is. We have patterns to find that root cause, and with a mix of our algorithms and Gen AI which we call Hybrid AI, we will generate a fix for the developer, present that fix to the developer in their GitHub so they don't need to go anywhere. They stay in what they know and love and can evaluate the fix and commit it. And to your question earlier of how do we not introduce new problems, that is because we don't go the easy route of, “Hey AI, fix this for me.” It's more of, “Hey AI, here's the fix. I had 98% of it, I just need your help in locating this pattern,” let's say, where we give the AI simple atomic tasks like string manipulations and searches that AI is good at. No risk for contamination with IP that you don't own, no risk for hallucination because the tasks are very, very narrow with what we are using it for.

RD You're keeping the human in the loop. You're making sure there's always a person there making sure that AI is not making stuff up or applying the wrong pattern or something like that, right?

EW We are keeping the human in the loop not for that, that's not our concern. Our concern is that we are touching your code. You're a developer. Psychologically, you shouldn't allow us to change your code. Second of all, we don't know everything about the code. We know how a fix should work, but if there is something else in your application that will now break because of it, you would know best. So far, not touch wood, we didn't break anything, but I don't want to claim that I know everything about the application.

BP So let me see if I can just give a layperson analogy. Let's say I write a long document. An AI then goes through that and suggests, “Hey, this might be a grammatical error, and this might be a spelling error,” and it also has suggestions for how I might fix those things. That often saves me a lot of time but it doesn't make the fixes automatically. It underlines them and leaves it for me. So this would be similar in that you've told your system, “Hey, look for these patterns,” and then restricted how it generates what it generates to avoid hallucinations. Here are the kind of solutions we want you to offer, or here is literally the solution. And then by identifying those areas, you free up a lot of developers’ time. They don't have to go through and do that huge scan themselves, and maybe nine times out of ten they can easily just accept your solution because they know it'll work. Does that sound right?

EW It does. I want to restrict one thing. We are not doing the detection ourselves, we're relying on your existing tools. I want to come to you and help you, and whatever program you already have today, whatever tool you're using, we will sit on top of that, take those results and fix them. And as you said, all you need to do is review and click commit if you're happy with it. And if not, we will also point you to the areas that we are not a hundred percent sure of, that we needed to take some educated guesses on, and show you the alternative answers, let's say, to those questions. So you as a developer know best your application, we know best security. Let's join forces together and we will do the ugly work for you so you can continue to innovate.

RD And like you said, those security issues follow known patterns. Are there specific security issues that you see time and time again that are just the sort of easy stuff?

EW Sadly, we see things repeated. SQL injection, even 20 years after it was an OWASP Top Ten, is still being shown a lot. XSS– cross site scripting is shown a lot. There are things that we see and we still don't know how to fix automatically and we're not sure that we will be able to fix automatically and we will do a lot of the work but the developer will still need to do some work. I think, just for the audience, it will be easy to understand, if we see that a tool reports that you're using a weak encryption, changing it to a strong encryption is easy but we need to find all the places that you decrypted the data and change that also. And if the SaaS solution doesn't tell us about those, we can't fix it. So we are very careful when we are suggesting fixes to not break your application, and AI is not going to be the solution for everything, but it's going to save a lot of time.

BP I want to let you talk your book here. Obviously you work at this company, and people can fact check this on their own, but give me your favorite example of, “We worked with company X, they have 50 developers, 500 developers, their code base is so large, and we're able to save them how much time.” Paint me a picture, here.

EW So there is two interesting stories. One– a large Fortune 500 software company was working with them. They showed me the massive backlog that they have, massive security backlog. Now, it's very common for large organizations to have a huge security backlog because it takes a lot of time and do you want to use your developers to advance the business or reduce the backlog? That's always a balance. They saw how with Mobb they can save over a million hours in the next year of using it and they are onboarding it now and seeing the progress. Just two days ago I spoke with another very large organization that shared that between 5-10% of their developer time is spent on these exact issues and they are eagerly working with us now to see the value of those. So we started with smaller companies, but for the bigger ones, we're talking about saving a significant amount of time because they care about security. If you don't care about security, honestly, don't waste your time on using Mobb. It will help you in the area that you're not worried about. We want to help those that care.

RD You said you're using a hybrid AI approach. Can you share some details about what are the ingredients going into that hybrid recipe?

EW Sure. We just wrote two patents on it, so I'm happy to share. The idea is that, as I said, we build an algorithm on how to fix a problem. Let's go back to the SQL injection because that's the one that most developers know. You have your SQL injection reported by, let's say, Checkmarx, one of the SaaS tools that we support, one of the leading tools in the market. They will tell us that there is a problem. We know that the problem was that the developer just used simple string concatenation to concatenate the user input with the SQL query, let's say. And the fix should be changing it to a prepared statement. We can do that, but finding a string concatenation can come in many shapes and forms. It can be a string plus value, but it can be StringBuilder, it could be StringConcat, it could be various different ways. And if we now go and spend a huge amount of time to capture all those patterns, that's going to be a problem for us. It will make us slower. Instead, we have the algorithms, but we use the AI to help us find the different patterns and do that small thing of change value with a question mark. I'm dumbing it down just for the sake of explanation, but that is the idea. We do the bulk of the work. We do the security research and all that. We let the AI do the string manipulations, the searches that it is doing really well. So far, the results, by the way, are awesome for us. It saves a lot of time and it's accurate. It's surprisingly accurate.

RD I mean, a million hours. You say ‘surprisingly accurate.’ Are you surprised how accurate it is?

EW Yes. The first few researches that we did around AI were underwhelming to the extreme. We got about a 30% success rate with fixes, and even those, sometimes the fixes were really strange. It fixed the problem but in a way that no one should do. Sometimes it actually introduced new vulnerabilities, as you mentioned earlier. So we needed to figure out an approach where we put guardrails around the AI using it for what it is doing really well and not letting it go outside of those guardrails and hallucinate stuff.

RD And obviously you're looking at other people's code. What kind of security privacy stuff do you have in place to make sure that you are being trustworthy, but verified?

EW So I'll start by saying that because of all these years in cyber security and because I knew what my future customers would ask me, we started working on our SOC 2 as we were working on our alpha. We had our first version when we were in beta of the product, and now we are on the second year of SOC 2. My plan is to be big enough one day to be a target, and if you're a target, you will be breached at some point, no matter what you do. We don't store customer code. We take it, we fix the issues, we show it to you, but after we cache it for a while, we delete everything. So we don't save it in our database. You don't need us to save it in the database. We will keep a pointer to your GitHub commit so you can see the fix there if you want, and just the metadata for statistics so you will be proud of what Mobb is doing for you. So that's one thing. The second thing is that we don't use it to train our AI, because as I said, we don't need to do that. We are asking it to do some atomic string manipulation things that it's good at from the get-go, the model that we're using. So there is no risk for privacy or security.

BP It's interesting what you say, I want to get your take on this. It almost feels like the thing that a lot of companies are coming away with as they explore how to put Gen AI into production or what to do with it is that right now the best way to use it is to have a ton of guardrails. And so like you said, when you first started out, you were underwhelmed, but as you narrowed down on what it was able to do, and you said, focus on known areas, known patterns, work with tools that this customer already uses, it can end up saving folks a lot of time. In our experience at Stack Overflow, we found it’s most useful from a perspective of retrieval augmented generation. Don't try and write me new code or come up with an answer on your own when you might hallucinate– go look at this database we have, come back to me with a summary of what you think is the best answer and show me the links to the sources so I can validate it myself. As you look out 5 or 10 years from now, do you see that changing or do you think, at least for the foreseeable future, you'll head down the same path? Obviously you want the company to grow. You said you want to make yourself so big you're a target, hats off to you. But at a certain point, you have a big enterprise customer and they come in and say, “I have this big security backlog.” “Okay, we're going to go in and find all the known issues. But not only that, we're going to help clean up tech debt. Every time we find a memory leak, we're going to patch it or we're going to do X or Y.” Do you think we're going to move in that direction or do you think, at least for your foreseeable future, the quality of what comes out of an AI system is too unpredictable for it to be allowed that kind of free reign?

EW So I know of three approaches to do this automatic remediation. One is completely pattern based, and I know of other companies that are doing it all the way to fully AI. If you do fully AI, I want to believe that in five years from now it will change, but we also need to remember that the AI is being trained on new data that is generated by AI now. I'm sure you guys have interesting stats on how much of the new code is being generated by AI and there’s no reason for us to think that that code will be better than before because it was trained on the before. At a certain point, the industry will need to learn how to feed the AI with trusted data, let's say that. I think that if you're using pattern based or if you're using the hybrid AI, people will be able to trust that to fix automatically. At the end, there should be a developer approving the merge request. Even today, if you are developing code, someone needs to review and approve your PRs, so it shouldn't change. The AI or the auto-remediation will generate the code and the developer will approve it. Saves millions of dollars.

RD I've read that of course you shouldn't trust AI code, but you should also not trust code written by other people or yourself. You should basically treat all code as a little bit suspect. Is there ways that AI can make any code a little less suspicious?

EW It's interesting because how do you know what is suspicious? It's business logic sometimes. You need to tell the AI, “This is what the application is supposed to do, and then tell me if it doesn't.” You're going to interesting philosophical questions here. By the way, all these backdoors, those always can be explained with business logic. So yes, some person will need to review, and to be honest, I'll be modest here, I have no clue. I can't even imagine how to do that automatically.

BP And what about on the opposite side of the fence? From a black hat perspective, what opportunities does Gen AI open up and what would a security company or somebody who's mindful of security, as you said you hope your customers are, do about that?

EW I will try to focus on our area and not on the amazing deepfake that was recently published where someone got 25 million dollars from a company in Hong Kong, which was an amazing story and I'm sure your audience is aware of it. I think that part of what AI can help the attackers do is to increase their velocity. Once there is a vulnerability disclosed, it's very easy for them to write exploits using AI to that vulnerability and just automatically generate many attacks on it. I think the time to remediate without being breached is going to be shorter and shorter and companies will need to address that because the other side will use AI to not necessarily zero days or anything, just to quickly create new exploits for known vulnerabilities and execute them.

RD Like you said, there's still SQL injection stuff out there unpatched.

BP And I think one of the things that I'm always excited and happy to see is that right now you can have AI brute force a million possible good ideas for a new antibiotic or a new battery material or whatever, and then somebody else can come in, sometimes an AI system or sometimes a human, and say, “All right, these 10 ideas look like they might actually work.” Unfortunately, as you point out, that probably also works for novel ideas for known exploits.

EW It's this very strong tool and we shouldn't expect the other side not to use it because it is part of their toolbox now and will help them be stronger and faster, and faster is the key here. You can no longer wait three months until you remediate a high vulnerability in your application layer. It doesn't work.

RD You've obviously been thinking about the security space for a long time. I want to get your take on what the new vulnerabilities are that generative AI and LLMs open up. What are the new attack surfaces?

EW I think the new attack surfaces are more for the LLMs. You may feed the LLM with data that is untrusted and the results that you will get are untrusted. There was very interesting research done I think by Vulcan Security, and if I got it wrong, I apologize. A startup in Israel wrote about the hallucination of AI where it can suggest a fix to the point that we mentioned earlier and include a library that doesn't exist just because it believes that that library exists. And then a hacker, once he sees it, can create that library. Next time, the AI will suggest it. That library exists and someone is being fed. So the same way, people and malicious actors can feed the AI with malicious information over and over again until it starts using that in the suggestions. That's one big problem that I see. There is a Top Ten for LLM Top Ten vulnerabilities also that discusses specifically around LLM, what is the problem. After all, it's mainly going to be in the data that it’s being fed with, and prompt engineering, but I talked about that already.

BP I wonder how many people are out there now putting together sites that they hope will be scraped by the AI in the future and then called upon to pull something out of its training data and then there's that security vulnerability buried in there.

RD The new honeypots.

EW Scary.

BP Exactly. All right, well, before we go to the outro, Eitan, is there anything you feel like we missed that you wanted to touch on?

EW I do have one question. Maybe you see that in our journey in AI, we heard a lot that if you tell people it's Gen AI, they will have more patience with you and tolerance for hallucination because developers are used to it by now. I always felt that this is true if you make one suggestion or two suggestions, but if you have a backlog of hundreds and you start giving up crazy ideas and let's say 7 out of 10 of them are wrong, no one will have the tolerance for that. Do you agree?

RD I think if you're going to be offering solutions at scale, like you said, this helps with velocity. So I think if you have to double-check every single one every single time, that's not really going to help your velocity.

EW Especially when developers don't know the security side, then they won't be able to evaluate that.

BP I think ‘we don't know yet’ is my response. Ryan and I have been looking into this trying to learn as much as we can about code gen from AI, and some of it’s LLMs, maybe some of it’s older systems or a hybrid system, whatever it may be. We don't have good benchmarks for evaluating code quality, and we don't have good benchmarks for evaluating, let's say there is a certain level of quality to the code, does that add a certain level of productivity? Developers feel more productive, they have more PRs pushed. Does that equal out to, let's say, the number of new security vulnerabilities introduced or the potential licensing issues that come from not writing all your own code? So a lot of the research that's coming out now is all greenfield. We don't know a lot about AI code gen and so we're all learning on the job, I guess.

EW By the way, one of the interesting things that we achieve with this solution is that for folks that are using static analysis, it is notorious to report everything that it finds, whether it's exploitable or not. And while we are not focusing on figuring out if it's exploitable or not, if we see bad code, we can fix bad code faster than you can triage and figure out if it's exploitable or not. And our approach is that this is about productivity. We will save your developer the 30 minutes to triage and then two hours to chase security and tell them, “Hey, this is not a real issue.” Instead of that, we will make the developer's code better. Within a minute, they review it and they approve it and that's it. So it took us some time to figure out that this is the approach instead of just trying to figure out if everything is true or not.

RD Give them the fix, let them figure it out.

BP Very cool.

[music plays]

BP All right, everybody. It is that time of the show. We want to shout out someone who came on Stack Overflow and shared a little knowledge with the community. Thank you to Conrad, who was awarded a Stellar Question Badge. This question was saved by 100 users. “What is the difference between private and protected members of C++ classes?” If you're curious, there's an answer on here, and thanks to Conrad for coming on and asking. Almost 600,000 people have checked out this question, so a lot of people have learned from you. As always, I am Ben Popper. I am the Director of Content here at Stack Overflow. You can find me on X @BenPopper. Email us with questions or suggestions for the program, podcast@stackoverflow.com. And if you like what you hear, leave us a rating and a review.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And if you want to reach out to me, you can find me on X @RThorDonovan.

EW Eitan Worcel here, CEO and co-founder of Mobb. You can learn more about Mobb at mobb.ai. You can find me on LinkedIn– Eitan Worcel. I'm not very strong at X yet. And I would love to hear from people what they think about what we are building and how we can help them be more productive and more secure at the same time.

BP Thank you so much for listening, and we will talk to you soon.

[outro music plays]