The Stack Overflow Podcast

“Are AI agents ready for the enterprise?”

Episode Summary

Deepak Singh, VP of Developer Agents and Experiences at AWS, helps Ryan break down the hype around agentic AI in software development. They cover the definition and real-world functionality of AI agents, how developers can integrate them into existing workflows, and the importance of establishing guardrails to ensure trust and security in agentic AI.

Episode Notes

Deepak works on Amazon Q Developer, a GenAI-powered coding assistant that includes autonomous agents.

Thinking, Fast and Slow by psychologist Daniel Kahneman is one of those books that’s a classic for a reason—and it’s more relevant to today’s AI landscape than you might think.

Connect with Deepak on LinkedIn.

Congrats to Stack Overflow user Morten Zilmer, who earned a Lifeboat badge by explaining Multiplication of two different bit numbers in VHDL.

Episode Transcription

[intro music plays]

Ryan Donovan Brain computer interfaces are transforming human communication. Join Raymond Yin, host of the Tech Between US Podcast, and Dr. Dan Rubin, Critical Care Neurologist at Massachusetts General Hospital, as they discuss the latest advancements. Listen now on your favorite podcast platform, or visit us at mouser.com/empowering-innovation.

RD Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ryan Donovan, your host for this episode, and today we are talking about AI agents. I know everybody's talking about it, but we are going to be talking about whether these are ready for the enterprise, whether this is primetime or just another toy. And I'm here with Deepak Singh, VP of Next Generation Developer Experience at AWS. So welcome to the program, Deepak.

Deepak Singh Thanks, Ryan. Thanks for having me.

RD Of course. So at the top of the show we like to get to know our guests, ask a little bit about how you got into software and technology.

DS My journey into software and technology is a little different from most people. I have a PhD in chemistry. I was a theoretical chemist, worked a lot on protein folding and quantum chemistry and looking at drug discovery, so I wrote software because I was writing algorithms, and along the way I started using the cloud, right around the time EC2 was released, and I got so fascinated by it that I ended up in AWS 17 years ago, so very early in the life of AWS and I’ve been here ever since. Started off on EC2, ran the containers organization for a long time, including some of our work in the serverless area with infrastructure as code. So increasingly working with developers more than infrastructure, and so about a couple of years ago when it became clear that artificial intelligence is going to be a really important part of how developers are going to build software and how they're going to operate the software, switched gears and started this new organization where our time is spent trying to figure out just how to get developers more efficient and give them tools and capabilities that just changed the way they work.

RD And one of those tools and capabilities that everybody's talking about and we are here to talk about today is the agents, the AI agents. It's been a bit of an overloaded term, so let's take a step back and talk about what AI agents are before we get into how they work.

DS So the interesting thing is people are talking a lot about agents now, and rightfully so, and I suspect we'll talk about a little bit, but on the AWS side, we started talking about and thinking about agents a while back. So for example, in July 2023, we introduced bedrock agents, which allow customers to create their own agents. In November of that same year, we launched our first agents on the Q side– Q Developer is the product that I work with– and the first two agents that we released were an agent that help you write software. And the way it worked was, and I think it's a good definition of what an agent is, the way it works, which is let's say you have an existing piece of software and you tell it, “Hey, I want to add this capability to your software,” the Q Developer agent for software development will basically go look at the code, look at the code base, and come back to you with a plan on how it wants to implement it. And you may go back and forth with it, but once that plan is set in place, you say, “Okay, go do your thing,” and then the Q software development agent goes ahead and implements that code. And since then, we've taken it quite a bit further, notably through our partner with GitLab, and there, you can write a user story or an issue and assign that to the Q Developer agent, and it goes ahead and creates your merge requests, your pull requests for you, and as a human, then you have to accept it or not. So this idea of giving it instructions, a goal, and having that agent then go and implement that autonomously, or semi-autonomously, it may come back to you for help every now and then, that's where we started, and I think that if you ask me what an agent is I’ll usually say that agents are goal-seeking, they know how to use tools, and they work mostly autonomously given the task that they have. And I think that's the world we are increasingly moving towards and the underlying technology is getting that much better so it becomes much more powerful to use and start building these agents.

RD The more I talk to folks, the more I see it as sort of another level of automation for things. It just feels like it's a next step forward because of the AI aspect where it can sort of do human-level things autonomously.

DS I would say the thing that makes agents particularly interesting is the fact that they don't follow instructions, they go and solve a problem. Your instructions don't have to be step by step. They can be higher level. Obviously, the better instructions, the better they are at getting to the goal. You started off by mentioning the Java upgrade thing. There we had a very specific goal, and the goal is upgrade this Java package, figure out what is takes to upgrade it, and then go do it for us. And then the underlying agents are the ones that figure out what are the things they need to do, and their output is, “Here's some code ready for code review,” which is a natural part of the software development process. This idea of, given a set of goals, the agent then is able to think through, “What do I need to accomplish this goal,” and that’s where, as the underlying models get better and better at reasoning, they're able to do that more effectively and that's where a lot of our work goes. So I think that part is why I think so many people are excited, because they can see where this is headed and it's quite exciting.

RD That stat about the Java upgrade came up on another podcast. I think it was 4,500 years saved on the Java upgrade. As somebody who's been at organizations who have put off Java upgrades for multiple versions, I understand that that's a painful process, but 4,500 years, that almost seems absurd. How do you measure that?

DS That's a function of how much code we have at Amazon. The way to think about it is, there's three actually stats that came out of that effort that are very, very interesting. The first one is the developer time saved, which is where the 4,500 years come from. This is how much time did it take for you to not just do the upgrade, but look into it, maybe do all the code reviews, etc, etc, etc, but that only happens because of the scale of the problem that we have at Amazon, well north of 30,000 packages. We have a process that has historically taken us several years. I mean, you've done this. You have this burned down chart that you track for years as you're going package by package by package, and here with a very small team you were able to just ship code reviews to all the teams that own those packages. So that was where the 4,500 years came from– knowing, because you've done it before, what are you saving, so that's one. The second part was $260 million of ARR in infrastructure savings, and that comes from knowing that an 8-17 migration makes your code that much more efficient. So what is the CapEx savings from going from one version of Java to the other, again, because of efficiency? Again, something that we are able to measure. And then the third one is one of my favorite stats is that 79% of those code reviews were accepted without any change. The amount of savings you're going to get from that, just from a time savings perspective, is significant. I think that just shows you how good modern AI systems are at doing things like that. But the other part, and this goes into it's never a free lunch, is the reason it was so effective for us is some of the work we did upfront with our Software Build Experience team at Amazon to integrate this agent into the way we do things at our company into our day-to-day processes. So it's not like you had to build some bespoke process to do it. It was built into just how the processes run, which make them that much more effective. So actually one of the discussions we often have with customers is how do you integrate these agents and these capabilities into how you're operating today, because that's important.

RD When you say integrating into it, do you mean it knows the code, it has style guides, it knows the sort of Java upgrade path you're looking for?

DS It has access, it knows your build system, it knows how you store, it has access to your repos, it knows how your repos are constructed, it knows how our ticketing system works and our issue system, those happen to be home-grown. So the fact that it has awareness of all of these things makes the processes much more integrated into just how things work. Otherwise, you're moving stuff from one place to the other, and that's inherently inefficient.

RD So I feel like this story almost moots my next question, but I'm going to ask it and then reframe it. Are AI agents ready for the enterprise? And I think I say it moots it because obviously you guys did this huge upgrade, but what is it ready for on the enterprises and how can people feel good about it?

DS I mean, so in the end, I'll actually answer it in two ways. One is obviously enterprises are using agents today. There's a number of enterprises, for example, Novacomp, which is an IT provider, they did a Java upgrade project with over 10,000 lines of code in minutes. It would’ve normally taken them two weeks before they used an upgrade agent, so that's just one of the examples. Actually one of my favorite examples is using our Bedrock Agents capability. Genentech worked with our– we have a team and a Gen AI innovation center that helps you adopt Gen AI. They worked with the innovation center to build a biomarker validation agent. So having worked in that industry before, you can give a team tons of biomarkers but you still have to validate them, and historically that validation process takes years. And they were able to shave significant months and years off this process because of building an agent for doing biomarker validation. So it is happening, but there's, I think, a few things that you need to put in place for these agents to be successful. Some of these may be measures you already have in place. You just have to think about how they apply to agents, which is access controls. Whether you have humans or you have agents, access control is very critical because you have to decide what data or systems do these agents have access to, how are they going to have access to them? You do that with humans, you have to do that with agents as well. Auditability is very important. For example, the code review process is a critical part of how the process works inside Amazon. For example, we have an agent that does code reviews. The way that works is very often teams get backed up because you have to find the code reviewers, most companies require two code reviewers, by the time you get to it, that's almost a rate limiting step. But we’ve had customers that have worked with AWS to use our code review agent, for example. The first step of the code review is done by the agent. It finds all the bad code, the unsafe code, where the style guide has been violated, et cetera, which vulnerabilities may be showing up, and it suggests a set of patches. So the second reviewer in theory can just go and say, “Yep, all of these look good, you're done,” but you need that second level of auditability to build that level of trust. So at an enterprise, you need to build up the systems that say, “Okay, this is what we need to do,” and you need to define them. The good news is those systems probably already exist for your existing processes. You're just augmenting them or modifying them a little bit to incorporate AI systems.

RD It seems like the TL;DR is, “Don't give every agent admin access. Keep a human in the loop and have them do specific tasks.”

DS And these are what you would do with your engineers as well, a new engineer, a more senior engineer.

RD Absolutely. And I think a lot of the trust issues, it's not about the agent itself, it's about the LLMs. When we do surveys on it, we find that a lot of developers are using AI, but not everybody trusts it. Do you have guardrails outside of the human review to make sure it's not hallucinating new packages or anything?

DS So the good news is those LLMs keep getting better and better, but I like saying ‘context is king.’ The better the context you provide to an LLM, the better the quality of your output is, whether it's in terms of code quality, hallucinations, pick whatever you want. Great context makes for better results. So one of the things that we spend a lot of our time on is making sure that the LLMs are getting good context, which could be context from your projects, you can provide specific instructions, it could be following a certain pattern, this is where the code review projects come in, but then we also provide additional guardrails on it. Within the system, we are using guardrails that the underlying foundation models may have, but we are also then putting guardrails on top of it that help with making sure that the results at the end are what you want them to be. At AWS we have one other unique piece of technology, an area of expertise that we think is very important in this space, especially when you are talking about things like industries with compliance requirements or information that needs to factually correct, and there's something called automated reasoning. In a way, essentially what automated reasoning is is mathematically verifiable proof. It's a system, it's a solver where you can, given a fact, I mean, the best example that somebody ever uses for showing what automated reasoning does is your homeowners association. If you've ever seen the HOA guidelines, they're pages and pages of stuff that you don't understand. You can extract all the important findings, et cetera, from it using automated reasoning and then verify at the end that these are valid. So for example, already in Q Developer, we have aspects where we are applying automated reasoning, where the information is factually correct, like maybe the pricing of something. That is an absolutely verifiable thing, so we have these verifiers that make sure that the output of the LLM is actually verifiably correct, and you can apply this in many areas. We use automated reasoning a lot in our security domain, for example, for verifying things like are your permissions correct, are they actually going to do what you said they're supposed to do? So I think adding these layers on top of what the LLMs are doing makes it that much more, but I also think you have to be judicious because sometimes you don't want to be in the way of just saying no all the time. So you have to find the balance between what LLMs are doing, which is why I think context is so important and why I spend so much time there, and then providing the safeguards and validation at the other end.

RD That's really interesting, the sort of mathematical proof of factual accuracy. I think when LLMs started, everybody was drawn to the RAG paradigm because it showed you those sources, and the mathematical proof is sort of a next level verifying the sources. Do you think this will become a standard sort of part of the RAG or is the RAG sort of on its way out?

DS RAG is context. You go back to it's all about providing good context. So if I go back to Bedrock, it's a good example. We have a concept of knowledge bases in Bedrock, and knowledge bases are effectively RAG systems, the embeddings that you provide, which everyone uses and what RAG does is providing context that helps you verify that this has come from a source that you can trust. That's one of the things that you can do with RAG. At the last re:Invent, we launched a preview of these automated reasoning knowledge bases, verifiable knowledge bases where you're using automated reasoning based knowledge basis. It's another tool in your arsenal that not everybody needs to use. These are for things that you just can’t get wrong like maybe a compliance regime. So there's going to be many, many others. Use web search for example. Can I do three web searches to make sure that this is actually the truth? These are things that people do. These are techniques that people, again, do to provide the right context to the LLM so they can get to the right place. And I keep harping on this because I think, in the end, success comes from how good you are at providing that context. RAG is a way of doing it, automated reasoning is a way of doing it, search is one way of doing it. There's others.

RD The mathematical fact-checking it. You thought about releasing that into the wild to verify the stuff that comes up on the web search is also factually accurate?

DS Right now, you could build your own knowledge base using Bedrock to do that. We haven't, to the best of my knowledge, done that yet. In the Q Developer side, we use it basically where we believe we should be giving you factually correct answers, like something that's the price of an EC2 instance, for example. That's where we bring those technologies in.

RD Something that has a very verifiable source of truth.

DS Yes, correct.

RD So we talked about these Bedrock Agents. If I got a hold of one of these right now, what could I do? If I need to add a new feature, could I say, “Check out these issue tickets. Get me some prototypes.”

DS And actually this is an area where the underlying tech is evolving really quickly. For example, we just launched a new agent for the Q command line interface. It's a new conversational agent chat, but it's quite magical. You use it and you're like, “How is it able to do all these things?” I said I wanted to do this and this figured out I needed to make a directory or create a file. What it's actually doing is using a tool, and through this agent framework that underlie things like Bedrock Agents, they have access to tools and one of the tools that we made available to this agent is all of your shell commands. Anything you can do in a shell, you can do, the agent is able to do, it can invoke all the shell commands. The science has got to a point that it's able to figure out which shell command to use, but the fact that we made all these tools available to it, even VI type thing, opening a file and closing it, is what makes it really powerful. So in many ways, what Bedrock Agents, et cetera, do is they give you access to knowledge bases, they give you access to tools, they give you access to guardrails. You can attach guardrails and a guardrail library to it. So it's like a construction kit for building these things, and the construction kit keeps getting better as you go along.

RD It seems like agents are sort of pushing everything to have an API now. I've been joking that every SaaS company is going to be the API company in a couple years, but I think that's going to become less and less of a joke.

DS I mean, if you think about it, an API is a tool that the agent is giving the LLM access to. It could be an API, it could be a resource, it could be another agent, because with Bedrock we launched multi-agent collaboration. Effectively, you're basically saying, “I have a parent agent or a superagent that's able to use other agents for domains that it may not itself be an expert at.” And so that's a very interesting sort of paradigm, but the one way to think about it is it's just another tool that the agent can use to accomplish the goal you've given it.

RD I've talked to folks who talk about this AI-created code as sort of reducing code base maintainability. Have you seen anything along those lines?

DS I think that they're framing it incorrectly. Any corporation, you have a certain amount of code that's generated and you maintain it. What's happening is, or what at least the concern that'll happen is, because more code is going to be written, it doesn't even matter if the code is good quality or bad quality. If you are writing more code as a company, you have to do more code reviews, you have to do more documentation. All the mechanics of what it means to write that software and maintain it expands with the amount of code being written. It’s just a function of there being more code. The code is almost irrelevant to that one. But that’s why I think some of the most interesting things are the tools we are providing, and the validations, and then the tools we provide to make it easier for people to do the auditability. I think that's the important part. But I think everybody likes the fact that you can go to things like– I'll give you an example. You could be a back end developer but you need to write a React front end for something, and you're not an expert in React. Now I've written a piece of React code, now you have to find a code reviewer who can verify that. But in the past you would not even have written that code, it would have either got backlogged, et cetera, so it's a good thing. The reason why it's a little tricky right now is that people are in that middle ground where they want to still maintain their own methods, so they want to do everything the way they used to, and the amount of software being generated has gone up. But I think what I'm starting to see is with things like our code review agent, with our documentation agent, with the unit testing agents. In some ways, the best practices of writing code are better handled by AI agents because they do that, and so the downstream maintenance impact should go down. I think there'll be this inflection point where people get very comfortable with that and I think this concern will go away. Right now people are like, “We have the capacity to do a thousand code reviews a week, but the amount of code being generated has gone on 5x, so how do we do that?” And that's why we built a code review agent was explicitly to start addressing some of these problems.

RD It's interesting you say sometimes the agents can be better. There's a bit in this book, Thinking Fast and Slow, where he talks about how we think we have these great insights into stuff but sometimes a formula is just better than us at making these judgments and decisions.

DS It's a great book, by the way. I read it about 3-4 months ago. There's no reason to be afraid, because if you have good mechanisms in place, you are going to be successful, as long as you're setting your expectations right. If you're expecting an agent today to write you a complete distributed system from scratch, you're probably going to be disappointed. If you're using it to add features, capabilities, augment what you're doing, you're going to be very successful, and most good software organizations have good mechanisms in place already to manage that because you are hiring new developers every year who may or may not be as good as your senior developers at writing software. How do you make sure their code is good? So I think as long as you have those mechanisms, they're going to be good. The tools that will help you do that are going to get better. They are getting better. We have a customer called Genesis. They were able to use our documentation agents and our code review agents and our unit testing agents to get more than a 30% productivity boost because their developers were onboarding four times faster to code bases they didn't know. Because all you have to do is, “Hey, document all this old code that no idea, no readme file.” Now the AI is generating the readme file. So you suddenly make them more effective at working with that code base. So net net, it's always going to be positive, but yes, you're going to generate a lot more code because you’re freeing people’s time up.

RD Right. And I think that with that there's a fear that people relying on AI to generate code is going to have some of those code generation skills atrophy in people. How do you feel about that? How do you assuage people's fears for that?

DS AI agents are going to make humans more effective. I have seen this firsthand with our CLI for example. I use it a lot. I’m a command line interface person and you get much more productive. And I just talked about Genesis and the benefits they get from it. If you have to write a small app to solve a problem, it's a conversation away. That's a net positive thing. I know we have customers who are invoking agents to address a problem that they're not experts at, for example, or doing things, as you said, that people never do because they're blocked by them, like Java migrations and transformations. Or you're making onboarding easier, you're unblocking people, you’re also doing things with more fun. One of the most common things that I see people doing is they have a set of backlog tasks that they have to do, but they would rather not because they want to do something more creative and more challenging. They can just assign the backlog parts to an AI system, which overall makes you better. You can do more research because research takes time and you can now get insights from 15 places at the same time by having the right conversation with your AI system. So because you assign this busywork to AI agents, use AI agents to help you get unblocked, help you get information, this is powerful and empowering for the individuals themselves. I actually think you get more rigor in what you're doing. Maybe you're not typing as much, but the rigor doesn't go away, which is why this is exciting times for a lot of people. And I think I'll go back to something you started off with, which is are agents ready for the enterprise, and the answer is yes. Very often I see enterprises work with AI systems a little bit differently than they would work with other systems. Some of it is understandable because they're not familiar with them, but I think the ones who are more successful are the ones that are able to build the right processes around them. For example, you want to give people freedom to experiment and innovate because a lot of ideas will come from that, but then you have to put the guardrails in place to make sure that things are happening the right way. Most of those guardrails already exist, they are related to data access permissions, things like that, and the enterprises that are more successful like Amazon was with Java transformation, was when you build it into existing processes, make sure there’s the right auditability built in, but give people the freedom to go do things with AI, because it’s improving so quickly that you're only going to benefit in the long term.

RD All right, the power of process. I like it.

[music plays]

RD Well, thank you for listening everybody. We are at the end of the show here. I want to shout out somebody who came onto Stack Overflow, dropped a little knowledge, won a badge. Today's badge is a Lifeboat Badge– somebody found a question that was at a -3 or less, gave an answer that boosted it up to 20. The badge was awarded to Morten Zilmer for answering: “Multiplication of two different bit numbers in VHDL.” If you're curious, we'll have it in the show notes. I am Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you want to give us feedback, flattering comments, et cetera, et cetera, email us at podcast@stackoverflow.com. And if you want to reach out to me directly, you can find me on LinkedIn.

DS Thank you for having me. I'm Deepak Singh. I'm the Vice President of Next Generation Developer Experience at Amazon Web Services. I'm easy to find on LinkedIn, and if you care about photography and youth soccer, you can find me on Threads.

RD All right, everyone. Thanks for listening, and we'll talk to you next time.

[outro music plays]