The Stack Overflow Podcast

How API security is evolving for the GenAI era

Episode Summary

Ben Popper chats with Keith Babo, Head of Product at Solo.io, about how the API security landscape is changing in the era of GenAI. They talk through the role of governance in AI, the importance of data protection, and the role API gateways play in enhancing security and functionality. Keith shares his insights on retrieval-augmented generation (RAG) systems, protecting PII, and the necessity of human-in-the-loop AI development.

Episode Notes

Solo.io provides API gateway, service mesh, and internal developer portal solutions. 

Follow Solo.io on X or LinkedIn or dig into the docs.

Want to brush up on RAG? Our Guide to AI walks you through the concept and includes a practical example. Or check out one expert’s practical tips for RAG on our blog.

Connect with Keith on LinkedIn.

Shoutout to Stack Overflow user MrSimpleMind: their helpful answer to the question – How to run jq from gitbash in windows? – has been viewed by more than 213,000 people and won a Populist badge.

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, and today I'm going to be chatting with Keith Babo, who is the VP of Product over at Solo.io. Keith was formerly at Red Hat and has been giving some talks this year focused on the keys to AI security and the use of API gateways. So without further ado, Keith, welcome to the Stack Overflow Podcast. 

Keith Babo Hey, Ben. Great to be here, and thanks for having me on. 

BP For sure. So give folks just that quick 10,000 foot flyover. How'd you get into the world of software and technology and what led you to the position you're at today? 

KB Most of my career still to this point has been in development and engineering. When I got out of school, I worked at Intel and worked on the software that ran their distribution centers, and ultimately got into electronic data interchange, which is basically how you exchange these business documents over the public internet– well, originally through these things called VANs that were dialup, very antiquated, but eventually became over the internet– and then transitioned to Sun Microsystems during the birth of Java and all this cool stuff that was going on there. Ultimately I found myself at Red Hat. I got really into open source community and software supply chain and these types of things, and that led me here to Solo. Along that journey, I went from being an engineer and creating my own open source project and stuff and products around that to transitioning into a product role that allowed me to just kind of get better visibility across the entire business, not just on the creation of the software, but how the software gets in the hands of a customer and creates value for a customer, which has been a lot of fun. 

BP Nice. So I just want to go back. I am old enough to have had a dialup modem. You said VANs was a way of transmitting data pre-digital?

KB Yes. This is the very beginning of my career, not to date myself too much here. But basically you’d go to a customer and they would have a modem, it was a bisync modem, so it was a bidirectional modem, basically, and they would dial up to a provider that had a huge phone bank and basically would take these electronic documents and just store them in servers in the value added network, is what this phone bank provider was called. They would charge you by the kilocharacter, which was crazy. The amount of money they printed from that business is just insane, and it's one of those things that we continuously see over the course of our careers where you have something that's throwing off a ton of money, these people are insanely profitable, and it's just ripe for disruption. And the internet disrupted that big time. 

BP All right, so let's focus a little bit on what you've been doing recently. I don't know everything about Solo, the company you're at now, but I understand a little bit about the need for security in the world of generative AI or AI services. So obviously for a lot of these, what people are doing are turning to large companies that have frontier foundational models, something that they can't recreate in-house, and so when they want their chatbot to respond or they want their enterprise search to run, they're accessing via API a service from Microsoft, IBM, Amazon, OpenAI, Google, I could probably name a few more. What are the unique, I guess I would say, security risks inherent in using an API as a way to interact with these AI and generative AI services versus API for other cloud services?

KB So it's inherent in that actually it's not so much that there's anything uniquely different about an API presenting as an API no matter where it lives and more about the content that's getting generated and what you're sending and control over what you're sending to that provider. As you mentioned, it's quite uncommon for companies outside of Anthropic, OpenAI, Azure, Google, AWS to basically train models. The amount of money that goes into training those models might do fine-tuning and this type of thing, but to train a model from scratch is pretty expensive so most companies are turning to models that are hosted externally. And in that case, then it becomes this issue of what are the trust boundaries for that, how am I actually securing the communication to talk to the provider, how am I handling credential management for API keys in that provider, how am I enriching that content with function calling or RAG and this type of thing? So what is really interesting is that all of the normal mechanics of securing API communication that you want– authentication, authorization, transport, encryption, all of these things apply to LLMs just like they do for normal APIs, but there are very specific risks and capabilities around AIs and embedding LLMs via API and applications that create the need for a more specialized type of API gateway, which is an AI gateway. 

BP Putting this into context of that you work at a company that's developing some proprietary technology, but you also are calling on an AI over an API to help you with code generation or to provide you with tests or documentation for your code, and so now, to your point, what are you sharing in there, if anything, that maybe you don't want other folks to get access to? Let's take that as an example. In what way would your company or any company design an API gateway to help obfuscate that data or what kind of solutions exist to ensure that you can get the value out of the Gen AI, whether that be some code gen or debugging or documentation, but try to de-risk the idea that some code or documentation you don't want in the public spills out there.

KB So actually on both sides there's a requirement for governance. So first, on the prompt request and the problem going to the LLM, what controls are you implementing to basically ensure and protect against either malicious and deliberate exploitation or accidental. So classes on there would be things like, A, prompt management, so being able to basically say, “Okay, if I'm asking questions that I know there's patterns or specific keywords that I should never allow, I can basically just reject those directly in the gateway,” so I just shut it off at the front door. I can also enrich via prompt management the context around that LLM API request. So the user is passing me a user prompt like, “What's Ben's favorite color?” but then I can actually append a system or a system prompt in the gateway that augments that to give certain guardrails and rules of what the LLM is allowed to respond with, which becomes another layer of defense. And that can get quite advanced; you can actually use other guardrails to actually look and do more semantic reasoning over that request to see if it fits key exploit characteristics. And that's all on the way to the LLM. On the way back from the LLM, then you're looking at doing sort of data filtering, data loss prevention, data masking controls, because if I say to the LLM, “What are three fun facts about Ben?” the LLM could respond with one of those facts as your social security number because it's trying to be helpful. So I'm not deliberately trying to phish for your social security number but it could just be out there. 

BP You make a great point. So I was focused on sort of a trade secret kind of thing or the idea of a security key getting out, a secret getting out from code. Now you're mentioning PII, so this is a really interesting one. Obviously, GDPR takes this super seriously and so all companies strive to find ways to comply with those rules as best they can. So you're saying that both on the way out and the way in, you might do some prompt engineering that says, “Don't share this. Every time you see ‘SSN:’ don't do that, and anytime you see it coming back, strip that out as well.” 

KB Yes, exactly. And again, there can be very explicit things that we state in the policy for the gateway, but then also you can really effectively leverage the power of the natural language processing in the LLM itself with those system and assistant contexts or prompts to basically tap on and say, “Hey, these are the ground rules for how you're going to satisfy this request.” And I will say that the gateway here, the AI gateway, can operate in a number of roles. So far we've been talking that I'm thinking about it for a forward proxy or egress use case, so I have an application that's consuming a remote LLM, and the AI gateway can be in the network path for that. But if I'm in an environment where I'm serving or supporting model inferencing and I'm taking inbound traffic, the AI gateway can sit in front of that as well, and there's a whole other set of security controls and attack vectors for that. Likewise, even for training, if I do end up in a situation where I'm training a model or I'm using technologies like RAG or function calling around that model, then I can use an AI gateway to enforce what it's allowed to talk to, like what APIs and data sources, and what it's not allowed to talk to. So there's multiple deployment patterns here. 

BP You build a RAG system internally so that when somebody has questions about how to set up their local dev environment, it doesn't just pull it from training on the internet, it pulls it from the documentation and FAQ and wiki inside of your company. So walk me through how you would work with a RAG system so that you can both get the most out of the LLM that's whoever is on the frontier benchmark for reasoning this week, but also not be sharing stuff out of internal HR documents and company policies.

KB I think that the market is still coming to terms about where this is going to live, but in this basically end-to-end pipeline, there's three participants if we assume we're using an AI gateway as well. There's the client application which can use client-side frameworks like LangChain and others for RAG and other enrichments, there's the provider which also supports RAG, some form of fine-tuning, and function calling as well, and then there's the AI gateway that can sit in the middle. And so where do I want to do this? Where do I want to exercise, taking RAG as an example? And there's different advantages and disadvantages. In the provider, if I use the provider in the RAG, then I'm basically giving the provider access to my content stores that I'm going to use for RAG, which may or may not be okay. The data is going to the LLM provider anyway, so that's a trust decision for every company. But it's not portable across providers, so you're going to wire up that RAG pipeline across every provider. If you do it in the application, you could argue that's undifferentiated heavy lifting for that application developer to implement a RAG pipeline. And certainly, if I'm a platform team and an AI governance team, I'm worried about what those teams are using as data sources. Because as you pointed out, if I just get any content repository, if I go to Wikipedia or just any internet source, RAG poisoning is a real thing and it's not hard to pull off. And so I could inadvertently pull in poisoned data which then obviously greatly influences what I'm returning. The AI gateway gives you a point to implement a RAG pipeline directly so we can basically calculate an embedding on the prompt request, then go to a vector store and basically pull out relevant content and then send that upstream as context to the LLM provider. The nice thing about that is it's portable across all providers, takes that lift of the RAG pipeline off the developer, and allows me to tightly control what content sources are used for that RAG pipeline. 

BP I remember discussing on the podcast a story about a certain library which originally was just a hallucination that one of the AI services loved to offer up in its coding suggestions, and so some smart folks went out and created said library, and then like you said, injected some nefarious stuff in there. So what did you call it– RAG poisoning? The idea of trying to figure out where on the web it may be calling from and then getting ahead of that and injecting something in there is definitely an interesting concern that's a fresh thing to me. I'm not sure we had to think about that before generative AI. SEO was kind of that. SEO can be gross. Why am I pulling up and serving up this web page versus that web page, and am I actually serving up something high quality or low quality? But the idea that you are injecting something into what will then become a prompt response is pretty novel. 

KB The security characteristics of this, some of them are so new and so fascinating. And going back to the beginning when we talked about my history when I went to Sun Microsystems so I had experience of working with EDI at Intel, I basically built a component, a product that did AS2 and AS1, which is basically like EDI over the internet, over HTTP. And that was my first real experience. There's an implicit security in that whole VAN model. If I have a private telephone line and I'm sending this to a secure storage, there's kind of a physical security aspect of that. But when you scale that to the internet, now you have other problems. It's basically an untrusted network and you have to really account for attackers and man in the middle and this type of thing. And these protocols really opened my eyes to security and how it's this glorious onion that you understand one layer and you peel it and there's another layer right underneath it, and there's another one under that. And I love that about asymmetric cryptography and Diffie–Hellman, exchanging a key in the open on the internet. These are such hard problems to solve and they take so much iteration and we're seeing the same thing play out right now in the community around LLMs, whether it's prompt hijacking, prompt injection, RAG poisoning, and the academic research around that and just kind of the anecdotal examples of people doing funny things is fascinating to me. 

BP So it looks like the API gateway is meant to do a number of different things. It's not just about safety, but it can also be about cost control, it can also be about handling traffic as it spikes. So what are the things that the API gateway your company offers does and how do those sort of sit within the layers of service of an API gateway to a large customer? 

KB So this is a fascinating area, one of the great things that I just love about being in product and also working for startups that have very close relationships with customers. We have a very popular API gateway that just does API management, so security, observability, traffic control, resiliency for all types of APIs. That's based on Envoy proxy which is kind of the leading cloud native proxy that companies born in the web kind of created and is the leading proxy for Kubernetes. So we're very successful with that product but we have this great design relationship and partnership with our customers, and so when our customers start to see new use cases where we can be used, they tell us and we kind of work together on it, which is a fun dynamic. So we had a very large customer that basically was starting to, like all companies right now, invest heavily in AI, finding their way in terms of what the use cases were and this type of thing. Governance became a really big deal. The platform teams that are going to own this and run these apps in production. At the end of the day when an exploit comes, when it's unavailable, when there's a security breach, they're deeply involved in that, so they're thinking, “How do we take a platform approach to addressing some of these concerns?” And so this customer came to us and we said, “Hey, you can use our existing gateway. It's just an API. An LLM API is just an API, just use it for that,” and they started to come out with, “No, but it's unique in this area and we're trying to do this thing,” and we were like, “Oh, those are actually really cool sort of extensions to what we do.” So even though the foundation of our AI gateway is our API gateway, there are very specific features we built for AI gateway use cases. One thing that's immediately useful and is the key use case, that's actually just an API gateway use case thing but it's a key thing in Gen AI consumption right now, is going into a provider, creating an API key, and then just sharing that amongst all apps that are basically consuming that model in that provider. There are so many problems with that as a security approach. The first and foremost is, if a developer sees that key and then leaves the company, what is your key rotation policy for that? What's the blast radius of that? If that inadvertently gets checked, forget about someone leaving, if it gets checked into a GitHub repo, now what do I need to unwind and how do I rotate that key? It's crazy. And so one of the first things that we help customers with is, listen, you go create your API keys in the providers, you put them in a secure vault like Hashi Vault, for example, you only let the AI gateway see those. Then the AI gateway can mint its own API keys that it gives to clients, or it can basically integrate with JWTs or OAuth or OIDC for those flows and then just map it to that provider. It instantly solves a very serious problem with basically the least-privileged permission principle and just what the blast radius is of a key getting exploited. The other thing is that if you look at how we would actually associate permissions for a given team, it's not fine-grained. You have an API key, you can access a provider and a model. We allow you to basically get very fine-grained about, “You're this client app, you can have access to this provider and this specific model,” and then we can enforce things like rate limiting. So instead of having everybody overwhelm your upstream LLM API and get 429s back from the provider, we can basically do back pressure and bulkheading inside the gateway to basically protect those upstream models and push back on clients in their consumption. And that's all based on the ability to reason about individual clients in the AI gateway. 

BP I thought it was interesting, the depth and breadth of things that you're packing into, and maybe I'm wrong, a single API gateway. But prompt guard is one thing, we talked about credential management, but then prompt enriching, like I'm enriching a prompt so that the response is better and maybe they use fewer tokens, rate limiting. Maybe you have these days, you've set a team of AI agents up to do a job and they're working night and day and they keep requesting tokens until they get to the bottom of this. That might be more expensive than you have planned for. 

KB It's so funny you bring that up. There's multiple things you just said that I want to double-click on, but one that I actually haven't heard people talk about enough– but the two things I was going to say are related. So when we talk about governance around Gen AI consumption, a huge theme in the industry right now is having a human in the loop, basically. I'm not allowing an agent to basically create code, build an image, and then deploy it and run it on its own. That's way too close to Terminator for most people right now. Likewise, a lot of companies that are using a copilot for customer service reps, they're not exposing that LLM directly to their customers. The CSR is in the middle, they're the human in the loop, basically. But you raised a very interesting point there around, and some companies are implementing this already, but we will get to agentic architectures where there are autonomous agents producing this. Right now we're sort of governed by the pace of humans to create new things and consume things, but that's about to scale up in a really big way and you have to think about that in the scale of your architecture. 

BP It's really interesting. I've seen some really incredible stories about content creators and influencers where they make a lot of their money doing fan service– chatting with people or sending them pictures or doing live events– and it's hard to maintain that across one human to 10 or 15 other humans a day, but if you build a great agent that can hold up its end of a chat, suddenly you can talk to a thousand fans a day and people are seeing an unbelievable jump in the number of people they can interact with and thus in the amount of money they can earn. 

KB Totally. And this is a major pet peeve of mine or soapbox topic about the human in the loop, because the promise of Gen AI is that it frees humans. It takes the really pedestrian work and allows us to focus on higher value work, and this concept of a human in the loop is like, “Hey, I have someone that can basically vet the output from an LLM, determine that it's correct, and then allow it to flow through the process.” But we know from pretty well published research at this point that the people that benefit most from Gen AI are the least experienced folks. They get the most productivity benefit from integrating with a copilot, but they can't be the human in the loop because they don't know anything, otherwise they'd be an expert. So when we talk about the security and vetting the responses from LLMs and security controls, we have to find a way to take what those experts know and encode that into sort of automated platform processing. That's a lot of the work we do in the AI gateway that allows you to then introduce less experienced humans into the loop and then get productivity benefits from it. 

BP I will say, of the podcast and blogs and newsletters we publish, something that has gotten a lot of audience attention recently is whither the junior dev. And like you're saying, for somebody who's experienced in running a team, maybe an AI agent feels like a junior dev who can create something for them that has a few bugs and needs a few tweaks before the pull request is approved, but hey, the work got done. Now that junior dev would have benefited greatly from doing a pair programming with the AI, but they're not going to know when something's wrong. So now you just have two agents, one human, one not, who are producing the same content. They're not necessarily improving on one another. So that is a really interesting question of, in the world of software development, what should new students who are exiting college focus on if AI agents and code assistants are kind of at their level– an S1 software developer?

KB So I'm feeling this right now, by the way. My daughter is in college, she's a sophomore in college, and her whole life I've told her, from my background, computer science, easy money, do it, it's great. And it's a fun job, it's great, and now I'm not so sure. I think there's going to be, even in her term in college, there's going to be a tremendous amount of change there very, very rapidly. I think that human reasoning is still very valuable and will be an important part of this process for some time. 

[music plays]

BP All right, everybody. It is that time of the show. We want to shout out somebody who came on Stack Overflow and helped to spread a little knowledge or curiosity. Congrats to MrSimpleMind, awarded the Populist Badge two days ago for providing an answer to “How to run jq from gitbash in windows?” If you don't know, the Populist Badge is when you come in and provide an answer to a question that already has an accepted answer, but people loved your answer even more than the accepted answer because it's that good. So congrats to SimpleMind on the badge, and thanks for sharing an additional perspective on this question. I am Ben Popper, I'm the Director of Content here at Stack Overflow. Find me on X @BenPopper. If you want to come on the show or you want to hear us talk about certain things, you can email us, podcast@stackoverflow.com. And if you liked what you heard today, why don't you do me a favor and subscribe? 

KB My name is Keith Babo. I'm the Head of Product at Solo.io. You can find me on LinkedIn if you have any questions about what we discussed today. You can find Solo.io at a very surprising web address of solo.io. So if you want to learn more about AI gateways, we have some great labs and instructional materials on that.

BP All right, everybody. Thanks for listening, and we will talk to you soon.

[outro music plays]