The Stack Overflow Podcast

“AI has been the wild west”: Creating standards for agents with Sean Falconer

Episode Summary

Ryan is joined on the podcast by Confluent’s AI Entrepreneur in Residence, Sean Falconer, to discuss the growing need for standards for AI agents, the emerging Model Context Protocol and agent-to-agent communication, and what we can learn from early web standards while AI continues to evolve.

Episode Notes

Confluent is an all-in-one, real-time platform that allows you to stream, connect, process, and govern your data.

Connect with Sean on Linkedin, and listen to the Software Engineering Daily podcast, where he co-hosts.

Do you know the answer to Loadrunner lr_get_attrib_string always returns null? If so, you can help out Trncvs and receive the bounty offered on the question.

Episode Transcription

[Intro music]

RYAN DONOVAN: Hello, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, your host, and I'm joined today by another podcast host. Joined by Sean Falconer, who is the host of the Software Engineering Daily podcast, and he is an AI entrepreneur in residence at Confluent and today we're gonna be talking about the need for standards for AI agents, what they can learn from the original web standards. So welcome to the show, Sean. 

SEAN FALCONER: Thank you. Thanks for having me. Excited to be here. 

RYAN DONOVAN: Yeah. I should say welcome back. You've been on the show before. 

SEAN FALCONER: Yeah, it's been a while, but it's good to be back. Thanks for having me again. 

RYAN DONOVAN: Yeah. So this is a topic that I'm excited– I saw the MCP protocol coming through and it was like, oh yeah, I've been talking about this with folks that AI has been wild west for a while, and that standards will probably develop, but, you've, you're coming here to talk about what they can learn from the OG web standards. So what's your take on that? What do you think is the most important thing that they can learn? 

SEAN FALCONER: Well, I think there's a bunch of things that go into it when you look back at standards. So if we look at sort of broad strokes of what's happening in the industry right now around AI like I think it's very encouraging that standards are starting to happen because standards essentially are a signal of like maturity of a market or maturity of a technology. I think we're not totally out of the wild west era of AI right now, right? 

RYAN DONOVAN: Yeah. 

SEAN FALCONER: But maybe we're progressing a little bit beyond just chicken wire and duct tape to starting to do things that are more standard or people proposing standards. And I think that is a healthy sign of people moving beyond just science experiments and demos, the things that they actually want to start to productionize and for, all the concern that's out there of what is the ROI around AI, what does this really mean for my business? Is this all hype or is this real?

I think that's a really good signal that there's real things happening in the industry. So that's great. In terms of looking back at the history of various standards, if you look at things like HTTP and we take advantage or assume that HTTP has been there forever, standardizing the web, but there was a pre-HTTP web era where it was really difficult for servers to know how to talk to clients and so forth and everybody did things differently. And that's kind of is–

RYAN DONOVAN: Everybody telenetting in back then. 

SEAN FALCONER: Yeah, exactly. And standards came along and that gave a universal way for clients and servers to essentially communicate. It extracted away the messy details and let any browser talk to any server, anywhere. It didn't matter what the operating system was, you used it or how the backend was built. If it spoke essentially HTTP, it could speak part of the web. And that's the whole point of these standards. MCP gives you a standard for, how do I have my agent talk to tools or Agent2Agent protocol from Google is like, how do I have agents that are developed using completely different systems? How do they communicate in some fashion? 

So I think it's, they're focused on solving very similar fundamental problems. And there's certain things that I think when you look back at the standards that have been successful that are certain criteria for standards being successful as an overall thing that I think we can learn from that history and help us understand, engage, which of these might be successful with where we are right now. 

RYAN DONOVAN: Yeah, I feel like there's almost– we're doing standards in reverse for the AI era because a lot of the sort of Cambrian explosion of the web came after standards were invented, right?

And now we had all the AI companies come out and then were like wait, they need to talk to each other now. Do you think it's that communication that the AI agents bring is the impetus for this or do you think standards would've developed anyway? 

SEAN FALCONER: I think standards would've developed anyway, but I think maybe one of the reasons it– I think it's very natural that you have fragmentation at the beginning. So there's just a lot of people that are experimenting and there's a potential risk with introducing standards from the get go where it can actually slow down. Innovation is like oh, if I have to adhere to the standard, but I want to do this thing that's like completely new but doesn't fit within the standard, how do I make that thing happen?

That can be a barrier to innovation in early stages. So it makes sense that kind of everybody at the beginning is just like going all over the place and trying things and you don't really know what's gonna be successful. The other challenge with early standardization is there's a certain level of abstraction that you're introducing and in the early days it's always hard to know what is the right abstraction to use, which abstraction is gonna be successful. And the pain point that you're solving at the very moment might not be the pain point that people are acutely feeling six months from now.

 Like even if you look at something like Beta versus VHS, they came out around the same time. Betamax was arguably a better technology in terms of video quality performance, but the big key advantage that VHS had was that you could record an entire movie on a VHS tape without having to flip it over and use the other side of the tape. So it solved something that was an acute pain point for users of tape at that point. And as a result it became more popular. 

And essentially, I think, the adoption and ecosystem is really key to the success of any sort of standard or dominant product is who's actually using it. Like you could have the best technical standard in the world, but no one uses it. Like who cares? 

RYAN DONOVAN: And for a lot of companies, it's not always in their interest to support broad open standards. If they have a proprietary standard that sort of gets people to use their particular formats, right? 

SEAN FALCONER: Yeah. I think that you see that with the history of like maybe the storage wars of like the warehouse data lake. And now we've gotten to a place where we've had the decomposition of the warehouse where we're moving to open table formats like Iceberg and Delta Lake and so forth, and sort of decoupling the data teeter and the object storage from the actual query engines.

And now I think the industry's going that way and companies like Snowflake are supporting it, even though Snowflake had its own proprietary format. But I think it's because the industry, for whatever reason, has really gotten behind these standards and companies are demanding that they want to be able to use these things in a vendor agnostic way. So then the key players also have to adapt their strategies and adapt to what's happening in the market and what they're hearing from their customers. 

RYAN DONOVAN: So these pathways to adoption, right now we see at least two agent competing agent standards, the MCP, the Model Context Protocol and the Agent2Agent, do you think there's benefits to either one? I think I see MCP being the one talked about more. Do you think it is winning because of habit or is it a better protocol? 

SEAN FALCONER: So I think they solved– they're focused on solving different problems. So MCP, which has been around longer, as far as I know, it was kind of the first proposed standard in the space and was proposed by Anthropic, I think in November of last year.

And it's really focused on trying to solve the problem of standardizing of like tool and resource integration. So if I'm building an agent, a key characteristic of an agent typically is that it can access tools, so it can go and perform certain actions, maybe hit an API endpoint, or go and update a record somewhere or execute bespoke code, or it could go and gather data outta the database and so forth.

And what MCP did or the standard did, was it proposed this sort of client server protocol for people being able to expose tools and resources through this standard so that not everybody has to go and essentially build their own tool integration to the same APIs. And if you're building one agent with three tools to integrate, you could probably do whatever you want to support that, but if you're an enterprise business that's presumably going to build thousands of these things with thousands of integrations, then you don't want everybody just like wild westing it and doing a different integration. So that was the problem that MCP was really focused on, like how do we start to put some standardization around tool integration resource use within agents. 

And what Google came out with four to six weeks ago in Google Next was a way to standardize agent to agent communication. The problem that they're trying to solve is potential fragmentation around agent ecosystems. So if I'm a business and I have glean agents, I have Salesforce agents, I have Cortex agents in Snowflake, and I'm also building my own agents and each department's using independent frameworks, one's using Land Graph and other one's using Auto Gen, how do I have all these different AI silos actually be able to communicate with each other? 

And essentially I had written an article about this around this sort of AI silo problem before the Agent2Agent protocol came out where I talked a little bit about this of, if agents continue to grow in adoption their enterprises have thousands of these. We're gonna run into this problem of rather than having data silos, we're creating sort of intelligent silos within the enterprise. And that's what agent to agents focus on is how do you break down these silos between ecosystems of agents, essentially. How do you create like the data mesh or the agent mesh for agents? 

So they're focused on different things. I think predominantly MCP has a lot more traction right now. It has now industry support from all the biggest players like AWS, Google, Microsoft, Open AI are all supporting MCP.

And I think one of the reasons, it's the leading standard at the moment is it’s solving an acute problem right now. It doesn't take long before you start to feel the pain of having to do your own bespoke tool integrations when you're building agents. And then it also was– it very well timed.

So it was early, but it was also timed to a point where the frontier models could actually reliably figure out which tools to use if you gave it the right prompts. If they had to try to do that a year earlier, I don't know that it would've been as successful because I think it would've been hard for a model to actually reliably make a determination about which tool to use.

And then Agent2Agent I think is fresher. It's newer and is solving, I would say, a problem that's probably more of a future problem, right, than necessarily a problem right now. 

RYAN DONOVAN: Yeah. The MCP seems like it's the web browser for using the sort of SaaS tools and A2A is Slack for agents to talk to each other.

SEAN FALCONER: Yeah. And then IBM has a standard as well called ACP, Agent Context Protocol, I believe and it's focused on also agent communication, but for– you're building a multi-agent system that is all supposed to work in collaboration. How do I standardize essentially the communication between those?

So less about this ecosystem of agentic mesh, but how do I make a multi-agent system that's supposed to do some sort of agentic workflow, I don’t know, claims processing. How do I make those things work together in a standardized way? 

RYAN DONOVAN: So it seems like, a lot of the standards we're talking about  that I've heard about are for agents. Do you think there will be other standards to arise in the AI ecosystem? Is there gonna be like an inferencing standard or something like that?

SEAN FALCONER: That's a good question. I definitely think that on the privacy security side, that's a big area of concern for many companies and also an area of active research. And also a lot of the people on the privacy side of historical like data systems groups like IAPP that have proposed various standards and so forth. There's a lot of movement there. The EU was the first area of the world to put forth like a privacy AI act a couple years ago as well.

So I think there's gonna be more and more standardization around, you know, what are the expectations? What guardrails should be put in place? What responsibilities do those who are building AI systems have to make sure that they're secure, they're not potentially compromising an individual’s private information or doing something nefarious in some way.

RYAN DONOVAN: To go back to the original, for these new protocols and new standards to be successful besides just a sort of  right time, delivery, and winner takes all adoption. What else can they learn from the sort of early protocols? 

SEAN FALCONER: Yeah, so I think there's a couple key characteristics of the people who've won in standards. So I mentioned this before, but they need to solve a clear, urgent pain. I think winners tend to fix something like everyone feels like I mentioned the sort of VHS versus Betamax, but also I think if you look at REST versus SOAP, I think the problem there was that they solve similar problems of integration standards for APIs, but SOAP was very bloated. So it became essentially a barrier to adoption and it created its own pain point where essentially it was solving a pain point, but it created a new one…

RYAN DONOVAN: Right.

SEAN FALCONER:…which was this is really hard to work with and it's really bloated. So simplicity, I think a lot of times like beats power. You could argue that SOAP was more powerful because it's more expressive. But REST is just easy, right? JSON’s just easier to write than XML, so everyone got behind it. I also mentioned this one where ecosystem– really the momentum around the ecosystem really matters, right? If people are, have a strong ecosystem behind them and key players in the industry are promoting them, integrating them, it starts to become this sort of self-fulfilling prophecy. And even if some better idea comes along, it can be a really hard to disrupt it.

 It's not impossible, like even if you look at browser wars, if you went back to early 2000s, it’s be hard to imagine a world where Internet Explorer and Firefox wasn't like the dominant players, but then Google Chrome came along and did something that new and it became like 90% of the market. So it's not impossible, but it's hard. I think one of the other keys is with these standards is openness really matters. Where if you have an open standard, then you can create an entire ecosystem of people who are contributing to it.

And everyone kind of gets a say even if it's really only the hyperscalers that kind of have the power to make, you know, push some of these standards at the beginning, then the fact that they're open standards, and this is like the key of things like open-source is anybody can contribute, anybody can have a voice, and that also really helps.

And then of course, the last thing that really matters is like timing and strategic backers. If you look at the people who are proposing some of these standards in the AI industry, it's like Googles of the world. It's OpenAI. It's Anthropic. Like some of the biggest names with the most market or like the loudest voice in the market are the ones that are helping really promote the standards and get an option. 

RYAN DONOVAN: Yeah. And they have some of the–  they’re the ones working at the edges of what these things can do.

SEAN FALCONER: Yeah. 

RYAN DONOVAN: And I think, you look at some of the later web standards like SSL and TLS or Web assembly came across because there were new problems that were developed. If you're gonna look at your crystal ball, what do you think the next new things that will need standards will be for AI?

SEAN FALCONER:That's a good question. I think that there's still a lot of fluidity when it comes to some of the internal workings in agents as well. Usually agents have some form of like short-term memory, long-term memory. They need a lot of times like shared state or shared context across interactions and right now most of that complexity is left to the developers to figure out. And that is where some of the central challenges of actually running this stuff in production really is. And it's essentially all the same challenges from like a distributed system standpoint that we've been solving for a long time.

But there isn't necessarily a clear reference architecture for how you actually build these things at scale. Everyone's still figuring that stuff out. So I think at some point as the market starts to mature and more people have built these systems at scale, they'll become these kind of reference architecture.

It's just like if you look at even early uses of something like Apache Kafka, when large players in the industry, like a Netflix comes out and they're like, Hey, this is how we're running our 10,000 microservices at scale within our company, and we're using Apache Kafka, we're using these other technologies to essentially make that those start to become sort of the reference architectures for, oh, okay, this is how you do this type of thing.

RYAN DONOVAN: Right. Yeah. Specifically in reference to Kafka, I'd heard that there were other hyperscalers that had their own internal SQS messaging or event handlers that just didn't get adopted because they weren't open, right?

SEAN FALCONER: Yeah. It goes back to what I was saying, like what the standards that win openness often is a characteristic of the standard.

RYAN DONOVAN: Yeah. So I think a lot of people interact with AI through the natural language prompting. And there's always prompt engineering. Do you think that there'll be a sort of standards point there where it'll go back around to where, people will no longer be natural language, there'll be some sort of like specific prompting language built in.

SEAN FALCONER: That’s a good question. There has been some work in that space. There's a few different tools that exist out there that allow you to do prompt engineering in more of a engineering sort of structured approach because it is challenging to know like am I actually creating a better prompt?

RYAN DONOVAN: Right (laughs).

SEAN FALCONER: Or if I do something where I get feedback that the outcome is not the expected outcome, or the outcome is not as good as I want, how do I factor that feedback in? And usually that requires some change of the prompt. So how do I do that in a way that's going to lead to success?

And I think that's another area where it's a little bit difficult right now. We mentioned other areas for improvement. I think like testing and evals is another place where there's not really a standard around it, there's a couple of different companies that have product offerings in that space, but when I talk to customers that are doing some of the stuff in production, I would say the majority of them are rolling their own testing eval suites to be able to do this stuff. And currently, so much of building things like agents or even any kind of gen AI type of application is a lot of writing a prompt testing, iterating, see how it performs.

And it essentially fast tests, cycles of iterating. And then if I make a change, I need to know if that change leading to something better or is it actually degrading performance? And that's what– where evals and testing is so important and critical, but there's not really– I think like again, an industry standard built around that.

RYAN DONOVAN: Yeah. Again, in terms of testing in the eval, do you mean LLM evaluation or some sort of glance into the internals of the LLM? 

SEAN FALCONER: More I would say use case specific interaction with an LLM. So if I'm building some sort of agent and let's say that agent is supposed to do something like first draft for loan underwriting approvals, right? Then presumably I'm probably gonna break that agent up into a bunch of different subagents that are gonna be responsible for different parts of that process. And I'm going to write essentially some prompt or several different prompts that is going to tell the model what his role is and what my expectations are in terms of processing the input and processing the output.

So the advantage there from a testing perspective is I am closing the world of potential responses significantly, which is really, really key. This is going back to “What are the key architectural patterns for this?” I would say this is one of the keys is like you really need to limit the scope of any particular inference call if you want to be able to get reliability out of it and also be able to test it.

So you need to put essentially limitations to what the potential inputs and outputs are as much as possible so that you can actually build a test set that allows you to have some confidence that given inputs that are representative of the types of inputs that I can expect in production, these are the types of outputs that I should perform similar to essentially testing.

Traditional predictive ML models where you use precision and recall, the advantage that you have with purpose-built models is you have a very good understanding of what the inputs and output expectations should be. The challenge with foundation models is, especially if you're solving these very open world, prompt based approaches, is you have no idea what someone could enter into the model, and you can't build a test set to test every single variance that could come up. But I think for most business use cases, they're much more constrained and you can break it down and you can put more constraints essentially around what the inputs and outputs are, and that's where testing is really key.

RYAN DONOVAN: You know that talking about the predictive ML models and the guardrails it's interesting because at one point is it just too much of a hassle, too much of a risk using the generative AI and just going and getting a traditional ML or a stack of conditionals to handle whatever AI you're looking for?

SEAN FALCONER: Well, I think it depends on what you're trying to accomplish. I do think that we're in a place where these models are super powerful. Everybody's really interested in them, and we end up sometimes applying them without really necessarily thinking about alternative things that we could be using. You don't necessarily need a fully dynamic like autonomous agent that I give it a prompt and it loops essentially unbounded and performing inference until it solves whatever task like I think that's gonna be a very hard thing for you to productionize. But I see a lot of people doing that because they're attracted by the fact that oh wow, these models can do really powerful things, but sometimes it's a little bit like you're cutting your steak with a chainsaw.

RYAN DONOVAN: (laughs)

SEAN FALCONER: Like you don't necessarily need the superpower model to do something like sentiment analysis or some sort of classification problem. You don't need these heavyweight models that are also, sometimes they're expensive to run. So you have to pick and choose. And I think when it comes to building agents or even compound AI systems, a lot of times we're combining– we might be combining a generative model with also applications of predictive models, and you need to know when it makes sense to do one or the other. But I do think that there's definitely real use cases for using the large language models or other generative models, it depends on what you're trying to accomplish.

And there are ways to, I think, also get reliability out of them, but you need to know what the limitations are and also understand in the particular problem that you're trying to solve. What is your tolerance for errors? And can you get essentially the reliability out of the model at a level that you're comfortable with, and you can essentially accept whatever downside risks that you have if something goes awry.

RYAN DONOVAN: Yeah, maybe that's a new standard that needs to be developed. Like the standard of use cases, when do you do this? What are the sort of conditions, what's the solid programming of generative AI? 

SEAN FALCONER: I think a lot– like my advice usually for businesses today is to, especially if you're just starting, you obviously want to target something that I think is high ROI, but like least effort. You want to balance these two axes and you also ideally are starting with something that's probably not customer facing, but is augmenting some sort of existing human process internally. And that could be something like it just is in charge of essentially training the first draft of some sort of email or some sort of approval process where you're today paying someone to go and stitch together like multiple pieces of information just to get to that point where they understand what's going on and can make some sort of determination.

If you can take something that maybe takes two days of work of a person today that's trying to stitch together multiple systems and you already have a workflow and playbook for it because you have humans doing it day in, day out, well, that's a great target for automating it at least to some level where within a few minutes of running more of this like agentic workflow, you could get to a place where you can, send that first draft to somebody and they can edit it. Because a lot of times editing something is much easier than solving the blank page problem. 

And I think that's why we see AI being so successful with engineering right now is because there's a lot of checks and balances in any product that is being engineered before that thing tends to hit users. If you're using a copilot to generate code, presumably it's not just shipped directly into production, there's compilation cycles, there's unit tests, there's integration tests, there's  staging servers, there's progressive rollouts and all this type of stuff that goes into it.

I would assume the engineers, hopefully, are looking at the code in the first place and assessing whether it's functional or not. So there's a lot of checks and balances before they ever surface into something that's going to touch a person or production system. And I think we also have to be thinking about similar opportunities, but outside of engineering, within the business, augmenting existing knowledge workers to be able to do their job more efficiently. 

RYAN DONOVAN: And you talked earlier about the sort of fall of SOAP to REST. With these standards, should we be concerned about future-proofing them against possibilities, should they be flexible or is that something that doesn't matter, that there'll be another better standard to come along if there runs into a pain point? 

SEAN FALCONER: I think it's hard to know. Of course, a hundred percent what's gonna come along with any standard. Presumably the people who are behind the standards are trying to think about making them somewhat adaptable. But there are like, I think we still feel challenges even with things like HTTP where there's certain limitations to protocol that people have to try to program around. Maybe that's why we have so many different ways of building web applications. There's– we ebb and flow between, hey, we're gonna do everything on the server and have a really thin client to having a really fat client. We're doing everything on that side or to some sort of in between, every few years that kind of changes. And I think sometimes that has to do with some of the lim limitations, the underlying protocols that we're trying to essentially engineer around.

I think we might be in a similar state eventually with some of these things, but I think my advice for businesses that are looking at building any sort of gen AI type of application today is they really have to architect for adaptability because even if something like MCP seems to be coming the de facto standard, there's no guarantee it's going to be there a year from now, there could be other competing standards that come along, and not everything is even in a state where they're supporting MCP.

 So you have to essentially build for flexibility because a lot of this stuff that we're building, we're basically trying to build these like really beautiful castles on top of a really brittle, shaky foundation that sometimes has earthquakes.

So you need to be like engineering with that perspective from the get go and understand that I might need to swap out a model at some point. I might need to change a standard or support multiple standards. I don't wanna get too locked into one way of thinking or one particular vendor, or it's going to prevent me from necessarily taking advantage of the latest innovations.

When it comes to building and engineering these things, you have to really be building for adaptability and flexibility from the get go, and understand that it's a very shifting market, but you also can't wait for stabilization or you're just going to be so far behind that no one at that point cares what you're doing, right?

It's a little bit like the early days of the home PC or something like that waiting for chip sets to stabilize. It's oh well Penem just came out with the 133 and I was on, I was on the  hundred before. I'm gonna wait until the chips stabilize before I buy a computer.

Fast forward 30 years, you're still waiting. So I don't think you wanna get into a place where you're waiting for stabilization. You have to essentially be architecting for sort of adaptability from the beginning.

[Outro music]

RYAN DONOVAN: Thank you very much ladies and gentlemen for listening. It's the time of the show where we shout out to somebody who came to Stack Overflow, helped out the community, dropping knowledge, sharing curiosity. Today we're going shout out a promoter badge. Somebody who came on and offered the first bounty on their own question.

We’re going to shout out Trncvs for offering a bounty on their question, “Loadrunner lr_get_attrib_string always returns null.” Always a pain.

 I have been Ryan Donovan, host of the podcast, editor of the blog here at Stack Overflow. If you wanna reach out to us with topics, suggestions, comments, concerns, you can email us at podcast@stackoverflow.com. And if you wanna reach out to me directly, you can find me on LinkedIn. 

SEAN FALCONER: Awesome. And I'm Sean Falconer. Again, thanks for having me. And I’m a Confluence AI entrepreneur in residence. If you want to learn more about Confluent, you can check us out at Confluent.io.  And if you want to connect with me, you can find me on LinkedIn. Just look up my name, Sean Falconer. There's not a lot of us out there, so you should be able to find me. 

RYAN DONOVAN: All right, good luck finding him and we'll talk to you next time.

[Outro music]