The Stack Overflow Podcast

Understanding the limitations of AI is crucial for enterprise success

Episode Summary

What challenges do organizations face when adopting AI, and why is understanding its limitations key to success? In this episode of Leaders of Code, Ellen Brandenberger, Senior Director of Product for Knowledge Solutions at Stack Overflow, sits down with Dan Shiebler, Head of Machine Learning at Abnormal AI, to explore the complexities of AI adoption, the importance of understanding its limitations, and the ethical considerations involved.

Episode Notes

The discussion also:

Touches on the role, evolution, and adoption of AI agents, emphasizing their growing integration into systems, while addressing key safeguarding measures to ensure AI agents can accurately use data to reason effectively.
Explores how Abnormal AI utilizes AI to detect and protect against cybersecurity threats, and how Dan and his team are leveraging AI to drive compounding productivity within their organization.

Connect with Dan Shiebler on LinkedIn and learn more about Abnormal AI.

Episode Transcription

(Intro music plays)

Ellen Brandenberger: Hi everyone, welcome to the Stack Overflow podcast. Another episode of Leaders of Code, where we chat with tech leaders about all things security, leadership, technology and innovation. My name is Ellen Brandenberger. I'm the Senior Director of Product for our knowledge solutions here at Stack Overflow. And I'm here talking to Dan from Abnormal Security. Dan, could you tell us more about you, your role, and the company you work for?

Dan Shiebler: Yeah, absolutely. My name is Dan Shiebler, I'm the head of Machine Learning at Abnormal Security. So, we're a cybersecurity company. We protect over 20% of the Fortune 500 from a mixture of different kinds of cyberattacks, including email attacks and account takeovers. Our detection engine is AI-based. It utilizes a combination of more traditional statistical machine learning systems, as well as newer large language model types of approaches in order to identify attacks. I've been in the security industry, really only during my time at Abnormal over the last three and a half years, but I've been in the AI fields for about ten years. I did my PhD in applications of category theory to machine learning, including embedding algorithms, manifold learning, and optimization. I spent about five years at Twitter working on vector databases and large language models, recommender systems, and ad tech. And I built some sensor data analytics systems before Twitter at a company called True Motion in the car insurance space. We were acquired by a larger company. And so, really been a great experience seeing a lot of the different components of how machine learning systems are able to be applied in the security industry and really bring good to the world in doing so.

Ellen: Yeah. Well, so, today I want to talk a little bit about sort of, AI implementation in organizations, how that relates to cyber, and sort of the emergence of AI within the space. And, you know, there's certainly a lot going on in the industry right now. And so maybe, maybe we start with that, which is, could you talk to me about the challenges that you're seeing in implementation with AI in organizations? Whether that's in software development or cybersecurity? How are things going? What are organizations struggling with? And where does your team succeed, and what are you doing well?

Dan: Yeah, absolutely. So I would say that this really splits into two very different categories of implementation, which have really different elements, in terms of, what are the challenges, what are the objectives, what are the ways in which organizations can be most successful? One is building AI into products. This also decomposes into things that are utilizing more new-age large language model systems, which tend to be a little bit more amenable to teams that haven't spent as much time utilizing these kind of systems in the past, versus building larger-scale machine learning systems that require lots of internally owned data that you're able to optimize and tune for your particular application. And so that's all within the bucket of incorporating AI into your products. But then there's incorporating AI into your teams and your teams' operations, which I think there's a great deal of that as well that we see, both in terms of improvements in software development, improvements in go to market structures, improvements in knowledge discovery. And in the first category, in terms of implementing AI in products, there's a number of different types of challenges that really are unique to when you're reasoning about systems that are inherently stochastic. AI systems, they generate different outputs based on different inputs, sometimes in very unpredictable ways. It's not always easy to understand how different kinds of changes actually will produce outputs. And a lot of things that you need to do need to be inherently empirical. And when you're working in an inherently empirical space, a lot of problems become data science problems. Of, how do you analyze and evaluate the performance of your system? How do you benchmark the performance? How do you make improvements to it? How do you track those improvements? How do you reason about how different kinds of changes in your data shift, because you want to onboard a new kind of customer, those different types of data they're going to have in their systems? Or if you want to make trade offs in terms of what kinds of system shortcuts you can apply to save costs or reduce latency, et cetera. How those different things will impact the actual performance of your product or the customer experience will be. A lot of that is very, very difficult questions to answer when you have AI in the loop. And it requires a lot of data science, empirical approaches to how to optimize these systems. And that's common whether we're utilizing newer large language model out-of-the-box types of approaches, or if you're fine tuning or building your own machine learning models on a particular task that you have a lot of data on. In both those cases, in products. In workflows, a lot of it is around upskilling people and enabling teams to understand the limitations of the tools that they're utilising, as well as how to optimally meet their tools in the middle. If you have a team of software developers experimenting with AI coding tools, two teams of software developers who are exactly the same level of people with the same degree of experience and ability may have very different experiences in terms of how effective these tools are, just based on whether the code bases they're working with happen to be well-suited for these tools or not. Or whether the kinds of tasks that they're going to be doing tend to be well-suited to how things work out of the box. And so, there's a two-component process of both understanding where limitations are and educating teams in terms of how to work around those, as well as modifying your workflows, modifying your code bases, modifying your documentation stack, modifying the integration points between your systems, and the tools that you utilize in order to get the most out of them.

Ellen: One of the observations that I've had in the industry right now, particularly from a more layman perspective of product. It's interesting to watch systems that are non-deterministic, like AI, being implemented in a world that's sort of used to SaaS, where, like, everything is deterministic and linear and maybe works out of the box. But sort of understanding those variable use cases, benchmarking them, evaluating their performance, and then improving on them is a much more iterative process. And so, making that shift from, whether it's in how you build products that are driven by AI, or even how you implement those tools within organizations. Both sides kind of have struggled with that, I think, in the industry as well. So there's certainly those challenges across the board. You know, with that, I think there's a lot here and in the industry more broadly. You have deep experience in machine learning, as you articulated earlier. How do you differentiate between what's sort of the next hype cycle in tech, versus, like, a really good, strong reality that can unlock doors for innovation and the next wave of products or efficiencies for either teams or organizations? Like, how do you think about the different pieces of that?

Dan: Yeah, I mean, I think that there's two things to think about. One is, whether a system is really, in its strengths, effectively tackling what the bottlenecks are, on certain problems. And then the other is, whether the limitations that exist right now are limitations that we naturally expect to be things that fall away with time, or if we expect them to be things that are somewhat immutable in their structure. And for the first one, one of the biggest limitations in machine learning over, throughout my entire career, is availability of data to train models. That, if you want to build a machine learning model that accomplishes a task, you need to have a lot of data of, examples of that task to be able to train your machine learning model. And your model is very unlikely to perform well on tasks that are substantially different than what it's been trained on, both because there's a feature input space that is fixed and the model doesn't really work with anything outside of that space. And because there's a single task that's performing. And, if you wanted a model to perform a different task, you need to train a new model to do that. Large language models are a complete paradigm shift in terms of their, at one point was called their zero-shot generalization capabilities, and now people just sort of refer to it as just, the capabilities of large language models, in that they can perform tasks that they were not trained on. If you can think up some tasks, you can try a large language model out on it, and a lot of the time it will do pretty well, if you present the information to it in the right way. And if you set the appropriate expectations of what its output will be. And so that's a complete paradigm shift, because it takes something that's just a fundamental bottleneck and decomposes it, right? it breaks it down entirely and essentially removes it. It unlocks an entirely new set of kinds of products that you can build. It completely changes the equation of whether or not a product is something you can build by yourself from scratch in your room, or if it's something that requires you to have a tremendous number of customers and data that is blocking you being able to build that. And when we think about the other component here, which is, what are the limitations in what we're building, and then are those limitations things that we naturally expect to fall away? There's a, I guess, more of a blog post that was written in 2019, I think, called The Bitter Lesson. That's reasoning... It presents a description of how a lot of tricks that people apply to artificial intelligence systems, machine learning systems, that don't naturally cause the system to perform better with increased compute and increased scale, will often be outperformed by systems that can perform better with increased compute and increased scale. And so, when we think about, what are the innovations that are happening in the industry, what are the ways that things are improving? Any kind of innovation that is something that opens up the capacity to see substantial improvements as we increase scale, or any kind of limitation that we see, or if we expect that the increased scale can help us break through that limitation, then that's not a limitation to think too deeply about, because over the course of a couple of years it will likely fall away. Whereas things that are more foundational and things that are less likely in their core to be something that can be simply removed by increasing scale, by increasing adoption, by increasing things that we expect to naturally have tailwinds behind it, those are things to pay a little more attention to as foundational limitations.

Ellen: Do you think within the AI space, there are... You know, you alluded to some, but are there foundational limitations in the AI space that you think are inherent risks? Or do you expect certain ones to fall away with scale?

Dan: Ultimately, any kind of probabilistic model is sometimes going to be wrong. It's sometimes going to generate what's sometimes known as hallucinations and be able to, you know, have outputs that are logically inconsistent in their structure. This isn't something that naturally falls away with scale. It's something that can improve with scale a little bit. But ultimately, these kinds of inconsistencies that are drawn from the absence of a well-structured world model are always going to be present at the edges, present at the core of a lot of the systems that we're working with and systems that we're reasoning about. And the way that systems today are effective at operating without those kinds, even when those kinds of problems are in place, is by building checks and guardrails and non-AI components to systems that are AI-centric. So that you have a mixture of more traditional approaches and of large language model approaches in order to deliver experiences that have higher degrees of predictability for end users.

Ellen: What are some good examples of those checks? Like, you talked about, whether they're systemic or people. Could you give me a more specific example, maybe from your own work? I'd love to hear if there's examples coming out of Abnormal that are relevant here.

Dan: Yeah, absolutely. So I can give two examples of... One just a general example that's very relevant for operations and operational teams, and then the other for just products that we specifically build at Abnormal. So, from an operational example, very simple to just look at code. And look at AI models that generate code. Code has a natural structure to it, and when it deviates from that structure it's often very easy to spot. Sometimes it will cause syntax issues, and sometimes it will cause runtime issues. And you can spot these by simply running your code through a linter or trying to run your code itself. And a lot of AI systems that are really effective at writing code and doing multi-stage programming tasks have within them calls to systems that perform linting or perform runtime checks, or even run unit tests or integration tests as part of their execution. And so by sticking together these more structural automations with large language model generation you could... You have a system that's much more reliable in terms of its infrastructure. And at Abnormal, we have a similar principles for attack detection. We have large language models that perform attack detection, but these sit alongside more rule-based systems that are looking at the structural components of messages that we're processing. Things like indicators of compromise and behavioral traits that are extracted. So, these work in concert, where the decisions of large language models are then checked to fit into larger rule engine systems that check them against other components, and then those outputs feed back into other large language models. Building a system that has checks on both sides and concordance all throughout.

Ellen: How successful have those been? And, like, what challenges did you and the team kind of have with implementing something like that? I'd love to hear.

Dan: Yeah, I mean... So there's a lot of challenges. As you might imagine. There's... In general it's very effective. We've seen that the combination of more traditional approaches, along with agentic systems, is really good at enabling us to both sweep up attacks that come from any degree of maliciousness present in it, while also being able to filter out the types of obvious false positives that somebody who's an analyst looking at an individual message might be able to say, oh, I could see why this is a false positive, but I could see why something flagged. Large language models are very good at understanding that and being able to integrate that information. Integrate information together with why something was flagged and why something was recognized, the contextual information flows into it, and then make these decisions. So you have these, sort of, two different types of things that work very well together. One is where you have a more rule-based approach for picking up things that might be malicious, and then a agentic approach for investigating and making decisions on those. And then the other is a more machine learning model approach for picking up things that have elements of suspiciousness and elements of, anomalies to it. And then more rule-based approaches that are checking for known patterns in an organization's environment. So these two fit together very, very well.

Ellen: Yeah.

Dan: I would say, a lot of the core challenge comes down to evaluation of interlocking components. And anytime you have multiple pieces to a system that need to fit into each other, you need to evaluate them both individually, and then as a system, as a whole. And that can be very complicated in order to think about, how is somebody working on one component of the system able to reason about the performance of the way that their work contributes to the whole, especially if somebody else is working on the other component at the same time. A lot of coordination. People problems are always the hardest kind of problems.

Ellen: Exactly. I mean, if you think about even the example you just gave of, like, a false positive with an analyst, there's also a human in there, right? Like, asking questions of like, okay, maybe I'm giving signal that this is a false positive, but I still have to independently decide if that answer is right and evaluate that as I do my own analysis. But then, you know, more broadly, to your other point, contributing to machine learning systems and AI systems going forward might be the new version of contributing to a larger code base. And so, how do we think about building teams that build AI in coordinated ways? Cool. So with that, I want to shift gears a little bit. You mentioned AI agents, which... Before we go into this, I think, you know, there's certainly folks in the industry who know a lot about agents, but I think it's one of those topics that maybe there's not a shared definition of. Maybe before we talk a little bit about how agents impact Abnormal and sort of, the work that you're doing. How would you define an AI agent? And I'm happy to share, kind of, how I've been thinking about it as well, after that.

Dan: I guess, in order to build a definition that's actually useful, we need to acknowledge that, for over a decade now, every kind of large software system of any size has had components in it that are mediated by machine learning systems. There is that... Any kind of decision that has any degree of relationship to revenue or user experience has, at some point in its processing pipeline, a machine learning system that's making a call. Classifying something or predicting something. And so, if we just wave our hand and say it's, an agent's anything that involves a machine learning system or an AI system anywhere in it, then we say everything is an agent. That's not very useful. I would say that if we're going to bring in the definition of agent, we need to be a little more specific than that. And I would say that, it's a system where there's decisions being made in how execution is going to be routed that's based on the decisions of a large language model, that is not just a single binary classification decision or multimodal classification decision that's pre-baked in advance, but instead, something that actually has a true degree of control flow mediated by a large language model at its core. And there's a lot of different forms of that, between these more unstructured kinds of approaches where there's a high-level objective and that's fed into a system that has lots of tools, or these more structured approaches where there's agents that are given individual subtasks that are sent out in swarms in order to put together overall components. But ultimately, I think once you reach a certain point of control flow mediated by large language models, then you can sort of graduate to your system from software that has a large language model in it somewhere to a agentic system.

Ellen: Yeah, I really like that definition. I'd previously been sort of talking to my own team about this through the lens of, like, you know, typically I think of an agent as three things. One being, you know, some computational logic, to your point. Some system that's non-deterministic, that might make decisions based on the output of a large language model, or some form of machine learning, and then some data set that can help inform those systems. But to your point earlier, it's probably a little bit too broad. So I like the narrowing in based on, kind of, the outputs of the large language model and having higher level of agency baked into the word. So that in mind, could you talk to me a little bit about, you know, whether it be Abnormal or broader trends you've seen in the industry, what are some key challenges that the space is facing in terms of ensuring AI agents can accurately assess, access and use data to reason effectively?

Dan: So, I think that there's both security challenges and performance challenges that are top of mind for a lot of people. I think from a performance perspective, a lot of this is just, there's traditional challenges that are associated with machine learning systems that have always been present, they're very difficult to reason about. AI and large language models make it possible for larger groups of software engineers in order to build AI into many more products much more easily. When you're able to just take a product and just stick it out there, you never need to actually train a model, it's easy to forget about the fact that it's an inherently stochastic system, and then you need to reason through all of these performance elements. So I think we're taking this problem that used to only exist within systems where you had to do a whole bunch of pre-work in order to launch it, now sticking it into everything. There's performance problems and performance management, performance optimization. I think it's very much just the same old machine learning problems, just brought in and now scattered all over everybody's codebase, that's touching large language models. The security issues I actually think are quite unique, though. Because I think that there's a number of different elements here, both within the degree of access that people want from large language models. A lot of the time, in order to give your large language model access to the data that you want it to have access to, and access to the systems that will make it most effective as a tool that can do a lot of work for you, you need to reason about how it's going to be able to touch all of those different systems, touch all of those different pieces of data in a safe way. Your role-based access controls that you've been able to implement for your workforce, you now need to propagate out to all of the agents that the different people in your workforce are being, are utilizing. Your reasoning about the way in which data that is fed into a large language model might be, essentially, accessible by anybody who touches a model that has that data in its context, it's very easy to prompt large language models into spilling anything out. So any data that's touched by an LLM is basically totally public, with very little effort to anybody who is interfacing with it at all. And so that's a... That's not necessarily how many people have been developing it, developing these systems with that in mind, with that degree of... Just openness in terms of data access. But it really is necessary if we want to be able to maintain the kinds of security and privacy that we need from enterprise systems.

Ellen: And I know for many enterprises, like, that security piece is sort of a table stakes feature in terms of adopting AI systems. And so, there's quite a bit of anxiety out there right now. Can you talk to me a little bit about any ethical considerations that might come into play? Whether that be, you know, how that data is used, or how things interact with users? And, you alluded to this earlier, but like, what safeguards make the most sense to help enterprises as they start thinking about AI agent adoption?

Dan: Yeah. I mean, so I'll say one thing, which I think is one of the largest ethical concerns that we think about at Abnormal Security, which is malicious actors utilizing large language models. And utilizing AI systems in order to improve the quality of attacks that they're sending out. This is a very serious problem. We've seen a substantial increase in threat actors utilizing artificial intelligence systems in order to craft more realistic phishing emails, or to automate OSINT workflows, in order to be able to simply uplevel the degree of effectiveness of their malicious campaigns. The reality is that AI tools are a tool that enable people who are less skilled technically to be able to operate as well as people who are more skilled technically. And this has both positive effects, in terms of the vast majority of people who utilize these tools, and it has negative effects, in terms of enabling bad actors in order to be able to send out more malicious things to all the victims that they're targeting. So that's one very, very large element, certainly. There's also substantial components in terms of the privacy considerations, of course, and how somebody is able to, a bad actor is able to potentially get large language models to be weak points in company security infrastructure by introducing large language model-mediated systems. If you're not careful to be able to propagate the same kinds of access controls with the perspective that anything a large language model touches is completely open to anybody who touches it on the other side. If you don't have that baked into the way that you've designed your system, then you open up to user data being leaked by people who are interfacing on the other side, and that itself has a number of concerns as well.

Ellen: So one last question on agents before we shift gears a little bit. So, how do you see agents evolving? Do you see them becoming more popular? Do you see their adoption rising? What does the future look like, essentially?

Dan: I certainly think automation is the theme today. I think that there's a tremendous number of workflows that were able to be automated by computers, and by the introduction of really nice tools that people have been able to utilize in order to take systems that that would require a lot of manual steps and be able to automate them using code and computers and APIs and workflow orchestration systems. And I think that there's a lot of steps in those processes that, before large language models, before agentic systems, were almost impossible to automate in a reliable way due to the degree of ambiguity of what would need to be performed at different stages. And now that's no longer the case. At one point, there was, it was very difficult to automate anything. Then we were able to automate quite a lot, and there's a sort of concentric sphere of things that we couldn't automate yet. And now we can automate a lot more things. There's just a lot more stuff that we're able to automate that we weren't able to automate before. And so, I think that that's a huge thing that we're going to see across every industry as people figure out, what are the things that today, or a year or two ago, were almost impossible to automate, and now is completely possible to automate to acceptable level of performance. And I don't think anyone knows exactly what these lines are. Like, what are the things that are and aren't possible to automate. But I'm sure everybody who works in any field will probably think of a couple of candidates that they see. There's certainly, at Abnormal, there's quite a lot of things that... Workflows that we've seen in the past have required human in the loops, and now we've been able to automate them and really be able to increase productivity across our organization and the performance of our products through those additional automations that we're introducing.

Dan: Yeah.

Dan: I think, in particular, the ability of AI systems to write code really is one of the most compounding effects, because of the fact that the code itself can then do things like produce automation and improve the performance of various systems and be able to fill gaps.

Ellen: Yeah, or find bugs, or commit code or whatever it may be. Yeah, exactly. I often, as my team builds products, I often ask them, what are some things that are possible with this technology that weren't possible two years ago, six months ago? And I think, you know, A, being in the AI space can be challenging, but it can also be very exciting to sort of ask ourselves that question and have a really good answer. And we're able to automate away some of those things that ultimately aren't to our benefit short-term. So, let's change gears a little bit. So, are your teams using AI right now? I know you build AI systems, so... (Laughs) I suspect the answer to that is probably yes. But, what are some of the benefits that you and your teams have seen there? And ultimately, what really excites you about the industry overall?

Dan: Yeah, we're using AI extensively. We actually track utilization of lines of AI-generated code very religiously across our organization. We have a number of different internal hackathons that are based around trying to build as much as we can without writing a single line of code ourselves, just prompting systems. And even a lot of work that we've done... One of the things I think that's most effective within our organization has been the generation of artifacts that enable AI systems to be more effective. So, these are things like rules files for AI systems that give them context on our organization, on some of the sharp edges in our code base, on the structure of our team, on each of the different ways that our information is organized in order to make them more effective. We have internal systems that give us access to all of the kinds of organizational context that is present, so that anybody who's looking with a question for a context that might be hidden inside of a Slack channel somewhere, or hidden inside of a conversation somewhere, or in code, can be surfaced very easily. And the net effect of a lot of this is a compounding productivity that comes from making these kinds of investments in order to make AI systems work for our organization. A lot of these... Things don't necessarily work out of the box. It takes a little bit of investment. It takes a little bit of key decisions in order to shape our organization, shape our artifacts, shape the tools that we use in order to match it optimally for our environments. But they're, the compounding effects are very real.

Ellen: So last but not least, you know, what's exciting for you in the AI industry right now? Whether, you know, it's in the space around Abnormal or more broadly? Sounds like you certainly have a passion for this space. But, what maybe gets you out of bed in the morning?

Dan: I think the pace of innovation is incredibly exciting. I think the fact that people are being able to do so many different things with these tools and think of creative new ways to utilize them. I think... I mean, what gets me out of bed in the morning is our mission. I love stopping attacks. I really feel like we're providing a very crucial... It's not many people who get to feel like they are on the front lines of the war against existential threats. Cybercrime is an existential threat. And fighting it with, you know, all the tools at our disposal, and AI is a very powerful tool, is a very exciting thing and I love being able to do that. And I think that... There's a lot of advancement that comes at every way. Tools that both enable people to become more productive, and also enable the products that you build to be more effective at the same time, is something really worth spending a lot of time thinking about.

Ellen: That's similar for me as well. Like, the idea that we can both build better products for developers and technology companies, and then also, build better ways for our teams to work. It really makes me excited to be at Stack Overflow as well. Although we're not in the cyber space, we certainly power developers and cyber teams across the world. So, good to share that goal, albeit indirectly. Dan, anything else you want to add about topics of discussion that we've had today about the AI space, or about your work at Abnormal, before we wrap up?

Dan: Yeah, I mean, I think that... I would just reiterate that AI systems are a bit of a double-edged sword, in terms of the complexities that they introduce, the difficulties of being appropriately responsible in how data flows that are required in order to get AI systems to work can be done in a safe and secure and private fashion. And they require a lot of upfront investment in terms of really mapping how they'll fit into an overall organization structure in order to build that, that buy-in and that this is a people problem more than anything else is a people problem and an education problem and an investment problem. But the results are truly transformative in terms of every area where they touch. And the results are... They're self-compounding as well. I would encourage organizations and people who want to utilize these tools to take the time to think through all of these kinds of challenges and all these kinds of hurdles to jump over, but to not lose sight of the very, very real benefits that lie on the other side.

Ellen: Yeah, I love that metaphor. I think as organizations think about adopting a lot of these technologies, unlike, you know, SaaS or previous technology waves, the upfront adoption cost is probably a lot higher. But the benefits could also far outweigh that as well. So, you know, starting to see that within Abnormal, as you pointed out, but, you know, as we work with partners at Stack Overflow in the space, we certainly have seen that as well.

(Music plays)

Ellen: So thanks so much, Dan. Appreciate everyone for listening. I'm Ellen Brandenberger and you can find me on LinkedIn under my name. Or, if you'd like to learn more about Stack Overflow, you can go to stackoverflow.com to see our millions of questions and answers, or stackoverflow.co, c-o, if you're interested in learning more about the company, Stack Overflow. And we are hiring as well, so if you're interested in taking a job with us, take a look at our site. And Dan, I would love to hear where we can find you.

Dan: Yeah, absolutely. So you can learn more about Abnormal Security at abnormalsecurity.com. You can find me on LinkedIn. My name is Dan Shiebler. Please feel free to reach out.

Ellen: Thanks so much, Dan. We appreciate having you on.

(Outro music plays)