The Stack Overflow Podcast

Boots on the ground: Holistic AI and Audioshake at HumanX

Episode Summary

Two interviews for the price of one, direct from HumanX 2025! Ryan sits down with Raj Patel, AI transformation lead at Holistic AI, and then chats with Audioshake cofounder and CEO Jessica Powell.

Episode Notes

Holistic AI is an AI governance platform that helps the enterprise adopt and scale AI.

Audioshake uses AI to mix, master, and separate music and other audio content.

Learn more about HumanX here. Feeling the FOMO? The event takes place again on April 7-9, 2026 in San Francisco. Early birds can register here.

Connect with Raj on LinkedIn.

Connect with Jessica on LinkedIn.

Episode Transcription

[intro music plays]

Ryan Donovan Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I am your host, Ryan Donovan, and today we're bringing you a special episode, two interviews for the price of one, recorded at HumanX. Today's guests are Jessica Powell of AudioShake and Raj Patel of Holistic AI. Please enjoy, and I'll talk to you after the show. Welcome to the show, Raj. 

Raj Patel Thanks. Pleasure to be here. 

RD Governance, as I've encountered it, covers a pretty wide range of things for AI. What are the sort of things that you're thinking about and you're covering? 

RP As our name implies, AI governance needs to be thought about holistically. It has to cover risk appetite from the board all the way down to people on the ground who are deploying AI. People need to feel empowered in order to deploy AI at scale safely, and AI governance really is the vehicle in order to make that happen. That covers everything from risk appetite by department by vertical, all the way through to people training.

RD So the risk appetite is managing how many hallucinations are you comfortable with, how much explainability and attribution do you want, what are the legal ramifications, that sort of thing?

RP Exactly that. It can go as granular as hallucination level, the amount of toxicity that you are comfortable with in a particular output which can vary greatly from a client-facing chatbot where you'd expect it to be near zero to something internally which you might have a bit more flexibility on. But the key driver of this entire conversation is really how much trust you put into your AI use cases and how much visibility do you have around that. As we see AI becoming more pervasive into the business decision making mechanism, companies are asking, can I trust this, and what are the metrics around this that allow me to have confidence in that trust?

RD That trust gap is something we've seen in our surveys. We did some AI surveys where 70% of the people responded were using AI but only 40% trusted it. Do you think that's just hallucinations or is there something else to it? 

RP Hallucinations is a big part of it. There is a fair amount of scrutiny around bias, the interpretability and explainability of AI models is also key. I think the statistic you put forward is one that we resonate with as well. When we look to some of the conversation we've been having in the market, 80% of the CTOs that we speak to have either an active AI program or one that's in sandbox, but 60% of them see trust as the main barrier to scale. You spend a lot of time and a lot of efforts setting up robust sandbox environments that you can test increasingly autonomous systems, and then when you get to the point of execution, you find different roadblocks getting in your way, whether that's a clear roadmap of ROI understanding, whether it's collaboration from different departments that should have been brought in together. So legal will have an opinion on this, compliance has an opinion on this. It's a multifaceted, complex problem that requires collaboration across the entire business, not just looking at some of the statistical foundational questions as well. 

RD One of the things we've been pushing for on top of the sort of explainability is attribution of sources. When we first started talking about AI, everyone was talking about RAG– retrieval augmented generation– because that showed its work, showed where that response came from. Do you think that's something that the AI industry has embraced or is there more work to do? 

RP A lot more work to do. I think the RAG architecture has certainly helped with understanding the interpretability of AI outcomes, but it's not the be all and end all. There's still questions around tokenization, how that impacts your output. There is properly understanding the attention mechanism. This is something that is not just a complex problems for models that we have out in the market right now, but as models become increasingly more sophisticated, it's a question that remains. Interpretability of outcomes has a big question mark over it. At the moment, a lot of resources are being put into output testing. So as long as you're comfortable with the output, are you okay with putting that into production? That is a testing approach that's based on large sample testing and you have a probability distribution of what you’re comfortable with. Now, I come from a financial services background. I know that the tail risk is where things can really go wrong. When you're looking at that very low probability instance of something going wrong, it has a large impact, whether that be financial, or reputation damage. 

RD It only has to happen once. 

RP It only has to happen once and the models will say it happens only once every thousand years, some black swan event that occurs, but those can have really damaging impacts to business and that's why it's very important to have guardrails that are active in your AI ecosystem. 

RD We've seen a lot of reasoning models. My understanding with a lot of those, they're pulling in search results and inference time data. What are the additional complications with governance of that sort of like real time data? 

RP It's similar to the ones that we experienced with not real time data. I mean, we’re talking about black box models that the data is available to be looked at, but whether it's available for companies like ours or for anybody who wants to deploy this in Angular, it's a visibility problem from providers. Even some of the open source models, getting the weightings out of these models is not trivial. Now, when you come into real time data, you have additional complexities of inherent bias within new data that's feeding into your models, and what's potentially more worrying is that people could be maliciously injecting data knowing that there's certain discrepancies within it. So there's a lot of press at the moment, when DeepSeek was released, people were trying to inject malicious content into the database so that it would output something that wasn't appropriate for that model. That's a very specific geopolitically charged one, but it's appropriate for every large language model that's being put out at the moment. People want to jailbreak it as soon as it comes out so that they can show that it's not infallible.

RD And I think there were some other headlines around Grok I believe, or X AI, that was able to produce easily some uncomfortable things. 

RP Yeah, hundred percent, and this is something that the co-founders of Holistic AI, Emre and Adriano, realized that there was a play here of being able to test and benchmark these large language models as quickly as they could when they come out. So DeepSeek, we managed to red team that within a week of its deployment so we could see how easy it was to jailbreak, how safe it was in terms of vectors like prompt leakage. I think Claude Sonnet 3.7, Holistic AI were the first company to audit that. And if you go to ChatGPT and ask, “Has anyone audited this model?” it actually references our research on this. So we're really active in trying to understand how well is a model performing, and we used existing benchmarks of previous models to put this forward. Now, when we're talking about benchmarking these models and performance, that is an open question for society as a large, because we need to define what is acceptable in terms of bias output, toxic output, because there's a trade-off between responsible AI and model performance in a lot of cases so there is a need for a standardized setting for this, potentially a regulatory body, or built into some sort of regulatory context that says, “For a chatbot that's working in the healthcare industry, these are the acceptable guardrails for toxicity, for hallucination, for bias, and you need to adhere to that in order to put this out to market.”

RD On the regulation question, as we've seen with privacy, when these are worldwide software companies, they are subject to a sort of fragmented regulatory system. The GDPR is a pretty big one and everybody goes after that, but you have California has one, other countries and states have their own. Do you think that there is a benefit to the regulation fragment or not, or do you think there is harms having multiple different regulations? 

RP I don't see direct harm in having different regulation. I think there's a benefit to having consolidated regulation that goes cross-boundary. I think the main issue that people are going to have is identifying where their use cases are going to be actually deployed and making sure that they're regulatory compliant. The Holistic AI platform builds that in natively, so we have regulation built in at the core of the platform so that when you say, “I’m deploying this in Colorado and in the EU,” it flags that these are the regulations that you’re subject to. There needs to be some sort of baseline, which I think the communities in different jurisdictions have adopted by default, but if we look at the EU versus the UK where I'm based, the UK is still following principles-based guidelines and looking to specific industry regulators to set up the guidelines. It's a balance between fostering innovation and hindering it, and I would err on the side of caution here that we can see how pervasive that data can be and how impactful AI can be to a society, and I would like to see a more prescriptive approach like the EU AI Act, or things like NIST where it's voluntary, but at least it's guidelines and it's instructive rather than principles-based. 

RD Yeah. I mean, as we've seen a lot of these governance slip ups, the PR are costly, they cost real money. So even if it is a loose coalition of doing it on their own agreement, I think it could be helpful. 

RP Yeah, a hundred percent. The worrying thing about that is I was on a panel yesterday talking about an AI governance playbook for businesses, and one of my co-panelists was speaking to her experience with working with Microsoft and seeing these governance slip ups. Now we see some in the market, like ChatGPT going rogue, or I won't name specific names, but she was saying that 90% of the slip ups never see the light of day. They're either repressed by marketing teams and legal teams or they're kind of lost in the news and they're not really brought to light. That was worrying that some of these slip ups are not really seeing the light of day, so every time someone makes a mistake, there's a benefit of learning there. I think there needs to be a bit more transparency as to what's actually going wrong in the market as well. 

RD And you all do actual testing of slip ups, trying to find audits. Are there sort of holes that almost everybody has? Are there prompt injections that you're like, “I can't tell other people this one.”

RP It does vary by use case and by foundation model. We see a lot of models struggling with being jailbroken so that they output what they shouldn't, so prompt leakage. We are constantly on the front foot and we're always doing research to see what are the latest techniques in order to make these models trip up and I think we’ve submitted three or four academic papers this year in that space. So unfortunately, there's no one thing that I can point to. Prompt leakage is probably the one that I see the most, but what's worrying is the more we test, the more we find that these things break, and then there's a whole new realm of testing that needs to be built around it. 

RD Yeah, I mean because it's a sort of creative generative model, there's almost infinite ways to break it. 

RP Exactly right. I mean, gone are the days where it's a rule-based system that you know where it's going to trip up if you test it any number of times. Now because of the probability driven output and also the different parameters and levers that you have to play with these models– the tokenization, the coverage that you apply to documents in a RAG setting, the top-k, top-p parameter, the temperature settings, all of those will have an impact and creates an n-dimensional cube of testing. And what you need to essentially do is focus on what is important for you as a business and what do you really want to risk mitigate against. Now as we get further along the complexity chain, the complexity of the testing system increases as well. So this is where we get into the realm of effective tooling for AI governance, which is eventually going to lead to AI policing AI. It's going to have some sort of governance agent that's going to look after your bots and keep them in line and have inbuilt circuit brokers or kill switches, incorporate auditability trails so you can see different model development pathways, and then you can monitor live to check for drift hallucination and other parameters important to you. 

RD Yeah. I already hear about LLM benchmarking, LLM evaluation. Something I heard yesterday that I don't know if this is a thing that's in the field that's real– explainability by reading individual neurons in the neural net. Is that something you're aware of that's possible? 

RP It’s not something I’m massively aware of. Is it something that's like a tension mechanism on steroids where you want to try and understand the interpretability of the model so each node of the output has a certain weight attributed to it?

RD So I'm not super sure. My understanding of a neural net is that it's essentially a giant sum function. The way I would sort of guess at understanding it is that it's almost a breakpoint within that sum function. 

RP No, this is not something I'm massively aware of, but definitely something I'll be reading up about afterwards, so I appreciate you bringing that to my attention. 

RD Is there anything else you want to talk about before we hit the outro? 

RP Trust is key. I think trust takes years to build, seconds to break, and then a fair bit to recover. So businesses that want to put AI into their ecosystem at scale really need to think about tooling that can help them do this in a standardized way that doesn't introduce roadblocks in itself, that enables the business to think about each of these use cases realistically in terms of risk and allows different teams to collaborate together. My name is Raj Patel. I'm the Head of AI Transformation at Holistic AI, and you can find us at holisticai.com. Please reach out to me on LinkedIn, and I’d be happy to answer any AI governance questions that you have.

RD All right. Thanks for listening, everyone. All right. Hi, Jessica. So your models deal with audio and audio manipulation. Are they using the same sort of transformer architecture or something else? 

Jessica Powell We use transformers [inaudible].

RD Obviously, I imagine you have a large data set you’ve sort of fine-tuned them on. Do you do training, train your own models, or is it all fine-tuning? 

JP No, maybe I should take a second to explain. So we’re in an area of sound separation. Essentially, if you were thinking of a piece of music, you might talk about the vocal stem, the drum stem, and the film, the dialogue, the music, the effects. On a podcast, maybe the different speaker streams are stems. So what we do is we can take the full mix of audio, meaning the thing you hear on the radio, the thing you’re listening to right now and we can separate it into its different components. So we’re not using any generative models or anything like that. We’ve built our own proprietary patented models to separate sound. These models are pretty large deep learning models, not as large as generative models, but you do face some similar challenges in the sense that you are dealing with really large models, different kinds of latency constraints, trying to fit them sometimes on device.

RD There’s a few things I want to touch on, but the sort of training, do you have to label, do you have to go through Led Zeppelin songs and label what's the guitar stem, what's the drum stem, or is it unsupervised? 

JP Yeah, I remember our CTO in the early days when it was just it was just three of us at the company and he was just like, “I’m an overpaid [inaudible].” Basically we don't train on Led Zeppelin. Our area is a little bit easier than generative in the sense of, yes, you need to train on thousands of audio stems, but we don’t have to be training in the millions or tens of millions or billions. Already that’s a little bit easier than a training data set. Also the kind of training data that we have is very different from generative. So in generative, the thing that pretty much everyone would do is they’d go out and they [inaudible]. What we do is we're separating sound, so a full mix of audio is useless to us, but we need our separated sets. So for us, we do things like license from production libraries, sound effects libraries, and then we do a lot of things with our customers where they're also providing data to train models. Generally it's stem separated and these different sounds are pretty hard to find. So the whole training data aspect of what we do is itself still a pretty big operation in finding that data and then the data is usually totally mislabeled. We built our own technology to label so we do much less human labeling now, but in the early days it was all by hand.

RD You didn’t think of coming up with a house band to just produce separate stems? 

JP I think for a while, we used to joke, because I remember I worked at Google and early in those early days I remember someone asking me if we had people sitting behind the computer that when someone did a query, was there someone searching for the answer. So I used to say the same thing with stem separation, that we would have a band that would just perform to return the different tracks to people. But if you're trying to train something in video or image and you need a bunch of trees, for example, probably for the most part a tree is a tree. I can think of some outliers where it might not clearly be a tree, but generally speaking, it's not hard to label either by machine or human. But [inaudible] for those of your listeners that are into music and might know what an 808 is, is a tonal 808 bass or is it drum? What is synth? Is synth piano? Are you classifying it as synth or are you classifying it as piano? So there's an extra element in what we do in sound. A Hans Zimmer film, what of that is music and what of that is effects? So there's an extra element that gets a little bit difficult when you're working in sound in that some things are very subjective, and so there, from a learning perspective deciding what the rules are. 

RD Yeah. And you all have been very kind to give me access to the tool for the recording of this conference here. The noise removal is an interesting one because, like you said, what is noise? How do you train on the difference between noise and dialogue? 

JP Well, and what is noise, because sometimes noise is very rich content that contributes to [inaudible]. When we're training, you get the best performing models in sound separation when you have a specific target and you're not trying to build a model that just generalizes. [inaudible]. So rather than thinking about are we training for noise, a lot of times it's more we're training to extract specific voices, and then there's the residual [inaudible]. Depending on what the workflow is, meeting about ASR and transcription workflows, they don't want any of that noise, they just want the voice. They want you to isolate the voice and they're going to then take that through ASR, versus if you're working with broadcast, some of the largest broadcasters in the US, they want that noise because it would be super weird to be filming the governor talking at a construction site and, sure, you don't want the jackhammer super loud in the background, but you don't want it not there at all. So you want to at least be able to turn it down. So again, it kind of depends on the context what people want to do, but fundamentally what we're doing is we're breaking the sound into its parts or separating it into its parts and then you can write the rules for [inaudible]

RD If you have the stem for the noise, you can turn it up and down as you go. 

JP Right, and you could do that at scale or you could do that in a human super creative too. 

RD You mentioned earlier about doing it on device. Do you actually do sound separation on devices? 

JP Yeah, we run on device. Depends on the model. [inaudible] So for example, we started off doing music and then speech and background separation. That can all run on device and that's [inaudible] DJ applications for transcription. 

RD What are you most excited about for the future of AI and this sort of technology? 

JP I mean, there's so much. There's a million really tedious, boring things that I'm so glad different people are working on and I can't even imagine necessarily what they are. But I think about just how easy things like DocuSign and LastPass [inaudible]. I think as someone who is very passionate about sound and has both creatively loved working with it, but I also have a lot of sound, it's really hard for me to go to a bar and pay attention when people are talking next to me. The idea that you can make sound customizable, accessible, editable to people is really, really exciting. And I think beyond what we’re doing, just the idea that AI creatively can act as this spark and co-collaborator I think is so exciting from a creativity point of view. So many creative workflows, so I write a lot, I play music, and so many times you'll get blocked on something. I don't want someone to do it for me. I like doing this or else I wouldn't be doing them, but I love that someone could show me different options or be in communication with me, speak and open different doors and then spark ideas, that's the best, that's so exciting. The idea that you essentially have [inaudible] helping you along, I think that’s such a [inaudible].

RD Yeah, I talked to somebody yesterday who put it as like in the Matrix, the ‘I know Kung Fu’ moment where you step in with AI and you spend a bunch of time researching, figuring out, and then you know-ish Kung Fu.

JP [inaudible]

RD Thank you very much. 

[music plays]

RD Well, thank you very much everyone for listening. I hope you enjoyed our two interviews today. Instead of a Lifeboat Badge or something like that, I would like to ask you a question: What sort of novel uses for AI have you seen or are you hoping to see? Email us at podcast@stackoverflow.com and we'll feature you in a future blog post. I am Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you liked what you heard, drop a rating or review, and if you want to reach out to me, you can find me on LinkedIn.

[outro music plays]