The Stack Overflow Podcast

Grab bag! On the floor at HumanX

Episode Summary

Today’s episode is a roundup of spontaneous, on-the-ground conversations from HumanX 2025, featuring guests from CodeConductor, DDN, Cloudflare, and Galileo.

Episode Notes

Paul Dhaliwal is the founder and CEO of CodeConductor.

Priya Joseph is the AI field CEO at DDN.

Lizzie Siegle is a developer advocate at Cloudflare.

Erin Mikail Staples is a developer experience engineer at Galileo.

This episode was recorded at HumanX last month. Next year’s event will be April 6-9, 2026 in San Francisco. Register today!

Episode Transcription

[intro music plays]

Ryan Donovan Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. My name is Ryan Donovan. I edit the blog and host the podcast here at Stack Overflow, and today I have a grab bag of interviews I found while walking the floor at the HumanX Conference. These were all spontaneous conversations and the quality is not as good as I would like, but hopefully you can hear the conversation and enjoy. First up is Priya Joseph, Field CTO at DDN. Tell me a little bit about DDN. I'm looking at the demo up there and it looks like you covered all of the data for AI.

Priya Joseph So DDN is a data intelligence platform company, so we cover the entire stack. People have known us always for our very optimized file system. We now offer DDM Infinia, which is a software-based product which covers every step in the end-to-end machine learning lifecycle. xAI uses us for the data prep, Nvidia uses us in all their supercomputers, Jensen actually, in the announcement for DDN Infinia event on Feb 20th, said that Nvidia is powered by DDN, so there you go.

RD Pretty good. So and as you were saying, storage layer, data layer as two separate things. Do you do anything with the storage layer or just the data layer?

PJ Yeah. So the way to think about us is that we are in the throughput and latency optimization business, so we want you to get out of the data movement. And so anytime you have data prep, data transformation, people are loading weights from storage, people are moving data, metadata. So we have a metadata where we allow unlimited tags, so that lets you be really extensive in terms of how you might tag things so you're able to pull them later. So the canonical example we like to give is, if you want to select a tissue sample for a breast cancer patient under 25 in a certain demographic, the image is taken by a certain machine, like GE Healthcare versus Siemens, in a certain format. So in order to be able to do that, you have to have extensive annotation in some ways. So we give you an ability to do that and we kind of do the data movement based on the metadata and so that gives you a lot of optimization. So that's how the file system and this data layer come together.

RD Interesting. I mean, we've talked about it almost sounds like a very sophisticated ETL pipeline for AI. Is that reductionist or is that accurate?

PJ We definitely play in the data prep space, but in the data transformation, any data pipeline. So we could be helping accelerate that. xAI uses us for the data prep, but we also work in the training space and in the inference space. So you are trying to load the weights at inference time, and if you're doing RAG, if you're doing chunking, if you're doing indexing, we can still speed up there as well. We have a store which is going to understand both vector and other data types, and that also helps with optimizing for latency and throughput

RD With loading the weights, are you doing helping with fine-tuning like that? Are you doing sort of custom on the fly brainwashing of the LLMs, changing the weights like that?

PJ You could do anything you want. So you run on top of us, so you could have a RAG pipeline where you're doing all of those things and we provide you the data layer and the storage layer where you can do this.

RD So obviously the data is super important to the AI. Do you think data is more important than the foundation model?

PJ I may be biased here, but you are only as good as the data that you have. Even the foundation models are only as good as– there are so many things that go into a foundation model, but data is a big part of it. You are Stack Overflow, you've played a key role in it.

RD Absolutely. And I hear a lot about the so-called reasoning models, which as I understand, they use a lot of inference time data. What are you seeing with inference time data on your platform?

PJ Yeah. So I mean definitely in this conference you can see there's a lot of focus on inference. There are a lot of players in the inference space who are here, and we definitely help you with throughput and latency in the inference layer and I'm going to leave it at that.

RD Fair. And how about the vector databases? Is that something that you manage directly or are you sort of agnostic to whatever vector database the client has?

PJ Yeah, we work with all kinds of vector stores that are available out there. We talk vector.

RD Do you think somebody needs a separate vector database to do AI well?

PJ There are benefits to it. You might have written the vector store in a certain language and that might be giving you an advantage, versus others. Even if you take all the embedding models, if you look at the benchmarks in LLMs, there are so many different players. Gemini just announced another embedding model.

RD Yeah, because we've talked to folks who built their name on the vector databases, and then we also talk to folks who are adding vector capabilities. We had somebody that said Postgres is the best or should be the default AI database with PGVector on top. Do you think there is a possibility of having one database to rule them all or is having a multi-data store menu a better go about it?

PJ Variety is good. Choice is always good for the customer.

RD And what do you think about some folks talking about running out of data for training? Improving models. Do you think that's a possibility?

PJ So if you use us, there are ways to get around the data wall, so you are able to use the metadata to create subsets of the data. You're able to create synthetic data from the annotation that you've done. So in a way, that helps you do continual training, pre-training process. So that's a way to get around the data wall.

RD So somebody has a pile of data, somebody was talking about companies have decades of data, sometimes a hundred years of data. How do they start getting it ready for AI?

PJ Yeah. So the quality of the data is quite important, so curation is quite important. So understanding the metadata definitely is a way to go about it. So you have to get your data quality to be good for the models to be good.

RD And is part of that adding the metadata? Do you have to go in tagging or can you do it all unsupervised?

PJ The jury's still out on this. So I mean you could go both ways. So there is benefits to being unlabeled, there’s benefits to being aware of what it is. Good category management is always good. If you come from the ontology world, you definitely appreciate especially for domain specific models. So if you're doing ICD models in healthcare, for example, it's good to know what those different things mean. And there is already a rich knowledge of that. So I think what you're pointing out is if there is already documentation, if there is notes, can the model already learn? And we know that they do learn, but there are hallucination possibilities so we want to be careful about the guardrails and be responsible. We are trying to serve customers and so we really want to be accurate from that perspective.

RD Yeah. Speaking of hallucinations, do you think there are ways to transform data, use data to minimize hallucinations?

PJ Yeah, I mean you want to do all you can to reduce hallucinations, and RAG is one way to do it. You're trying to enhance the context and there are different models with different context lengths. So all of that can help. And guardrails are the other way. We have some levers that we can play with. That doesn't mean you're not going to get hallucinations. There are people who've said hallucinations is a feature and not a bug, but if you're in regulated industries, that's not going to fly and that gets in the way of adoption. So we want to work with the specific customer use case and work through RAG to achieve that.

RD Yeah. We hear a lot about RAG and I think we at Stack Overflow are very much interested in attribution and having LLMs show their sources. Do you think that's something that is well considered in the industry or is that somewhere where we have room to grow?

PJ Yeah, perhaps it's been an afterthought. Citations are important. You could argue we could have started with that. I mean, it's always good to get there. So I think there are so many companies in this space trying to do it. There are techniques to do it. So I think it's good to adopt citations. For us, we want to serve customers in all verticals, all industries. We want enterprise customers. And so if you are using hundreds of terabytes of data, if you are heading into the petabyte space, if you're using GPUs, if you want to have your applications on GPUs, with DDN, you will have no idling GPUs. And the other place is we are super efficient. We are green. Where you might have needed a floor of GPUs, you would need a rack of GPUs with us. And we speed up data prep the entire machine learning life cycle, so you're going to see speed up in throughput, speed up in latency at quarter the scale. So it's 10x and multiple X’s in some cases.

RD And how do you achieve that sort of efficiency, that increase in throughput, the lowering of latency?

PJ Yeah, so it's really a function of the optimization that we've done across the stack. So you have the fully optimized file system. On top of that, you have the metadata, which is enriched, and then you are relying on the metadata to be doing the data movement and you're not moving the data all across, so that helps as well. And then we have a unique indexing, so all of these combined together give you efficiencies across the board.

RD Can you talk a little bit about that indexing? What's sort of special about it?

PJ It's a way for you to optimize. It's a very well known data structure. So the Epsilon is a way to accelerate the three indexes so you're able to index faster.

RD Okay. Well give you an opportunity to say your name again, title, and any links where you want to be found.

PJ So if you go to ddn.com you will find a lot of industry specific use cases. You'll find all the work that we've done. We started in HPC, so you will see that we have very science-based origins which we are now leveraging in the AI space. If you go to Google Cloud and AWS, we are in the marketplace. You will soon see us as a first party offering as well. Listen to us on our podcast. You can see a lot of content on our Beyond Artificial event. If you go and type ‘beyond artificial’ on YouTube, you're going to find us. You will find our CEO, Alex Bouzari’s, talks all over the place and a lot of our leadership.

RD All right. Thank you very much. Talk to you next time.

PJ Thank you.

RS Next up, Lizzie Siegle, who is a Developer Advocate at CloudFlare. Enjoy.

Lizzie Siegle Hi, I'm Lizzie Siegle. I'm a Developer Advocate at CloudFlare.

RD Hey, Lizzie. So what is the role of APIs in AI?

LS I'm so excited for using AI as a developer because I think you guys make AI more accessible, programmable, and fun for developers. I feel like now is a great time to be developer. You can build so much in Cursor with AI generated code, but also AI generated tools, and it's just there's so much happening so quickly. It's really fun to play around with AI APIs and make AI applications using LLMs programmatically.

RD We had somebody from Postman on and they said the future of the internet is going to be robots building robots in a robotics factory. What do you think about that in terms of APIs?

LS I don’t know if I agree with that. I like to think I'll still have it solved writing AI demos, but it's really cool to see the progress of AI with Waymo. I love biking next to Waymos now. I trust them more than humans because I know what to expect. And last year when I was using AI to write code, [inaudible] and now AI can write more of my code from scratch and its really cool to see that progress, so we’ll see.

RD We'll see. So with that progress, do you think AI is enterprise ready?

LS I do not think that AI is enterprise ready yet. I think it's close. We’re seeing AI in enterprise applications in companies but LLMs are still nondeterministic. No matter how often I use LLMs for demos and I give the demo, something can still go wrong. I practice it a lot. I'm like, “It's going to do this,” and then it surprises me and I think other people are surprised, therefore, not quite enterprise ready. You can check out Cloudflare Devs on YouTube. I like to plug my favorite New York tech events newsletter [inaudible]. You can find me online and also on the Cloudflare Devs Twitter channel.

RD Amazing, thank you. Finally, we have a conversation with Erin Staples, a Senior Developer Experience Engineer at Galileo.

Erin Mikail Staples My is Erin Mikail Staples. I'm a Senior Developer Experience Engineer at Galileo. I foresee a lot of our new projects as well as helping developers be successful with whatever AI apps they're building, kind of making sure the robots don't go off the rails.

RD So when you're making sure they don't go off the rails, do you think about improving the product, especially as there are better and better and more complicated robots?

EM Yeah, so the world of AI has kind of been taking off by storm as we speak. I always laugh because we made printed materials for a conference a few weeks back and then three days later we had a new model released. Thanks Anthropic, you're doing really great at releasing them, but we did just print those materials, and one is just making sure that you're constantly testing or improving or iterating. I think sometimes we get this habit of set it and forget it. But in the case of AI, especially with the grounds of innovation, it's really not the best practice. You’ve got to make sure that– I feel like someone said it earlier today best, of the new version of ‘works on my machine’ is, ‘well, works for my prompt.

RD Speaking of innovation, a lot of that innovation is coming from open source models. Do you have a take on the open source vs closed source model debate?

EM This is a hot topic debate, something we see probably day in and day out. When it comes to open source vs closed source, I feel like the really great tools and resources, personally I'm a big fan of the open source models for access, the availability of information sometimes especially if they are publishing their training data how their model was created and generated. That’s always a better [inaudible]. However, I totally understand that sometimes companies have proprietary models that are trained that have things like personally identifiable information or company resources, that that is just probably not best in that open source realm. So to use the standard developer answer, it depends on use case, but standard, if I were to default, I would probably lean towards that open source function.

RD And how important do you think the attribution of the source is in a prompt response? Showing your work.

EM I think it becomes more important than ever, and I think not just from a showing your work, where did you get your training set, where did you get your resource, how did you train or clean the data, but even on the step by step of what prompts did you use, even on what day that you retrieved it. I think one cool thing we're starting to see from AI tools, I think Perplexity does this and Grammarly does this now is they have a reference where you can check the sources, or when the output comes out it has, “This was generated on this date and this version of the model,” which I do think is more important, especially as things are changing so rapidly and the deterministic nature of these models.

RD All right, thank you very much. And you can shout out your name again, socials, links, whatever.

EM Awesome. So my name is Erin Mikail Staples. You can find myself at Erin Mikail on most platforms. Or if you want to check out what we're up to at Galileo, it’s Galileo.ai. If you’re in New York City, I am a stand up comedian running AI comedy shows on the side, so follow us @insidejokesnyc.

RD All right. Thank you very much.

[outro music plays]