The Stack Overflow Podcast

Improving on a 30-year-old hardware architecture

Episode Summary

At HumanX 2025, Ryan chatted with Rodrigo Liang, cofounder and CEO of SambaNova, about reimagining 30-year-old hardware architecture for the AI era.

Episode Notes

SambaNova makes a full-stack AI platform and an “intelligent chip” capable of running models of up to five trillion parameters, allowing developers to run state-of-the-art open source models without the time-consuming work of tuning and modeling. See what developers are building with the tech.

Find Rodrigo on LinkedIn.

This episode was recorded at HumanX in March. Next year’s event will be April 6-9, 2026 in San Francisco. Register today!

Episode Transcription

[Intro music]

Ryan Donovan: Welcome everyone to the Stack Overflow podcast, a place to talk all things software and technology. I'm Ryan Donovan, and today we're going to be talking about inferencing, training, finetuning, all the good stuff you like about AI, talking with Rodrigo Liang, the co-founder and CEO of SambaNova Systems. Welcome to the show, Rodrigo.

Rodrigo Liang: Yeah, thanks Ryan. Thanks for having me.

Ryan Donovan: Yeah, of course. So we're here at the HumanX Conference with all these companies that do AI and you have a software stack and a custom chip that does all of the AI stuff. Is that right?

Rodrigo Liang: Yeah, so what we did here at SambaNova, we're a spin out from Stanford University. We've been at it for seven years, but our bookends are, we built chips that compete within NVIDIA on one end, and then we completed the entire stack so that you can actually run open-source models, the state-of-the-art open-source models, on that hardware stack without having to do all that work of tuning and mapping, all the things that people have to do. And so, we significantly reduced the amount of effort, 80, 90% of the effort that most people would take to actually try to map those models in the hardware, SambaNova took care of it. So now you can spend your time and energy building the solutions, building the applications that you want, using the models instead of trying to wrestle down the model, make it do what you want it to do.

Ryan Donovan: Right. So you started with the chip, yeah?

Rodrigo Liang: Well, it's been a journey.

Ryan Donovan: Yeah? (laughs).

Rodrigo Liang: It's been a journey. When the research was happening at Stanford through my co-founders, both are professors at Stanford, it really started as a software project. Here's the way the thesis goes: We thought about, well, AI is going to be such a big transformation for the planet. Is it possible that there's only going to be one chip vendor?

Ryan Donovan: Right.

Rodrigo Liang: Is it possible that we're going to use an architecture that's 30-years-old? Right? So, that's the thesis. I think there's probably opportunities for us to make it better. And so, that's kind of the brilliance of what my co-founders did where they started thinking about this in a completely different way. And the answer here is Dataflow. Basically, the architecture that we use is called Dataflow and it allows a significantly more efficient way of computing these neural nets, and turns out, in a much simpler way as well. So what we've done now is actually taken that and mapped it from the software models that they had down to a brand new architecture that we're now on generation four in hardware that completely eliminates the need to use any NVIDIA GPUs. And so it's a tenth of the power and you're inferencing at ten times the speed. And so, this is the level of innovation that we've been able to deliver in the seven years that the company has been around.

Ryan Donovan: Yeah, I know for a lot of the GPUs, TPUs, NPUs, whatever, they have a lot of parallel processing of small calculations. To focus on the Dataflow is an interesting...what is the special sauce, if you can give it, that the Dataflow provides to it?

Rodrigo Liang: So the way that I talk about it is whether it's a CPU, GPU, TPU, all those architectures, they're core-based architectures. And so, by that I mean you have a neural net...

Ryan Donovan: Right.

Rodrigo Liang: ...That wants to pass information from one kernel to the next, to the next. It is like a map, right? It's like a map. You're taking one operator then passing it to the next, passing it to the next. And it's this network that is computing all of those weights. What we do in traditional architectures, these core architectures, you are carving those neural nets into tiny little pieces and spoon-feeding one core at a time, and then you can actually put the results together and figure out, oh, what do I have? Right.

Ryan Donovan: Yeah

Rodrigo Liang: And then you do it again.

Ryan Donovan: Right.

Rodrigo Liang: And so basically you're chopping up these, what wants to be a single-flowing neural net, you're chopping into pieces, dividing it up, having all these independent workers do it, and then put it back together to see what you got. Right? And so that constant chopping, calculating, reassembling, analyzing, chopping again, so that is where a lot of the work goes. Whereas a Dataflow Architecture like SambaNova, what we did was just say, "Well, why don't we just compute it the way that the neural net wants to operate?" And so we have these cores that by function pass the output of one kernel real-time to the next hardware function that operates the next kernel, right? And so we call it a Reconfigurable Dataflow Unit because the hardware has to reconfigure its mapping to match the neural net you're trying to operate. But if you do that, now you can see, okay, well I can cite the hardware to match each kernel and then I can make the output of the kernel map go straight into the input of the next without ever having to chop up these neural nets in a natural way.

Ryan Donovan: Yeah, yeah, yeah. No, that's interesting. When I first took a AI course and saw what a neural net was, it's basically a big sum function, right? So it's interesting to sort of pass the pieces of that sum function essentially. Is that right?

Rodrigo Liang: Yeah. So you have these operators and sometimes you're doing matrix multiply, sometimes you're doing sum functions, you're filtering. Those are well understood operators...

Ryan Donovan: Sure.

Rodrigo Liang: …But the key is they all want to stay together. Those operators want to stay together. And when you don't have the ability to compute certain matrix multiplications altogether, then you start chopping this thing in unnatural ways, which means that you've got to decompose it, find a way to fragment that into these cores. They compute only part of the answer. Then you got to reassemble to figure out, well, what do I got? Okay, now I've got to go and recompute some part of it again. Right? And so, a lot of it is just because the hardware is rigid and you can't map it to the operator that you're trying to run, right? And so with Dataflow, we're able to then resize these cores, if you want to call it, but resize these computational units to fit the neural net because sometimes the neural net is long and long and skinny. Sometimes, it's short and wide. Right? And so, you want to be able to map the hardware to compute what you're trying to compute...

Ryan Donovan: Right.

Rodrigo Liang: ...Without having to chop the neural net in unnatural ways.

Ryan Donovan: Interesting. And you also mentioned using open-source AI. Do you have a preference for open-source AI? Do you think they're better?

Rodrigo Liang: Well, I mean, look on the hardware end, we're agnostic. We run any neural net that they would want, so long as it's written in PyTorch, which is this beautiful language...

Ryan Donovan: Sure.

Rodrigo Liang: ...That's been abstracted, all the hardware-specific nature way. Right? And so, we're proud to say we're a kudo-free zone. Right?

Ryan Donovan: (laughs)

Rodrigo Liang: No manual writing of any, it's just all PyTorch. Take PyTorch from Hugging Face, which is our partner and any of these places and you run. Right? But what we like about open-source, I tell people, I think this is a Linux moment for AI. Right? That you can see this velocity of innovation in the open-source community accelerating.

Ryan Donovan: Yeah.

Rodrigo Liang: Right. It's accelerating because now, suddenly, it's not one company or four companies that are innovating, it's thousands of companies.

Ryan Donovan: And we've seen the closed-source get a little bit rattled from some open-source models lately with the DeepSeek coming through.

Rodrigo Liang: Yeah. Well, I think what happens here with open-source…you have two forces that are happening because it's open, the innovation is building on itself. Right? So the DeepSeek team puts something out, well, everybody will learn. Right?

Ryan Donovan: Right.

Rodrigo Liang: So now that technique that they use is being applied across the planet and somebody else is going to invent something, they're going to put that out and that learning’s across the world. Whereas if you're an individual closed-source company, you're trying to learn it by yourself and reading others' papers, but you're not able to kind of get the leverage that the entire world's putting together. So this is what's exciting about it. But here's the other thing that I really like about the open-source. We focus on enterprises. Enterprises want to own. Governments want to own.

Ryan Donovan: Right.

Rodrigo Liang: Like there's the concept of sovereign AI. The inventors want to own, right? I don't want my nation's national model...

Ryan Donovan: You don't want it running on somebody else's server, yeah.

Rodrigo Liang: Somebody else in another country (laughs).

Ryan Donovan: Right.

Rodrigo Liang: So people want to own. Well, it's hard to own if it's not your IP.

Ryan Donovan: Right.

Rodrigo Liang: And so what the open-source community does is, is it gives people that choice for the things they want to own. And it could be a government, could be a nation, could be a company, like I'm a bank, I want to own the model. If I train my data into that model, I want to own it in perpetuity. No one should be able to have access to my data...

Ryan Donovan: Right.

Rodrigo Liang: ...That was trained into the model. And so the open-source gives that autonomy, right, whether for nations or for companies in a way that the closed-source can't. Right, because in closed-source, your IP is the weights. If I show it to you, you have like, you know, what is the IP…but the open-source is already out there. And so we like it because we can take that, make it run really, really efficiently, collapse the cost of operating the open-source models, collapse the cost of being able to actually turn those into agentic workflows. And then you can have the ownership of the open-source models train your private data into it, and then go at it and have full control of it. One day you don't like SambaNova, you can take those models that you trained, you finetuned, they're open-source, take it somewhere else.

Ryan Donovan: Right.

Rodrigo Liang: Right? No, it really does give the end model owner a level of autonomy and sovereignty and security that is hard to achieve if you don't have that level of control.

Ryan Donovan: Right. Yeah, and we've been talking on the podcast how important data is, right? Data is sort of going to be the differentiator for any AI use case.

Rodrigo Liang: Yeah.

Ryan Donovan: Do you think the folks who are using their custom data on a closed-source model are missing out on the full capabilities?

Rodrigo Liang: Well, I mean, I think, here's what's going to happen. I do think that most of the enterprises in the world today have all this data that they collected over decades, if not centuries, in some cases. Of all this data they've collected, they have no idea what it says. So I'll go into places and say, "Look, your business, you're grossly underleveraging your most valuable asset. You have all this data on customers, on your contracts, on your commercials, on all, I mean every decades of it. You don't know what it says. And there's no easy way to do it until now." Right? Until now, where I can come in with a single rack, with open-source model preoptimized, and just read 90 years of data. Suddenly, you can chat to that model privately, securely in the way that you chat to ChatGPT, but all about your own private, secured thing.

Ryan Donovan: And it's secure.

Rodrigo Liang: It's secure because you put it wherever your data is. Good for the data, good for the model. Right?

Ryan Donovan: I want to talk about the inferencing. I hear a lot about the reasoning models and my understanding of those is they use a lot of inference-time data. What do you think are the best practices around using that inference-time data and sort of real-time prompting…what are the best practices, difficulties around it?

Rodrigo Liang: Well, I think there are two pieces on this reasoning model. And there's a lot of confusion out there because the market's noisy, right? People go, "DeepSeek. I have DeepSeek." No, well, DeepSeek reasoning, R1, is a 671 billion parameter model.

Ryan Donovan: Okay.

Rodrigo Liang: That's the reason.. that's the good model that everyone's all…671b. Okay?

Ryan Donovan: Yeah.

Rodrigo Liang: A lot of people have taken that model and distilled it to some other version and called it DeepSeek. No, that's a Llama model running a DeepSeek weight.

Ryan Donovan: (Laughs)

Rodrigo Liang: Right, or a Mistral model running DeepSeek. Those aren't DeepSeek models. The reasoning model is the 671b, right, which SambaNova, we're really proud of. We've offered the full precision 671b on SambaNova Cloud.

Ryan Donovan: Yeah.

Rodrigo Liang: Use it at will. Right, just go log on and use it. It's free and use it. Right and so most people aren't willing to do that because it's an expensive model to run. Because SambaNova hardware is so efficient, we put it out free. So you run the 671b, what happens here is...the journey was, when ChatGPT showed up, I asked ChatGPT one question, I got one prompt back. Right? Write a prompt once, I got one result back.

Ryan Donovan: Right.

Rodrigo Liang: One for one. When we did RAG, I would ask one question, maybe activate a RAG model, go fetch it. And then I got two things back. Right?

Ryan Donovan: Right (laughs)

Rodrigo Liang: I got the prompt and the data back.

Ryan Donovan: Right.

Rodrigo Liang: With reasoning, if I ask a very simple question, "How's my business?" Well, the reasoning model has to think, oh, is it my stock price...

Ryan Donovan: Right? What's your business?

Rodrigo Liang: Is it my revenue? Is it my people? Is it my products? Is it my geography? Like, it's reasoning its way through that very simple question, "How's my business?"

Ryan Donovan: Yeah. So, but I think saying it's reasoning is obscuring a little bit of what's going on underneath. I'm curious what that reasoning is. Is it building its own prompts to figure out what's next? Is it bringing in other data?

Rodrigo Liang: Exactly. So these models-- and what DeepSeek did that is really nice is it created these, you know, it's called a mixture of experts, MoE. You've got these experts that are able to do a variety of different things, and so they can go and actually pick out the right types of models to be able to then provide a different view of what that prompt could be. So you're kind of thinking your way through it in a way that humans would do. Right?

Ryan Donovan: Right.

Rodrigo Liang: Or, in some cases, teams of humans would do.

Ryan Donovan: Right.

Rodrigo Liang: But in this particular case, it takes you three seconds to actually generate a very complex answer on, "How's my business?" Right?

Ryan Donovan: Right.

Rodrigo Liang: So I think that's the beauty of it, that it's able to not just be single-faceted. Right? It's not just giving you a very sliver of that answer, it's actually gone through and tried its best, it's not perfect, but its best to give you a broad view, and based on the broad view, come to some conclusions for you. So I do this thing in the company. I'll ask people, "Okay, well just go to Cloud.SambaNova.ai, bring that model and say, ‘Who would win in a battle between a bear and a shark in shallow waters?’ Right? Okay, well, who would win?” You and I could debate this for hours.

Ryan Donovan: Shark.

Rodrigo Liang: So if you ask that question, it starts thinking, okay, well the bear is strong here. Its weakness is here. Well, the shark's strong here, weakness here, but the condition is shallow water. It's not deep water. It's not no water. We call it reasoning, but it's actually doing the steps of thinking. Right?

Ryan Donovan: Right. So you're saying AI is going to ruin bar conversations? (laughs)

Rodrigo Liang: (Laughs) No, I think the debates will be even more abstract and even more passionate.

Ryan Donovan: Right.

Rodrigo Liang: The less data-driven, I find, the more passionate we get,

Ryan Donovan: Absolutely. The less proof, the more passionate the arguments. So I see a lot of pieces of the AI stack maturing and commodifying. What do you think the frontiers of the AI stack are?

Rodrigo Liang: Well, I think you're going to see...I think on the hardware infrastructure is going to continue to be a battleground. Frankly, to be honest, we're far from what efficiency the world needs. If you really think about it, the cost structure is way too high. The acquisitional cost is too high, the operational cost is too high, and now the new thing that everyone's talking about, it's the energy cost. Like, I got to go find data centers, gigawatt data centers, nuclear power plants, all of that.

Ryan Donovan: Right.

Rodrigo Liang: I mean, that's...

Ryan Donovan: Got to get that cold fusion.

Rodrigo Liang: Yeah, exactly. That's not included in the acquisition cost.

Ryan Donovan: Right.

Rodrigo Liang: The acquisition...buying a rack. Oh, okay, well that comes with a nuclear power plant. Right? And so, you haven't figured...you know, so those costs all need to collapse. So we are maniacally focused on that, right, on the hardware side. Then if you go up, I still believe that the accuracy of these models are still not where you are ready to say, "I can trust it." You know, you look at these models, they're pretty…doing some very sophisticated things. And I tell people, if you haven't started with AI, you need to because it does some amazing things. And many companies are...it's good enough to actually deploy production because we have enterprises, you know, Saudi Aramco's in production. Right?

Ryan Donovan: Right.

Rodrigo Liang: We can deploy these things in production already, even though in my eyes, as somebody that's in the space, I still think that there's so much more we can do for you to get to the point where you can actually truly do mission critical applications. And so, we've now entered in late last year, second half of last year, we’ve now entered the zone where these models are already good for you to start doing something with it.

Ryan Donovan: But they're still...you have to check their work, right?

Rodrigo Liang: Well, here's the beauty. So how do we check our work? Think of it decades ago, right?

Ryan Donovan: Sure.

Rodrigo Liang: You have operators. Your operators doing some function in the field for some machinery. Right? And so while the machine was say, loading these boards, like these electronic components were getting soldered on the board. And then for a long time it was humans doing it. Machines were doing it. Well, what did we do? We inspect it by a human.

Ryan Donovan: Right.

Rodrigo Liang: Right. What are we doing now? We test it through robots. So now you've got machines testing machines building, machines checking. And I think you're going to have the same thing. You're going to have AI doing AI checking. Right? Now, you've got to make sure those are coming from different places and doing different things and we've got to get comfortable. We as a society have to get comfortable with the two pieces to make sure that the checks are trustworthy. But you already have a model today where decades ago we didn't trust the machines to both build and check, but today that's exactly how we do it. Look at the cars and computers. As they go through the manufacturing line, you have somebody building it, some machine...and some machine checking it and testing it. And you may have to do multiple steps of tests until you're comfortable. And I think you're going to see us as a society evolve and try to figure out what are the different things that we can do to get to the comfort level and say, "Okay, this thing is trustworthy." right? "It's responsible, it's reliable." So I think what we're going through those conversations now.

Ryan Donovan: Somebody else said that agentic AI was just going to be robots building other robots in a robotic factory. And I think you're right. We are sort of hitting that trust gap where it's like I'm not quite sure how comfortable I am having robots making decisions. Right?

Rodrigo Liang: Right.

Ryan Donovan: I've seen Terminator, I know what happens...

Rodrigo Liang: (Laughs)

Ryan Donovan: ...But how do we overcome that trust gap? How do we make sure people understand that we're not being cut out of this?

Rodrigo Liang: So here's my forecast and who knows what'll happen, but I'll tell you what I think. I think the agentic workflows will accelerate into production faster than AGI will. Why? Because these agentic workflows are tiny little tasks. They're very easily provable. Right? So I can say, "Summarize this document for me." Right? And this agent that summarizes that document, I can put enough college grads or interns, read enough documents and then see if the summarization is correct. It's provable. It's much easier for us to get comfortable with the agent-being doing what we think it's supposed to be doing. It's much harder when it goes into a black box. So this is why I think in production, you're going to see these agentic workflows start really taking off because they're straightforward to actually validate, straightforward to verify, straightforward to actually observe and make sure that it's doing the right set of things because the tasks are simple and then you're creating a flow out of it. You saying, you know, you're putting five, six things together and creating a workflow, each individual piece simple enough for us to understand. Right?

Ryan Donovan: Yeah.

Rodrigo Liang: And so I think this is going to be the next wave that you're going to have these things go into production. And frankly, that's stuff that none of us want to do anyway.

Ryan Donovan: Yeah, I mean, it almost seems like as every tech industry matures, observability becomes the thing of the du jour. What you're talking about is observability of AI and agentic workflows. Right?

Rodrigo Liang: Yeah.

Ryan Donovan: And I think for us to trust it, we have to be like, what are you doing? What are these small pieces?

Rodrigo Liang: Yeah, exactly. Observability is a start. You got testability, right? You got to be able to regular test. You got to have...

Ryan Donovan: Explainability.

Rodrigo Liang: Explainability. I mean, there'll be organizations regulating. And look at our drug discovery and drug creation process, right? Yeah, you got to observe because you have to do all those tests and make sure, but then they're still regulating organization to make sure that there is ongoing monitoring of it. Right? And so I think all those things are going to be part of things that we as a society we use. But what's undeniable is the technology is here

Ryan Donovan: Yeah.

Rodrigo Liang: Right? And it's changing everything that we do. And if you aren't on it, you aren't going to be able to be part of the conversation. And so we have--

Ryan Donovan: At least to build the AI literacy.

Rodrigo Liang: Well, exactly. You have to know what it is that it's doing for us to be able to converse intelligently about what is a real issue, what's a fear? What is something that we can do something about? What is something we have to stay away from, right? Those are nuanced.

Ryan Donovan: Right.

Rodrigo Liang: It's not all AI is good or all AI is bad.

Ryan Donovan: Right.

Rodrigo Liang: It's where can we use this technology? I remember when the internet showed up, what are all the fears? Oh my God, privacy all gone. No more privacy. We're all going to be hacked. We're all going to be...oh, guess what, 20 years later, many of those things actually became true.

Ryan Donovan: We've all been hacked.

Rodrigo Liang: Right and so what is one of the biggest industries that exists right now? Cybersecurity...

Ryan Donovan: (Laughs)

Rodrigo Liang: ...To protect against hacking, right?

Ryan Donovan: Right.

Rodrigo Liang: I mean, those are the things that...but the technology's here, what we have to do to work together, try to figure out how to make the technology capabilities married to things that we can actually do to make it do what we want it to do, right, not just be binary. Okay, we don't use it at all, or we use it and let it do whatever it wants. No there's...

Ryan Donovan: I mean cat's out of the bag. Right? We have to figure it-- it has a bunch of power. We have to figure out how to make it work for us.

Rodrigo Liang: Well, and in our history, we have many data points starting with internet and then you've got the advent of new manufacturing lines. We trusted machines to do what used to be humans doing. And so, again, we figured out how to take advantage of technology all through history for us to actually take advantage of it without having as much of the downside as initially feared.

Ryan Donovan: I think that some people fear that this may be a technology that's a little too powerful to control. I read something about an LLM playing chess, cheating at chess, and creating a program to hack the other bot so it would surrender. So being able to sort of amorally solve its goals outside of the rules of the game.

Rodrigo Liang: Yeah. Yeah. I mean, I haven't seen that, but it's possible. But again, I do think that this is where I think you're going to have this convergence of us as humans trying to figure out what are the things that we're going to put boundaries around. I mean, here's the other thing. I remember the first time that I found my information on the internet where I had not put it. Right? Same thing. It's like, oh my God, this has gotten out of hand. Right? I didn't store that there. Somehow, somewhere it just kind of got out there. And so, there is, to some extent, there's norming that we're going to have to feel out what are we comfortable with, what are we not comfortable with? The things that we aren't comfortable with, guess what? We are innovators. Right? The cybersecurity industry is thriving.

Ryan Donovan: Yeah.

Rodrigo Liang: (Laughs) Because we decided those things are not acceptable to us. We're not okay with people hacking us and misleading. So it's not a fight that's won, it's a fight that's fighting every day. And I think you're going to have the same types of things that we're going to do. With new technology, some things we're just going to deem that we don't want that. And industries are going to come to actually help solve that.

Ryan Donovan: And onto your FDA and drug discovery metaphor, do you think we're going to need government regulation to put in some guardrails around AI?

Rodrigo Liang: Well, I think some things that we're doing already, I do think that there's a human aspect that we have, as a society, we have comfort with. And so I think some things in the end, we'll still trust the human eye, if for no other reasons, because we are society of humans. And so we...

Ryan Donovan: Last, the last I checked (laughs).

Rodrigo Liang: (Laughs) We trust the human eye and the processes that we think have gotten us this far. And so there are some things that I do think that the human process of being the final sign-off, like who's the accountable person for signing off on a particular thing. Now, drug discovery, I think what you'll find is there's a lot of work through those seven years of discovery that is just incredibly human intensive for just sort of sorting through things. The machine can do those early stages really, really well. Significantly reduced time to discover, significantly reduced the cost to discover. Right? And yet the end process, in my guess, is that as a society with so many different opinions about things that we want what we feel like is what's worked for us before. And so that will likely be the last to change.

Ryan Donovan: Yeah. And I could see, you know, I think a lot of people's discomfort is not having the possibility of that human making the decision, right?

Rodrigo Liang: Yeah.

Ryan Donovan: That final sign-off.

Rodrigo Liang: Yeah

Ryan Donovan: And I see a lot of companies where their AI solution puts a PR out for other people to approve, where it's still like somebody's got to sign their name to this.

Rodrigo Liang: Right? Exactly. And I think that that's, you know, in a lot of the currently regulated environments, I think you're going to see that continue to be true for a little bit, right? Because it's hard to put accountability on machine. Like, who knows, right? And so in a world where we have proven that the handshake between one organization and another, or in some cases, companies and the public, or governments and the public, where you have the handoff, we've proven over decades that a sign-off is not perfect, but works well. It's predictable. So my guess is as a society is that we're going to lean into that until we're ready to actually make a change. But there are things before that, that could significantly reduce the time, the cost, the energy, in many cases, the lack of humans that we can find to do that work, right? The machines can replace it. But for many of these, you're going to find that the sign off is still going to sit with a human, as far as them saying, yeah, I've reviewed it all and I believe that this is true.

Ryan Donovan: To bring it down a little more, for the AI of the future how important is data and then the sort of attribution of sources going to be?

Rodrigo Liang: Well, I think data is going to be the differentiation. I think what you're going to see is the generic model, which is trained on the world's data. I mean, that's silly to say. This is the world's data (laughs), but it's really all the world's public data that's getting commoditized. Every model is going to have access to all the data and training, you know, the results, their knowledge base is going to be starting to, over time, get roughly the same. Right? And so you're going to basically get a model where we all know more or less what the world knows.

Ryan Donovan: Right.

Rodrigo Liang: OK. So we're in a world where industry wants to compete. Industries want to generate value by differentiation. We want to generate competitive advantage by having some secret sauce.

Ryan Donovan: Right.

Rodrigo Liang: Well, what is that secret sauce?

Ryan Donovan: Data.

Rodrigo Liang: It's on the data that you have and nobody else has. And so this is where I think that accumulation of data that people have and companies have, and being able to...and this is why I started this conversation where most companies have decades of data...

Ryan Donovan: Yeah, if not centuries.

Rodrigo Liang: ...Grossly underleveraged, grossly. So I think data's going to be at the center of it. Now, how do you do it?

Ryan Donovan: Yeah. Right.

Rodrigo Liang: How do you make it do the right set of things? And there are many different things that you can do to actually leverage it the most. And so what I say, and we actually have a whole recipe on how we help people get going on it. But there's a whole journey that allows you, you don't have to figure it all out. You don't have to say, I need to know all the thousand things that I want to do with my data before I get started. Now, there are three or five things that you should just do immediately and get you a huge amount of return just by interpreting the data you already have, you don't have to really do much. And so those are the things that we encourage people to do just to get their hands in there. Why pay 80% of the cost before you see 5% return? How about I show you how to put 5% of the cost and get 50% return? Let me tell you where the high value stuff is with your data, get that out for you, start using it. And now you tell me, is AI good for you or not?

Ryan Donovan: Yeah, no, I mean, I've talked to companies that are doing what I call the sort of dumbest version of AI where they're just taking their data and vectorizing it, and they're seeing huge returns just from having that. No language models, nothing.

Rodrigo Liang: Yeah, just like that. I would say that that's like the first 5% of the benefit. Right, but what if I told you, because one of the biggest concerns people have is that that data's mine, I don't know what it says. My issue number one is, I'm not even sure if I'm allowed to actually put that in the cloud because I don't know what it says. Are there regulatory things or secretory things, or private things, or regulated things that make me unable to put (it)? So I don't want that risk. So I don't even want to move that data. The second is, there's so much data. I spend 30, 40, 50 years convincing myself that data's safe right there. I put firewalls and firewalls and tools, and cybers, and I got protected, all that. Now you're telling me I got to move it to the cloud, move it somewhere else. I don't know if the security is good enough. The people who did all this stuff left. And so here's what SambaNova did: We said, "Look, we created a single rack solution that allows us to actually deploy the AI where your data sits. Don't move your data. We'll bring the AI to you. We'll bring a 671b or a Llama 405b, or some of the largest models that exist out there, into your data center where your data sits. Don't move the data and we'll train that data right there in-situ. And so what happens for a lot of companies is all that risk of, I don't know what it says. I don't want to move it. I haven't moved it. It's been there forever. It's been there forever for a reason. I don't even know what the reason is, but it's been there. I take all of that risk out by saying, "Okay, why don't I train it? Unlock the data for you. Now you interrogate, see what it says. Chat to it like you would chat GPT, but you can talk to it about your contracts, all your product specs, all your customer complaints, or whatever it is that you don't want to disclose that in the world. You can do it in the privacy of your own firewalls." And so that's kind of what a lot of people do that we can actually go and unlock it and you start seeing, okay, what do I have? What can I use out of this?

Ryan Donovan: Yeah. Is there anything we didn't cover that you wanted to talk about?

Rodrigo Liang: Well, I think, look, AI is moving very quickly into inferencing. Ninety percent of the world's AI costs will be in inferencing, which is basically a fortune. Everybody's problem.

Ryan Donovan: Right.

Rodrigo Liang: Training, maybe the top 20 companies in the world, but inferencing is everybody. It's like Google search. So that problem has a big ticket to it, right? Starting with power. And so this is one of the things I tell people, like, your power will be the limiter. And so you have to think about the power issue. So at SambaNova, we've been very focused on how do I drive the highest performance at the lowest power? Because at scale, most companies will need to do this 10, 100 times more than they are today. And you can't see the cost continue to skyrocket in the way that it's done the last two, three years.

Ryan Donovan: So what's the thing everybody can do now to lower the cost on power? Do you quantize models? Do you get smaller models?

Rodrigo Liang: Well, number one thing is get off NVIDIA hardware.

Ryan Donovan: (Laughs)

Rodrigo Liang: I'd start there, right? Use some other hardware,10 kilowatt rack versus 140 kilowatt rack. And so get off of that, and the inferencing is 10 times faster. So it's 10 times the inference speed at one-tenth the power. So I start there, but I do not recommend quantization for most things because what happens is when you do that, you narrow the applicability of the model, but as you go forward, the people forget what it was narrowed to.

Ryan Donovan: (Laughs)

Rodrigo Liang: And so you start forgetting, okay, well where is it...it becomes a brutal model. So the world's actually moving towards what I believe is going to be more generalized models. You want the accuracy to be good across a more general model, because over time, we won't remember what not to ask that model. I distilled it down, so it's for science, not for legal, not for that. But over time, I won't remember that I can only ask science questions because it's distilled down. So we think that the world's going to move towards a more robust model, but you still got to run that robust model efficiently. I can tell you, if I give you a robust model that runs at the same price and cost and power as the distilled model…

Ryan Donovan: I would use that one. Yeah,

Rodrigo Liang: Exactly. Right. And so I think as a society, we're going to drive towards robust models, that cover a broad range of things, and then drive the cost down, drive the power down, drive all the things so that people can use it. So that's I think the direction where SambaNova is going. I think everybody else should be focusing on that, not trying to maximize the profits of the infrastructure.

Ryan Donovan: (Laughs) That's a good note to end on, I think.

Rodrigo Liang: Yup.

[Outro music]

Ryan Donovan: Well, we're at the end of the show, ladies and gentlemen. I've been Ryan Donovan, editor of the blog, host of the podcast here at Stack Overflow. If you have questions, concerns, ratings and reviews, send it to Podcast@stackoverflow.com. And if you want to reach out to me, you can find me on LinkedIn.

Rodrigo Liang: Great. This is Rodrigo Liang, co-founder and CEO of SambaNova. You can find me on LinkedIn and X under Rodrigo Liang. And SambaNova, you can try our tech at Cloud.SambaNova.ai. It's a free service and you just try it and see if you like it.

Ryan Donovan: Alright, thank you very much everyone. We'll talk to you next time.

[Credits music]