The Stack Overflow Podcast

Are long context windows the end of RAG?

Episode Summary

The home team is joined by Michael Foree, Stack Overflow’s director of data science and data platform, and occasional cohost Cassidy Williams, CTO at Contenda, for a conversation about long context windows, retrieval-augmented generation, and how Databricks’ new open LLM could change the game for developers. Plus: How will FTX co-founder Sam Bankman-Fried’s sentence of 25 years in prison reverberate in the blockchain and crypto spaces?

Episode Notes

DBRX, an open, general-purpose LLM created by Databricks, reportedly outperforms GPT-3.5 and is competitive with Gemini 1.0 Pro.

Recent research found that large, complex LLMs use a simple mechanism to retrieve stored knowledge in response to a user prompt. These mechanisms can help researchers reveal what the model knows and potentially even correct false information it has stored.

FTX co-founder Sam Bankman-Fried, whose downfall began in late 2022, was sentenced last week to 25 years in prison for conspiracy and fraud.

Find Michael on LinkedIn.

Find Cassidy on her website.

Stack Overflow user Bucket received a well-deserved Lifeboat badge for rescuing How to calculate decimal(x, y) max value in SQL Server from an ocean of ignorance.

Chapters (please note that these timestamps may not be exact):

00:00 Introduction and White Paper Discussion

02:01 Long Context Windows and Retrieval Augmented Generation

05:56 Models' Ability to Recall Relevant Information

07:18 Models' Creativity and Thinking Outside the Box

09:41 Advantages and Limitations of Models' Knowledge

15:09 Databricks' Open Language Model

22:25 Sam Bankman-Fried’s Sentence and the Effects on Crypto/Blockchain

31:28 Closing Remarks and Lifeboat Badge

Episode Transcription

[intro music plays]

Ryan Donovan Hello, everyone, and welcome to the Stack Overflow Podcast, the place to talk about all things software and technology. I'm Ryan Donovan, hosting while Ben Popper is away in my motherland, Iceland. Today, we're having a home team episode with a couple of special guests– special-ish. We’ve got Cassidy Williams–

Cassidy Williams Hello!

RD And Michael Foree, who is Director of Data Platform here at Stack Overflow.

Michael Foree Hi, Ryan.

RD Today, we're going to be talking about a little bit of news that's out. So Michael, you dropped an interesting white paper in the AI chat about long context windows possibly making retrieval augmented generation obsolete. Is that right?

MF It would definitely transform the way that retrieval augmented generation happens. So the paper is a paper from Google about Google's recently released model, Gemini 1.5, and it specifically looks at the biggest version of that model that, to date, hasn't actually been released to the public, so it's research only. They're pushing very hard to support an absurdly large context window. So typical open source models might allow you to paste 500-1,000 words into the context and ask for information about that. Recently, we've seen some models get published that can have a little bit more text or video or audio. Gemini's model boasts– what is it, 700,000 words– which is the equivalent of War and Peace 10 times, or 3 hours of video or 22 hours of audio. There's no existing practical purpose for it, but it's an interesting signal on where Google thinks the market is shifting, and I think we can speculate as to what you might want to do with this gigantic context window that perhaps isn't quite possible with other models with a more normal or reasonable context window.

RD I think the interesting stat you dropped was that Gemini 1.5 can handle 10 million tokens and stackoverflow.com is 10 billion tokens, so you can filter a pretty good chunk of the site in there. Now, I think we've talked with a lot of AI folks about retrieval augmented generation as being the thing that everybody's doing. How does this affect that?

MF With your traditional retrieval augmented generation, what you have is a retrieval step where you retrieve your content. You can do this with a vector database, those are all the rage these days. You can do it with a traditional relational database, you can do it with graph databases, you can do it by just having the user say, “This is the thing that I care about. This is the content,” and then you have your language model go and prompt. And typically what happens is that you have to filter down the thing that you care about in the retrieval step pretty precisely. You’ve got to narrow down and be like, “Hey, these are the 500 words that I want to play around with that I want to do something with. This is the picture of this size that I want to understand what's going on here.” With a much larger context window with what Gemini is offering now, you don't have to be as precise or prescriptive at getting down to the narrow window. You can do things like, in their paper, they took the entire play of Les Mis and pasted it into the context window and then started asking questions about who is this main character, what's the conflict between these two different characters, and it was able to traverse entire scenes to talk about how these two different characters are interplaying. With your traditional RAG-style and normal context window, you have to first go and say, “Oh, I’ve got to figure out who this character is, then I’ve got to find each of the different scenes where these characters are talking, and then narrow into just the key information and put that into the context window.” With this very large context window, you don't you don't have to pick and choose. You can be less choosy, if you will. I would imagine it comes with some drawbacks. I'm not able to play with the model. I speculate that one of the drawbacks that you would find is that traditional RAG generation is the opposite of one shot prompting. So if you provide useful information as a prompt into your context, it helps the model figure out the right answer. If you provide distracting information, it hurts, it can get in your way. And so I would imagine that there's a scenario where you have just too much content and it distracts the overall model from getting to the right useful, helpful answer. But of course Google released the paper and they didn't they didn't talk about those drawbacks so it's hard for me to do more than just speculate as to the drawbacks.

CW It's kind of like talking to a person who knows a lot about something. If you try to throw in something that's distracting, a percentage of the time they will be distracted by it, and I think that's kind of what this Gemini paper is saying, at least in my perspective. It's not so much, “Who knows what you could do?” There's so many things. I think they're trying to get closer to what it would be like to talk to a person who has all the context and what you can do with that. If someone really knows deeply the Les Mis play, you could say, “Okay, could you tell me about what's Javert's deal? Why is he so into Jean Valjean?” Someone who really knows the play will be able to do it, and some plays, some books, some things are longer than others, and so it kind of gives that context. But at the same time, you can distract a person, so I imagine, like you said, you can probably distract a model in some way.

MF Do you think that models would have an advantage over humans, vis-à-vis, we might know the play as a human, but we forget things that are perhaps relevant, we just choose not to talk about them? Do you think that models could get to a spot where they would be better than us at recalling relevant information?

CW I think so, because they don't have the other context of life. Models know what they're told and that's just it. Humans know a lot of what they're told and what they study, but they also know what they ate for breakfast that morning and stuff. And so I do think that when you know something very specifically, you just happen to know it really well, and I think that's where models know things very well. Where if you look at something like AlphaGo, AlphaGo plays Go really well, to the point where they kind of had to retire it. It knows Go better than any human ever could, but that's because that's the only thing it knows, at least in my perspective. If you have a wide context but it's the only thing you know, you're going to be an expert at that context.

RD And my understanding of AlphaGo is it comes up with strategies that no human would do, and they're super effective. They're sort of bafflingly effective.

CW Yeah, true. Humans like comfort. And I think that there's always innovative people doing innovative things and stuff, but there is something to be said for a machine being less afraid to try something based on the limited context of the world that it has. I took Go lessons at one point for a little while and I remember I was talking to my teacher about what's the best move in this kind of scenario, and she said, “Okay, for literally hundreds of years, this was the best move, but AlphaGo came up with a move that blew everyone's mind to the point where multiple people wrote papers on why this move blew people's minds after hundreds of years.” And it's just one stone away that changes how an entire game could be played. And it's kind of like as Grace Hopper said famously, “The most dangerous phrase in the language is, ‘We've always done it this way.’” I do think that by giving the freedom of machines to kind of do whatever they want, there's pros to that. There's cons to that, too, that are scary. But all this being said, yes, you can distract a machine, but to answer your question, Michael, I do think that with it being limited to what it's supposed to know, a machine will probably be pretty good at knowing something fairly deeply.

MF Can I pull on that thread just a little bit, Cassidy, if that's okay?

CW Please!

MF One of the things that we talk about in my circles is that models only know the thing that you teach them, and they're not good at not knowing what they don't know, and they're not good at branching out. You need that human creativity to drag it away from its center. But the experience that you're talking about with AlphaGo is that it knew something that no human had ever thought of or suggested. Presumably it was rifling somewhere around in its training. What are your thoughts on a model's ability to think outside of the box that the human that trained it would expect it to work in?

CW Great question, because I do think that that's something that I go back and forth on. Because with an example like AlphaGo, it is thinking outside of the box, again, within this very specific context. I doubt that it would say, “You know what? Instead of playing this stone, I'm going to say ‘moo’ like a cow.” A child who's probably exploring the world might say ‘moo’ like a cow instead of playing a game because it's a different type of learning. And so I think there is a level of a machine being creative within a specific context given what it knows, versus what a human might do that's completely weird, honestly. I've yet to see a machine– yes, they produce weird things, but it's weird given what they've been taught to be weird about or what they haven't been taught, but they're just kind of trying something necessarily. You'll see a machine generate an image where a human has six fingers. It's probably not because it was just like, “I'm going to give this person six fingers.” It's probably just like, “This seems right.” I don't know, I'm kind of rambling at this point, but I do think that there's limits.

RD I think the six finger one is interesting because we sort of humanize and we say, “Well, it's six fingers,” and the model doesn't know what fingers are. It's just like, “Here's an image where this is sort of what fingers do, and there's another one of these things stacked on it.” So sometimes it stacks too many, but I think the needle in the haystack-ness of this is really interesting because that's not how people work. People don't have all of the facts. If you're like, “What color were Javert's shoes?” And it's like, “I don't know, who cares? This is a story about justice and poverty and all that.” And I wonder if almost knowing too much will get in the way. Like you said, it's all distractions.

CW It's kind of interesting that you say that because it's almost like how people compare answering questions on Stack Overflow versus having an AI answer a question. An AI might answer the question right away, but a lot of times when you get an answer on Stack Overflow, it's someone saying, “Are you sure this is the right question you want to ask?” And there's pros and cons to that, but sometimes it's good to be questioned on the context rather than being just given the answer.

MF Absolutely, and something that someone on my team is looking at right now is the back and forth, the 20 Questions part of asking a question. “Here's my initial question.” “Well, but why, and did you try this, did you try that?” And one of the things that I want us to kind of challenge is that LLMs will take a guess at what they know based on what they've been taught, but it's the humans that know what they should be going after, what that gold standard really is. And if we're able to identify the key steps between a bad question and a good answer, then I want to try and help craft the narrative of how to help human beings get what they really want by leveraging what human beings are really good at and leveraging what machines are good at, and taking a step back and saying, “You sure you want to go down this route? You don't see 20 steps in the future. I've tried this before, I've done this 10 times and every single time your boss is going to come back and say, ‘Uh-uh, there's this thing that you forgot to consider that you need the human experts to step in and say, ‘Let's do this way.’’”

CW I do think there's something to be said for just trying it and then just giving an answer, but like you said, sometimes you need to be told to kind of say, “Ah, are you sure about that?” And there's so many examples that we could talk about where LLMs are slightly wrong but they did give an answer, but it's not the right answer.

RD So let's move on to the next. You mentioned Databricks in the initial conversation before we started recording. Databricks is our data platform. Recently, they released a state-of-the-art open LLM. Now, Michael, you shared this. What is interesting about this to you?

MF There's a couple of things that I take note of. One is that, since GPT-3.5 launched, OpenAI has been viewed as the leader to beat for language models or multimodal models. And you see a lot of other companies that come and try and compare themselves to OpenAI and Databricks is doing just that with their latest model. They benchmarked it against GPT-3.5 and also 4 and then also a handful of other models. I'm trying to connect some dots, and maybe they don't belong to be connected together, but I view this as Databricks’s way of saying, “If you use our platform, you, too, can have a language model that's better than the best out there.” Come and we'll give you this open source model for ‘free,’ free in air quotes. Use our ecosystem, connect with your data and pay for our software and then we'll help you be the very best. And I'm starting to see a similar narrative pop up in different places, and it's almost as though it's really convenient that GPT-3.5 has been seen as the best until GPT-4 came out and everyone can point and say, “Hey, remember all that great publicity and all this great stuff on how great GPT-4 is and all the things. Well, we're just as good. If you jump through our boxes and use our software and our platform and our product, you can be amazing too. Just pay us per GPU that's coming out.” And no shame in the game. That's kind of where everyone's trying to circle and they're trying to find out where their piece of the pie is. And I'm trying to read the tea leaves, and I think that's where the other market is trying to form around OpenAI and saying, “Yeah, you can be almost as good and have it all in your own ecosystem.”

RD Yeah. It almost seems like the LLM in this case is not the product, it's the sweetener for the product.

CW They call it an open LLM, but that doesn't mean it's an open source LLM. Why do you think they call it an open one?

MF Cassidy, I think that's a very good question.

CW Cool.

MF I've heard so many different– “Oh, it's not open source unless you do this or you do that,” and we don't know what data Databricks trained their model on. We don't know what transformations, how they shaped the data to prepare it to do this. We have a little bit into a mixture of experts’ model. Okay, cool, thanks. I think they allow you to see the final weights. You can download it on your own laptop and run it on your own machine. I think somebody did this on an M2 quite recently, which is phenomenal because they just released it. What does open source even mean? I think that there's a lot of wordplay going on in the industry right now and I think in five years, we're going to look back and we're going to be like, “Ah, good for you. Ah, nice try.”

CW You called it ‘open.’ It does seem exciting. I'm all for competition because then that can help us in general as consumers of all of these models and stuff. But at the same time, I'm curious about it. Once again, the word 'open' is the thing that I'm kind of stuck on because I'm just like, “Okay, so do I have to pay for something to play with it?”

RD Well, they published the weights of the base model and the fine-tuned model and that seems what's open. That's available under an open license.

CW Oh, okay. That’s the ‘open.’

RD So I wonder if you have the weights for everything, what does it take to actually have a model that you can run? Do you have to do some more work?

MF I'm going to guess at that question, Ryan. It's a guess because I just have not looked too much into it. I would imagine that it's probably brand spanking new so there's probably a lot of libraries that don't yet support it, but will soon. But I would imagine that you can get your favorite GPU, go to your favorite cloud provider or your own home computer and install it there, and with a little bit of elbow grease, probably start getting some inference, and not pay Databricks a dime, probably, and they're probably okay with that. And if you know how to do that and you want to do all that, I'm sure that they're going to be happy to be like, “Yeah, go right ahead.” The next tier of people that they want are the people that are like, “Well, surely there's an easier way.” “Oh, yes, there is actually. Use Databricks to deploy it and to fine-tune it,” which is very close to Google, AWS’s, and Azure's stance where they have models that probably aren't open source or open at all, but if you use their ecosystem, you can fine-tune it. Can you download it on your computer? No, but you can run inference if you pay some money. So Databricks is one step closer in a good direction.

CW Speaking of the Googles and Amazons and all those, I know that they even have consulting that they'll do for ‘free’ to you where they'll say, “Okay, yes, you can't run this on your machine but we'll help you get fully set up so you can have your own model that's perfect for you and then you can pay us later.”

MF Yes, of course, of course. Which is a great business model if you want to make money.

CW I mean, it works. People want AI and you can give them AI.

MF Yeah.

RD The richest person in a gold rush is the person selling shovels.

CW Yeah, that's true. I am curious about this Databricks model because I have seen enough people shouting on Twitter being just like, “Oh, dang. This is actually pretty nice.” So add them to the pile and see what comes of it, I guess, and we'll see in a few years if this is something where, “Oh, yeah, that was a big one,” or, “Eh, that was just yet another model you could attempt.”

RD Let's shift gears a little bit. Eira, you gave us a story from The Verge about the FTX co-founder Sam Bankman-Fried sentenced to 25 years for the fraud.

Eira May Yeah, that news broke today. So if you were following the collapse of FTX, the crypto exchange started by Sam Bankman-Fried, he's just been sentenced to 25 years in prison for seven counts of conspiracy and fraud stemming from the collapse of that crypto exchange that he started. So not a huge surprise to anybody who's been kind of following the almost two-year long now– I guess more like an 18-month long saga of this highest profile crypto collapse. It looks like we finally have the coda to that, which is that he is going to prison for a quarter century. I'm not ever going to be excited about somebody going to prison, but certainly for the fraud to be recognized as such and termed as such legally, I think is something that needed to happen for the industry to move forward.

CW I think if anything, it kind of sends a message about, “Hey, fraud is bad.” I guess they wanted to charge him to be in prison for over a hundred years and stuff, which is kind of similar to how Bernie Madoff was technically sentenced to 150 years, even though he went to prison when he was 70 years old or something.

EM Yeah, exactly.

CW It's sending a message. Fraud is bad. Hey, everybody listening, don't do fraud.

EM Right. I think a lot of the conversation that I've seen actually has been comparing this to the Bernie Madoff sentence. So he was sentenced to 150 years for operating a Ponzi scheme, but he was 71 when he was sentenced, so in a lot of ways that was a symbolic sentence, whereas SBF I think is in his early 30’s.

CW So it's actually like, “Oh, this is going to be your life.”

EM Yes, this is not a symbolic sentence. This is going to be a big chunk of your life so it's kind of sobering I think to see that today.

MF I'm sad about the lost opportunity. All of the chit-chat about blockchain and bitcoin and Etherium, I think it's a neat technology that could actually do beneficial stuff and it got wrapped up in shillsters trying to make a buck. And they didn't have to commandeer the narrative. They could have been like, “Yep, that's cool. I'm not going to try and use it to scam anybody. I'm just going to sit back and go and do something else and let the people that want to play with the technology just have some fun and maybe actually produce something useful.” And there might actually be some legitimate, genuine benefits of blockchain technology out there. Well, actually I know one particular company that's doing something kind of cool, but no no one cares because everyone is like, “Oh, Bitcoin, is it going up or down? Do I buy or sell?” like it's an investment. And Sam Bankman? Come on. You didn't have to change the narrative about this.

CW I agree with you because the little I've played with in the crypto space just to kind of experiment I'm just like, “Oh, I get it. This actually is particularly cool,” but why is it all scammers right now or just people spamming my mentions or something? There's so much of that that it's just an anti-incentive to ever mess around with the technology unless you're really quiet about it because you don't want your mentions to be destroyed.

EM Yeah, absolutely.

MF Some people compare the boom/bust of bitcoin to the boom/bust of AI and LLMs. Do you ever sit back and think, “Oh, man. All this chit-chat, all this hype, all this is just a bunch of hot air,” or, “No, I've seen some things and, yes, it's actually going to stick. And it's getting a lot of hype, but it deserves at least a fraction of the hype.

CW I think it's both similar and different where just like there's some bad actors and things that are annoying with crypto, same with AI, even just minor things. I noticed this morning, there's this person who was responding on a ton of my LinkedIn posts and I was just like, “Why are you asking all these questions suddenly?” Then when I looked at his profile, I saw he had posted something like hundreds of questions in the span of a minute, and all of them were basically the same. He was probably using AI and ran a script on all of his LinkedIn mentions to try to generate traffic and relevancy and stuff. And there's things like that that you see that type of behavior more mundane, like, “I just want more people to look at my LinkedIn,” to “I want people to listen to me and pay me a lot of money, and then maybe I'll also do a deep fake video and maybe people will listen to that, or maybe I can drive certain change in my direction.” I feel like there's always going to be bad actors, unfortunately. And the AI/LLM wave, it does feel like it's a similar vibe in that everybody's excited about this one thing and it'll probably die down, but at the same time, I do think a lot of the good innovation will bubble up over time but there has to be a lot of regulation against the bad actors if it's going to succeed.

MF Well said. I keep reminding myself that the internet had a boom/bust cycle back in the 90’s and I'm quite certain that there's a bubble going on with AI and LLMs. And there's a bunch of people throwing a lot of money and it's like, “Good try, but I'm sorry.” And that doesn't mean that the underlying technology is not useful or beneficial. It doesn't mean that all the companies involved are going to go belly-up. It might take some work to separate out all these, “Hey, I've got an idea. Let's try this. Let's try that. Let's try this,” from the companies that are actually going to deliver something worthwhile with it. And I'm really curious what's going to happen in the next five years. I think that a lot of technology is going to change because of what's going on with LLMs. I think there should be a lot of regulation, as you mentioned, Cassidy, that should come out of this. And it's going to be interesting how companies respond and be nimble to the regulation that's coming out, even a regulation in the European Union coming out already.

RD I think this judgment against Sam Bankman-Fried could actually benefit blockchain technology.

CW I agree.

RD It shows that the hucksters and the folks just using it for speculation will get punished in the end. Because we had Ryan Dahl of Dino on the podcast the other week, and he was talking about an NPM replacement that uses blockchain for cryptographic signature verification. That's a legit use that doesn't have money attached to it.

MF That's kind of cool.

CW I like anything that would let me play with a new technology that doesn't cost me so much money to just make a mistake. And I think anything like that that is a really viable use of crypto, of AI, of any of these things, I see that there's value in it. But once again, there's some regulation that needs to happen and good actors outshouting the bad ones.

EM Yeah, exactly.

[music plays]

RD Well, we've come to the end of the show. And as we do a lot of times, we're going to shout out somebody who came and saved a question from the dustbin of history. A Lifeboat Badge was awarded two hours ago to Bucket, “How to calculate decimal(x, y) max value in SQL Server.” So if you're curious about that, we've got the answer. I've been Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And if you want to reach out to me, my DMs are open on X @RThorDonovan.

EM My name is Eira May. I'm on the Editorial Team also at Stack Overflow. You can find me online @EiraMaybe.

CW I'm Cassidy Williams. You can find me on the internet @Cassidoo on most things. I'm CTO at Contenda, and you can play with our tool, Brainstory, at brainstory.ai.

MF My name is Michael Foree. I'm not on social media. I'm the Director of Data Platform and Data Science here at Stack Overflow. You can find me on LinkedIn, though.

CW Ooh.

RD All right. Thank you, everybody, and we'll see you next time

[outro music plays]