The Stack Overflow Podcast

The framework helping devs build LLM apps

Episode Summary

Ben and Eira talk with LlamaIndex CEO and cofounder Jerry Liu, along with venture capitalist Jerry Chen, about how the company is making it easier for developers to build LLM apps. They touch on the importance of high-quality training data to improve accuracy and relevance, the role of prompt engineering, the impact of larger context windows, and the challenges of setting up retrieval-augmented generation (RAG).

Episode Notes

LlamaIndex is a data framework for building LLM applications. Check out the open-source framework or get started with the developer community, LlamaHub.

Looking for a deeper understanding of RAG? Start with our guide.

Wondering how to import `SimpleDirectoryReader` from LlamaIndex? This question has you covered.

Jerry Chen is a partner at Greylock. Connect with him on LinkedIn.

Read Jerry Liu’s posts on the LlamaIndex blog or connect with him on LinkedIn.

Episode Transcription

[intro music plays]

Ryan Donovan Monday Dev helps R&D teams manage every aspect of their software development lifecycle on a single platform– sprints, bugs, product roadmaps, you name it. It integrates with Jira, GitHub, GitLab, and Slack. Speed up your product delivery today, see it for yourself at monday.com/stackoverflow.

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ben Popper, Director of Content here at Stack Overflow, and I'm joined today by my wonderful co-host, my content team collaborator, Eira May. Hi, Eira. 

Eira May Hey, how's it going? 

BP It's going well. How are you?

EM No complaints here. 

BP So today we are going to be chatting with a pair of Jerrys– Jerry Chen and Jerry Liu. Jerry Liu is someone I had reached out to over Twitter because I had been seeing some really interesting stuff about a product called LlamaIndex. The way to connect LLMs to your data, how do you get LLMs familiar with the stuff that you have inside of your organization so that when it's answering questions it's not just the generic base model that’s learned from the internet, but maybe it knows about proprietary stuff, maybe it knows specifics about your industry, maybe it knows specifics about your code base, and that's fundamentally the thesis at Stack Overflow too. If you have Stack Overflow for Teams as your knowledge base, there's going to be a lot of stuff in there that's relevant that the AI will be able to feed to your employees when they have questions, and it's going to be a lot more accurate than its best guess based on having read all the internet. So without further ado, Jerry and Jerry, welcome to the Stack Overflow Podcast.

Jerry Liu Thanks for having me. And thanks for having us. 

Jerry Chen Thanks for having me as well. 

BP So Jerry Chen, I'll start with you. Just give folks a quick flyover. How did you get into the world of software and technology and to the role you're at today? 

JC Gosh. My career starts way back almost 30 years ago. After college, Netscape went public in ‘95-‘96, so I think I got into the internet like a lot of young engineers who were really curious about this next-generation platform. And in the past 30 years, I've been lucky enough to be part of some great companies, like a long career at VMware as a product manager where I helped build a bunch of their cloud technologies and their platform technologies. And then a decade ago, I got lucky enough to join Greylock as a VC, a venture capitalist, where I got to be part of amazing stories investing in seed Series A companies like LlamaIndex. And so it's been a joy working with Jerry and Simon for the past year or so. 

BP Cool. I saw you were the Series B first partner on Docker, a company we discuss a lot on our show, as well as Instabase has come up a few times, so some companies you've invested in that the Stack Overflow audience is familiar with. And so Jerry Liu, tell us a little about yourself. How'd you get into the world of software and technology and what led you to become the CEO and co-founder over at LlamaIndex?

JL So I've always been interested in programming since I was a kid and majored in CS in college. Actually I got into AI a little bit later than I probably should have. So I started around senior year in 2017 in college where I started discovering some of the latest advances around ML and deep learning at the time. At the time, for those of you who are familiar with deep learning research, there was a lot of hype around these things called GANs, or Generative Adversarial Networks. They were just starting to pop out and they could generate some semi-realistic pictures of faces, bedrooms, different types of scenery and art. And so I was very fascinated by that, and that actually kick-started my interest in diving a little bit deeper into machine learning and deep learning, both on the applied side and also the research side. Fast forward to 2022, I had been working in both research and industry across startups– Robust Intelligence– bigger companies –Uber to start– other companies like Quora, and then I started diving a little bit deeper into GPT-3, or Davinci. And so this was around the time when there were a lot of new AI startups being formed and a lot of emerging excitement about generative models. I started playing around with it and one of the first use cases I really wanted to look into was how do I apply the reasoning capabilities of these language models on top of private sources of data that the model did not have knowledge about beforehand. And so as I was going into this process, I realized there were some pain points. There were limited context windows, I had a lot of files or sales transcripts that I wanted to have the LLM understand, and so I started building a basic tool kit to enable, as a developer, you to hook up different arbitrary sources of data, get over the context window limitation, and connect it with your language model. And so that unexpectedly took off on social media and then basically kick-started this whole feedback loop of working on this open source project at the time which eventually turned into a company in April 2023. And it started right before or at the cusp of the whole ChatGPT hype wave that basically kick-started this entire interest and development in Gen AI applications. 

BP Good for you. If you were in at Davinci, you were in early. That's like crypto 2012 or something, so you had the chance to be building this before the wave really crashed and everybody jumped in, so that's cool that you were ahead of the game. So maybe fundamentally, the first thing I should ask is, are we competitors somehow? I don't mind doing this interview and having fun, but what we're trying to sell at Stack Overflow now is this idea that you're going to build a knowledge base internally using Stack Overflow for Teams, which is just a private instance of Stack Overflow. Then we're going to help you bring in your Confluence, your GitHub, your Jira. We're going to use the wisdom of the crowds to make sure that all the knowledge is inside of this knowledge base, is accurate, that it's as fresh as can be, that if it's outdated or inaccurate, folks let you know. You get these extra metadata signals from the votes and from the tags and from other things, and then you're going to build essentially a RAG system where you use whatever Gen AI model you want, but instead of asking it a basic question, you say, “Go look at my knowledge base and bring me back the answer from there.” Is that what LlamaIndex does? 

JL So taking a step back, I think there's a lot of different companies building RAG systems over different sources of data, and there's also a lot of different data types. As you mentioned, there's all the workplace apps developer tools that you might use. Another popular data source is just buckets of unstructured files for instance, like PDFs, powerpoints, and then there's also, of course, structured data like data warehouses. And each of these, there's probably multiple companies and just different segments, even if you slice by data. There's also the horizontal layer, and I think that's probably the biggest difference. This is basically how we started– we care a lot about developers. We started off as an open source developer toolkit that emerged into a framework of just tools for different developers to use and compose their own applications. Of course there's a lot of just general high-level out of the box search tools, whether it's internal company search or code search or anything, that have emerged that basically directly are targeted to the end user. But our entire belief is that at the developer level, a lot of developers are going to be leading the charge of basically enabling and spreading Gen AI adoption throughout the enterprise. And we also think there's going to be a new data stack that emerges that basically encompasses the data infrastructure these developers need to use to actually deploy outline applications in production, and we want to build that platform and infrastructure. 

BP Gotcha. So as you look out at the ecosystem right now, what are some of the trends you're seeing? You mentioned that you started in the Davinci days. Eira and I were doing some research yesterday just trying to understand what the price per token economics have been like, and I think we found that when Davinci first launched, it was something like $2-5 for a million tokens across the various providers, and that price has now fallen to 50 cents or less, and so it's almost been halving every six months. What does that mean from your perspective, happy to hear from both of you, and what does it allow as those costs race towards zero? 

JL Just from a pretty technical perspective on the LLM application development side, as costs are coming down, context windows are also getting bigger, and so what this means is that you can basically stuff more and more data within a single prompt call to the LLM. People have debated whether or not this kills RAG or whether it's obsolete. I think at least in the current state, our belief is that retrieval is still generally going to be needed, but this actually allows you to bypass some of the almost dumb decisions that people are making these days to set up a RAG system. So finer-grain parameters like your chunk size or specific tweaking of whether you split by sentences or paragraphs, I think that stuff will probably go away. Because as these contexts get bigger, you can just stuff a little bit more stuff into the outline prompt window and then you basically don't have to deal as much with minute decisions. 

JC I think in general, cost coming down is a good thing for the ecosystem because it reduces the friction for developers to create new applications. So at four or five dollars per million tokens, you can do X, but at 50 cents, whenever the price goes down, the creativity of developers is unleashed. So we're looking forward to what developers can do or will do when the marginal cost goes close to zero. But I think Jerry's right– larger context windows is kind of a coarse tool, if you will. It's kind of a brute force tool to interact with your data. Especially when you think about the enterprise context or even a personal context, different data has different provenance, different requirements, different access controls that you want, and also performance aspects like Jerry Liu just mentioned. And so having some more nuance around how you output the data for both a developer point of view, from a user point of view, makes more sense. So I see reducing the cost as a good thing because it increases experimentation. And when anyone builds something serious in production, you realize they kind of undo all the stuff they did in their demo apps and say, “Okay, let's do a more sophisticated RAG architecture using LlamaIndex or something else, because that's what it means to be a production-level Gen AI app.”

BP Gotcha. 

EM I was wondering if you could talk a bit more about the decisions that people are making in setting up their RAG systems and wanting to give folks a little bit better infrastructure or guidance for that. What are the mistakes that you see people making or that you anticipate people making as more and more people get into this market and how do you want to help developers mitigate that?

JL We talk about this pretty extensively throughout basically all the content that we put on socials and also YouTube as well as our documentation guides, but there's this entire sequence of steps that you generally should consider when going from prototype to production. For your RAG application or more generally just a general LLM application, if it's an agent structured extraction, those types of things, this applies as well. Typically, what we see is that setting up an initial application is very straightforward, and when I say ‘setting up something,’ I just mean you have some test input and you have some test output and it works. This might be the fault of our quick start tutorial as well, but also a value prop, which is that it takes about 5-10 minutes to actually set up a RAG pipeline if you just follow the starter tutorial of LlamaIndex. So that gets you something, and so you're able to ask a question and get back an answer over your data. But then what ends up happening is that inevitably for your specific use case, there's going to be a performance bar that you want to hit that this POC that you whipped up in 10 minutes is not going to achieve that level of accuracy that you desire. And so then you go into this whole iteration process to understand all the components that are contributing to the end performance and then seeing which knobs to tweak to try to increase the performance. And so if you look at, for instance, RAG, the thing about building any sort of LLM software system is that you're basically adding a bunch of free parameters and every parameter contributes to the final accuracy. And so basically, the more complex your application is, the more knobs you have to tweak to try to improve the final accuracy. In RAG, this includes all your data decisions, data parsing, chunking, indexing, and then on the query side, all your retrieval and prompt decisions– so your retrieval algorithm, the prompt that you're using. And so specifically maybe for RAG, we see a mix of both. We see people struggle in both the data setup, being able to connect your data sources in the first place, being able to parse a complex document like a PDF, to being able to figure out the right prompt to basically answer the question, extract out structured output, and also respond in a way such that it can say, “I don't know,” for instance, when it doesn't have enough information. And so we see a little bit of both, and typically the overall development process, though, is you first set up something initial and then you go and iterate on all the different parameters, potentially with the help of some evaluation tools to basically get to something that's more production ready.

BP So you mentioned these different levers that you can pull, one of which being the algorithm and another being the prompting. Prompt engineering is now its whole own discipline. We were working on a piece internally recently asking the question, “If you have a 1 million, 10 million, infinite context window, what can you do with that?” Well, maybe you can do a little bit of not zero shot learning, but add a million or 10 million tokens about this very specific medical literature, give it some great examples of what a good question and answer looks like, and then without fine-tuning this generalized foundation model, suddenly you can get results that are strikingly accurate. What do you think the role of prompt engineering is today and what will it be a year or two from now? 

JL It's an interesting question. I've kind of gone back and forth on this. I think some people think that prompt engineering will completely go away, and some people still think it'll exist. No matter how intelligent an LLM agent is, even if GPT-5/GPT-6 comes out, in the end, you still need to communicate with it so there needs to be some communication protocol. And generally speaking right now, the most intuitive for that is through natural language. You can call that prompt engineering because you're talking to it through natural language, but I think what's going to go away is trying to figure out how to format the instructions in a very precise way to basically have the LLM output something in exactly the way that you want it. Because I think the level of specificity you need to do, basically really formatting that Python string, is going to go down as models get more intelligent over time. You bring up a good point, though, in that as models have a bigger context window, you're going to be able to not just manually tweak some words, but programmatically insert a bunch of stuff and basically make the LLM call a lot more expressive. The example you mentioned is basically few-shot examples, which we have seen generally boost performance. I think that's an interesting study which I don't know off the top of my head to see what's the limit. Does it asymptotically cap at a certain level the more few-shot examples you put into a prompt? The whole idea of RAG is just context, so if you have a longer context window, you can just set your top-K to 10X and hopefully get back more context so that you can better answer the question. There's also memory. I think that's another very interesting component where the level of personalization you can create for the user goes up a lot the more context you can put into the prompt, because then you can imagine you just allocated a buffer space to basically insert your prior conversation history, high level concepts, so on and so forth. 

JC Just zooming out, Ben, how you think about software engineering, software development, and software testing will change dramatically. I was talking to some professors on how we're going to teach CS in universities in the next two, three, four years, even. It's going to be very different. And so I think regardless of what happens with prompt engineering, if it's here to stay or go or evolves into something else like Jerry was saying, call it prompt engineering, call it something else, the way we think about programming is going to be very different because it's going to become less deterministic and more probabilistic with these LLMs. 

BP Right. We had a conversation recently with someone from Flatiron Software who's building a product called Snapshot Reviews where the AI is reviewing developer productivity based on what it sees in a GitHub pull request, a JIRA ticket, the standup notes, and it goes in and evaluates complexity versus time and says, “Well, this person hasn't had too many PRs, but the ones they do are really meaningful. This person is constantly submitting stuff, but most of it's kind of table stakes, and so this developer gets this score, this developer gets that score.” And that guy was saying that there will be no more junior engineers. You'll come in at the level of, “I'm going to be managing these workflows for these AI agents that know how to write this code better than I do because they've been working and fine-tuned on this specific code base, and it's my job to make sure that nothing is inaccurate or that we figure out how to best fit this with our upcoming changes in architecture or the road map,” for example. To sort of push on this, by ‘prompt engineering,’ I think I meant two things. There's two halves of this. I would agree with you that the way I have to speak to the AI hopefully will become simpler and simpler. There's mega prompts out there now and whole guides on how to write the prompt. Nobody wants to have to do that. But there's also the prompt engineering that goes on behind the scenes. There's sort of the system prompt, and maybe if there's a set of agents doing chain of thought, a whole set of instructions of how to take the input, read it, evaluate it, critique it, send it back before you give it back to the user. So do you think that kind of stuff will still exist, and do you call that prompt engineering? 

JL I think the short answer is yes, it's just a programmatic way of doing it. In the end, RAG is just prompt engineering. You're just programmatically feeding context from a vector database into the prompt. I think over time, generally speaking, and this is true for software just in general, is things tend to move up the stack in level of abstraction. And so generally speaking right now, or at least last year when people were fiddling around with the f-strings of a python string, that's a pretty low level of abstraction because you're just trying to figure out what these models were capable of. As models get better and as people figure out more complex emergent applications, they're going to want to basically offload some of the pure prompt engineering work to basically Python modules or library modules. And there's a separate question on whether or not humans will write these modules or another agent will write these modules. Who knows. But basically there's going to be some higher level library that you can just import, and it will just do chain of thought for you so that you don't have to tweak it as much, ideally. There's going to be some general libraries for helping you insert few-shot examples. Even libraries these days, there's us, there's DSPy, for instance, that helps you programmatically do that. There's all this kind of push towards that direction. And so it's very interesting because I think in general, programmers are going to be using these higher level modules to compose more complex workflows as the lower level modules on prompt engineering get solved.

BP So you're going to say, “This is my foundation model. I'm using NPM RAG system, NPM few-shot to medical literature, NPM,” and then take it from there, right? 

JL I think so. As a framework, it's always tough to strike a good balance between something that's too high-level of an abstraction and something that's too low-level. Because I think the interesting thing is that right now a lot of end use cases are still very custom, and because they're very custom, when developers want to build RAG for different sources of data, they tend to want to make relatively custom decisions. And so if you're too high level, then that leads to frustration because they're not able to do the things that they want to do. However, over time, everything moves higher level up the stack. I think people start figuring out the best practices for how to deal with some of the micro-decisions. Even if you think about RAG right now, data parsing, chunking, generally good retrieval over unstructured data, I think a lot of that's going to just become almost commoditized modules over time that people can just import and use in their broader application.

BP You mentioned at the beginning the idea of PDFs. I don't remember what the context was, but one of the interesting things that's happened in the last year is table stakes is now multimodal for most things. And for LlamaIndex, if the idea is to turn your enterprise data into something that's production-ready on the LLM side, at Stack Overflow one of the things we find really fascinating is, “Okay, there might be all this great knowledge or information inside of your company in text format or in code format, but what about all the slideshow presentations and the videos and webinars where the information you're looking for is not going to be that easy to find with a file name and date? It's what somebody said inside of this stuff,” and so the idea that these multimodal Gen AI agents can now peer into that and extract it for you is pretty fascinating. From your perspective both as an investor and then at LlamaIndex, how do you adapt to the advent of multimodality and how do you make the most of it going forward?

JL So I think multimodality is super interesting. We're starting to see more people leverage multimodal capabilities even in RAG systems. I think you basically hit the nail on the head with the idea of that if you have some document with a lot of text but maybe some pages have images or diagrams, representing that document or that page as a native multimodal image, for instance, is going to basically preserve all the information versus if you try to do some text extraction, it's going to lead to information loss. And so we have a lot of the core capabilities on the open source library to support building multimodal RAG systems, and one interesting thing that we're doing on the enterprise side is that really what we're trying to push for is to help users centralize and process all this knowledge, whether it's text or multimodal data with our enterprise offering, LlamaCloud. One of the parsing capabilities that we offer, LlamaParse, enables users to parse both text and tables– so tabular information which tends to be a little bit cheaper to parse, but also lets you dynamically, for instance, if it recognizes a diagram or image on the page, will let you basically parse that out as a pure image and allows you to basically represent that document as an image chunk as opposed to a text chunk. And so it's interesting because you don't necessarily always want to represent the document as an entire sequence of images, especially if it's just a bunch of text. That ends up being pretty wasteful and can actually lead to hallucinations. However, having some almost hybrid mix of the two so that you can figure out maybe some objects or embedded objects are better represented in a multimodal fashion. You can basically create a more interesting advanced RAG system that combines both text and images, and so that is something that we're pushing a lot towards. 

BP Jerry, I want to let you get to the investor perspective, but before I forget, you mentioned before this idea of, “Oh, well as the context window expands, you might have a buffer that's for the prior conversations.” Listening to folks talk, it's pretty clear that the assistant from Her is kind of the platonic ideal of where they want to go with the consumer-facing product or even the work-facing product that ingests email and slide decks and all that stuff and is ready to help you out. And so I think what you're saying is super interesting. If I'm the kind of person who sometimes writes, but sometimes takes handwritten notes, but sometimes draws a sketch, but sometimes records a voice memo, I want all that information in my system so that when I'm talking to my assistant, it has that full context.

JL And this has implications on both the model capabilities, but also surrounding components, especially if people are building applications with it. This has implications on the data storage on the data processing piece, and a lot of what we're trying to solve is helping with that data orchestration so you can combine all these different sources of data and make it LLM ready. 

JC And LlamaCloud and Llama Enterprise and the productized version we're building is kind of making that easier for developers so you don't need to think about it. So multimodal now is kind of table stakes for any application going forward, and the key for all products, the line is to make the hard easy and the impossible possible. And so the hard stuff, we’re trying to think about context around video, images, text. Llama Enterprise will make that easy for developers so you don't have to think about it. We just point it to your data, and be it text, video, embedded videos, music, etc, the enterprise offering will make that easy for developers so that you don't even think about it. And that frees up the developers to be creative with both inputs and outputs, so both ingesting all the different data you have, and then what you do with it. Because not only do you want to ingest your data from handwritten notes, voice notes, or your PDFs, the output from your applications will also be multimodal as well. I think what's going to be very cool in the next two-three years is what these generative AI applications do, not just respond in text, but voice, video, images, etc. 

BP There was a cool demo. I forget which company did it, but basically it was saying, “I want to see a party plan for my daughter's graduation,” and it came back with something and it said, “Could you organize this not as just a series of paragraphs, but more like a spreadsheet with priority and a hierarchy of which comes first?” And so then it did it, and so it's fun to imagine, Jerry, to your point this J.A.R.V.I.S. thing where you ask it a question and it's like, “Oh, well the best way to show this to you is using a three-dimensional graph,” or, “The best way to show this to you is with a quick animation.” And so it will decide on the fly what modality best expresses the information. 

JC It’s tool use. It's clearly all the agents that Jerry and Simon are working on agentic workflows around RAG, is all the sudden not only are you collecting the data, but then in your own Stack Overflow world, be it a Kanban system or Jira or whatever, all the sudden what tools do you want to use to output the expression, not just a video or a tablet format, but tool use going forward. So I think the combination of data context and the agenetic workflows is kind of the kernel of what next-gen apps will look like. And we're seeing that today. You kind of squint more and you kind of see the future and we're pretty excited because we've seen stuff that developers are working on and I think it's going to be mind-blowing in the next year or two what these applications can do and will do. 

BP So I was checking out the website and it's interesting because it feels like there's this evolution of the product. You see the open source on the left and then LlamaParse for getting stuff out of documents, and then Enterprise which is meant to be a turnkey solution. And so the data, the embedding, the LLM, the vectors, the evals all goes into the Llama, and on the other side, you get what I think most people think about when they think about Gen AI which is Q&A, structured extraction, chat, semantic search agents. Can you give us a bit of a peek behind the window of how that happens? What is the system that allows you to do that? 

JL So that system, at least as a common framework, is RAG, but it's moving into more agentic territory. And at a very high-level, we basically, as a company, want to build the tools to enable any developer to build LLM apps over their data. And so that graphic you saw on the website basically just shows that you shove all your data in, you combine it with LLMs, maybe combine it with some supporting modules, and you get back all these simple to advanced applications that are not just what the applications can build today are, but also the applications you can build tomorrow as these models get better. And so we really have two main product offerings, so to speak, which are all key components of this strategy. The first is the open source project. It's completely free, MIT-licensed. Frameworks are designed to give developers the tools to basically orchestrate, prototype, and productionize LLM application flows over their data. And so an example of this is the RAG set up– set up some indexing data connectors, set up some retrieval prompt modules, and build an application. Another example is an agentic flow– define a set of tools, define an agent reasoning loop, and basically create this overall assistant that can take actions for you, synthesize knowledge, and also personalize to your preferences over time. In complement to that, the motivation for developing our enterprise offering as it exists today, which we call LlamaCloud, is that we basically notice as people are building these LLM applications, there is this new data stack that is forming around what do you actually need as infrastructure to help power all the the data that you need for your LLM app. And so LlamaCloud is meant to basically help centralize process and enhance that knowledge for your LLM application. It brings together all your unstructured data and the future structured data as well as general API interfaces that you want to use, and first puts them all into one place, applies good processing algorithms to make sure they're good quality, and basically ensures that you have good data interfaces to build LLM apps on top of. And so LlamaCloud really is aimed at building that new data stack, at building the orchestration around your data to make it good quality, because an underrated and oftentimes overlooked aspect of building any machine learning application is that the data quality piece is one of the most important pieces of building your application, and we see room there to basically build a fully-managed service to help you really not have to think about that as much so that as a developer, you can focus on the fun stuff– basically build the agents and build all the fancy flows you want to build that basically capture the business logic or the application logic that you want to write. 

BP All right, I have to challenge you. This is my one challenge, again, seeing as I see a lot of overlap here. And again, this is no problem. We talk to people on the podcast all the time where our products are overlapping and we're all trying to move as fast as we can in the Gen AI space and whatever. But you have this great parsing, so basically you're taking semi-structured or structured data and you're extracting it, and now you've added metadata to it, or at least where it ends up in the vector database and where it ends up in the embedding is in the right context and it's understood within the context of the company. But does the LLM or the parsing know which data is accurate or which data is best if there's conflicting answers or which data is out of date? 

JL I think there's a few things you basically have to do. I think those are good points, basically.

BP Because this is what the wisdom of crowd solves. If there's room for people to do this one thing, it's to manage this sort of artifact of data in this library so that when the LLM goes in it has these signals to say, “Well, there's three answers here, but this one clearly has the most upvotes,” or, “There's four answers here, but this one is two years out of date, and this one is more recent,” and so it has those signals to understand how to put priority on information that it will then share. 

JL Generally speaking, I think you're kind of touching on a more advanced data flow that we're trying to help capture with LlamaCloud as well, which is basically the ability to define and annotate metadata on top of your documents and then having some sort of feedback loop between, say, a human or even another machine and the data itself so that the data can basically update over time to reflect the user needs and preferences. Even in a really basic case, if you set up a naive RAG system, you chunk it up and then you just have a bunch of random floating text chunks. It's going to have no context awareness of where it fits into the broader article or broader knowledge base. And so I think one of the flows that we are investing in is really thinking carefully about how do you basically properly attach the right metadata onto each of these documents with humans in the loop so that, in the end, the goal is to give the LLM enough context to solve the task or answer the question. That really is the goal, and this act of defining this data flow will help you go towards that goal. 

BP That makes sense. You could add metadata about recency or about the number of authors or about how often this document was accessed. Another thing we did when we were working with Prosus, which acquired Stack Overflow, on this sort of Slack bot was that you get feedback when you get the answer. “This answer was helpful. This answer was not helpful. This answer had a hallucination,” and then, okay, it can digest that and go back and use that as a bit of metadata as to that particular chunk so that it knows for the future.

JL For sure.

[music plays]

BP It is that time of the show. Usually I shout out the winner of a Lifeboat Badge– someone who came on Stack Overflow and saved a question– but today let's do a little special callout. “How do I import a simple directory reader from LlamaIndex?" Asked September 2nd, 2023. This question has an accepted answer and 240 views tagged with Python 3, large language model, and LlamaIndex. So if you're using it and you have a question, there's an answer for you on Stack Overflow. And Jerry, I’ve got to let you know there's a few questions here with no answer, so if you want to go in and help the crowd, there's some people using LlamaIndex on Stack Overflow that don't have answers to their questions. No, I'm just teasing. As always, I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can find me on X @BenPopper. You can email me with questions or suggestions at podcast@stackoverflow.com. We bring listeners on, we take suggestions for topics, we try to answer questions. I'm going on sabbatical for a month so you won't be hearing from me, but Ryan and Eira will be running the show. And as always, if you enjoyed today's episode, the nicest thing you could do for us is leave a rating and a review. It really helps. 

EM Hey, I'm Eira May. I am a Senior Writer and Editor at Stack Overflow. You can always email us if you have ideas for the podcast, just at podcast@stackoverflow.com. And you can also DM me with questions or thoughts on Twitter @EiraMaybe. 

JL I'm Jerry, co-founder and CEO of LlamaIndex. You can find me at my Twitter handle, JerryJLiu0. 

JC I'm Jerry Chen, Partner at Greylock. I'm looking for early-stage venture capital. Please give me a shout out, X @JerryChen on Twitter or find me on LinkedIn, Jerry Chen.

BP Awesome. Thanks for listening, and we will talk to you soon.

[outro music plays]