The Stack Overflow Podcast

We chat search from both sides now

Episode Summary

In this episode, Ben chats with Elastic software engineering director Paul Oremland along with Stack Overflow staff software engineer Steffi Grewenig and senior software developer Gregor Časar about vector databases and semantic search from both the vendor and customer perspectives. They talk about the impact of GenAI on productivity and the search experience, the value of structured data for LLMs, and the potential for knowledge extraction and sharing.

Episode Notes

Stack Overflow and Elastic are collaborating to improve the search experience using vector search and generative AI. Learn more about the new AI features for Stack Overflow for Teams, including Enhanced Search.

Learn more about the Elastic platform, including vector search. Developers can start building here.

Connect with Paul, Steffi, and Gregor on LinkedIn.

Stack Overflow user chepner won a Lifeboat badge for answering How do I use __repr__ with multiple arguments?.

Episode Transcription

[intro music plays]

Ryan Donovan Monday Dev helps R&D teams manage every aspect of their software development lifecycle on a single platform: sprints, bugs, product roadmaps– you name it. It integrates with Jira, GitHub, GitLab, and Slack. Speed up your product delivery today, see it for yourself at monday.com/stackoverflow.

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast: Elastic Edition. I am Ben Popper, Director of Content here at Stack Overflow. I've been working here for about five years, and I think one of the very first pieces I did was with two fine folks from Elastic who were talking about how they powered some of our search in the background. Now, five years later, that was in 2019, I have a podcast for you with Paul Oremland from Elastic as well as Steffi and Gregor from Stack Overflow, to chat a bit about how our two organizations work together to power some of the new search features that are going to be available as part of our OverflowAI offerings, and how Gen AI and things like vector databases are being integrated into our more traditional search– the semantic elements of search. So without further ado, let's start with you, Paul. Why don't you introduce yourself to the audience? Just quickly tell them a little bit about who you are and what it is you do at Elastic.

Paul Oremland My name is Paul Oremland, like you mentioned. I'm a Director of Software Engineering at Elastic within the search group. Elastic has three main solutions. We have a security solution, we have an observability solution, and we have a search platform which we expose to developers like yourselves. I've been at Elastic for just over a year now, but I've been in the software industry for coming up on almost 25 years, and I've kind of been all over the place. Most of my work has been in large scale distributed systems, but I've done everything from mobile applications to client applications.

BP Cool. Steffi, why don't you say hello to our audience and tell them a little bit about what it is you do here at Stack.

Steffi Grewenig Hi, I'm Steffi. I'm a Staff Engineer at Stack. I've been here for about three years, and I'm part of the team that works on content creation usually at Stack, so everything that's around posts or articles, draft articles, and in the Teams context, so for our Teams product.

BP Last but not least, Gregor, do you want to say hi to our audience and just tell them what you do at Stack?

Gregor Časar Hi, folks. I'm Greg. I'm based in Slovenia. I'm the Tech Lead of the content pod here at Stack working on our Teams offering. And in our last months, we've been tasked to integrate and bring in external data into our product.

BP So Steffi and Greg, maybe you can help the audience understand. You're on the content team, but I know today we're going to be talking a bit about search. What does content and search have to do with one another? And if you can, talk a little bit about what it is we worked on with Elastic.

SG I think that's actually a bit of a wild ride. We actually did a three-day hackathon where we just wanted to refocus a bit, and one of the ideas for that three-day hackathon was ingesting content and also making it available via search. But we also wanted to ingest that and generate new content from that, so that's kind of why we as content creation tackled that within a hackathon. And that gained so much traction that we actually got asked to build the hackathon proof of concept into something that is now a feature that we want to roll out to our enterprise customers. And we used Elastic during that three-day hackathon and it went so nicely that, actually our team without prior extensive Elastic knowledge or knowing especially a lot about infrastructure and deploying things on Elastic or search, we were able to do a proof of concept for enterprise search.

BP Greg, do you want to talk a little bit about what the MVP was at the hackathon and why you think it had such traction that we decided to turn it into a fully-fledged feature on our roadmap?

GC Actually, Steffi would be more equipped to talk about that. During that time, she was actually the one that pitched the idea.

SG I think at the time, we just used a bunch of connectors from Elastic and the Elastic application search feature that they provide, and we weighed Slack messages, actually podcast transcripts, and I think some of our GitHub repositories. We ingested those and made them available for our internal Stack Overflow for Teams instance. So internally, after three days, people could search our public Slack channels, some of our GitHub repositories, and the podcast transcripts, and that was the scope of the hackathon that then turned into an MVP for our users. And currently, we are invested mostly in GitHub for now, making GitHub repositories searchable.

BP Nice. I know one of the big goals with enterprise search and with us trying to be a great knowledge base is like, “Look, maybe you've invested time in Stack Overflow for Teams. You got a lot of great Q&A there, but your company has also built up incredible knowledge and sources of data other places like GitHub or Jira or Confluence, whatever it may be. How can we ingest those and make that all part of a single database that search can do something really smart with when you just want to chat with it in natural language?” I have a question about how we do APIs. Am I using the right setup for this new function I'm building? Paul, from your side, once we decided to go from hackathon to actual product, what was the interaction like between Stack and Elastic? What was the reachout like and how did we begin building together?

PO What's really interesting about this is that I think this is actually a pretty common evolution. I know Stack has used Elasticsearch in various contexts throughout the history of Stack Overflow and Elasticsearch, and what we actually see is that this type of evolution happens pretty commonly. Folks start with regular search, then they start looking at how we get a little bit more knowledge-based type search or some more private data content. We want to expose that. We want to do maybe some semantic search with that because the way that those documents are structured and the content in them is not necessarily always structured. And so it usually starts with meeting with you all and sitting down and looking at what is your use case and what are the types of workflows. With search, it's an interesting space because there is no one-size-fits-all solution. What you're trying to do is very different than what some of our ecommerce customers are trying to do with search. So we really want to start with understanding what is the workflow and what are some of the key decision points in that workflow. For instance, a lot of the data that you're ingesting is coming from a lot of different data sources, so we want to understand how we get that data in in a way that's efficient, that is cost effective, but actually puts the data into Elasticsearch in a way that you can then retrieve it for whatever your search needs are. And so it usually starts by sitting down and thinking about what does that overall architecture look like?

BP That's interesting. So you've seen not just Stack Overflow, but other folks evolving in this direction, and I think that makes a lot of sense. Everybody is looking at a lot of Gen AI as a greenfield that maybe is going to improve productivity or allow for some new functionality inside of an enterprise that wasn't available before. What are the steps you have to take in order to make that possible? Do you have to clean up the data? Is there an ETL pipeline? Do people need to add metadata? And then once you have it, are you talking about working with things like embedding and chunking and vector databases, or am I on the wrong track there?

PO You're exactly on the right track. I think it really actually starts with just understanding where we use generative AI and where we use RAG in our workflow, and usually that does start on the ingestion side. We have a bunch of connectors like you all mentioned that you can use, but it's really important that as you bring that data in that you do think about things like chunking, you do think about your ETL pipelines. For instance, one of the things that Elastic offers –I'm not actually sure if you all are using this right now because it's a relatively recent offering– is the ability to store multiple vectors within a single document. And so what that allows you to do is it allows you to chunk the document efficiently as passages, generate vectors for each of those passages, and then when you do your search and you return the result, you get the actual whole document rather than just the individual passage context, because that's at the end of the day, what the person is actually searching for. So you have to think about that when you're bringing the documents in. That's not just a search time thing, that's an ingestion time thing. So typically when folks are building generative AI into their applications, they first have solved, “What is our use case?” Then they've sat down and they've thought about what are the chunking strategies. There's two main ways you can do chunking with Elastic. One is that you can use a third party library, something like a LangChain or a LlamaIndex where it's going to have its chunking strategy built into that framework. You can also set up a pipeline within Elastic where you can have the pipeline using our scripting language go through and generate individual passages and store them back within documents. And then you have to figure out what you want to do once you have that in there. There's three main ways to search, and I think right now the big trend in the industry is that people don't really understand which is best for their model and so it does usually take a little bit of trial and error, but you've got your straight lexical search, or your BM25 is the algorithm that we use. This is your traditional keyword search. This is how everybody has been doing search for the last couple decades. You've got now, since I think 2019 in Elastic, you've got KNN search, or the ability to do vector search, which essentially is doing vector math. We're using approximate nearest-neighbor algorithms for that. And then you have hybrid search, and that's actually where we're seeing that people are finding the most benefit, where you can take lexical search, you can take vector search, you can then combine the results and take the best of both of those. And so what we're seeing is the evolution as people are starting to think about this, they're thinking about the chunking, they're getting their chunking strategies, they're getting their inference set up right, and then they're really focusing on how I get the most relevant results out, and that's where folks, we're seeing them really end on hybrid search.

BP Nice. From the Stack Overflow side, Steffi, Greg, I would love to hear your thoughts on this. How did we learn to work with these new approaches? What tooling did you help to set up or use now? And what do you think about it is interesting in terms of delivering more value to folks, whether that be on a public platform for Stack Overflow, an enterprise teams customer, et cetera?

GC For me, what Elastic enables was moving the responsibility of who owns integrations. So Paul, you mentioned how do you bring that data in? That's also on Elastic, that's your enterprise search offering, and it's part of your ETL pipeline. Because it's still your pipeline that is shared in our other teams, that meant for our content pod team that we didn't actually have to worry about this at this time. We knew that at a later time, or from a different team, people could invest time into that, and when we are ready to bring external data into it, that it'll scale and work just like with the data that they've been using for our OverflowAI, enhanced search and improved search, which is currently indexing our Stack questions and answers and articles. And as a developer, I'm very happy that I don't need to write 20-something integrations and to maintain them, and in a year, somebody brings me, “This is broken now.” I know that folks at Elastic, their solution architects that we've been working with are responsible for that. And for us, it's all presented behind a nice little facade.

BP Yes, don't build internally what you don't have the resources to maintain. A very important maxim to keep in mind. Steffi, how about on your side?

SG I think not much there. We are actually still kind of in our exploring phase, baby phase regarding search. Our team in particular never did anything regarding search, so we are using way more Elastic offerings for our internal content than we do right now for the externally ingested content. So that is, I'm afraid, just plain old boring lexical search.

BP Hey, don't knock it. We need them both. Let's stop for a second and look at this from the perspective of, “What are we building for whom and why?” Paul, I'll let you maybe speak to how this is happening not just with us, but across the industry in a minute, but for Steffi or Greg, I'd like to sort of ground the audience a little bit. What is the ultimate goal here? What do you hope to deliver, and why do you think that would improve productivity or search experience or collaboration? What is it that we're adding to the traditional experience that's new and that has some component via generative AI that we couldn't do before?

SG I think we knew for years that our customers have valuable knowledge and other tools, and the question was, how can we make that available within our platform, and also, how can we make our tools like voting, and the focus on community, being able to leave comments, available through that external content? Another part would be content health, that we flag outdated content, duplicated content, and those kinds of things, and how can we make this also available for external knowledge? So the first step for us was to bring in external knowledge into Stack Overflow, and then a foundational piece to this is also, of course, make it discoverable via search and then we can add on top of this. And our team in particular, right before we did that, invested into content generation. So we actually took external content and generated questions or articles out of this.

BP One of the old challenges for Stack Overflow or Stack Overflow for Teams who are building a knowledge base is, again as we said, folks have knowledge being built up in different places. It could be in a code comment, it could be in a Confluence wiki. Now, one of the things that Gen AI is really good at is, look at this text, synthesize it for me, pull out the salient points, and maybe even, “Hey, can you structure them like these Q&A couplets that I've got a thousand of to show you and train you on?” And so that's where knowledge ingestion can kind of happen. The idea there would be that you both seed this library of knowledge that the community can use, and you save a lot of manual labor in doing so, and then maybe over time, that process is almost automated. Two people have a great conversation in Slack or they resolve an issue in GitHub and they click a button, save this as an answer inside of Stack Overflow for Teams because we had a question, we worked it out, we got to the solution, so I want this ingested by the machine. Greg, from your perspective, anything to add?

GC This is a very exciting topic and we are currently in the sandwich phase. So we started with Gen AI and leveraging it, but you're thinking top-down or bottom-up. We started bottom-up. We took an article, for example, and chunked it out and used an LLM to then summarize or basically transform it, but we also used it to kind of do an experiment and say, “You mentioned a conversation.” So we use that pattern to say, “Please grade the importance or the relevancy of this in this context.” But that was a very one-dimensional or maybe too focused thing. It only had the context of a single article. Now with the solution that we're building, a holistic approach can be done. You first bring in all the content from various sources using enterprise search, and maybe you can also add some information at ingestion time, but importantly –this is the bit that I'm excited about– this will enable the holistic approach when you say, “Hey, this is the subject that I actually want to extract some knowledge out of.” And then you first do a search and then you do the same thing, but instead of only having chunks from a single article, now you have chunks that are very relevant from your whole organizational knowledge base.

BP I do think one of the wild dreams which has a dystopian flavor to it, an Orwellian flavor to it, but we're going to do it in the right way is, “Hey, every stand up we do now we get a video recording and a transcript automatically, so make sure we're pulling in the best points from that on XYZ subject.” Every conversation now the knowledge can be made into an artifact that is retrievable or that can become part of a collection or something like that. Paul, talk about this from your perspective. It can be focused on what Stack Overflow is doing in this or what you're seeing with other clients. How are folks trying to take advantage of some of these new capabilities, and what are they coming to Elastic and asking for? You could mention the tooling that you provided to us or some of the other things that you're working with.

PO Yeah, absolutely. I want to start by saying that when I actually heard of OverflowAI and some of the work you all are doing there, it made me sad that I'm not writing code every day as part of my job anymore, because I actually think it's really cool. Some of the possibilities, especially when you're bringing in things like GitHub issues and stuff like that, as software engineers, we always want our code to outlive the original purpose that it had, or we want it to have a legacy. And we don't want to change things– if it's not broke, don't fix it, kind of thing. And when you have the history of what's been going on, the history of your issues, maybe wikis or runbooks or things that, you have so much context around this piece of code that you don't have when you're just saying, “Oh, okay, today I'm going to add this function to this class,” or something like that. So I'm excited for your customers to be able to utilize this functionality. And I think, honestly, when I look across the industry, that's the big thing. Gen AI is this huge buzzword right now. RAG and Gen AI, and RAG is really the process of taking results and augmenting what you're doing with that, and then the generative AI is taking those results and generating something new, like a summary. Summaries are great examples of ways to use this. Internally, we use RAG and Gen AI with our customer support team. So they take our wikis, they take our knowledge base, they take our code base, they take our tech docs, and they're able to provide a better summary over what an issue looks like or what's the end to end workflow. And so it's pretty exciting to see you all being able to offer that through the OverflowAI and the Teams work that you're doing. And I think in general, that's just going to become foundational. I've been talking with a lot of people recently. In fact, I was just at the Microsoft Build conference and I probably had this conversation 50 times over three days there. But today we're just learning all that we can do and all of the new service area that LLMs and text embedding models have given us, but 5 years from now, 10 years from now, this is all going to be table stakes. We're not going to be talking about this the way that we did today. And it's the pioneers in the industry that are really going to set the tone for how we think about this and how we use this. I give a silly example. I've been in the industry for 25 years and I've seen multiple evolutions of technology. The one that I think is the funniest is, I want to say it was in the mid-2000’s, we were all talking about this thing called Ajax. It was the greatest thing since sliced bread. I see you all laughing at this because it's silly to even say that word right now, but we were talking about it then just like we talk about Gen AI today because we realized that there was so much foundational work that went into browsers, that went into JavaScript engines, that we unlocked new capabilities and new functionality that we just couldn't do before. Nowadays in 2024, if I walked into your stand up and I said, “I really think we need to implement Ajax in our thing,” you would probably walk me to the door because it's just silly to talk about. But everything that we do on the web today has some form of this dynamic JavaScript execution there. I really think Gen AI and RAG is going in that direction, and I think some of the products like what you all are working on, we see this with a lot of other customers, particularly in a similar scenario where they've got this plethora of knowledge just sitting around and they're trying to make things more efficient, they're trying to speed things up. A good example is one of the things that we use internally because we pull up this data in, I can go and say, “Hey, I'm on the search team, but I don't have every API memorized. I don't have every piece of functionality memorized. Sometimes I need to just go quickly figure out how to do something.” I can go ask our internal support tool. I can say, “Hey, I'm trying to use reciprocal rank fusion with this type of functionality. How do I generate a filter?” And it can go, it can look at our documentation, it can go look at the docs, it can go look at the examples that we have, and then it can actually give me this wonderful summary, and I think that's where we're going to start to see that just kind of built in everywhere, and it's going to be based on the knowledge that you're able to grab from throughout your organization. And the knowledge is not just general. LLMs are really wonderful, but where I think they're going to be the most powerful is in this kind of private data world where you've got specific context to your organization, to your workflows. How do you bring that context into helping you solve problems and then helping your customers solve problems quicker? Because at the end of the day, search really matters about two things. It's, “Did I get the thing that I expected to get, and did it come quickly?” People don't care about everything else under the covers.

BP Just out of curiosity, as you've been doing this, you mentioned you're doing it a lot for your own organization, different kinds of information that's flowing through. And yes, the dream is that the code will outlive you, so make sure that as we're going along we're creating this amazing blueprint, this map of all the discussions we had when we built it and what went wrong and why we changed this and when this person leaves what they were responsible for. All that stuff would be kind of wonderful for software developers to have if it happened automatically and it was fairly accurate. Does the Q&A format matter? Does putting data into that structure help LLMs perform better or do you think it's not really relevant?

PO I think it does matter. What I actually think matters the most is the instructions that you give the LLMs, at least right now, and that's where we're seeing this RAG aspect of it being the most helpful. You see articles all the time about different LLMs hallucinating on different things and so the more grounded you can give from a context perspective, the better and more truthful the results you're going to get. So yes, the Q&A matters, but the grounding I think matters a little bit more.

BP The better the ground truth, I got you. The higher quality of the data, the better the results. Makes sense. Steffi, Greg, anything else you want to touch on before we hop off?

GC I also wanted to highlight how Elastic enabled our team to be more autonomous. I know that's a bit of a tangent, but all the solutions are hosted in the cloud, which enabled us as a team to basically move the SRE in-house into our team and provision everything we needed ourselves while we were at this phase, which was of tremendous help.

BP Great. Paul, you can feed that soundbite to your marketing and comms team.

PO Oh, yeah.

BP Steffi, anything you want to add before we hop off?

SG I think I said this before, but we were the four devs having three days of spare time and we built that kind of into production, into our internal team, the search, and none of us had prior particular knowledge about search in general or Elastic. And I think that's really telling for a product, I feel. That's a really cool thing.

BP Cool.

[music plays]

BP All right, everybody. It is that time of the show. As usual, we want to shout out a community member who came on and shared some knowledge, helped to save a question from the dustbin of history and thus gave everybody on the internet a chance to find what they need. They've got a problem, we've got a solution. A Lifeboat Badge was awarded May 30th to Chepner: “How do I use __repr__ with multiple arguments?” Represent with multiple arguments. This is in Python. This question was asked 7 years ago and 10,000 people have benefited, so thanks again to Chepner for your answer and congrats on your Lifeboat Badge. As always, everybody, I am Ben Popper. I'm the Director of Content here at Stack Overflow. Find me on X @BenPopper. Hit us up with questions or suggestions for the program, podcast@stackoverflow.com. Email us there if you want to come on and be a guest or have suggestions for what we should talk about. And if you liked the program today, leave us a rating and a review. It really helps.

PO Paul Oremland, Director of Software Engineering at Elastic, and I really just want you to go check out Search Labs. It's elastic.co/search-labs. Learn everything you want to know about search and more.

GC Greg, the Tech Lead at the content pod in the Team's part of our organization.

SG I'm Steffi. I'm a Staff Engineer at Stack Overflow, same as Gregor on the content pod. And I'm on LinkedIn, at Steffi Grewenig.

BP All right, everybody. Thanks so much for listening, and we will talk to you soon.

[outro music plays]