The Stack Overflow Podcast

Want better answers from your data? Ask better questions

Episode Summary

Tim Tutt, CEO and cofounder of Night Shift Development, tells the home team about his work in deploying large-scale search and discovery analytics, why he’s working to help nontechnical users understand and utilize their business data, and how GenAI is teaching people to ask better questions.

Episode Notes

The mission of Night Shift Development is to democratize data analytics to help organizations and users of all skill levels understand their data. Their flagship product, ClearQuery, is a data intelligence and analytics platform designed for nontechnical users.

ClearQuery has a free version that lets you try out the full array of features. Learn how it works and register here to get started, gratis.

Learn how Stack Overflow implemented semantic search to allow users to search using natural language.

Read about why self-healing code is the future of software development.

Tim is on LinkedIn.

Thanks and congrats to Lifeboat badge winner Boann, whose answer to Sort four numbers without an array has been viewed 23,000 times and counting.

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ben Popper, Director of Content here at Stack Overflow. Despite what you may have heard, I'm not an AI-generated voice or personality. This is really me for now, still working out the bugs in our other systems. But I'm joined today by two of my favorite co-hosts: my colleague Ryan Donovan, who edits our blog and works on our newsletter, and Cassidy Williams, who contributes to our newsletter and frequently co-hosts the podcast with us. Hello to you both.

Cassidy Williams Hello!

Ryan Donovan Hey!

BP So today we're going to be chatting with Tim Tutt, he's the CEO and co-founder over at Night Shift Development, and we're going to be talking about some of the work he's done deploying large-scale search and discovery data analytics solutions in the public and private sector, and a passion he has for helping non-technical users understand and leverage some of the technical capabilities inside their organization or the data that they have inside. So Tim, thank you so much for joining us, and welcome to the Stack Overflow Podcast.

Tim Tutt Hey, Ben. Thanks so much for having me, and thank you, Ryan and Cassidy. Looking forward to the chat today.

BP So Tim, for folks who are listening, most of our audience are folks who are developers, engineers, or work in the world of software. What's your origin story? How did you get into this world and find yourself in the role you're at today?

TT Sure thing. So luckily my origin story isn't a villain story, so we'll have a good ending to this.

BP We'll decide, we'll see.

TT Not yet. So I started coding when I was really young and got into website development pretty young, then I kind of moved and started building web applications towards the high school timeframe. I did that in college for a while and studied computer science at Virginia Tech. While I was there, I wound up doing an internship that led me into building some large-scale search and discovery solutions for some of our government customers back in the day, and that's really kind of where this whole thing started for me and where I got deep into search and discovery. We had massive amounts of data that our government organizations were collecting that people needed to find these needles in the haystack. How do we get to the thing that matters the most, how do we find the things that matter quickly and rapidly? And it started off with the standard building Boolean search systems, making those things super scalable. How do we make sure that this operates in a distributed manner and it's easy enough for our end users to kind of run through and get the data that they need? I then started working at a company called Endeca that no longer exists. They were bought by Oracle, I think 2012-ish timeframe. I was working there doing the same thing, and they had a search and discovery solution that we would deploy and help implement across the board. One of the things that was great about it was that it was very easy to use. It reminded you inside of our government customer spaces of what a search experience looks like on the outside. So when you think amazon.com, you've got the, “Hey, I'm looking for a Samsung TV. I've got the facet navigation type things. How do we drill in and do the aggregations and analytics that I need to?” And that's really kind of where I started to hone in this love and desire to make these things a whole lot more accessible, make these systems as accessible as possible. Fast forward a bit and I've used a wide range of technologies, spent a lot of time playing with open source tech, specifically around this, Solr back in the day. And these days I spend a whole lot of time using Elasticsearch which is kind of the core of my company's platform, ClearQuery. And that’s kind of the beginnings of my origin story, happy to dive in where it makes sense though.

CW I feel like being bought at that time by Oracle is quite the jumping off point.

TT Yeah, it was a great jumping off point and it was a really interesting time, because this was actually my first job right out of school.

CW Pretty nice.

TT Got bought very quickly and was a nice bit of change. But I was very into working for a smaller company so it wound up kind of shifting around very quickly thereafter that acquisition occurred.

RD So you spent a lot of time in search. What interests you about search and search algorithms and all that?

TT Yeah. I think for me, I have a knack for data. I am constantly, as I kind of roam through the world, looking at what types of data you can be gathering on an individual and how that data can be used to get at a person, whether that's through advertising, marketing, or even from a cybersecurity standpoint. And that was kind of another big side passion of mine and still is– looking at how data can be used for both beneficial purposes but also nefarious purposes, and then how do we defend against that and how do we operate in this world where so much data is being collected about us, even from your phone. I joke all the time with my friends that Instagram has my number because I get the targeted ads and I think I buy more things from Instagram than I buy from anywhere else. They've really honed in, “Hey, if we ship this ad to Tim, he's probably going to buy it or we at least have a highly likely shot of him looking at it versus other platforms in general.” So it's kind of a very interesting thing, and I know this, I know what those behaviors are, and sometimes I even try to do something weird just to defeat the system. But these systems and the algorithms are hyper-interesting. So the thing that's interesting to me is really more about how we can use data both for good and bad, just because it's hyper-interesting to me and how we dive in.

BP Right. Not to do a shameless plug for our own editorial, but we published a piece yesterday about how Stack Overflow is evolving from being a purely lexical search company, and we, also as you mentioned, worked with Elastic to try to do a hybrid of semantic and lexical search. And hybrid in that sense meaning, where we can, looking to semantic because it can have some advantages if you want to start to incorporate some of what you're doing with Gen AI. What’s your perspective on some of the changes that have been happening in the search world recently? Even at the major players like Google, 10 blue links may not be the future. As they said at IO, they're thinking about other ways to show people information.

TT I tend to think these days that I'm glad everybody is starting to catch up and get to this area now, because one of the things that we've been working on for a while is this conversational analytics. When people are searching for data, they're not just searching for data. There was a stat a while back about how in the workplace people spend 29% of their workweek just searching for the data that they need, and that's before they even get to drilling into the actual answers. How do I find the record or the document that's going to have what I need, but now I've got to go and drill into that and find the answer or get the value that I need out of it. So I am glad to see all of these things start to change, and you even see them in Google today. As you search, it pops up with questions that people may have wanted to ask and the raw answers where it's extracting that. So I love this concept and it's actually a thing that I'm very passionate about, myself. I think semantic layers are things that have been around for a while, they've become a lot more efficient, the models have become a lot easier to deploy for these types of things. We still have the scalability issue of training on the right types of data so that you can find things at a relatively cheap cost, but if you can leverage existing models, and this is one of the things that OpenAI did with the launch of ChatGPT, we now see a lot more consumers getting into this space and understanding, “Oh, here's some interesting ways that I can come and actually ask questions and get answers to data.” So yes, I do think we're going to move away from just search to answer finding, and that semantic layer is going to be a key piece of that. I'd also say that we've seen a pretty big rise in the last several months of vector databases. That becomes a huge piece of all of this for how we find things that are similar to what we're looking for to actually dive in in a very effective and intuitive way from an engineering standpoint.

BP Yeah, that was one of the more interesting things that came out of talking to our engineers, that there are situations in which a vector database and semantic search is way more useful, and there are, often in a Stack Overflow context when somebody is dropping three keywords or the exact text of an error message, where lexical wins out. And so it's not like either/or is better, they each have their own advantages. But I'll have to talk to you after the podcast about how much time you said people spend each week looking for answers. We need you to come do some plugs for Stack Overflow for Teams for our knowledge base. That's exactly what we need, to talk about how frustrated people are looking for the right knowledge.

CW It reminds me, something I've noticed in Discord channels and amongst different chat applications where I talk with friends, people have started adding keywords around links that they share and stuff because they know that they're going to be searching for it later. They'll be just like, “Oh, this is the one where it touches on Twitter turning into X and how it affects Google search results,” or something like that. And they add all those keywords so that they can look for it later.

TT It's interesting because it makes me think about how if you think about metadata for search results and things like that, that metadata can be used for that semantic layer for that future searching, and especially with Stack Overflow, avid user, and I think every developer in the world will tell you that. But especially when you're thinking about Stack Overflow, you're looking for an answer, and usually it's because I ran into this weird error message, but every now and then it's because you've hit this weird edge case, and those weird edge cases are always the hardest ones to find. I know that I've responded to a couple of posts before, or I've even actually authored a couple of posts on Stack Overflow where it was something so obscure that I was like, “You know what? I've been here before. I couldn't find the answer, but I found a solution on my own. Let me post what I did in case somebody happens to come across this, in case they happen to see this thing.” Because it's so frustrating when you can't find it.

BP That's truly when you're getting karma. That's when you're adding to the universe a little bit of knowledge that wasn't out there or wasn't public. And I do think it's interesting –we'll see the next couple of months as some of our messaging evolves– there’s all this hullabaloo about Stack Overflow traffic, a lot of which is false. But I think there may be an interesting situation in which the edge cases, the new questions, the things that have to do with languages and technologies that are evolving, those are always the kinds of questions we wanted on Stack Overflow, not the duplicate question about homework or something that's been asked a million times and you didn't find it. And so in some ways semantic search almost solves a certain problem, which is like, “If there is an answer, it'll be easier to find it and to understand it. You'll get it back in this conversational format. And for the edge cases or the undiscovered problems, the things that are brand new, now we can spend more time focusing on that.”

TT Absolutely. Also it has me wondering, because if you think about the number of answers you get on a given post, sometimes it'll be a simple question and you've got the right selected answer, but then there's all the other answers of, “Here's another way to do it, or here's another way to optimize the solution,” or years later, “That worked a while back, but this is actually the more relevant solution now.” I also look at that and think about Gen AI and some of these things. Could that data be used for inventing new and creative solutions in a Gen AI fashion? That becomes a hyper-interesting potential application of that type of data.

BP Oh yeah, for sure. We chatted a little bit about this and we published a piece called Self-Healing Code, but the idea that the system is scanning the answers every couple of weeks, and if something is out of date or there's a better answer that it finds somewhere else, it can go in and do that process of updating or pointing to the right thing, whereas a lot of times people do have frustrations with finding an accepted answer that's stale and three answers down is the one that's like, “Wait! As of version 3.4 you actually need to do this.”

TT Absolutely.

RD So you're a big data nerd, that's how you got into search, is that right?

TT That's right.

RD Yeah. So you want to help organizations do more with their data. What's the more that organizations can do? Because I know a lot of people are treating data as the new oil.

TT Yep. And that's been the old saying, and it's a really interesting saying. I've given a couple of talks on this before, but if you really look at it, people are collecting mass amounts of data and everyone claims they want to be data-driven but no one's actually using that data to make real decisions. And when they are actually trying to use that data, it's usually hard for them to get from this raw data to an accurate business decision. And a lot of that boils down to this big gap you have in who has access to the data and who has access to ask the right questions of that data. So you'll have these subject matter experts that know the business, know how to operate, know what they’re doing, and then you've got your team of data scientists or your team of data engineers that don't always have the business context. And that's one of the things that, myself, my team, we've all been very keen that every time we get involved in engagement, I need to understand your business before I even start talking tech, before I even start coming up with a solution, because if I don't understand your business I'm going to build the wrong solution. So what we're trying to do with ClearQuery is really help democratize this ability for people to ask the right questions of their data so that you really get the subject matter experts asking those questions instead of having to go through a team of engineers to go and build those things. What that does for us, and this is really why we started the company, myself and my co-founder found ourselves in this role where I was literally playing the middleman. We had analysts coming to us asking questions. I’d go run queries against a supercomputer, massage that data, come back, get them answers, and it became this repetitive cycle of answering a lot of the same questions, and I think every developer listening to this will tell you the same thing– we are all lazy as hell. If I have to do something more than once, I'm going to automate it. I'm going to write a script and I'm going to use that script from now until the end of eternity to not have to do that thing again. But that was the thing we kept running into, and even with these scripts, it was great that we could do that but I still had to be the one to go and press those buttons. So we really took this big step back and said, “Well, what if we didn't have to be in the middle? What if people could get the simple answers to the simple things that they want to know very rapidly on their own, as simple as asking questions in natural language?” Think of it as Siri meets your data type thing, and this is where we are now with Gen AI. Everyone is starting to see it a lot more than it's been seen before, but this is where we started six years ago with, “How do we enable this conversational analytics thing for non-technical users?” And what that does for us as engineers is it frees our time up so that we can go work on harder, more interesting problems and find more creative ways to leverage that data to drive things forward overall.

RD Yeah, that seems like the path for a lot of things in computer science; you just get rid of the boilerplate. I think it's what we do with Stack Overflow and Stack Overflow for Teams. It's like, “Somebody has answered this question somewhere. Make it searchable.”

TT Absolutely. And this is the other thing, once you get data into the hands of the right people, humans can be a lot more creative. I think there's always been this constant fear of, “Oh my God. Computers are going to take my job, machines are going to take it away. If they can do these things, what am I going to do?” Well, what it does is it frees you up to be more creative and have a little bit more intuition about what it actually means. One of the things that I always see is that people tend to think correlation equals causation, and that's not necessarily true. You need a subject matter expert to really understand what's going on there to really determine what this correlation actually means. Is this the cause or is there something else that we can drill into? And if you give that to the right people, they can really help find that a bit more and move that along the way. So I really think democratizing analytics, democratizing data in general, just gives us more room to be creative at a faster pace so that we can get to those answers faster.

BP Right. And I think another big thing we're thinking about at Stack Overflow, which we talked about in an announcement at We Are Developers recently, is the accessibility of the data as it pertains to the training of AI systems. If the AI is training on Wikipedia or Stack Overflow or Reddit, now people are becoming concerned. Should that be something we're charging for? Is this actually licensed for you to ingest the data, learn the AI, and then charge for what the AI puts out? And I think, to your point, Tim, our ethos has been that the data needs to continue to be public like Stack Overflow data dumps have always been, and if you're going to get an answer from the AI, you should be able to understand the attribution. Who put this knowledge in there, where did it come from, can you cite the source, can I go look at the post and see how this code is licensed, et cetera? That prevents this kind of black box in which we might end up with less and less training data. More and more the AI is just telling people what they need to know, and less and less original material, like you pointed out, your edge cases, are being created, and that could be a tragedy of the commons.

TT Yeah, that's an interesting point and it almost begs this question of if we wind up with new licenses for data in a weird way where, kind of like you just mentioned here with this attribution piece, yes, you can use these Stack Overflow posts for training, but you have to source what pieces or who the authors were of the original content that this AI generated, which also means we now have to build AI that isn't so much of a black box. We have to understand exactly what it's doing and how it's getting there, and that's a big gap that I think we've had for the last 10 years of this development in machine learning AI algorithms.

RD I mean, if I remember correctly, I think Stack Overflow data is Creative Commons version 4, which requires attribution, it just hasn’t been.

TT Gotcha.

CW I think we're seeing that with a lot of AI tools where people are, for example, using GitHub Copilot and they're like, “What license do I have with this? Am I allowed to just use this, or what should I do?” And I feel like that is a very big thing that businesses in particular have to figure out with what we are allowed to use in our paid products.

TT Absolutely. That is such a nightmare. Licensing is always terrible. Every time you pull in an open source product one of the first things we have to look at is what license is this under before we even go down the path of seeing if it would work, because we sell a paid product and if we're using something that doesn't have a permissible license then we need to go find another solution or create something else ourselves which is always a big challenge in and of itself.

CW Yeah, it gets hairy.

TT Very hairy.

BP So as you look out, are there things that you're working on either on the search side, the AI side, the data side, that you're excited about for the next year? What are you hoping to accomplish over the next year? And if there are ways that people can come check out some of that work, where should they go?

TT Yeah, absolutely. So a couple things that we're working on and we are looking at are how we can leverage language models to make our stuff more effective. One of the biggest questions that we always get is, “How well do you handle unstructured text?” And our answer is, “Look, unstructured text is great. We can bring unstructured text in. However, we're really going to need to do some data engineering work on that to do entity extraction and identify core concepts and themes so that you can do analytics on it.” We're starting to use language models now to do that automatically on the fly for bodies of text for users so that we're going to automatically generate that so you can start asking questions. But one of the other ways that we're starting to use language models is for that standard question answering. So if you're asking a question that is not necessarily an analytic-based question but, “I'm looking for a pure fact that's buried somewhere here in this data. How do I get that Q&A response back from ClearQuery in a more effective way?” So those are some of the things that I'm really excited about. I'd also say one of the other big ones is really this decision intelligence thing, so how do we help you optimize the decisions that you're making? How do we help you determine, “I want to reach a certain goal, I want to increase our revenue by 30%. What things should I change to make sure that we're able to do that? Do I need to sell more extra large t-shirts or is it that people like the ones that have a SpaceX X on it or something like that, or they love the Twitter Blue logo. Whatever that is, how do I make the right decisions? How do I change the right factors in my business to impact and get me towards that particular goal?” So those are some of the big things that we're looking at in the next quarter or two that I'm really excited about launching in particular. One of the great things about ClearQuery is that we have a free tier. If you go to clearquery.io, you can sign up right there. That free tier is purely a free tier. It has all of our features, the only limitation is how much data you can put in and the number of users that can touch it. But other than that, you get access to every capability and you can try it out there. And then we have a variety of tiers on our SaaS version. We also deploy behind firewall, and that's really where the bread and butter of our business comes from– how do you deploy this behind firewall so it's safe and secure and we're not using third party APIs and having to ship off our data? That's been one of the key things for us, and I think differentiates us a lot in this market space. We've been doing this conversational AI thing for a while and I’ve started to see some of our competitors start to add it in, but they're using OpenAI APIs which means that my data's getting shipped off, questions are getting shipped somewhere else, and sometimes that's more damaging than even the data itself because now you're giving up some of your competitive advantage. So a couple of different ways for you to try that out. Definitely happy to chat with anyone if they're interested in exploring that further.

CW Awesome.

BP Terrific.

[music plays]

BP All right, everybody. It is that time of the show. Let's shout out the winner of a Lifeboat Badge: someone who came on Stack Overflow, answered a question, saved a little knowledge from the dustbin of history. Awarded two days ago to Boann, “How do I sort four numbers without an array?” If you're interested, we've got an answer for you. And thanks to Boann, you've helped over 23,000 people. I am Ben Popper, the Director of Content here at Stack Overflow. You can always find me on X @BenPopper. You can always email us, podcast@stackoverflow.com. And if you like the show, leave us a rating and a review, because it really helps.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And if you want to reach out to me, I probably still check my DMs on X.

CW My name is Cassidy. You can find me @Cassidoo on most things, and I'm the CTO over at Contenda.

TT My name's Tim Tutt. I am the CEO and co-founder at Night Shift Development. Like I said, if you're interested in checking out ClearQuery, www.clearquery.io. You can find me @TimTutt on most things, with the exception of X where I am still @TimFTutt. I have not been able to steal the original yet, but everywhere else you can find me @TimTutt.

BP All right, everybody. Thanks for listening.

[outro music plays]