The Stack Overflow Podcast

Information foraging: the tricks great developers use to find solutions

Episode Summary

We chat with Austin Henley, assistant professor of computer science at the University of Tennessee in Knoxville. Prof. Henley has been studying the ways in which developers seek out the information they need to solve problems, debug code, or write new applications.

Episode Notes

You can check out some more of Henley's work on his blog here. Recent pieces include: 

How much time does the average developer spend typing in their editor versus researching, exploring, and pondering? Henley believes half an hour of inputting actual code a day is realistic, despite what you've heard about the 10X developer in your area. 

Episode Transcription

Austin Henley There are some data sets out there of like, basically key log data from developers. And you know, there's really not much editing going on for most developers. We hear about these 10x developers that are just extremely productive. But it seems like if you have 15 to 30 minutes of actual editing in a day, like you're actually being pretty productive just because like you said, you're spending so much time tracking down a bug trying to understand whether you're going to break something else, or you know, being interrupted and such.

[intro music]

Ben Popper This episode is sponsored by Circle CI. Designed for modern software teams, Circle CI's continuous integration and delivery platform helps developers push code with confidence. Trusted by 1000s of companies, from four person startups to Fortune 500 businesses, Circle CI helps teams take their software from idea to delivery quickly, safely and at scale. Visit circleci.com/overflow to learn why high performing DevOps teams use Circle CI to automate and accelerate their CI/CD pipelines.

BP Hello, everybody! Welcome to the Stack Overflow Podcast, a place to talk all things software technology, knowledge, question and answer, sharing community online content platform creator, did I get all the buzzwords? I got most of them. I am Ben Popper, Director of Content here at Stack Overflow. And I'm joined today by my colleague, Ryan Donovan. Hi, Ryan. 

Ryan Donovan Hey, Ben, how you doing? 

BP Good. Thanks. I think you're the one to set the stage today. our episode today is about information foraging. People do a lot of that on Stack Overflow, they forage, they copy, they paste they prosper. How did you come across this topic? And how does it connect to the guest we have on today?

RD So I think one of our engineers posted this in one of Slack channels. Really interesting way to look at how engineers look at getting information out of whatever tool, whatever website they have. I've definitely, you know, looking for information on Stack Overflow, treated it as like a hunting grounds. So I'm interested to see how that applies to us, you know, as a budding narcissist. 

BP Hunting, not foraging, hunting and gathering? Little bit of both.

RD Information predators.

BP Exactly. So our guest today is Austin Henley, a Professor of Computer Science at the University of Tennessee. Welcome, Austin.

AH Hello. Thank you for having me.

BP Oh, yeah, definitely. So Austin, I guess, yeah, tell us a little bit about how you got into computer science and into, you know, the world of academia, teaching bright young minds about this stuff. And then where the idea came from, for information foraging, and to what degree you know, that sort of, I guess, maybe a focus for you when it comes to research.

AH Yeah, so I got into computer science quite young. I just wanted to make websites for my favorite video games and favorite TV shows, I was quite young, and actually really, really had a hard time learning even HTML. And so I started diving in headfirst by trying to build my own compiler. And so like, you know, that took me eight years, maybe, but I learned a lot along the way. And yeah, really got into computer science. And then eventually, of course, went to college for it. And at graduation, didn't know what to do, like most people, so went to grad school. And I was always really into productivity, and decision making, and how to just optimize for everything, even you know, what's the fewest number of clicks on the microwave that I can do to get the desired outcome? [Ben chuckles] I started getting into research and started collaborating with some people that were working on this theory called information foraging theory. And they were applying it to understand how software developers make decisions, how they find information, whether it be on Stack Overflow, or inside of their code, inside of their Slack channel, just all the decision making that goes to find the relevant information.

BP So just to back up, I want to know what games and TV shows build websites for? If you ever got that far.

AH Oh, no. [Austin laughs] Yeah, so the very first one, I think it was in fifth grade, and Dragonball Z. So this was, I guess, I don't know, 1998 or so. And Dragonball Z, Pokemon were hitting the US and were quite popular.

BP It's interesting. We had a guest on the other week who's doing some work on community building tools. And he was saying that his entree into the world of technology was a forum for Tony Hawk Pro Skater 4, that was his first experience doing online community management. So I think for a lot of people, yeah, who grew up in the late 80s. And into the 90s. You know, these were the things you were passionate about the things you did with your friends and things that were social. And then that was an on ramp if you wanted to go deeper to learn about technology. So obviously, yeah, we care a lot about productivity here. And knowledge sharing, you know, we have Stack Overflow the public platform, we also have Stack Overflow for Teams just to like let people do that inside of a company with their proprietary code. What are some of the things you've discovered when it comes to sharing foraging? Are there approaches to it that work or don't work? Are there ways that in the world of engineering seem to do it that's different than perhaps other careers?

AH Yeah, so we really studied it in the context of source code. And so developers spend a large portion of their time just looking for information in code or in documentation. And so we were trying to model how they go about doing that, when they say they're on Stack Overflow and they see a couple search results that are relevant to them, how do they decide to go to one over the other, you know, are they gonna search all of it, that's, that's probably not going to happen, you'll, you'll spend forever, and you get into the madness of opening too many tabs, and so forth. And so yeah, we ran a lot of studies where we took professional engineers into the lab, and we would have them debug open source code. And we would try to tease out their decision making that's going on while they're, they're looking at code or documentation, one of the things we would do is we would actually play back their video of from them working, we would jump to certain points in the video and ask them, what were you looking for in this point of time? And try to get the rationale as to, you know, why did they click on this hyperlink instead of this other one? And then afterwards, we would ask them, did you find what you thought you were going to find at that location? And so one of the really big results that we found was that developers are often very confident about what they're going to find with their next click. But they oftentimes are led down the wrong path. [Ben laughs] And then they're actually disappointed.

BP We know this, because we're the—what's that old purple link Ryan? 

RD Yeah, that's something we did with last year's developer survey. Yeah, where people would search and they'd find the link was already purple, they'd already clicked on it.

BP Yeah, 25% of the time, that the length that they felt the best about was the one that they had written and already forgotten the answer.

AH Yeah. Yeah. That's a good proxy.

RD Not finding the the information that they thought they would. Is this a question of like, you know, it being open source code that they weren't familiar with?

AH Yeah, it certainly can be. And studies have been done on developers working on their own code, and studies on, you know, code that they're not familiar with. And they kind of run into different issues depending on that. But it's, it's really hard to remember, what does what in code and where it is, and especially when you're working on large code bases, and you know, teammates have been editing the code. And so there's this idea called information set, which is the kind of the clue of what you're going to get when you click on something. So on Stack Overflow, if you go to the homepage, and you see the listing of questions, the clues that you get are the title of the question and the tags. And you know, there's some other helpful things like the number of votes, number of answers. And so the scent comes from, you know, the the text, the keywords that are in that title, and you have to make a decision, you know, is this relevant to me, yes, or no, based on that, and a lot of times that, that's all you need, and you click on it, and you find what you want. But there's also a huge problem that we've all run into where we click on something, we start reading the question, and realize no, this is, this is the same area, but you know, solve something, some other problem, and you have to start backtracking. And that's when a lot of problems arise. 

RD Is this forging a skill that people can build, like getting better at recognizing which titles which tags, which pieces of information are relevant?

AH Yeah, that's a really good point. Because we've interviewed a lot of developers, and every developer kind of has a different strategy that they use for foraging. And some of them are quite elaborate about how they go about and what details that they start picking up on. So one of the really big problems is what some people call the vocabulary problem. You know, variable names, function names are really hard to come up with good descriptive names for, especially in certain domains. Like in GUIs, I see this a lot of graphical user interfaces, there's so many layers of abstraction, that the names are just so meaningless to us. And so we start using a lot of synonyms. And the English meaning is very different than probably the technical meaning of those words. And so people have a lot of hard time differentiating those farther looking for answers or looking for the code that's relevant. And so we saw in one case, they were looking for a way to like clear a buffer. And so they're looking for remove, but then they would see a word like destroy, and remove and destroy are pretty similar. So they start going down this rabbit hole, they see this kill object function, so they go down that path. And yeah, they really, really had a lot of issues with synonyms. And so some of the more senior developers had some really good strategies where they basically don't trust the names, and they have to utilize some other, some other methods. So I definitely think it's a skill that can be learned. I've tried to figure out how to teach undergrads these skills, and I haven't figured it out how to do so yet.

RS I think at a lot of companies, they'll have their own sort of naming convention style guides in that, but it's not consistent across companies. So like you said, you know, destroy, remove, erase, like, what's the one I'm looking for?

BP So yeah, have you classified? I mean, you're saying you've tried to sort of impart this on but like, are there 234 important different styles of foragers and do some prove to be more effective than others in those sort of simulations you ran?

AH Certainly. So there's, there's a lot of evidence about that novices take a bottom up approach, where more senior experts take this top down. So novices like students will often jump into the weeds of code and start reading code line by line, which is just not very effective for getting a mental model of how some code is organized. And experts, they'll look at it at a higher level, they have a lot of preconceived, like design patterns and architectures that they know about. And so they can start looking at the folders and file names to get an initial idea. And they can kind of work down towards the problem. Whereas the novices yeah, they'll jump into the main function and start stepping from there, even though you know, the bug in question might be 100,000 lines of code away.

BP I had another question, which was like, as you, you know, talk to these developers trying to solve you know, a particular puzzle, unpack what the code is, maybe you trying to get familiar with an open source project you want to work on, or you're a new joiner at a company and you you know, have to dive into the code base and understand what all the spaghetti means. Did you get a sense from them of how long in a typical average and productive work day, a coder might spend typing code versus untangling or researching or pondering? You know, I think it's such an interesting question. I've seen a couple of videos on it. But it's sort of like, you know, there's this myth of somebody who goes in and codes for 12 hours a day and spits out you know, what they need but most people who aren't being honest, it's more like, I feel good if I had a three hour block of productivity in the day, maybe three hours in the morning or three hours an afternoon and the rest of the time is spent doing exactly what you're saying, like poking around trying to figure things out, researching, looking at documentation, talking to somebody who was there when this was originally done, like so much of the work, is that archiving and forging you know, that like sort of unearthing work before you can go in and try to build something new.

AH Yeah, so it's really hard to get data on professional developers in the workplace. Because large, large companies often don't like is recording this type of thing. But I have tried to get at it from talking to them. And some there are some data sets out there of like, basically key log data from developers. And you know, there's really not much editing going on for most developers. We hear about these 10x developers that that are just extremely productive. But it seems like if you have 15 to 30 minutes of actual editing in a day, like you're actually being pretty productive, just because like you said, you're spending so much time tracking down a bug, trying to understand whether you're going to break something else, or you know, being interrupted and such.

BP The four day workweek is real, y'all. It's real. [Ryan laughs]

RD It just spread across five days. So I was curious. Also reading this, like, you know, Stack Overflow is a information source, like I'm curious if you know, on the fly, you can talk about how information theory, foraging theory applies to Stack Overflow.

AH Sure, yeah, it definitely applies. So information foraging theory came out of Xerox PARC back in the 90s. And so they were looking at web search results. And what makes people click on a certain link over another and trying to see whether they can predict exactly the what trail or path someone's going to take through a website to answer some questions. And so a Stack Overflow is actually kind of like the ideal scenario, to study that sort of thing. I haven't worked on it directly. But Stack Overflow actually already uses a lot of the design ideas that have come out of information foraging theory. And I can actually give you an example of where I think that information foraging theory, in my perspective, could help me use Stack Overflow even more effectively.

RD Yeah, that'd be great. 

AH So I use Stack Overflow a lot for my problems. A recent one that I had a couple weeks ago, was that I was just trying to install React Native on my new laptop, I got a new work machine trying to set up my dev environments. And usually it's pretty easy to install, React Native, but at every step I was running into a bug or some kind of error. So I would just copy and paste the error from my terminal into search engine and try to see whether Stack Overflow could help me and I had to do this at every single step of trying to get so I was just working through like the Getting Started Guide for React Native. And every single step I would have multiple errors. I fixed one and then I get another and I would go to Stack Overflow. It would solve the problem, but it was this back and forth. And I've seen this in other domains too. And so one of the ideas that I thought that that comes from information foraging, is that so Stack Overflow is solving my immediate problem that I have. But what it's missing is that I'm probably not the only person that's had to go through the sequence of related problems. And so something that I think could help me is if Stack Overflow using its data, would be able to say, you know, 60% of the people that had this problem you're looking at, also had this other problem. And so right now, the sidebar contains recommendations for questions and answers that are very similar to the one that I'm looking at. Maybe alternate perspectives, or just duplicates. But what it doesn't get ot is what problem am I going to have in five minutes?

BP You want the nested thread, of what has done before.

AH Yes!

RD Oh, I think our private teams product has something like that in collections. But it would definitely be interesting to have something like that on the public sites.

AH Yeah. It's almost like a playlist.

BP Yeah, exactly. Can say like, you know, lots of people have been having issues with this when it comes to configuring proxy servers. You know, here are the six questions that people turn to you most often, you know, when they're going through this stuff, you kind of get a better idea for that. Or I think actually, the one you brought up is an even better example. Because this is typically the most popular question on a internal Stack Overflow question and answers, is how do I set up my developer environment? And then everything that you know stems from there.

RD Right.

AH Yeah, so that so that idea comes from a problem we found in our studies for information foraging. We call it the scaling up problem, where tools often help you get to one step away for the information you're looking for. But they're kind of stuck in the weeds, they don't help you get to that higher level problem, or help you get to some problem that's five, six steps away. 

[music]

BP Alright, today we're going to mix things up, we will shout out a Necromancer Badge, will answer a question more than 60 days later, with a score of five or more. So raise a question from the dead, awarded 42 minutes ago to conmak, "Creating a dictionary from a csv file?" Thanks, conmak. Alright, everybody, we're gonna say who we are and where we can be found on the internet if we want to be found. I am Ben Popper, Director of Content here at Stack Overflow. You can always find me on Twitter @BenPopper. You can always email us podcast@stackoverflow.com and if you liked the show, please do leave a rating and review. It really helps. Austin, who are you and where can folks find you on the interwebs?

AH I'm Austin Henley professor at the University of Tennessee. You can find me on Twitter. My handle is @AustinZHenley or on my website austinhenley.com.

RD I'm Ryan Donovan, editor of the blog and newsletter. I lurk on Twitter at @RThorDonovan. And if you have blog posts ideas, you can always email me at pitches@stackoverflow.com

BP If you want to know more about Ryan, don't bother Twitter, just go to our blog and look up his byline and there's lots of good stuff there. Well, thanks everybody for coming on. I'm gonna go forage. I've got about six acres here. So berries, mushrooms, maybe a wild turkey. We'll see.

RD Delicious!

[outro music]