Research scientist John Flournoy sits down with Ryan and Eira to dive into the recent research around developer experience, including the nuances of measuring productivity, the potential reasons for variability in developer performance, and the impacts of collaboration and competition on developer efficiency.
Read more findings from the Developer Success Lab here.
Find John at his website and connect with him on LinkedIn.
Congrats to user Matthieu M. who won a Populist badge for answering the question struct with 2 cells vs std::pair?.
[intro music plays]
Ryan Donovan: Hello everyone and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ryan Donovan, and I'm joined today by my co-host Eira May. How are you doing Eira?
Eira May: Not too bad. Not too bad.
RD: Yeah. So today we are gonna be talking about developer productivity and experience, but backed by science, by research. This is a guest you brought in. Can you give us a little introduction?
EM: Yeah. So John Flournoy is a social science researcher and consultant. He's here to talk with us about some of the research that he's been doing with the Developer Success Lab at Pluralsight and some of the takeaways from that research for developers and for their managers.
RD: Excellent. Well, John, welcome to the show -
John Flournoy: Hi. Thank you so much for having me.
RD: So, top of the show, we'd like to find out how our guests got into software and technology.
JF: Yeah, well, it's kind of a roundabout story in a way. It also starts when I was pretty young. I think my dad gave me a book that had something to do about hackers and I got really interested in shadowy characters and went to, back in the day, maybe Borders bookstore and like stood there and read a book about how to program on C++ or maybe even just C at the time. And from there, got more and more interested in programming, met folks in high school who were interested in programming. They happen to have a high school programming course. And ended up going to college doing a degree in cognitive science, which you always have to do a bit of programming in. You can also, you know, emphasize in that, learned about the previous iteration of artificial intelligence that was much more focused on algorithms than specific, than like a broad, deep learning approach. But stayed in psychology for a while. And it wasn't until I was partway through grad school that I realized that more than psychology research, I was doing a lot of what I now think of as research computing infrastructure, and that became kind of the focus of my work postgraduate school, working with other scientists. And then, when I met Kat, things kind of clicked into place for me. So I'd been working in an adolescent development research lab doing statistics, and this was a neuroscience lab, so a lot of big data stuff on high performance computing systems, using like [inaudible] batch systems… and, yeah, go ahead -
RD: Oh, I was gonna say, Kat Hicks is a Kat in this case who's been on this program before.
JF: That's right. Yeah. Kat Hicks from the developer, right -
EM: Friend of the show, Kat Hicks -
JF: So we met on Twitter back in the day and she and I started to talk about building out research infrastructure in her lab. And at first I was like, well, that's interesting. Total departure from adolescence neuroscience. But you know, when I thought back to my history and love of computing, and what I had actually been doing kind of on a day-to-day level in science, I was like, ‘oh, this kind of makes sense’. I actually would love to help learn about how software developers, infrastructure workers, actually do their jobs better and happier.
EM: Yeah. That was one of the things that really interested me in asking you about, not to just jump right in, but I guess sort of to to jump right in -
JF: Yeah -
EM: Is that Kat had mentioned that you had a background in adolescent wellbeing and health and that that had an influence on some of the, whether it was the research that you directed that you wanted to focus on, or whether it was just the lens through which you interpreted some of these issues. I was just curious how that kind of shapes your approach.
JF: Yeah. So my understanding of what wellness and wellbeing means changed a lot throughout my career as an adolescent developmental neuroscientist. One of the things I realized was that pinning down the meaning of that term by itself, we have this sort of intuitive understanding of what wellbeing means, but when you actually try to put that to a set of questions that you ask a population or an objective metric like cycle time or number of risky sexual encounters for adolescents, like you start to understand that if things are not so cut and dried. And that was a big realization I had in terms of: we need to think deeply about what this term means. And it might mean more than one thing.
RD: Yeah. I think what is good in life has been a classic problem for philosophy.
JF: This is a deep and longstanding philosophical problem. Yes. And it, it actually is directly relevant to how you actually are well at work.
EM: Well, good thing we are gonna answer it definitively on this podcast today.
JF: Absolutely. By the end of the show we should totally have that answer with us.
RD: [Laughter] Well, yeah. Today we're talking about developer happiness and performance, and you have a pre-print that you've shared with us that says that a lot of folks who are doing the sort of simple solutions, the stack ranking, the cycle time stuff, they're doing it wrong. Can you tell us about that?
JF: Yes, that is a very definitive way to put it. I would say perhaps that they're using one of the most accessible tools that they have, which is just how long it takes to, you know, from ticket open to ticket close. That's what we used in this case. There's some variations on it and like the door metrics and so on, but that's what we used. It's really accessible. It's something that you can track over time easily. You don't even have to ask anybody about it. But we didn't necessarily even start with this in mind when we started this paper, but what we ended up finding with this data is that it's quite variable. And it's difficult to easily take one number from that kind of metric and use it to capture even just the fullness of a picture, of the metric itself of cycle time much less than think about a developer's overall productivity. Which is another complicated term to pin down, which we might talk about later, would be fun. So I wouldn't say they're doing it wrong. I would say it's messy and people are probably working as best they can in that messy space.
RD: It is part of one of my favorite hobbies asking PhD researchers a yes or no question [laughter] -
JF: Absolutely. I'll try to give you some yes or no answers, but it is messy as yet. That's always the go-to answer.
EM: You said something about people are focusing on cycle time because it's an accessible metric. It's just like an easy thing to grab. It's kind of intuitively seems like it should map to developer productivity. Do you think that's kind of why people are focusing so much on that? Why that's been kind of the default? -
JF: Yeah, that is my intuition. I would say it's speculation. We should probably ask people. It's also really directly tied to something that they experience every day, which is, it feels good to close a ticket, so if you close more faster, that seems like a good thing.
RD: And I think people have done adjustments to that, doing things like story points. But story points is almost a vibe based estimation.
JF: The thing that we ended up recommending in this paper is you have cycle time. As we know, software development is a creative profession, which is obliged by its technical nature. If we can bring some of that vibe based intuition in order to compare like with like, so take cycle time comparisons from tickets that people have maybe, and this is where you start getting into more manual labor, which is a bit, you know, unfortunate. But if we can group things, I mean story points seems like a fine way to do it. But if we can start comparing like to like using those vibe based categories, it might help people get a better handle on their own throughput.
EM: Did the findings surprise you?
JF: That's a good question.
EM: I know that's a big question.
John Flournoy: Yeah. There's always something new that you learn. So when I came onto this project, it was pretty clear that a lot of folks would expect things like giving developers more time to code should allow them to close tickets faster. So in that sense, we found this effect. This is, you know, kind of a validation that the less time you spend in meetings up to an extent perhaps, and the more time you spend coding, the more tickets you're going to close. In the sense it's a strong effect in these data, meaning that we estimate it, that it is very precisely positive. However, it is not a huge influence on cycle time, in part because there's a huge spread of what any one ticket might consist of. So you have that variability and that variability doesn't go away when you start to account for a number of days, people have to code. So in that sense, that was not surprising. What was a little bit surprising was the size of that effect, that it was so small and that there was so much variability, not just between people, which is actually swamped entirely, almost, by the amount of variation within person month to month. So your average cycle time, one month might be four weeks, and the next month it goes down to one week, just depending on the nature of the work that you're doing from week to week. And then of course, this also varies quite a bit by organization. And that I think, the surprising aspect of that was that there was so much variability in this sense in those different, when you break things out in those different ways, we expect obviously some differences individually and day to day. The real surprising thing was how much of that was within person, I would say where you see these huge fluctuations in a person's workflow from month to month, and that that's really the biggest source. So it's not necessarily what an organization is doing. It's not necessarily any particular aspect of a particular person, but it's really driven by, this is speculation now, it seems to be driven perhaps by the context of the work that you're doing. Either how you're doing the work that changes or the kind of work that you're doing changes. And I think that's the thing this paper provides a lot of, to me, avenues for really rich future investigation, which is like, let's really get to the bottom of what is going on day to day. Like let's map these processes out. Let's get in the rooms with developers and start building up a model of what this works like day to day so that we can account for as much of these external factors as possible. And it has to be somewhat simplified, but a much more detailed process model than I think most people are using now. If we have that kind of process model, we can start to either measure and observe these effects in a way that gives us higher fidelity, what really matters, and we can start finding the pressure points in a much more refined way.
RD: Yeah, the background is sort of like the mysterious developer experience that a lot of people are investing in. How does your developer feel today? Are they happy? Are they sad? Was it raining? And it seems like measuring a person is again, one of those fundamental chores of psychology philosophy -
JF: Absolutely -
RD: And the humanities. What are the sort of things that you think you could research in a sort of quantitative way?
JF: One thing to note about these kinds of metrics where you're asking for vibes, for lack of a better word, and this is yes, psychologists are famous for doing, and I would say being able to do. I think that there's research that shows that people are really good actually at summarizing across their experiences. You see this in personality research, for example, people are really good at perceiving other people's personalities just from a snapshot of their room. They're really good and accurate at being able to know who they are from their lived experience and report on that accurately, you know, whether someone is talkative or not talkative. We're really good at summarizing over our own experiences and picking up on cues in a really quick way. So I think that's where the strength of those kinds of developer experience metrics comes from. Is that we are ourselves as humans, like pretty good instruments. However, we have their flaws and there are difficulties. Language is notoriously elastic. You could fit a lot of different meanings into the same word. And people also have their biases. So like you mentioned, somebody might be having a bad day for other reasons, and it really influences how they see their current productivity. And so like you're mixing a lot of noise into that signal. The power is that people are summarizing over a lot of experience, kind of like we do in this paper, using like average cycle time. But there's a place for it because it does also capture, like it mixes maybe the signal of productivity and general happiness, but it's difficult to know what part of that you're moving around when you make a change in the developer's environment or when you're asking the developer to make a change in how they work. You might be pushing around something that’s not actually changing. Like if you want to be changing how fast the work is done or the quality of work and you're using one of those developer experience metrics, it's possible that you are pushing not those things that you care about, rather pushing around something about just their perceptions of their work, which is fine if you care about developers' happiness, which you should. Those are great things to push around to, but it would be nice to know which is which.
RD: You mentioned the Dora metrics earlier, we chatted about this a little bit beforehand. How does that factor into cycle time? How does it factor and compare?
JF: Yeah, so my understanding of Dora metrics, and again just to emphasize that I've only recently come to this broader field in terms of the content of it so I know a bit about the ways in which folks have measured productivity in the past, but I'm by no means an expert. But my understanding of Dora is that there's some definitions on the Dora.Dev website. People tend to change around these metrics depending on their working environment, what works for them. We're measuring ticket open to ticket close, which is, you know, through a ticketing system like Jira. And that means that in this paper, we're actually capturing not just software development tickets, like not just code, like code command kinds of tickets, but we're capturing other kinds of work that software developers are doing. Dora is really focused on things that are specifically about code. And so that's one difference. But really they're similar in that it's like the time from when you start something to when you end something. The thing that's nice about one of the door metrics, I think it's deployment frequency is like what's the time between when something is changed and when does it get to the actual product? And that's something, in our case, we're measuring very narrowly a ticket open to ticket close, but that doesn't tell us anything about the effect downstream on an actual product release.
RD: I think that change lead time, the one you're talking about, deployment frequency is one of 'em, but it's how often application changes go in -
JF: Mm-hmm. Yeah -
RD: It's a similar -
JF: Yeah, they're both throughput, right? And I think that's actually part of that process model that I was talking about. One of the things when we talk about productivity, one reason people don't like cycle time, and I'm totally on board with this, is that it's like, okay, this is just a small part of the whole picture and it doesn't quite get to the user experience, which is ultimately kind of what we care about. You know, depending on who the users are. But we wanna deliver value to the people who are paying for what we're doing, right? And that is what a process model would give you. It would give you like all of these steps and the extent to which it's contributing to that end goal. And I know I've talked to software developers who they're thinking about this in their minds. They're like, I'm being productive in a way that I think is going to benefit the user. They're more or less in conversation with the business folks who also have a read on that social piece is, I think, a really interesting part of that process.
RD: Yeah. Speaking of the business results of this, another thing mentioned was something like stack ranking, which affects the sort of business effects of productivity measurement, things like compensation, but also the upper out mentalities where you cut the bottom person.
JF: We chatted a little bit about this beforehand. I mean, socially that just sounds like a recipe for Miserliness, like that competitive mindset like it is useful in some contexts in a day-to-day business environment, and this is primarily my opinion as a social scientist. It would be really hard socially, and we actually have some evidence from the Developer Success Lab that these kinds of competitive environments lead to all sorts of practices that are not conducive to good outcomes in terms of the product. Interestingly, one of the other things that we see in the cycle time paper is the effect of collaboration is useful for speeding things up. So we measure this by how many people are involved in a particular ticket. And when more people are involved and when there's collaboration, we see that cycle time is reduced when more people are sharing kind of the burden of a particular pr, for example. We also see conversely that when there are more people involved in a discussion about a particular ticket, so this is not like there's not adding code, but it's like when people are leaving comments, we see much slower cycle times. And this is an interesting conundrum because we can't be sure like what's really going on here. It seems like the story I have in my head is like when people are collaborating, again which would be in a non-competitive environment where they're not worried about being fired 'cause they're not working on their own ticket. When people are being collaborative in terms of the code commits, we see that this speeds up work, when they're being particularly chatty over a pr what I think that's indicating is that there is a problem to be solved and people aren't really sure how to solve it. So they're collaborating on using language in this particular context. So that would be my guess. Going back, just the idea of stack ranking. The other aspect of this paper that is really relevant is that we'd really have to be quite careful if you actually wanted to implement stack ranking and get some kind of valid estimate of performance, you would have to be really careful about how you do that. Like I said, what we see in one of these figures, I think it's go to the paper, go to figure 13, you know, listeners at home. What we see is that people's variability from month to month, it makes them overlap. You do see people who tend to have lower or higher cycle times, but what you see is that those folks with much faster cycle times, their cycle times will overlap people who tend to have slower cycle times. And this is over the course of a year, that you can start to pick this up. Like I was saying before, we don't know, and I don't necessarily think that engineering managers aren't necessarily always aware of these nuances. Like we don't know what that kind of work is and what's contributing to the quickness or the slowness of those cycles, whether it's the kind of work that is being done in those cases. And so it's like one of the big take homes here is if you're gonna do this kind of comparison, you really need to have a system in place for comparing like with like, which is really hard in a creative profession like this, where each, for the most part, there are exceptions of course, but there are very few rote tasks.
EM: I was just wondering if there were other highlights or things that you wanted to pull out of the paper to emphasize that you'd want folks to take away from the research? I mean, I know we talked quite a lot about the systems level thinking that needs to change and like rethinking how we approach these metrics, but what else in there do you wanna highlight for folks?
JF: So we do have section 6.1 in this paper. We've really tried to tailor part of the discussion to directly speak to practitioners. And, because we know that this research, we hope that it gives something to this field academically in the sense that somebody can go back into five or 10 years and see kind of what this evidence tells us in a way that's durable over the long term, but like it's directly applicable. So I would say the main thing is that. If you're a practitioner, I would just say use this research to feel justified in seeking more shared and environmental explanations for the speed of your work. If you're getting input from managers on your cycle time or a metric like that. And managers, I would have the same message to you, try to really understand what is going into each ticket and what are the influences that might make things take quicker or slower, that would make the work go quicker or slower. And I think the other thing is if you are going to start tracking this stuff, try to develop a system that compares like with like. I've said that a few times during this chat, but I think that is the hard problem here and it's kind of the next step for people who are implementing this stuff.
RD: Will you be following up on this research yourself? Do you have next steps?
JF: The Kat Hicks Developer Success Lab was until recently at Pluralsight and recently dissolved the lab. So we are all kind of looking for what is the next thing. If I'm in a position to follow up on this research, I would absolutely love to. One utility of this being in the public domain this research is that others can also follow up on it. So there are a lot of ways to go with this and I think there's a lot of interest in developer experience and productivity written broadly. And there are groups out there that are doing really good work on it, and I would love to help 'em out.
EM: Let me ask one more question. I'm just curious, if you did have the opportunity to sequel as it were, what would be the next question that you'd wanna zero in on? I mean, what would be top of your, I know there are probably a lot of follow-up questions, but if you had to pick the top few?
JF: This is a big challenge of the research process because one only has time to do so many things. So in this case, I would want to start building out this more robust and detailed model of software development. And I think honestly it would start with some qualitative or mixed methods research, observing software teams or interviewing software teams to start to build up this model. And it's actually, when I say model, it might be, you know, boxes and arrows, but I think we could start developing something that's much more of a mathematical model, a formal model here. But it would have to start, I don't think we can just throw our intuitive theories at this, I think we need to start from some more detailed, empirical observations directly tied to the day-to-day of folks.
RD: Well, ladies and gentlemen, it's that time of the show where we shout out somebody who came on to Stack Overflow, dropped some knowledge, shared some curiosity, and won a badge. Today we're shouting out the winner of a populist badge. Somebody who dropped an answer that was so good, it outscored the accepted answer. Today's winner is Matthew M for answering struck with two cells verse standard pair. It's almost a question, but if you're curious, we'll have the answer for you. I am Ryan Donovan. I edit the blog. Host the podcast here at Stack Overflow. If you have questions, concerns, topics, or trends to share with us, you can email us at podcast@stackoverflow.com. And if you wanna reach out to me directly, you can find me on LinkedIn.
EM: My name is Eira May. I am the B2B editor at Stack Overflow. You can also find me on LinkedIn, or you can find me at the podcast, email podcast@stackoverflow.com.
JF: My name is John Flournoy, Social Science Research consultant. You can find me on the internet at johnflournoy.science
RD: All right. Thank you for listening, everyone, and we'll talk to you next time.