The Stack Overflow Podcast

The problem with the tech debt mindset

Episode Summary

Ryan chats with Jon Bevan, a software engineer currently building the cloud version of Scriptrunner, an Atlassian app, about the concept of tech debt. They explore how tech debt can arise from outdated technology choices, shortcuts, and the need for maintenance work. They also delve into the challenges of upgrading dependencies and the potential scope creep of requirements and features over time.

Episode Notes

Chelsea Troy defines technical debt and maintenance load in her blog post, “Stop saying ‘technical debt.’”

Learn more about technical bankruptcy in this blog post, “Monitoring debt builds up faster than software teams can pay it off.”

Joel Spolsky’s classic blog post on avoiding rewriting code from scratch – Things you should never do, part I.

Technical debt as explained by Ward Cunningham, who coined the term.

Code as an asset, a conversation from Hacker News.

Middleware is the “software glue” that provides services to applications beyond those available from the operating system.

Ratpack framework is a toolkit for creating high performance web applications.

React is a front end javascript library.

jQuery is a JavaScript library designed to simplify HTML.

Questions about functional programming.

User shout out! Nikoksr received the lifeboat badge after answering a question related to math.pow.

Episode Transcription

[intro music plays]

Ryan Donovan Welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, I edit the blog here at Stack Overflow, and I'm joined today by a listener who wrote in and suggested a great topic. Jon Bevan, joining us today to talk all things tech debt. Hi, Jon. How are you doing today?

Jon Bevan Hey. I'm doing well, thank you. Thanks for having me.

RD Our pleasure. So Jon, at the beginning of the show we like to have the guests introduce themselves and talk about how you got into software and technology. What's your origin story?

JB What's my origin story? That sounds like a superhero.

RD How'd you get your superpowers?

JB How did I get my coding superpowers? Well, my dad was a software engineer so we always had computers around the house. I think he had one of the first laptops with a hard drive or something. This was a big deal back in the early-90’s, I suppose. So anyway, we had computers around. I think I started off with HTML and DHTML, as we used to call it before it became JavaScript, messing around with websites and iframes and just kind of went on from there. I studied computer science at university and went into web development and then kind of expanded that into building platform cloud products.

RD And what are you doing today?

JB I work for an organization called ScriptRunner. We're part of the Adaptavist Group and provide apps for the Atlassian ecosystem. I've been working on a product called ScriptRunner for Jira Cloud and ScriptRunner for Confluence Cloud for a good few years– eight years, I think– building the cloud version of those apps. So it's been an interesting journey going from the startup mentality right at the beginning of the product to a much larger customer base and all kinds of different sets of requirements as we grow in scale.

RD Right. And I'm sure that that growth and that scale has left you with a lot of tech debt, our topic for the day.

JB Yeah, you can understand why I might have emailed in asking to talk about it.

RD So tech debt is sort of a big category that I think people use in different ways. We had a great article about how maybe it's misused and people should use other terms. What for you is tech debt?

JB I see tech debt in several different categories. I think a lot of times we mean a technology choice that was good at the time but is no longer good, and that can be as straightforward as that Java 8 used to be a great technology choice, but it's now out of date, or maybe it's a technology choice that was a shortcut. We wanted to kind of get alone from the technology bank, to ship things to the customer faster. I think that's often what we mean when we talk about technical debt. I think Chelsea Troy wrote a blog on the Stack Overflow Blog when she was talking about this idea of ‘maintenance work,’ kind of upgrading your dependencies or language versions and that kind of stuff, and I think those are separate because you have to do the maintenance to keep things running. You don't necessarily have to do it straight away, but eventually you are going to have to upgrade. And because everybody's upgrading everything all the time, we're kind of all in that same boat of upgrading everything all the time. And then there's a third aspect of this which I think is kind of what happens over longer periods of time potentially, which is when you end up rearchitecting or even replatforming the products you're building.

RD A sort of tech bankruptcy?

JB Kind of tech bankruptcy. Hopefully you don't have to do this, but eventually the requirements are so different or the technology maybe hasn't been maintained and it's not financially viable for your business to stay on unmaintained technology, or there's a different way, there's a new best practice and new technology has come in that you want to use and benefit from, and the technology you chose doesn't support that anymore.

RD I've been chatting with a developer friend of mine about performance and he shared a video that was like, “Performance really matters because all of these companies have basically rearchitected their software when they hit product market fit for performance.” Tiny bits of performance, but that feels to me like they hit this moment where their tech debt, they couldn't keep it up anymore. They had to sort of declare bankruptcy and start over.

JB Is it Joel Spolsky's famous blog post about don't rewrite everything from scratch? And that's a really big challenge when maybe your initial architecture design wasn't ever meant to scale, but of course it's the one that got you all the revenue that you've then used to build your company and build the extra features. And how do you not rewrite everything from scratch but also get the benefit of the new technologies, whatever it happens to be? And we've definitely seen that and we've been able to pivot and kind of migrate technologies with mostly success over the years, but as the code base gets bigger and as the technology changes are bigger, that becomes harder to sustain an existing product. And it's a cloud-based microservices architecture and there's a lot of moving parts and you don't want to introduce downtime or maintenance. You don't necessarily want to stop the roadmap to just rebuild everything to work exactly the same as it currently works, so that's a difficult challenge.

RD There's new paradigms that come up, new libraries, new frameworks, all that. When we were chatting before this, I said, “Isn’t every dependency basically tech debt?” And you said, “Well, every line of code. That's the old saw.”

JB People say that every line of code is technical debt and I kind of understand that, but I think it's a bit flippant. Those lines of code are also an asset. That's what's bringing in the money, but you do need to curate it and look after it and you can't just have a backlog filled with tech debt tickets that you're eventually going to get around to doing. You've got to have some balance of looking after this code base and maintaining it while also building the new features and doing the bug fixes and rearchitecting and replatforming. Some balance has to be there. I think what becomes difficult is choosing this low priority but important piece of work over a high impact customer feature or a high impact bug fix or something, and you're almost always going to choose the high impact customer-facing thing because it's easier to justify with a revenue number or something.

RD Nobody's ever complained about being on a lower level of Java. I've seen companies avoid that and sticking with Java 8 when Java 13 was out.

JB The customers may not complain about it, but at some point it's going to stop you doing the next thing, and that's what we've seen. On the occasions where we haven't upgraded things or maintained them, you then come to build that next feature or do that next change. and all of a sudden the scope of that initiative or project work expands rapidly and the deadlines move way into the future because, “Oh, we didn't do this and we didn't do that, so we have to do that now because the technology won't let us ignore those problems or ignore that upgrade.”

RD So that's an interesting sort of side corollary– the scope creep of requirements and features over time. You make a decision with your old code and you realize, “Oh, we actually have to do this other feature. It might not be compatible. We might have to write some middleware,” and then eventually that middleware starts being a problem. Do you have instances of that tech debt where it sort of became a huge problem?

JB Yeah. There was some relatively recent work we did. The product is called ScriptRunner. We maybe unsurprisingly allow our customers to run scripts, and there's a scripting editor and we want to give customers feedback on, “This is a compile error,” or that kind of stuff while they're authoring their code. And we were improving that area of the code base and we needed to upgrade the version of Groovy that we were working with. So historically, customers write scripts in Groovy, and no customers were asking for the latest version of Groovy, and so we hadn't really upgraded. And then we wanted to introduce this new feature to significantly improve the user experience, and for reasons, we needed to use the latest version of Groovy. So all of a sudden, it was like, “Okay, so how do we do this? There's incompatibilities between the different versions. There’s backwards incompatible changes that introduces bugs and we have to catch them. This is a SaaS product that's running 24/7. We can't just upgrade Groovy one afternoon and hope nobody’s scripts break. We need to anticipate that.” So that was an interesting challenge to work through for sure.

RD It always surprises me how time consuming a lot of library updates can be. When I was at my previous job, they converted from AngularJS to Angular, and that is a huge, huge undertaking because it goes from JavaScript to TypeScript and they had to backport all the code to an abstract syntax tree and sort of re-figure out how the code worked. And then, of course, I think they moved to React at some point, so they’ve got to do the whole thing again.

JB So one of the projects I'm working on at the moment is moving away from the web server technology we chose years ago, which is called Ratpack, which is this great non-blocking Java web framework based on Netty. We're moving away from that now and we want to be running on the latest Java versions to get the benefits of virtual threads, and we then realized, “Oh, we need to update Gradle as well and Groovy again because they don't have compatibility for Java 21 unless you use a later version.” And so you go through that process and then, “Oh, this Gradle plugin also needs upgrading and updating.” It's this kind of never-ending wheel of things you have to upgrade and maintain and look after, which is hard to drip feed through your sprint work or your Kanban backlog or however you're doing things. We've got to the point where we need a dedicated team to do some of this stuff. Some of it we can outsource within our organization to teams that have specializations in this area. Other stuff we need to own as the engineering teams working with this technology day in, day out. But at some point this amount of work, this supporting work becomes so much and so consistent that you may as well have a dedicated team to do a bunch of it.

RD So how do you argue for something like that? How do you convince the folks in charge of the purse to give you the man hours to sit down and upgrade stuff?

JB That's the challenge for everybody. I know you wrote on the Stack Overflow blog recently about measuring the amount of this work and the impact that it has, and there's a significant element of that that's required but I think it doesn't necessarily have to be that every engineer creates a Jira ticket or something every time they come across some technical debt and you just add up the story points. You can have a conversation about how this took three times longer to deliver because we didn't do this work, or kind of anecdotally, “All the engineers are quite frustrated with this technology choice and it's slowing us down.” Or, “We actually have this roadmap of purely technology changes that we either need to make or want to make for these reasons that match these business requirements.” And then take all of that evidence if you want to the managers and say, “Look, this is why we want to do it. This is the benefit. We think it's not just because we like it that way, but because it unlocks this capability to work faster or more reliably to provide these kinds of features that we couldn't provide if we didn't redesign this or rearchitect this.” We've tried different mechanisms in the past. We've done the drip feeding it through your regular weekly work. We've done weeks of tech debt-only work and sometimes that's enough and sometimes it isn't. Some tech debt removal projects have completed successfully and it's been fantastic and others get 80/90 percent of the way through and then there's these sharp edges of the code base nobody wants to work with. That has a challenge for onboarding new engineers because they may see some code and be like, “Oh, that's how we do it here,” and then all their new code is like, “No, no, no. You're introducing even more tech debt. Don't do it that way. We haven't finished replacing that part of the code base.” So it's a big challenge to solve and I think it requires, like you were saying, that you've got to be very clear about it. You've got to use metaphors that people understand. You've got to make it quantifiable. You've got to make it match up with the business objectives and the roadmap for what the team is doing.

RD And I think that's the original intent of the metaphor– to put it in financial business terms that this is stuff that's a drag on our ledger. But it's interesting that you mentioned earlier that code is also an asset, to extend the accounting metaphor of it. Do you think there is a way that code can sort of counteract it? Can it appreciate in value or is it always depreciating in value?

JB That's a great question. I want to say, yes, it can appreciate. My gut feeling is, yes, this code can be more valuable in the future when more customers are using it, potentially. One of the things we've done actually is look at which features our customers use, and then you could kind of proportion out how much revenue we get to each feature. And then you could say, “Well, these lines of code that support that feature has this dollar value or pound value or euro value,” and that would go up as your customer base grows and that's good. We were having a conversation internally and somebody was asking what's the best thing about the product, what worries you, and something else, and I was like, “I think we forget to talk about the good.” We're so focused on the bugs and the problems and the negative aspects of that, and I wonder whether there's an element of critical thinking as a software engineer that tends to lead us in that direction of looking for the problems. But actually there's some really good stuff and we should celebrate that.

RD Absolutely. You're solving problems, so you're trained to sort of look for problems.

JB Exactly. And I think that lends us to this position where we forget actually that this technology is great and solving really important and valuable problems for our customers and it's made their lives better. And it's easy to forget about that and to forget that that's a really valuable aspect of this. And if you invest in it and the quality of the software, then the customer's lives are better or more reliably better or more predictably better because you've built something high quality and you're maintaining it well, and it's not just going to break one day because you didn't upgrade this thing or whatever it was.

RD Are there pieces of code or libraries or functions that you were impressed with their resiliency and their ability to hold up over time?

JB I think I mentioned the Ratpack web framework we started using. It's been fantastic over the– I think I said eight years earlier. It takes a little while to get familiar with because of its asynchronous programming style, but it's been very high performant. It's been very effective at its job. It's just not very well maintained which is why we're moving away from it. I think React might be the other one. At the beginning of the product's life, we were using jQuery which was a decent choice for the time, and I don't want to knock jQuery. It's great for some things, but I remember when we became aware of React and what React could do and it was like, “This is the future. Let's move. Let's rewrite all of our front end code into React.” And for the most part, that's scaled really well and really effectively. So those have been two good choices, one of which we're keeping with and one of which we're not because there's that maintenance aspect of it.

RD Every piece of dependency, every framework, every library you take on, you take on its bad parts, too. You sort of commit to maintenance of it. And I think a lot of folks think of dependencies as this thing that works, exists over here separate from the software that we're building, but you have to maintain that, too. Every piece of that is code in your software.

JB And any given dependency can fundamentally change your code and fundamentally change the architecture of what you're building. Early in our journey with React, we used– was it Redux-Saga? I think that was the name of the library which heavily relied on– is it the yield keyword in JavaScript? We had a guy working for us and that library had worked really well in his previous job so we tried it and it was mostly good except for the testing was horrendous and nobody really understood how it worked under the hood, and so after a period of time we were like, “Nope, this sucks. Let's get rid of it and go back to Redux,” or whatever it was. We had a guy working for us who loved functional programming and Haskell and stuff and he introduced another front end library FP-TS, and it solves some real world problems, but it solves it in a way that most of the engineers I've worked with have no comprehension of how this hangs together. And most of us didn't want to understand semigroups and monoids and whatever all that terminology is, and we kind of introduced it and tried it, and again, were like, “Nope, this sucks. Let's get rid of it.”

RD So that's an interesting source of potential tech debt– individual engineer idiosyncrasies. What are the things that this person likes that the one Haskell person coming in and being like, “Oh, we should make everything Haskell,” and then everybody else has to learn Haskell and maintain it.

JB As an engineering leader/engineering manager, it's a really interesting path to how do we be open to new ideas and how do we allow people to pursue what they're interested in whilst making sure everybody goes on the same journey and everybody can be equally as productive and we don't leave people behind or accidentally make this unmaintainable or whatever? And I've definitely made mistakes along the way. Some of these libraries should have been a hard no, but you don't know. You don't know until you've used it or until you've seen it a few times and you can spot the pattern and say, “No, this isn't going to work for us. This isn't going to work for the people in this team and their level of experience and their expertise and what they're good at. For another team it might be great and that's fine. But for us, it's a no.” So that's been a journey.

RD Sometimes you’ve got to get the boat in the water to see if it floats. So that is an interesting question. How do you filter out? How do you say no? How do you say yes? And how do you adjust based on those? Like you said, those should have been hard no’s.

JB I've learned to trust my gut a little bit more. I've been doing professional software engineering for 12, 13, 14 years now. I guess I've got some level of perspective on this stuff now and I'm learning to trust my gut instinct. One of the things I learned with some of the mistakes I made is that the psychological safety of a team is really, really important to allow people to really genuinely vocalize, “No, I hate this. Please don't make me work with it.” And at least in my experience, you need the team to have been together for 6, 9, 12 months before they understand the culture and they feel safe and they can challenge their boss or the loud person in the team or the opinionated person in the team without fearing that they're going to lose their job because they don't want to work with this technology that you're forcing on them or that you're exploring. And so building that psychological safety in the team is, I think, one of my biggest takeaways. That underpins everything else. If you can't get people's real opinions on the technology you're using and the problems that they are experiencing on a day to day basis, then they're either going to be really unhappy working with stuff they don't like working with, or they're going to leave. Either way that sucks. And if they're not telling you that there's problems that they're encountering on a day to day basis, then you're operating in this model that everything's fine and it's a false pretense. It's a false foundation that you're operating under. And it's really valuable speaking to each of the individual engineers and giving them the space to say, “What slows you down each day or each week? If we could speed one thing up or replace one thing, what would it be? If you were in charge, what changes would you make if you could change anything?” I'm really listening. And then for some engineers I've spoken to, they're like, “I don't want to have to lobby for these changes. This is difficult and painful, but I can't keep bringing it up.” That needs to be somebody else's job to champion some of these decisions and some of these technology choices. Other people get used to it and they're like, “I just thought this is how we did things here,” and it’s like, “Oh, no, we can make it better. Let's make it better together.” Ultimately, this impacts the customer and what we can deliver to the customer, so let's make our technology choices better. Let's make the platform better. Let's make the developer experience better, and then the consequence of that is the customer gets a better product at the end of the day.

RD Right. Technology at its base is a people problem, right?

JB For sure.

[music plays]

RD Well, as we do at the end of every show, we'd like to shout out a user that came on, dropped a little knowledge, asked an interesting question. Today we’re shouting out a Lifeboat Badge, awarded to Nikoksr for answering, “How to use Math.Pow with integers in Go.” If you're looking for an answer for how to use Math.Pow, we have it. I've been Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. If you like what you heard today, leave a rating and review. It really helps. And if you want to reach out to me, you can find me on Twitter @RThorDonovan.

JB I'm Jon. I work for ScriptRunner, part of the Adaptavist Group. You can find me on X @JDBevan.

RD All right, everybody. Thank you very much, and we'll see you next time.

[outro music plays]