The Stack Overflow Podcast

The open-source ecosystem built to reduce tech debt

Episode Summary

Today’s guest is Jonathan Schneider, co-founder and CEO of Moderne and creator of OpenRewrite, an open-source automated refactoring ecosystem for source code built to help developers eliminate tech debt. He tells Ben and Ryan about the challenges of automatic refactoring, how Java continues to evolve, and what kind of impact tech debt has on software development. Jonathan also describes the transition from open-source project to startup, why clean code is so important, and the role AI plays for developers right now.

Episode Notes

Moderne is an open-source company building automated source code transformations for framework migrations, vulnerability patches, and API migrations. Explore the platform here.

OpenRewrite is a community-driven open source project that consists of an auto-refactoring engine that runs prepackaged refactoring recipes for common framework migrations, security fixes, and stylistic consistency tasks.

Connect with Jonathan on LinkedIn.

Props to Stack Overflow user Benjamin Atkin, who earned a Populist badge by offering up some wisdom on Rails - How to refresh an association after a save.

Episode Transcription

[intro music plays]

Ben Popper Ready to transform the way you manage agreements? Unlock higher efficiency and performance at DocuSign Discover on November 20th. APIs and tools that span the entire contract lifecycle, DocuSign Discover equips you to integrate, automate, and optimize your agreements. Register for free at developers.docusign.com

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ben Popper, the Head of Content here at Stack Overflow, joined as I often am by my colleague and collaborator, Ryan Donovan, Editor of our blog, veteran technical writer.

Ryan Donovan Hey, Ben.

BP Ryan, tell me a little bit about who we're going to be chatting with today.

RD Today, we'll be chatting with Jonathan Schneider, co-founder of Moderne, who also worked on the OpenRewrite project at Netflix.

BP Nice.

RD And Moderne is an extension of that. So we're going to be talking about automatic refactoring, particularly for Java, I believe.

BP Very cool. Jonathan, welcome to the Stack Overflow Podcast.

Jonathan Schneider Thank you, Ben and Ryan. Pleasure to be here.

BP So we always ask folks first, give us a quick flyover. What was the first computer you ever used? What got you into writing code, and how did your journey within the world of software and technology lead you to the role you're at today?

JS I grew up in a very rural area in Missouri. My first computer came fairly late to me. It was a Pentium II, I think. It was my junior year of high school, so I think I was a bit late to the game just based on where I lived. But I took a computer science class with a friend in high school, an AP computer science class, and the teacher was ideal. He dropped a C++ textbook on our desks and said, “Let me know if you have any questions,” and went and sat down at his desk, and that's basically how he taught. So just going through the book, writing one thing after another, and if you have any questions, go up and ask him. I always felt like software development was like free engineering. It's supply-free at least, you can just keep building and building and building and so it really attracted me.

RD So how did you get involved in the OpenRewrite project?

JS I had joined Netflix back in 2014, roughly something like that. So I was working for engineering tools, a central team. We were, like it or not, responsible in some ways for moving the company forward from Java 6 to 7 at the time, 7 to 8, trying to retire old libraries, security vulnerability repair. And at the time, Netflix had a special freedom and responsibility culture. It was part of their culture deck, which meant a central team could impose no constraints on what product engineers did. So we spent a lot of time doing reporting, trying to provide context to product teams like why they weren't where we were wanting them to be, and it resulted in pretty much no action on the part of development teams. So we'd go around asking, “What could I do? What could I do to make you move forward?” and they'd say, “Well do it for me, otherwise we’ve got something else to do.” It's kind of an almost sarcastic way, but I think it's representative of how a lot of product teams are now. They're just kind of groaning under the pressure of a lot of feature work required of them, and at the same time trying to deal with modernization or security vulnerability repair.

RD You talked about moving from the various Java releases. I remember any company that I worked at that had to migrate from Java was three or four full releases behind because it was such a bear. Why is that such an issue for teams?

JS It's just a ton of manual work, ultimately, and there can be just a lot of context necessary as well to accomplish it. With OpenRewrite, we've written what we call ‘recipes,’ which perform individual units of transformation. It could be something as small as changing a method to something as large as a whole library or a language-level version update. And if I take one of those, for example, like a Spring Boot 2 to 3 recipe that's in open source right now, it has more than 2,300 steps in that migration.

BP Yikes.

JS So we're effectively asking developers to know all of those things, and oh, by the way, if you miss one of those things, you broke it, what's wrong with you? So it's hard.

BP Can you step back for a second and just sort of say what is the genesis of OpenRewrite as a project with a name and with a working group behind it?

JS So the problem was clear enough to us. We were trying to get Java 6 to 7, et cetera, and we started looking around at what refactoring technologies existed that were available at the time. So Uber had worked on something called Piranha, Google had worked on something called Error Prone which had a refactoring component called Refaster, and most of the solutions out there had some, to us, unacceptable constraint. In the case of Uber's Piranha, it saw the syntax tree but it didn't know everything the compiler knew about a particular method call, and that wound up being important when we were doing logging library migrations. A lot of logging libraries and syntax look very similar, but they're not the same. You want to narrowly identify a particular library when you're making a change. There were some that Google, when Google builds a technology like this, Google is a very controlled environment. They have Google Java format monorepo, everything kind of looks the same, and so it's sufficient to make a tree-level change and then print it back out to text and it's always going to look the same. At Netflix, because of that freedom and responsibility culture, it was absolute chaos. There was a lot of different styles of writing code, the tooling stack wasn't consistent. And so we had some sort of base principles that were important to us, like that the change must look idiomatically consistent in the context of the repository it's being inserted into, and there can be no false positives. And anything short of those two things, developers will just reject the change.

RD So were you the creator of the OpenRewrite?

JS Yes. I started that project back then in 2015, something like that. At that point we had identified the problem and started working on that solution. So recognizing that we needed everything the compiler knew about, we built really by integrating directly with the compiler. The compiler produces an abstract syntax tree and then type attributes it, and so we just kind of used that data and started layering formatting on top of it and other things in order to produce accurate change.

BP Stepping back for a second, Ryan and I have been on many podcasts and been around for many developer surveys. What is your perspective on the hate that Java receives? Does it deserve that hate, and is some of working with it in a way that's more pleasant sort of integral to what you're doing here– finding this pain point, being able to solve it, doing it in a community-driven way? Because Java has obviously been around forever and continues to be in an enormous number of applications at enormous companies, so it's not going anywhere.

JS What are some of the common hate themes that you hear if you had to summarize?

RD There's the builder-builder, factory-factory, builder-factory.

JS Ah, yes.

RD And then I think some people complain about being forced into object-oriented, as hard as it is. And there's some folks who see a 20 year old language and are like, “It's old.”

JS Yeah. I suppose JavaScript is too at this point. I think it very much depends on the frameworks you're using. So the long naming convention was typical of older versions of Spring and so forth and I think there's always new frameworks coming out that challenge those assumptions. I think the language has evolved with the times somewhat conservatively. It's never the first language to introduce a new feature. It tends to kind of sit and wait and see a pattern sort of fully mature before adopting it, which I think has led to a long history of non-backwards incompatible changes to syntax and so forth. In terms of market, I think, first of all, at the time, most of Netflix’s stack was built on Java, at least the back end, obviously. And I think in enterprise mid and large markets, C# and Java still dominate the back end. So when we went to build something for maximal use, we started with something that had a good market share and then kind of moved from there to other languages.

RD So I want to ask about the transition between working on this open source project at a big company and then using that as the basis for a separate company.

JS So I had actually left Netflix somewhere around 2017 and joined the Spring team, so I was part of the problem in that respect, I guess, making the breaking changes. I started a project called Micrometer for application metrics instrumentation and then was sort of working up the value stack, and with Cloud Foundry portfolio we were trying to add a product called Spinnaker, which was an open source multi-cloud continuous delivery solution from Netflix and Google and others trying to add that into a pivotal sweep to its paths, I guess. We were working with the Moderne co-founder. We were working as kind of the technical and product leads of this at large enterprise. JPMorgan, we were working with at the time, Home Depot, Fidelity, these kinds of organizations trying to try out the idea of Spinnaker with them. And what we kept hearing as we were teaching advanced continuous delivery concepts was, “Well, talk to me in a year when I'm done moving Spring Boot 1 to 2,” or this to that. There was always some migration or some vulnerability repair that they had to deal with and it basically sucked up all their time. So having seen that enough, we thought maybe we should go take that technology from Netflix and go do this instead, because it's the same problem we heard over and over again.

BP Part of what I was excited to chat about in this is that it originated in Netflix, and as you mentioned, a lot of large enterprises do use Java. So last November, Amazon announced Amazon Q Code, which integrates OpenRewrite. And Andy Jassy, who's now the CEO of Amazon, posted on LinkedIn that this enables them to perform 17 Java upgrades in record time. I could list out all the numbers here in terms of the estimated savings in the hundreds of millions. How much of this is promotional, how much of this is real, and how much of this can be applied at other large institutions?

JS I think the change that was produced by Amazon Q Code transformer was all OpenRewrite. So it's a rule-based system underlying that. I think Amazon was able to do that so quickly because that open source Java 8 to 17 recipe has been so battle-tested at so many different organizations over the years. That's another one of those recipes that's got hundreds of steps and so has accrued a lot of edge cases over the years and has made it super reliable. And so part of a recipe definition is how much time savings would be realized by a single application of that recipe in the code. The numbers we see are usually something in the several hundred to a thousand hours.

RD So we've been talking a lot about tech debt and this is obviously a huge source of tech debts, these upgrades.

JS Absolutely.

RD Do you think, best case scenario, you're able to get recipes for every upgrade, every security fix. Does tech debt still exist?

JS Well, tech debt itself is an interesting phrase. We used to think of tech debt as something we imposed upon ourself. We took some sort of shortcut architecturally or otherwise and so we're going to have to pay that down later. There's this other activity, which I think is that we are all relying on third party and open source components that evolve at their own pace, and if you don't keep up, then something breaks. I think from the perspective of the business, they don't see those as distinct oftentimes, because the request is the same from the engineer. The engineer is saying, “Hey, I need to take time doing non-feature development in order to do X,” and I don't really care whose fault it is, all I hear is, “I'm not doing feature development.” So I do think the vast majority of technical debt, if you lump those together, is coming from this third party and open source. Keep up or your app breaks, ultimately. That stuff I feel like is solvable if we ensure that framework authors, as they make breaking changes, provide recipes that help their downstream consumers adopt them more quickly. Probably the most mature engineering organization I've seen in terms of dealing with this is actually a small insurance company. They called it maintenance as opposed to technical debt, so there was technical debt tags and there was maintenance flags or tags on their issues. And whenever something was tagged technical debt, that was something I imposed upon myself. Maintenance was something that was being imposed upon me from the outside. It's more intuitive, I think, to the business too. Obviously it's something you have to do with your car, your home. It's an asset you own, it needs to be maintained.

BP Technical debt sounds like you overextended yourself, you got into trouble, like, “Why are we in debt here?” whereas maintenance sounds like everybody does maintenance, this is an understandable cost. And it's interesting, in the dev survey, there's sort of these two contrasting data points. The thing that developers find most frustrating is technical debt and the things that make developers happiest is clean code and environment, and it's sort of like, man, if only we could find a way to square these two. But that idea of maintenance and the broken windows theory and just getting everybody to kind of bake a little bit of this into every single day does seem to be more effective than turning technical debt into a once a quarter sprint or whatever.

JS Right. The other element to that too is, if you were doing it by hand, if I were doing say that Java 8 to 17 migration by hand, I would do the bare necessity of what was necessary in order to make the app run on the next runtime, but I wouldn't necessarily go back to every if statement with an instance of that then casts behind it and uses the new idiom, which is, “I don't need to do the subsequent cast anymore,” or “There's a better string format method now.” But when you have a recipe to do not only the bare necessities but also adopt the more common or more modern idioms, suddenly, and maybe that goes back to the point of hating Java, it looks better. It looks more modern, it looks less archaic ultimately, and you don't have to sort of choose between the optional things and just the required things.

BP How much of hating Java is just the aesthetics?

JS It looks bad. The old stuff looks really bad. It looks better now.

RD I took some of my early computer science classes in Java 1, and back then it was a lot of interesting features, but obviously they've kept adding interesting features. And you say sometimes it'll convert it to a more idiomatic, more updated string builder or whatever. Do people ever get mad about that?

JS Absolutely. So if I just try to do Java 8 to 17 and said that the way I'm going to do this is I'm going to mass PR to the whole organization, and everybody should accept this PR, I think PRs tend to be viewed from product teams as unwelcome advice coming from an inlaw or something like that. It could be good advice, but you're just going to look for a reason to reject it, you're just going to find a way. And the things people often pick on will be kind of stupid things like, “Oh, I don't like that string format thing.” I heard one product team really, really pushing back hard on the fact that they could no longer use the constructor for a big integer. You had to use big end up value of. It's like, “Guys, this is deprecated. It's going away. You don't have a choice. This is going to be.” But because the recipe is composite, it can just pull out that one recipe, run the rest of it, get everything merged, we'll come back to this point later on, and so moving product team by product team and asking them to pull this change as opposed to it being pushed to them through mass PRs winds up being the thing that gets it over the line in the end. It's a social engineering problem more than a technical one, ultimately.

RD You mentioned moving from Java 8 to 17. Have you done any studies on how much code has changed between a massive jump like that?

JS I see it all the time. So recipes, they also admit structured data in addition to making changes, and so we can see every file that was changed. There was one banking application we moved that touched 19,000 files, and that's just one. So there was, I think, another 60,000 repositories after that. I can't have a flashcard experience where I'm like, “What do you think about this change? What do you think about this change? What do you think about this?” times 19,000, it just doesn't make any sense. And so that's where it becomes essential ultimately that I think a provably accurate system is the one making the change at the end of the day, however that original recipe was developed.

RD So is this just Java-based right now?

JS We started in the JVM obviously, Java, and then added Groovy and Kotlin, then added more infrastructure as code type things. So YAML, JSON properties, XML, Terraform. Terraform actually wound up being a very difficult grammar, as it turns out. We went back and added COBOL, JCL copybooks, about 31 related COS technologies that wound up being a lot harder than I expected as well. But then as we kind of came back to the more modern languages, we wanted to add C#, Python, JavaScript, Ruby, et cetera. What we found is that these, I guess what you could generally call C family languages, shared most of the same LST structure, which is both intuitive and counterintuitive, I guess. They all have if statements, they all have for-loops, they all have method declarations. The structure is very similar for the most part. They look very different in text in terms of how those constructs are printed out, but structurally they're very similar. And so when we wrote the first C family extension for C#, the C# LST, the lossless semantic tree model, actually extends from the base J one. So the significance of that is that many of the recipes we originally wrote for Java just automatically work on C#. So Boolean simplification, change method name, a lot of these core building blocks actually wound up working on both sides. We wanted to add a recipe authorship experience in each language as we moved along as well, and so we wrote this technology that allows us to transfer parts of the LST over the wire so that a C# developer can write a recipe in C#, a Java developer writes a recipe in Java, et cetera. At this point, we finished Python a month ago, C# a couple months ago, we're just about done with JavaScript and TypeScript. Each of those extends from J so that there's this cross-language reuse. I haven't seen that much before in my history in software, so it's kind of an exciting, unexpected benefit.

RD Do you ever run into sort of conflicting recipes when somebody is upgrading Java and Spring?

JS Conflicting in the sense that they would make incompatible changes somehow?

RD Yeah, or they're both trying to change the same things or change them in different ways.

JS The great thing about a recipe is, if something beat me to a change, then by the time I see it, I just have nothing to do so I just don't do anything. So we'll actually chain these recipes often. The Spring Boot 3.3 recipe chains to the 3.2 which chains to a 3.1, 3.0, 2.7, 2.6, et cetera. So it doesn't really matter whether an application portfolio is starting on 2.6 or they're starting on 3.1, because everything below the version I’m on just doesn't do anything, so you can kind of take a heterogeneous set and move it forward together.

BP What are your thoughts on capabilities from Gen AI, which now writes, reviews, debugs, documents code, to play a role in making some of what you're doing easier to help alleviate some of these pain points to be a complement to basically what you built?

JS I think the first and most important point is, and I'll state my belief on this and time may prove me wrong, but I think that the code as text, or even as AST– and so AST means lacking type attribution that the compiler knows about– is an insufficient set of data for a model to do large scale impact analyses or transformation. And the easy example of this I think is logging libraries. We go back to this. There's five Java logging libraries. They each look very similar: log.info, log.warn, etc. How do I know which logging library I'm looking at? I may have inherited that log field from a base class that comes from a binary dependency, so nowhere in the text of the code is there any reference to what library I'm using right now. So I think for any sort of large scale change, model needs to know things that the compiler knows. So it's a data problem I think more than anything. I think models will always be very data hungry, and to me, I feel good about our long term prospects because I feel like the LST is the data that it needs to make these sort of decisions. There's been another evolution, though, or another change recently in models that I find very interesting, and honestly, very easy for us to integrate, which is that I feel like there was this RAG paradigm– retrieval augmented generation. It's effectively prompt stuffing– stuff it into the prompt. This tool function calling paradigm which has arisen in the large foundation models in the last several months inverts that responsibility. So as I'm asking a question, I say to the model,
“Hey, here's a list of tools and all their descriptions, and when you want to ask a question about binary dependencies, ask me. When you want to ask a question about deprecated methods, ask me,” and so on and so forth. The model is the one that takes a human-provided prompt and decides when to call various tools in order to supply it data, and interestingly, we found that recipes that produce those data tables are sort of trivial tools to bolt onto a model for it to answer questions about the codes pretty deeply. I think that what will continue to be true is that data will be king and however you provide tools that help the model inferencing from that data will be the interesting solutions. I expect that the models themselves are interchangeable going forward and even today, but what do you see? I'm curious. Does that align with what you're thinking or seeing?

BP I would agree that for the kind of large scale organization-wide transformation you're talking about, people are not trusting these models to do that, and that the more context they have on your code base and your documentation and your whatever you want to call it, knowledge base of your company, the easier it is for these models to be a benefit as opposed to writing code that seems good at first, but takes just as much time to unwrite or debug later, and that their central application over the next 12 to 24 months will be removing the toil work that developers have to do in addition to writing code that is pretty simple, whether that's writing unit tests or doing the documentation or doing the things like that. But Ryan and I have talked about this– what do developers want? Well, they want to maintain agency and they want to get into a flow state and write good code. They don't want to be a copy editor for an AI agent, but will AI agents get to a point where they're good enough that people want to replace developers? I hope not, but I can't say, because they're getting better all the time.

JS I think when you provide those examples like unit test writing, documentation, certainly the kinds of things that a GitHub Copilot does for me in the IDE, all of those I feel like are authorship experiences on some level or another. It's kind of synthesizing that new code. And I use Copilot every day. It's been my experience that it makes me, I don't know, 10 percent better, 15 percent, whatever number you want to throw at it. The IDE rule-based refactorings made me dramatically better as well when they kind of arose, and so it's in some ways a way of writing more net new code than needs to be maintained on the back end, and if anything, accelerates this existing problem that I unwittingly set out to solve.

BP I've written about this and I like the way you phrase it, which is that more net new code is not necessarily a good thing. Maybe it is, maybe it's not. Cleaner, more robust code is kind of net new a good thing and so better documentation is net new a good thing. And so if you can apply it in those areas, you can feel confident for the time being that you're getting an ROI, whereas net new code, we can't be certain yet. And I'm sure there'll be tons of research and academic research that maybe we can trust and research from different folks in industry that'll be a little biased towards whatever their solution is.

JS I heard something interesting as well last week, and this was at a large financial tech innovation forum. And privately, there was a lot of talk about developer productivity code assistants, and one of the executives reacted really badly. He said, “I don't want to talk about developer productivity. If you give my developer back two hours of the week, do I believe they'll spend that on themselves or on the business?” And I thought, “Well, that's a really dark view,” but there's some truth to that. It's very hard to measure value to the business when it comes to that kind of thing, whereas I think in some respects, a large scale transformation like, “I'm replacing Oracle with Postgres,” that's easier for them to understand the value of.

RD But isn't that that version of productivity where the developer gets two hours back and they go play Call of Duty for a little bit and refresh their brain.

JS I know it's a very dark view, but at the same time…

RD But I think that's just as valuable. They get a break, they come back refreshed, they're able to think about these hard problems.

JS Absolutely.

BP This is where you need a great CTO or CPO or CIO on the senior leadership team to make the opposite case to this person who has a dark view to give that glass half full version of what that means for their developers.

[music plays]

BP All right, everybody. It is that time of the show. Let's shout out someone who came on Stack Overflow and shared a little bit of knowledge or curiosity, in doing so, helped the whole community. Awarded four hours ago to Benjamin Atkin, a Populist Badge for answering the question: “How to refresh an association after a save in Rails.” Benjamin's answer was so good that it was given more upvotes than the accepted answer, and hence the Populist Badge. Shout out to Benjamin, and thanks for contributing some knowledge. I am Benjamin Popper, not Benjamin Atkin, and I'm the Director of Content here at Stack Overflow. You can always find me on X @BenPopper. If you have questions or suggestions for the show, if you want to come on as a guest or hear us talk about a particular topic, email us, podcast@stackoverflow.com. And if you enjoyed today's episode, the kindest thing you could do would be to subscribe and leave us a rating and a review.

RD I'm Ryan Donovan. I'm definitely achin’. I edit the blog here at Stack Overflow. You can find the blog at stackoverflow.blog. And if you want to reach out to me with comments, suggestions, hot tips, you can find me on LinkedIn.

JS I'm Jonathan Schneider, co-founder at Moderne. You can find us at moderne.ai. I'm easy to find on LinkedIn and Twitter. If you haven't heard of OpenRewrite, give it a try.

BP Very cool. All right, everybody. We'll put those links in the show notes so you can check them out. Thanks for listening, and we will talk to you soon.

[outro music plays]