The Stack Overflow Podcast

Commit to something big: all about monorepos

Episode Summary

Juri Strumpflohner of Nrwl joins the home team to talk all things monorepos, how he balances his roles as Director of Developer Experience (Global) and Director of Engineering (Europe), and the perennial question of how to monetize open source.

Episode Notes

Juri is currently Director of Developer Experience (Global) and Director of Engineering (Europe) at Nrwl, founded by former Googlers/Angular core team members Jeff Cross and Victor Savkin.

Nrwl has compiled everything you need to know about monorepos, plus the tools to build them, here.

Connect with Juri on LinkedIn or explore his website.

Shoutout to Lifeboat badge winner penguin2718 for their answer to Storing loop output in a dataframe in R.

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk about all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my wonderful colleague, Matt Kiernander. Hey, Matt. 

Matt Kiernander Hello! How’re you doing? 

BP I'm good. So we have a great guest coming on today, Juri from Italy, and we're going to be talking about building software in a better way, which is probably pretty interesting to the folks who listen to this show, at least I hope so. Juri, welcome to the Stack Overflow Podcast. 

Juri Strumpflohner Hey, hey. Thanks for having me. 

BP So for people who are listening and don't know, tell them a little bit about yourself. Where is it that you work and how did you get into this world of software and development? 

JS Yeah, sure. So my name is Juri. I'm from the very northern part of Italy, middle of the Alps. We expect snow tomorrow apparently. It has been pretty warm recently, so we will see what happens. It has been more summer-like in the recent days. But yeah, I've been in software development for over 10 years now. I started on the backend side, kind of in .NET Java environments, then came over to the more front end part of things as that became more popular and more in demand. And so ultimately now I'm currently in the role of the 
Director of Developer Experience at a company called NRWL. And we develop tooling basically for front end developers not only, but we are mostly in the JavaScript space, focusing on tooling for monorepos, developer tooling for JavaScript folks in general. So that's what I'm currently busy with. 

BP And how did NRWL come about? What was the origin of that company? 

JS Oh, the company itself was basically founded by two ex-Googlers. They started as part of the Angular team, and they then transitioned out of Google. They started first mostly consulting in the Angular space, but their main interest was to build an equivalent solution for handling larger monorepos outside of Google. Because Google is obviously known for monorepos, they have their own kind of stack with Blaze, however it's called internally. But then, it is hardly adoptable outside of the Google environment. It's not that easy to set up. And so they wanted to have some similar solution, but fully open sourced, more dedicated towards Javascript folks. So yeah, that's how it got started and that was almost five years ago now, so it's been quite a while. 

MK Before we get started talking about monorepos, would you be able to give a quick definition for those who are unfamiliar? 

JS Yeah, sure. So at the super high level basically, the term monorepo can be kind of misleading sometimes. I often like more the term multi-project repository because it literally is simply a Git repository usually, hosting multiple projects. Now those projects might be related, they usually are. So at least that's when you get some benefits out of it. So there's mostly areas which are kind of related, projects that are related, and they basically are grouped within a single Git repository, versus what is called often a polyrepo or single project repository which is more if you want the standard kind of starter use case, which most folks are aware of. I like to define it like that basically at a high level. 

MK And what's the relationship between what NRWL is trying to do and monorepos? 

JS The thing is mostly that, first of all, when NRWL started there was also Lerna out. Lerna is a popular tool as well for monorepos now. NRWL and Nx recently took over in the sense of the stewardship of it and kind of helped that community push that forward. But the main thing about monorepos is mostly, at some point, you can start really on your own in a sense that you set up everything on your own, have a couple of scripts that bundle stuff together. But at a certain scale, at some point you need some sort of tooling to help you, whether that is more efficiently run tasks across the repository, across model projects. Let's say you want to build all the projects. It might not be feasible at some point because there’s simply too much, too many of those projects. And so there’s things like tooling that helps you more efficiently run those tasks, avoid running even certain tasks which haven't changed and stuff. So it's mostly that type of aspect. That is one thing that we help developers kind of improve, and we help their lives basically in managing those types of monorepos. But also things around the whole DX. If you need to integrate a new project into a monorepo or you need to set up common tooling in a proper way that works. Let's say you have a TypeScript set up and you want to make that work in a monorepo setup, it might be different so you might need to kind of share stuff. And so that's where we kind of jump in and help set that up already in a preconfigured way.

BP And so would you say that a lot of the work you do is for companies that are making an architectural transition or evolving with the times? Their developers are saying, “Listen, w’ve got to get on TypeScript. We think it's better. It's going to make it easier to hire and take our company in the right direction.” Or are you equally as well doing this with companies that are starting from scratch, looking to scale and thinking about how to build something for the long term? 

JS It's really on both sides actually, because there are companies that are in a situation where they have already some sort of monorepo, maybe with the Yarn workspace or NMP workspace or something like that, or even Lerna. And they come actually to that point which I mentioned where they're like, “Well, it doesn't scale anymore so we need some sort of help or tooling.” Because getting started is really easy, but at some point it can become really painful as well, not just beneficial. And so we have those kinds of situations where companies reach out, or even they just go and use our tools because they're open source, so you can really just grab them, set them up, follow some of our tutorials and get going. But very often they pull us in then and we kind of help them push forward. And so we have those situations, but we also have the other type of situations where people are like, “Okay, we know we have those related business areas that we definitely want to have in one repo.” They see the benefit because they maybe have already tried it out or experimented with it in the past, and they want us basically helping them set them up from the ground up, in a way that scales and is kind of future-proof. 

MK So your role at NRWL is Director of Engineering and Director of Developer Experience. How are you managing to facilitate two very kind of time intensive, and almost separate but I guess they're also very intertwined roles. What are you doing at NRWL? 

BP Yeah, exactly. You have to get people to get their work done on time and you have to keep them happy. They're mutually exclusive.

JS Yeah, actually that's how the role started basically, because I mostly helped out manage European folks, and I still do manage European folks when they work with clients and help them basically get onboard in those client projects and reach out to clients. But most recently, as you can probably guess, even the DX part of the whole thing is kind of intensive. I do mostly the DX part and on the side help out basically with the client work that goes more into the engineering part of the whole role. So right now you could probably say it's more the Director of Developer Experience. Also because this year specifically for Nx, which is one of our open source monorepo tooling projects, that really blew up in the sense of usage. And so we were like, “Okay, let's focus on that.” So there needs to be one person at least that is fully focused on that and pushes that forward in terms of community work and making sure that the DX part of that is fine. And so that's how basically I started on those roles, but then quickly saw, “Oh, this is actually going really, really fast and growing fast, so let's shift gears and focus mostly on that side of things.”

BP Do you attribute that sudden growth to something in particular? Was it picked up by an influencer? Was it surge-driven by remote work? Was it just hitting a critical inflection point of the technology? What would turn the tide like that? 

JS Yeah, I think that it was basically that the monorepos part have been around for quite a while. As I mentioned before, it’s been around five years since we’ve been in that space. But in the beginning of this year, even the beginning of probably October or November of last year, more of the jobs that folks got, kind of the tension on that monorepo space. They were like, “Oh, I have the same situation. Let me actually look up what solutions are out there.” And that kind of started, especially in the Twitter space, you could see a lot of folks being interested and jumping onto that. And so I think from that it kind of started to kickoff much more. Also, I don't remember if it was like January or something, Vercel for instance also entered the monorepo space by acquiring Turborepo, which is a new solution to that problem as well. And so people at that point were like, “Oh, wait a minute. What is that monorepo? Maybe I need one?” And so I feel like there's a lot of interest in general that grew in the entire community at that point. And so from there on it kind of grew much, much faster than before. 

BP That's interesting. So yeah, you think there may have been some industry shift. Did you follow along with the Turbopack news? 

JS Oh, yeah. Totally. 

BP That's what came out of acquiring Turborepo. 

JS Yeah. I mean, Turbopack is a different approach even more, because that's even a lower level part. For instance, Nx doesn't really care what you use underneath. It's more at a very bare bones kind of level, it's more like a fast task editor. So rather than use Webpack or maybe tomorrow Turbopack, well even better. If you have a faster bundler, the whole monorepo would be faster as well. From the perspective of monorepos, Nx doesn't really care. So it will be interesting how they bundle things up, because they mentioned some sort of maybe monorepo solution and bundler kind of merging together or something. But I feel it's super early to even say that. 

BP Alpha only. 

JS Yeah. Really Alpha, yeah. 

MK I have a potentially more of a stupid question when it comes to monorepo. It's just not something that I've worked with extensively in the past. So you're saying things like, obviously having a faster bundler is going to increase the size. From my understanding, a monorepo is essentially for example like a GitHub repository that has a number of different projects kind of bundled into one. So with the kind of work that you are doing, is this more around the orchestration of how you kind of run and manage all those different repositories within the monorepo? Or what other technology goes into a monorepo apart from just a GitHub repository?

JS Yeah, it's a good question. So from the Git point of view, it's really just one repo. And you can imagine there are a couple of folders where each folder usually has its own responsibility. Like this bunch of libraries, maybe potentially some applications in there, which then link those libraries and bundle them together. So how we approach the monorepo part is mostly telling people that instead of having a monolithic setup of your application where you maybe have Next.js or React or whatever you are using, and have just a couple of folders which define your feature areas, rather than having those folders, just move them out into libraries. So there can even be very app-specific libraries, but you kind of restructure your code into having a very thin app and a couple of libraries in there, a couple of packages. And then you can host multiple apps even in there, which then link different parts of those packages, which makes things very flexible. So why do I mention that? Basically because in the end, it's not that you have just one app and everything bundles into one big thing that you deploy, but you can have multiple applications which you even deploy individually. It's more about the code organization aspect that the monorepo kind of takes part there. And so why does a fast bundler speed up things? Well, because in a monorepo then obviously it might be that you bundle the apps, you build the app itself, you build also the individual packages, and so if each of these pieces builds faster, the whole thing will be building faster, some of them basically. But still at some point, and that's where the kind of intelligent task orchestration and also things like computation caching that Nx and all monorepo solutions provide come in, because even if you have fast single project builds, as you add projects to the monorepo, because usually people start slow and then they pull in projects and projects and products that will sum up inevitably over time. And so you start from a five minute CI and it quickly goes to a 30 minute CI, which is not obviously something you want to have, because you ultimately start with the monorepo for being quicker in shipping stuff because you can share code, but then if it slows down on CI and PR merging time, then again you're back to troll days. And so that's where task scheduling comes in, orchestration, things like that. 

MK In terms of scale then, some of the kind of more notorious monorepo companies, what happens when those monorepos get to the point where they do become massive, they have thousands of employees kind of working on these monorepos. What is the kind of impact in terms of those CI pipelines? Do they typically split out into other monorepos? What are the kinds of solutions that are in place to kind of solve those issues? 

JS The solution is usually, and that's for instance, the core features that Nx provides –our tool specifically, but also others– is that you need to apply some sort of strategy then. So you cannot really for each PR build all the projects, run a test for all the projects, stuff like that. But what we usually do is you apply different levels of optimization. So the first level being automatically detecting what changed. So if you have a library in the middle of the monorepo, those solutions, Nx included, kind of build up a graph of those dependencies. So having a graph of the whole situation of the monorepo, it kind of is able to understand if a library in the middle changed, it just follows that graph of dependencies up to the app level potentially. And then it knows, “Well, these are the group of nodes, of projects, that need to be tested, built, linked,” whatever you need to run really in terms of process and tasks. And so you can already imagine that would then include only building a subset of projects rather than an entire monorepo. And so again, contributing then to having faster CI times. So that's usually the first layer that you absolutely need to apply, because otherwise, that's also the reason why a bare bones NPN workspace or something like that at some point doesn't scale anymore, because in theory, you always build everything. And the next level is the caching that definitely kicks in. Similar to what they have announced at Turbopack, but they just do it at the bundler level, so lower level even more. What we referenced at Nx with the monorepo solution is that we do it at the project level. So each task that you run, each build, each test, we compute a hash out of the things that go into that build and we store it into a cache which can be distributed, and the next time you run it on your CI or your local machine, we just compare it against those hashes. So that way we can really not run a whole lot of things at all because we know nothing really changed so there's no point of rerunning that same computation, basically. But that's the kind of things that you need to apply at a certain scale, for sure.

BP Do you find the debate over the mono versus poly to be a religious one among developers? Kind of like spaces and tabs, and people just are fixed in their ways? Or do you think people can be flexible at this and are open to hearing about the solutions that are being developed and are evolving with time as new technologies come along? 

JS Totally. I think the main thing there is that you need to be flexible, because a common misconception that people have is like it's not that if me as a company are going into a monorepo direction that we are going to have one single corporate-wide monorepo solution. But usually the companies that I see, which are even big companies, like Fortune 500 companies, they have multiple monorepos usually. So what they have is a situation where there are a couple of related business areas and entities. They make sense because they collaborate a lot. Let's put those developer teams into a monorepo. And then some parts of that monorepo might even be published through some registry as packages and they're consumed by some other monorepos or polyrepos out there. So it's not just one, as always. It's not like the golden hammer, like, “Let's go full in.” 

BP Yeah, it depends. Of course.

JS Yeah, exactly. You kind of adjust it, and it also doesn't make sense for you to pull in everything. Because in a monorepo, you usually just get a lot of benefits out of it if you kind of share code between stuff, if they're all relationships. If it's just non-related products you throw into one Git repository that never really touch each other or have no overlap at all, it doesn't really make a lot of sense. 

BP And what about from the perspective, again, to get back to a big enterprise company. What about from the perspective of security and compliance and things like that? Is one more challenging than the other? I guess to have it all in one place you'd think would be easier as opposed to this polyrepo, but from your perspective having seen that, does one create more overhead for those kinds of considerations? 

JS Yeah, as mentioned before, you definitely need tooling in place, of course, in the sense that that helps you manage it over time. Because again, the setup thing is usually pretty straightforward when you get started. The challenging thing is more the maintainability in the long run. And so there's first of all, obviously speed, which we mentioned before, where you need to apply some strategies. But also kind of the maintenance aspect. If you have so many packages in a monorepo, who can depend on which packages, stuff like that. At some point you need to have some guardrails in place, and that's for instance another feature at Nx we introduced from the very beginning. Because Nx grew out of our consulting business of implementing monorepos in large companies. Usually what happens is we see the problem, go back, and implement it in an open source solution. And so one of the earliest features that we already had in place there is custom linking rules where you can define package dependencies. You can imagine it as a tagging system that you give to projects, and at the higher level, you define, “Well, these types of tags can only depend on those other types of tags.” So maybe everyone can kind of depend on a generic authentication utility kind of library, but the sales part should not depend on the specific authentication library deeper down or something. And so those maintainability aspects can definitely be challenging if you just naively jump into the thing, so you need to have some guardrails in place. And I know obviously code merging wise, GitHub and GitLab and stuff, they have the coding files so they work pretty well where you can, at the folder level, define who needs to approve certain mergers and stuff. But obviously if you need also view restrictions, well then that's more difficult obviously. So the monorepo is really meant so everyone can read and see all the code, but not necessarily write.

BP For top secret projects at a big company. 

JS That might not be the solution. 

MK So you kind of touched on this in your last answer, but this is something that we always kind of have to ask projects that depend on open source, and that is monetization. Can you kind of briefly describe how NRWL's business model operates and how you guys actually manage to pay your employees and grow? 

JS Yeah, so a lot of the stuff until now has basically mostly been on top of consulting. So we actually developed the Nx open source monorepo tool as our 20% time. So mostly once a week, maybe two days a week, kind of stuff. The rest of it was using it actually for consulting in companies. Nowadays, what we have done and what grew quite big also in the last year, is that the caching mechanism that I mentioned before, we have a mechanism where you can distribute that cache into the cloud and therefore easily share it across different machines and even run tasks and distribute them dynamically across machines so that they can be run in parallel, dynamically distributed based on historical data and stuff. So we kind of provide that type of cloud information and infrastructure, which again, for open source projects, we give it for free so people can try it out and play around with it. But then for companies which usually need on-prem installations of that, because they want to have the data in-house in their own kind of infrastructure and local cloud services. So that's when we charge, actually. And so that's when they either just use it and pay for it, or we come in and help them set it up and companion them in the first couple months and help them get going quickly. 

BP Right. I guess I wanted to follow up on that– how do you balance your job and your open source and consulting and the line between all of those things? For developers who are listening who are interested in taking a 20% project and turning it into something bigger, how do you approach that? Do you have open dialogue with your employer and with the folks that you contract with?

JS Yeah, no, totally. For us internally, it can be challenging, but at the same time it's kind of a big benefit. Because consulting work obviously is one type of work where you kind of have certain tasks, you help the client to get forward and then help them with that. But then the open source work is kind of more if a person is feeling more freed up, where you can hack on the tool. The interesting part for us is that we hack on the same tool that we then use for our own work. It's kind of dog fooding and that helps a bit. It can be challenging. For us, the main rule was always that you have that one day a week. You need to plan it very precisely in a sense for planning your own time, because one day is kind of not a whole lot. Especially when you develop the larger features, they might even span across multiple weeks. We've been doing pretty well, I feel, and luckily recently what we are doing right now is basically shift more after consulting and move more folks over. As our cloud infrastructure gets more profitable, we move over more folks to full-time open source. So luckily we have to deal less with that kind of balance of jumping to consulting or not. It can be challenging, but at the same time, interesting. 

MK So with an open source project that obviously NRWL is making money, has employees, all that kind of thing, managing the community aspect of that and all the pull requests that are coming through and the requests from the community and all that kind of side of things, what is your approach to that? Do you have a dedicated team set up for managing those requests, making sure that the community is kind of heard and feeding that information through to the rest of the team and the company and the product and roadmap and all that kind of stuff? Can you talk to some of the challenges that you've had working with the community and how that all operates?

JS Yeah, that's a very good point. So usually what we do is, in general, we have half year major release cycles. So that's kind of where we do kind of potentially breaking changes, although we have some automated migrations as well so usually it's not super breaking. But in the sense that that half year release cycle also goes together with a half year roadmap that we usually publish at the same time. So once the major release is out there will be a roadmap for the next half year. So I think that helps a lot, especially when you collaborate with the community to kind of communicate, “Okay, these are the big chunks of focus that we are going to look at,” so whoever wants to contribute can chime in. And for those folks we then have obviously the GitHub discussions that we use. We have a GitLab chat, a Slack channel chat where people can chime in which is quite active. I feel like we are lucky since Nx grew over five years, more or less, initially kind of slowly and organically alongside the consulting business. It allowed us also to kind of grow that community alongside, but not super fast, so we had time basically to build that up. But having those channels of Slack, GitHub discussions is very useful. We also are very vocal on other channels like Twitter and YouTube where we kind of pull in people, but that is more the educational part of it so that people can understand how things work, then can chime in to contribute. But those are the main mechanisms. And then we have rotations internally. So there's always a couple of people dedicated for the full week that mostly focus on looking at open pull requests, looking at issues, assigning those issues, making sure they get addressed. And to some degree we also at some point added automation, in the sense that issues that are kind of going stale, there's bots which aren't– I know it's not always nice. But we have after I think six months or eight months or so that no one actually interacts actively with it, we're going to close it because those are simply some mechanisms.

BP No, just turn up the bounty. You need a bigger bounty. That's all. 

JS Yeah, exactly. No, but again, it’s like that. People will get a notice that it’s going to be closed, and usually then those folks, if they're still interested, they will reach out. And so we'll keep it open, and at that point you will get a notification as well and be like, “Oh, right. There was that feature. Let's have a look and let's start a discussion again maybe,” or something like that. So we right now have I think 2.7 million downloads a week, so at a certain volume you need some sort of automation to kind of handle those issues. 

BP Yeah, for sure, for sure. And a full-time community manager, if you're lucky. 

JS Oh, yeah. If you're lucky. That might be the next one.

[music plays]

BP All right, everybody. It is that time of the show. I want to shout out someone from the community who came on and helped contribute some knowledge. Awarded yesterday to Penguin2718, a lifeboat badge for coming in and dropping an answer on a question that was about to be in the dustbin of history. “How do you store a loop output in a data frame in R?” Penguin2718 has the answer for you, and you can find it in the show notes. I am Ben Popper. I am the Director of Content here at Stack Overflow. You can always find me on Twitter, if that site continues to exist, @BenPopper. If not, you'll find me on Mastodon, I guess. I don't know, we'll see. You can always email us, with questions or suggestions. And if you enjoy what you hear, just leave us a rating and a review. It really helps. 

MK And I’m Matt Kiernander. I'm a Developer Advocate here at Stack Overflow. You can find me online, Twitter @MattKander, and YouTube as well. 

JS And my name is Juri Strumpflohner. I'm the Director of Developer Experience at Nx, and you can find me mostly on Twitter, @juristr.

BP All right. And if we don't have a blue check mark it's not because we're cheap. It's not our fault.

[outro music plays]