The Stack Overflow Podcast

Balancing business and open source in 2024

Episode Summary

During the holidays, we’re releasing some highlights from a year full of conversations with developers and technologists. Enjoy! We’ll see you in 2025.

Episode Notes

In this episode: The birth of React, how the world’s biggest open-source business is leveraging LLMs, the creator of Jenkins on CI/CD, the creator of Node.js and Deno on open-source evolution, and an open-source development paradigm.

Listen to the full versions:

Episode Transcription

[intro music plays]

Ryan Donovan Hello, everyone, and welcome to a special holiday episode of the Stack Overflow Podcast. We are taking some much needed rest over these holidays, spending time with our families, opening presents, and we've decided to put together a couple episodes that cover some of the great conversations we've had with developers and technologists over the year. This episode is about open source, and today we're talking to five great developers from five open source projects about how they are using open source and building a business on open source products. Our first interview is with Tom Occhino, who is now Chief Product Officer at Vercel, but he's one of the creators of React over at Facebook. He's going to talk about how they moved React to open source, how contentious that decision was, and how Vercel is building a business on both that open source project and open sourcing its own project.

[music plays]

Tom Occhino This young enterprising engineer named Jordan Walke kind of had a better idea for how to do things. And you can watch the React documentary if you want the full story of the background there, it's a really good creation story, but eventually this thing that would eventually be called React was born, and my team was the place where we ended up building up a team and developing it. Once it started to pick up traction internally and the team got bigger, I eventually grew that team into an org. I think by the time I left, the React org was somewhere around maybe just north of a hundred people working on everything from our web technologies and JavaScript frameworks and things like that.

Ben Popper Can you talk a little bit about how it transitioned or if it began originally from something that was addressing needs in-house and was the brainchild of an engineer with some ideas that they wanted to be ambitious about, to something that is open to the public community and how you share both the responsibility of guiding it internally with meeting the community's needs, listening to them, and letting them also accelerate its development?

TO I love that. I think the reasons that we were excited to invest in it in the beginning became very different than the reasons we decided to continue investing in it over the long run. So in the very beginning, we saw something that made it easier and more fun to manage some of the larger applications we were building. So the classic problem with a big application is that you have lots of models and lots of controllers and these things fit together in weird ways, and as a single engineer you can't keep the whole system in your head for very long. So React introduced this idea that the component is the right sort of boundary for concerns. I can handle everything inside of my component, we could grow our teams more easily. We were really excited about the way that this thing was scaling. But the reason that we kept investing in it and that we sort of almost doubled-down was that our code was higher quality when we were building it this way. Once you have engineers focusing on doing fewer things better because they're contained inside of this component boundary, we saw that with our software we were iterating faster, we were shipping with higher quality, and there was higher satisfaction. There was a whole other shift about should we even make this thing open source? Initially, there was a lot of opposition internally to open sourcing this. We had a bunch of former Microsoft folks who really didn't believe in open source and almost went on this sort of crusade internally to say, “No, we have to share this with the world.” And then spoiler alert, when we ultimately did share it with the world, I think our initial messaging was just not very good. People didn't really get it and I don't want to say they booed us off stage, but it took some time. We had to give them five minutes and a couple other conference talks before it started to pick up. 

BP There were two sort of lines that were sent to me to spark this conversation, and I think both of them get to what you're saying. I'd like you to maybe elaborate on them a little, and they also get to that heart of commerce and commercial business versus open source. We don't build Next.js to make money, we make money to keep building Next.js. Okay. And then leverage Vercel infrastructure as a testbed for new innovation, but make that innovation available to the rest of the world. And that goes right back to what you said about how at Facebook it was specific to this infrastructure which isn't necessarily applicable to many people or shared in a way that other people can pick up on, although they have open sourced many things over time in the AI world and in the data center world and in other places. But just talk through those two things and I'll play the skeptic here. You're not in it to make money– sure, sure. But if that's the mission and the maxim, how do you do it in the way that allows you to do both– continue growing the company and thriving with the community and allowing the technology to be agnostic, it's open to everyone? 

TO So I'll take the second piece of that first. I think that we have this concept of framework-defined infrastructure, and the whole idea behind this is that I believe that most of the time that humans spend provisioning infrastructure– provisioning compute, scaling their systems– I believe that most of that work is undifferentiated. Around here, I call it ‘undifferentiated heat loss.’ And it's very hard to do this, but if you can change developer behavior subtly such that you have the ability to deploy based on the code that's written, we can just obviate the need for all of that manual orchestration and configuration. So I love to live in a world where because I've written my code a certain way, infrastructure can be automatically provisioned for me. I don't want to think about that. I'm a front end person, I like to focus on my end user experience, etc. So that's kind of part one. In order to enable that, we actually really do need to have an influence and strong opinions about how the infrastructure should be provisioned. What do I see on the front end? What do I see from the framework that enables me to provision infrastructure in a certain way? But it turns out that in building the primitives that can automatically scale and can automatically deploy the right types of compute in the right regions no matter where your customers are and can cache as close as possible, in building a lot of this stuff we ended up building a bunch of general purpose primitives that end up working really well for any workload. So I think Vercel is probably the best place to host Remix apps right now and a bunch of other frameworks. And if you want to get whatever you've built out to the world, you've got to put it somewhere, so it ends up being a pretty good business to be able to serve and render and compute even these really sophisticated workloads. So that generates the sort of business side, and that then creates this sort of flywheel of investment back into the framework that ends up driving more innovation into the infrastructure and the framework itself, which then begets more. So we've created this sort of flywheel. And this is adapted from, ‘we don't build products to make money, we make money to keep building better products,’ but I think the same is true about Next.js. We're not building Next.js explicitly to make money. We're running this business in order to keep funding Next.js, and that'll become more and more true in the fullness of time. 

[music plays]

R Donovan Our second interview is with Scott McCarty, Global Senior Principal Product Manager for Red Hat Enterprise Linux. And we're going to be talking about open source LLMs, how open source are they, and how could they be more open source? And frankly, how open source do you want them?

[music plays]

R Donovan Speaking of counterculture becoming the culture, open source has taken over the software industry, and at the same time, we’ve got the LLMs and generative AI coming in, and I hear a lot about the open source LLMs and I wonder how open source are they? 

Scott McCarty Obviously everything has become open source. Whether it is or not, it's always open source. It's open source-washed no matter what. Much as things are AI-washed no matter what, I joked about how AI ops and ML ops has pivoted the meaning within the last five years. You're like, “It was that, now it's this. It's fine, we just changed the meaning.” I think with open source LLMs, there are some. I'd argue the way, for example, Ansible Lightspeed does things, if you generate a piece of configuration, it will actually cite what work it built it off of. You're like, “Okay, that's pretty cool.” The Granite models cite all the data, that's cool. That feels like something I'd want in an open source model. But even at best, I'd say the popular ones are more read-only open source, similar to an Android versus a Linux. In open source, we've always had this challenge of, is it community-driven open source where it's read/write, or is it just read-only? Download and use it, but good luck trying to change it. 

BP I think you make a good point. If you were to go onto Hugging Face, I'm sure you'd find some that are read/write. Are they necessarily the biggest ones or can they keep up with the ones that are getting a lot of investment and staff? And to your point, some of the models that are coming out open source are definitely more that you can use this model and maybe we will share some of the details of the training or the weights, but you're not able to get involved in the building of the next version necessarily. It's interesting to hear you talk about attribution. That's one of the things we're trying to figure out at Stack Overflow, and this was something we announced when we talked about our API partnership. If the AI trains on our data, then when you ask a question, we want you to offer citations back to the Stack Overflow questions, and hopefully that will encourage more people to create knowledge in the future. It's still something that's in motion. I don't know if that's something that's been totally put out into production, but definitely agree with you on that being a key vector to think about. 

SM Yeah, I agree. 

R Donovan You talk about a lot of them being read-only open source. Is it a good idea to have LLMs write open source? Do you want somebody coming in and fiddling with weights and biases? 

SM That is the challenge. I've heard people describe it different ways. I think I've heard Jeremy Eder say, “Let's get rid of the forks.” Forking it is good. A lot of these are read-only, and so that opens you up to doing a fork, which is cool. You could take the weights and maybe even the data sometimes and then fork it or fine-tune it for whatever you want to do. But you're right– the question is, can I go back and change the original weights in the base model? I've had a lot of people argue, “Is that a good idea?” One, is it even possible, because it's so expensive. And so the argument is, sure, I think a Facebook could figure that out. They could probably create some kind of read/write GitHub-type community to feed into the next model, but that is pretty terrifying for them because they're basically going to burn all those resources, millions and millions of dollars in resources, to train the next base model. So I definitely understand the paranoia about doing that, not to mention the safety aspects. I feel a little more strongly about the safety aspects. I actually think the argument against that being open source is not well thought through. We're pretty good at this in the Kubernetes community. We know how to analyze code that's coming in and we know how to set up systems. I think community does that better than a single vendor behind their closed wall. I have to trust whoever to go do that right with internal people with closed discussions that I have no idea what's going on, whereas at least if it's in the community, at least I have input, and at least I can participate at the level that I want to. I can either watch it. If I see something that really irks me, I can chime in. I feel like the safety aspect would be a lot better in an open community-driven model.

BP You mentioned the security issue and the idea that many eyes make for small bugs, and I do recall that that's something that Mark Zuckerberg brought up in making his pitch. Now, to your point, how truly open is Meta AI in comparison to a Linux or a read/write world, we could have a longer discussion about that, but certainly they are pushing in that direction compared to some others. In his open letter, I think one of the things he referenced was that open source tends to be more secure because it's developed so transparently.

SM I saw his recent interview on Bloomberg. It was interesting to see his views on this. And I do agree, even internally at a big company like that, I see it at Red Hat, it is open source to an extent. It's not publicly open source. There's such a wide array of opinions that you get an approximation of reality of the global world. But you do have to be careful because a company like Red Hat often gets blowback because we will develop something and go, “Oh yeah, this is probably about right. We have a bunch of people that disagree and we've had all kinds of bloody debates internally, so people should just trust us.” And that typically goes over like a lead zeppelin is the problem. Even though I do truly agree with you, though. It is actually still approximately open source. 

R Donovan It's like horseshoes and hand grenades. ‘Approximately’ probably doesn't cut it for the open source purists.

SM Yeah. And it is kind of the checks and balances of it all. I guess it is like the constitutional government with the checks and balances. You're like, “All right, we do kind of need to do this right.” I do tend to believe that Meta probably and Zuckerberg does want to do it. I suspect they don't know how to do it all the way yet. And that's fine, I think that's a fair argument.

BP We talked about to what degree is a lot of the progress in the really big foundational models happening in an open source way, to what degree is that lip service? How important is it that this sort of brand new technology, which is obviously taking a big role in how many companies think about building for the future, to what degree are we going to be able to have that in a read/write fashion? Do you also, though, just feel energized? I feel like when I go on X or look at Hugging Face, it seems like people are extremely energized by their ability to share and build new things, and that even if the foundation models when they come out are not super open, like Ryan was saying, they're kind of the invisible engine and then you get to put lots of stuff on top of that. 

SM I am still excited, to your point. Even if they're not open source yet, although I do think there are some business dynamics that are happening that make people realize that the base models will probably eventually be really tough to monetize anyway, so may as well make them open source, which arguably is probably what Meta is doing. They kind of know that and they know they can kind of throw that monkey wrench in the machine without actually making it read/write open source. But I suspect that 5-10 years from now you'll have open source base models that are whatever, because they're not that monetizable anyway. And your point, the innovation probably happens at a layer higher than that. I think that's probably well accepted and well understood in Silicon Valley. In fact, they're not even investing in new base models at this point because of that problem. 

[music plays]

R Donovan Our third interview is with the creator of Jenkins, Kohsuke Kawaguchi, who's now at CloudBees building a business on Jenkins. We talk about what it was like to build an open source project that solved his own problems and how Jenkins was almost too good to build a business around.

[music plays]

BP What was it like in the open source world that you were working in from 2000 to 2004? It was very different from what people imagine today where they throw up a public repo quickly and people can come in and look and then contribute or fork. You had to what– communicate with people on bulletin boards and email and hash out what was going to happen? How did it work? 

Kohsuke Kawaguchi It's a crazy idea that the people would have to communicate to collaborate. This was one of the key phases in the life of open source because this was around the time that companies started taking on open source. I remember Sun and IBM were in a competitive relationship, but at one point they decided to work together, and then the vehicle in which they chose to do so was the open source project under Apache. I was sort of in the peripheral of that effort, which is why I was able to come from Japan to the US. So in some sense, the reason I got noticed was because of this– my side open source gig. So for me, open source was pretty big in the heart. So these are the times when there was very prestigious software foundations, most notably Apache Foundation and later Eclipse and some others, so people are trying to become a “committer,” which is a privileged status to be able to push the change directly. And then in order to do so, you kind of have to prove yourself that you're trustworthy, that you can behave, you're one of them. So you often see people start by contributing patches, which today is like a pull request, and then gradually show you that your work is valuable, you know how this community works. There's a lot of upfront investment necessary. And then I started seeing some prestigious individuals publish open source projects that were very influential, like new web frameworks and Codehaus, another hosting platform back then that was also seen as highly prestigious. That's different from, let's say, the likes of a source board where that was the GitHub of that day. Anybody could start the project, but what that meant is too many noises. It was really an environment in which great programmers could kind of make their names out of what their individual work was. That was an exciting context for individuals. 

R Donovan One of the things we come across in open source is funding that open source. And when you originally went to CloudBees, you were building the enterprise version of Jenkins. What's it like building there as a paid version of an open source project?

KK So CloudBees was a startup founded by Sacha Labourey. He was already a well-known figure in the open source space, and he was successful enough that he could fund the next startup. In a strange turn of events, I got in touch with him and he was trying to do originally a platform as a service, so like a Heroku for Java. And then in that process, he had the right vision that the delivery pipeline is an integral part of this platform as a service and so that's where I kind of came in. In that effort, we created the enterprise version of Jenkins. By that point, Jenkins was pretty popular. It just so happens that Sun was going down the tubes. So if that didn't happen, I was probably pretty happy working as a software engineer at Sun. It's a comfortable place and the work isn't too busy and we get all of the spotlights in the world, but alas, nothing in this world stays the same. So suddenly Oracle happened and then I started thinking, “Well maybe this software is popular enough. Perhaps I could do something about it.” So CloudBees was a great vehicle. And then as I started doing enterprise versions of Jenkins, the Jenkins community was so good that it was kind of hard to come up with things that you can charge for. We are selling to the people who can build their own stuff, so if something is too valuable then they'd be able to do so on their own. So initially it wasn't easy. I remember the sales rep was complaining to us.

R Donovan So I wrote an article a few years back about open source governance, and you were a benevolent dictator for life for Jenkins, perhaps still are. Do you think that governance style has its benefits? Obviously, you were a benevolent dictator. Do you think that is the better way than having a lot of cooks in the kitchen?

KK I think the key innovation in the Jenkins project that I touched upon a little bit is that it's kind of both at the same time. You described too many cooks in the kitchen. The idea is that every cook in this case, every volunteer contributor, they want to do some things. The metaphor that too many cooks overwhelms soup or something is that they only have one thing to work on, therefore they need to collaborate and then kind of compromise and do all of the painful things. So in the Jenkins project, the way we solved it is by coming up with this notion called plug-ins, it's plug-ins all the way down. And we didn't certainly invent that, but we applied that to the web app. So what that meant is that if you guys have different ideas, like if one guy wants to paint a shed in blue, the other guy wants to do it in red, go ahead. They can both do it. So I think that's really key to unlocking the innovations, because there's only so much one person can do. Collaboration is a significant friction, especially when they're talking about the global volunteer part-time community. Engineering wise, I think you can avoid forcing certain decisions, but in the social structures and as a kind of backstop, you do need some more corporate-like decision making scheme so that they are aware of the notion of benevolent dictator or some sort of condition making system, like a committee of respected village elders or whatever, is really useful. So I think it was natural that there's only certain things that the founder of the project can do because they have outsized influence. And then to have that codified, that influence is always there, whether you name it or not, and then to give it a name makes it a little easier for people to understand how this community rolls. And then the transparency, since you talked about the governance, transparency in the governance I think is actually pretty key in my mind to grow the project as opposed to people showing up to the project and sort of figuring out how this community does decision making. So I think that's beneficial. Different projects have different needs,

[music plays]

R Donovan Our fourth interview is with Ryan Dahl of Deno. He talks about how he created Deno 2 to solve additional customer problems, how he's been working on taking his open source Deno 2 and making it enterprise-ready, building support tools around it, safety, making it scalable, as well as his love of Rust, another open source project.

[music plays]

R Donovan Were there specific customer needs and incidences that led to do this or was this part of your 10 Things I Hate About Node just sort of cascading down across projects? 

Ryan Dahl I think it's just the maturity of the software. When you start a software project, you can change everything, everything's kind of crazy, and Node has definitely gone through this cycle as well. In the early days, everything was changing all the time, and bit by bit, month by month, it starts kind of getting set in stone. And with Deno, that is important because not everybody cares about how different functions are named or kind of the aesthetics of command line flags and stuff. We're just kind of not changing things arbitrarily anymore, and that is important for people to be able to build on stuff. People complain about things that change too much. 

R Donovan Breaking changes always make people mad. Stack Overflow would thank you for any breaking changes. We get plenty of traffic for it, but your customers would prefer not to, I'm sure. You mentioned you're adding NPM support, is that right? 

R Dahl NPM support has been added. Maybe 18 months ago we started work on this, but NPM support is not a feature that you implement in a month or something. You have no idea how much complexity goes into being able to run arbitrary NPM packages. You just have to support intricate built-in Node APIs. And it's just really good now. There's always going to be a long tail of incompatibilities and getting to 100% is essentially impossible and not necessarily the goal, but it's hard to put a number on it but we're at like 98% compatibility these days. Most packages you can run, most of the ones that you care about. You will probably still find things and there's issues every day, but we've been plugging away at this for almost two years now.

R Donovan Rust ends up being our most admired language in our annual survey almost every year for five or six years now. Were you a fan of Rust before and are you still a fan after working on it in a massive code base? 

R Dahl So the original prototype of Deno was written in Go in 2018, and once we decided that we were going to commit to this project and move beyond a demo, we quickly were like, “Go is not a great choice,” because Go has a garbage collector, and v8 has a garbage collector, and those things potentially are going to interact in unholy, unnice ways. It was pretty clear that we needed some native code, and the choice was essentially C++ or Rust. And at the time, Rust was pretty new and the choice to jump over and use Rust was pretty risky in my mind. I didn't know too much about it. Nobody I knew had used it. I know the creator of Rust, Graydon, so I sent some emails to him asking, “Is this a good choice? I don't know,” and basically just kind of jumped in with it. That turns out to be one of the best choices that we've made in Deno. It is fantastic. Rust scales well. In some ways, you can think of a system like Deno as kind of like a metropolis. It's like a city and there's roads connecting it to other systems, and building those roads is a lot of what we're doing. Let me not use the city analogy. The Linux distribution analogy is kind of also appropriate. In some ways, Deno and Node are two different distributions of v8, in the same way that Ubuntu is a distribution of the Linux Kernel. A lot of what we're doing is– we're not programming JavaScript itself, we are making connections, we're making APIs that allow JavaScript to call AWS, for example, to talk to some AWS API. And being able to use the Rust ecosystem and pull in infrastructure from crates.io is incredibly powerful and useful in ways that were just inconceivable when building Node. Any kind of third party dependency in Node has to be carefully, carefully integrated into the build system and thought out very deeply. You don't just go willy-nilly pulling in random dependencies into the Node code base. It's very slow. But in Deno, we can really reach for the stars and move very quickly because of Rust and the ecosystem there, and it all compiles down to native code and we have very deep control over what actually gets executed on the CPU. So suffice to say, I'm a big fan of Rust, and in terms of building a company, I think it's also been fantastic because there's a lot of people out there who like programming Rust and find it fun, and just generally makes building a project like Deno much more enjoyable versus say working in C++ which is just terrible, just terrible.

[music plays]

R Donovan Our final podcast episode for this rehash is with co-founder and CTO of Temporal, Maxim Fateev. He talks about what it was like to build a business on an open source project started at another company, as well as the various additional services that he produces to make that open source project a viable business.

[music plays]

BP And so it's interesting that you mentioned this was a problem you encountered and worked on within a company, created a solution that was open source within that company, and then stepped outside to sort of replicate this product market fit. Was there an issue of IP there? How did you go about recreating the same solution that you had done internally, externally? 

Maxim Fateev We didn't recreate it, we actually forked it. It was an open source project from the beginning under MIT license, which practically allows you to do these things. 

BP Gotcha. 

MF We forked the project. So the code base which we actually run with now is not 4.5 years old. It is over seven years old, almost eight years old because we started that at Uber. 

BP This is a strategy that more engineers need to employ. Build your startup inside of another company and convince them to open source it, and then once you've realized how well it's working, then leave and continue working on it. That's pretty brilliant. 

R Donovan I think Uber was actually pretty good about that. We've talked to other companies that were built off of Uber open source projects. I think Chronosphere is one of them. 

MF Uber might be a different company now, but back then Uber was extremely friendly to open source development. But at the same time, I don't think it is like Uber losing something, because first, I would never join Uber unless they promised me to work on an open source project. Because I had a very good offer at Amazon, and realistically I lost a lot of money joining Uber versus Amazon because Amazon stock grew multiple times while I was at Uber. But I joined Uber because they let me work on open source projects. And then at the same time, my personal opinion is that any infrastructure-level complex project, if it's company-specific, will die one day. There is no way. I've witnessed at Amazon so many cool projects which were 10 years ahead of practically the whole industry, but they never got outsourced, they ended up being deprecated 10-15 years later because open source analogs appear. The Pub/Sub system we created at Uber, at Amazon almost 20 years ago was super powerful. It was a replicated storage and Kafka wasn't even conceived. We absolutely could take over the market if we open sourced that and created a public project, but it's still inside of Amazon. I don't know if it's still used, but my point is that any infrastructure software should be open source. And if you are building infrastructure inside of your company, especially things like databases, queuing systems, workflow engines, state management, anything like that, and you're building custom, you're doing it for the fun of your developers. The long-term strategy for your company is a losing proposition because at some point this team will build and leave and the company will have to deal with that legacy. The only real path is open sourcing that. 

R Donovan Going from an open source product to a cloud-hosted service, what was the biggest technical challenge y'all faced there? 

MF Well, multiple things. Obviously just the infrastructure itself, because me and my co-founder, we were always focused on actual software and we always ran this inside of a large organization like Amazon or Uber or Microsoft, and there is always existing infrastructure which is very specific to the company. So we had to hire a pretty strong team of infrastructure engineers who would help us to run it on AWS and now we just added GCP and we will add other clouds in the future. So that was the first part, just how we mapped it to the actual Amazon infrastructure. And then creating a control plane. The good news is that we have the best tool to create control planes which is called Temporal, so if you think about it, what is our control plane? It's just a bunch of these durable execution processes which everything you do is durable and guaranteed to execute because this is just a bunch of what we call ‘workflows.’ That has made our life much simpler, but that was most of that. And certainly a lot of usual things about quotas, about provisioning and provisions and so on. It's just a lot of things. One thing we realized is that, to have a real cloud service, first, it should be fully automated. You cannot have manual operations there. They're built across 15 AWS regions in a reliable manner across thousands of customers. Without automation, it doesn't work. And then the other part is that just table stakes is a lot of stuff. We cannot just, “Oh, we need billing. We need metering. We need that. We need that. We need security and we have like 15 ways to integrate with every company's security and IAM.” So just the sheer size of that was hard and we're still working on a lot of that. Now our cloud offering is pretty solid, but we're just adding more and more features. 

[music plays]

R Donovan Well, thank you very much, everyone. Hope you enjoy your holiday season. If you have stories, ideas, anything you would like to talk about related to open source, feel free to email us at podcast@stackoverflow.com or drop a comment on this episode. We love talking about open source. We think open source has eaten the software world. So if you would like to talk about that, let us know, and have a happy holidays.

[outro music plays]