The Stack Overflow Podcast

Trust as a service for validating OSS dependencies

Episode Summary

This is part two of our conversation with Kubernetes project cofounder Craig McLuckie, whose new company helps developers build safer software by validating where code came from and that it’s been properly maintained.

Episode Notes

ICYMI, listen to part one of this conversation.

Craig is the cofounder and CEO of Stacklok, which helps developers and open-source communities build safer software, secure the supply chain, and choose safer dependencies. Stacklok’s free-to-use service, Trusty, employs a statistical analysis of author/repo activity and a package’s source of origin to assess its trustworthiness.

Craig cofounded the Kubernetes project, an open-source system for automating deployment, scaling, and management of containerized applications.

Craig is on LinkedIn.

Stack Overflow user mprivat earned a well-deserved Lifeboat badge by answering Abstract class extending concrete classes.

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast. Just a heads up– you are listening to Part 2 of our interview with Craig McLuckie, one of the cofounders on the Kubernetes project. He’s done a lot of incredible stuff that has really altered the landscape of software development and has now got some really cool stuff cooking in the area of securing the software supply chain, really interesting stuff. If you want to catch the first half of this episode, we’ll put it in the show notes. Otherwise, enjoy.

[music plays]

BP So what was the pain point that Sigstore was created to address, and from there, how and why did you team up with this person to now create a new company with aspirations that build on top of that?

Craig McLuckie Well for me, I think it came down to this. Originally with Sigstore, from an origin perspective, if you look at the broad ecosystem out there, not all packaging systems, not all language environments are equal. If you look at what Golang has done, there’s some very clever technology supporting dependency management in the Golang ecosystem. In fact, I think Golang in some ways predated Sigstore in terms of having a Merkle tree and being able to actually, as you start to vendor in dependencies, generate checks right in the Merkle tree. It's a very robust ecosystem, but that doesn't exist for a lot of other ecosystems. Heaven knows if you look at even the Java ecosystem, a lot of these Java packages are being signed, but the public keys associated with it are not even being published. And so Sigstore was really intended to make it trivially easy to sign and publish provenance information associated with a package so that when you're looking at a package you could get a deterministic view of who produced it, where it was produced, and some context around the production of that package and what source was associated with that. And my story was that I wasn't as brilliant as Luke. I was thinking about this since well before the SolarWinds incident. I've been kind of obsessed about this for quite a while and tried to do some work in the space, but I just didn't get that kind of critical mass going. And then when Luke produced Sigstore, it really captured my attention. I was like, “Oh wow, this is really cool.” It's solving a very key problem, but it's just a piece of the story, it's not the totality of the story. We still need more capabilities to help people understand and create incentives for communities to actually start adopting the Sigstore capabilities to demonstrate proof of origin, and once you've actually got that, help them make informed choices about the technology that's being produced. And then on the other hand, we also need to make it much easier for these communities, which are often volunteers. They're doing this for love, they're not doing it for money; often it's not. And it's just mean-spirited to approach a community that's doing something out of the goodness of their heart because they're passionate about doing it, not necessarily because they're even being paid to do it, and expect them to jump through hoops for you. That's just mean-spirited. And so in addition to helping people get better insight using the Sigstore based provenance, we also owe it to the communities to help them produce this in a more simplistic and an accessible way. And so that's really what kind of captured my attention and that's what we've been working on with Stacklok over the last several months as we lead up to our first public release.

Ryan Donovan Yeah, the open source dependency has shown its vulnerabilities with things like the Log4j incident and of creators putting something in there that's, if not malicious, it has an intent other than the original state. So do you think there is a risk to open source because of that?

CM I think there is. So let me frame it differently– we don't have a choice but to make open source work. It's just too critical to the human condition. We are going to navigate these complexities, we are going to figure it out as a species– we have to. The question is just how. So absolutely, there's a huge risk to the supply chain. I think there's a sea change in the way that malicious actors are operating. It used to be that they were kind of like burglars creeping through your neighborhood looking for an open window or an open door, and so you could be safe by just making sure your doors were locked, meaning CVEs were a currency that made sense. Something with a CVE is a critical vulnerability. If I update it, that window is locked. No one is going to come in through the window. Now what we're seeing is these hostile, very sophisticated individuals working to undermine the integrity of doors and windows so you're installing a window that's just unlockable. And that's a sea change; that's a fundamentally different world that we now live in and it's going to require a fundamentally different approach to security. The CVE is not a good currency anymore for two reasons. One is that the aggregate quality of CVEs has gone down and it's overwhelmed developers because the signal to noise ratio is just not there. There might be a critical CVE in something in your dependency chain, but in the context that you're using it, it's not accessible. And going to an open source developer who's doing this for love not for money and telling them, “Hey, you have to update this thing because it's a CVE,” and they look at it and they're like, “This is not reachable. Why do I care? I'm doing this for love, not money. Go pound sand.” So that's not working. But the other thing that's even worse is the absence of a CVE doesn't tell you anything either. The absence of a CVE might mean, “Oh, it's a perfect package.” It might also mean that it's simply a neglected package and no one has actually bothered to look. Or even worse, it could mean that it's a perfectly malicious package– that someone's taken something good, forked it, introduced something really bad, published it, and drawn a connection back to the original source of content. So we have to do better, we as a community have to do better. And Sigstore is offering a very practical solution to connecting source of origin to artifact, and once we’ve got that bridge, that's the start and now we can kind of enrich that bridge. We can start to generate much richer context. So if you look at the work that we're doing at Stacklok with this technology called Trusty, it's helping people get a better opinion about a package. And there's a lot of different ways of thinking about this, and we're not the first people to do this, but we are bringing pretty sophisticated data science. So we look at who's contributing to this package, we look at the source itself, we look at a bunch of different activity heuristics, we do some principal component analysis, and we go figure out whether this thing looks good or not. And if it doesn't look good, I'm going to tell you. If it does look good, you're fine. And it's pretty much exactly the thing that we would be doing anyway as developers if we have the time. We'll go and look something up, we'll go to the community, we'll go poke around, we'll go figure out whether they're burning down their issues, whether they actually have an effective release cadence, we'll figure out who's actually contributing, what else they've contributed to. We've just used data science to replicate that process and make it immediately accessible to developers.

RD So do you want to talk about the new project you're working on?

CM So there's really two parts to this, and I've already mentioned one part of it, which is about Trusty, which is just a more sophisticated package explorer that uses data science and AI/ML to generate unique insights. It's something that I think will be familiar to people. We just like it a lot because it's really using that Sigstore bridge and it's drawing attention to the Sigstore bridge so you can kind of depend on it more. It's not vulnerable to things like typosquatting or StarJacking. It's actually seeking out those patterns and helping raise awareness of them and it's intended to really just showcase what's out there. The other half of it is, like I said, it's just mean-spirited to approach an open source community and say, “Hey, you have to do something” when they're already overburdened, when they're already doing everything they can. And so the other part of what we've done is we've built a technology called Minder, which is an open source project, and it's intended to just help. It's available in an open source form factor. We're also delivering it as a SaaS. It's free to use, we're not charging for it. We want to help communities achieve better security posture. We want to help them adopt the Sigstore capabilities. We want them to make sure they've got all of their GitHub settings set up just right. GitHub is obviously a critical part of the infrastructure, it's got some amazing security features, but if you're running a project with 500 repos and some of the people we're talking to are, it's hard. There's a lot of surface area. Let's just work to make sure that it's all set up just right and let's look at what else we can do. And so it's really built about what we think of these provenance generators where it integrates with a portion of your software development lifecycle. In this case, GitHub is our starting point. We'll also be integrating with build environments and production environments and a variety of different destinations. And it runs control loops, just the way that Kubernetes runs control loops. So if one of the things you want to make sure you do is have all of your branch controls just right across 500 repos, connect it, it'll watch them, and if something drifts out of sync, it'll start to spin up a control loop and work with you to move things back into sync. If you submit something and you say, “Hey, we want everything we submit to be at a certain level of trust based on the heuristics that we use,” we'll provide a control loop. And if something comes in and there's an update available, we'll submit a merge request, kind of like Dependabot, but we'll also suggest alternatives using generative AI. If we see someone using something where there's a more modern alternative or a better version or what have you, we'll make that recommendation as well.

BP Using generative AI in what sense there? How are you using it?

CM Well, so let's say you were using a package and it's abandonware. We would identify that that package is abandonware. I say that meaning that two maintainers worked on something, it got very popular, something like Copilot was trained on a dataset where this was the harkness. Those two maintainers just got fed up and they decided to move on and they just can't maintain it anymore and that branch is now end of life but there's another version that’s maybe a community picked up and maintained thing under a different name that is findable and is API compatible. What we're doing is using the large language model code models to generate a view of alternatives and then we're retraining on our data science heuristics so we're not going to recommend something that has a relatively low point of merit. And this is important. It's kind of funny, it shows up even internally through our dogfooding of these tools. So no, I'm with you. And look, we just want to help consumers, so we want to point it out when you're going to get hurt, but more importantly, I think we also want to help the producers, the people that are building this just show their work, make it easy to adopt security practices, show that they're adopting security practices, and make that obvious to the consumer via that Sigstore bridge.

BP Very cool.

[music plays]

BP All right, everybody. As always this time of the show, let us shout out the winner of a Lifeboat Badge. Mprivat, “How do I take an abstract class and extend it into a concrete class? I have an abstract class that extends a concrete class, but I'm confused about how to use it.” Well, if you were confused, mprivat has an answer for you and has helped over 18,000 people over the years and earned himself a Lifeboat Badge. As always, I am Ben Popper. You can find me on X @BenPopper. You can email us with questions or suggestions for the show, podcast@stackoverflow.com. And if you like the show, leave us a rating and a review. It really helps.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And you can find me on X @RThorDonovan.

CM I'm Craig McLuckie, the founder and CEO of Stacklok. You can find me on X @CMcLuck. And I invite you to come check us out at stacklok.com and kick the tires on some of these open source and free to use SaaS offerings that we've built.

BP Wonderful. All right, everybody. Thanks for listening, and we will talk to you soon.

[outro music plays]