The Stack Overflow Podcast

Building out a managed Kubernetes service is a bigger job than you think

Episode Summary

On this sponsored episode of the Stack Overflow podcast, Ben and Ryan talk with David Dymko and Walt Ribeiro of Vultr about what they went through to build their managed Kubernetes service as a cloud offering. It was a journey that ended not just with a managed K8s service, but also with a wealth of additional tooling, upgrades, and open sourcing.

Episode Notes

You may be running your code in containers. You might even have taken the plunge and orchestrated it all with YAML code through Kubernetes. But infrastructure as code becomes a whole new level of complicated when setting up a managed Kubernetes service.

When building out a Kubernetes implementation, you can abstract away some of the complexity, especially if you use some of the more popular tools like Kubeadm or Kubespray. But when using a managed service, you want to be able to focus on your workloads and only your workloads, which means taking away the control plane. The user doesn’t need to care about the underlying infrastructure, but for those designing it, the missing control plane opens a whole heap of trouble.

Once you remove this abstraction, your cloud cluster is treated as a single solid compute. But then how do you do upgrades? How do you maintain x509 certifications for HTTPS calls? How do you get metrics? Without the control plane, Vultr needed to communicate to their Kubernetes worker nodes through the API. And wouldn’t you know it: the API isn’t all that well-documented.

They took it back to bare necessities, the MVP feature set of their K8s cloud service. They’d need the Cloud Controller Manager (CCM) and the Container Storage Interface (CSI) as core components to have Vultr be a first-class citizen on a Kubernetes cluster. They built a Go client to interface using those components and figured, hey, why not open-source this? That led to a few other open-source projects, like a Terraform integration and a command-line interface.

This was the start of a two-year journey connecting all the dots that this project required. They needed a managed load balancer that could work without the control plane or any of the tools that interfaced with it. They built it. They needed a quality-of-life update to their API to catch up with everything that today’s developer expects: modern CRUD actions, REST best practices, and pagination. All the while, they kept listening to their customers to make sure they didn’t stray too far from the original product.

To see the results of their journey, listen to the podcast and check out Vultr.com for all of their cloud offerings, available in 25 locations worldwide.

Episode Transcription

David Dymko And once we did all that, it doesn't sound like a lot, right? Like, "All right, we did a client, we did load balancers. We did a few things here and there."

Ben Popper Sounds like a lot.

DD I mean, some people are just like, "Oh, okay." I guess a developer will appreciate it because it is a lot of work. And then once we did all that, we were like, "All right, we're ready for Kubernetes." So there was like this two year journey of Kubernetes just in reach, but we kind of need to do all these steps to get there. And then last year in August is when we released the beta of our Kubernetes engine.

[intro music plays]

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I am joined today as I often am, by my colleague, Ryan Donovan. Ryan, how're you doing today?

Ryan Donovan Oh, I'm doing all right. How're you doing, Ben?

BP Not bad. Spring has sprung where I am. I've had many animal interactions today. I was late to my last meeting because I was rescuing a turtle that was crossing the road, so I've done my good deed for the day.

RD Good Samaritan badge for the day.

BP We are going to be having a chat today about Kubernetes. We're going to be talking with some fine folks at Vultr who are sponsoring this episode. Ryan, you and I have talked about the K8 many times. I feel like the adoption and the usage must be increasing because it comes up on a lot of the editorial we do as well as the pitches that are inbound.

RD Yeah. I think with how much cloud adoption there is and how much sort of virtual infrastructure, this is becoming more and more an easy way just to handle that infrastructure.

BP Yeah. Especially the lift and shift. All right. So we have two great guests today, Walt and David. I'm going to let them introduce themselves and give you a quick flyover of who they are, and then we'll dive into the conversation. So Walt, David, welcome to the Stack Overflow Podcast.

Walt Ribeiro Hi!

DD Thanks for having us.

BP My pleasure. Walt, why don't we start with you. Tell folks quickly how it is you got into software and tech and what it is you do over at Vultr.

WR All right. So, hello everyone. My name is Walt Ribeiro and I am the Developer Advocate at Vultr. The way that I got started was probably back in 2010. I started running C++ on my Arduino. I was trying to just write some code with a bunch of pianos and just trying to midi hack them. Then I started doing a lot of music production. And then when I came back around I'd say in maybe about 2018-19, at the height of the pandemic maybe about two years ago, I just doubled down then and I just started writing more and more code because it was the only thing to pretty much do.

BP Stuck at home being a creative technologist. I like it.

WR Yeah. Then I was the Developer Advocate at Linode and I jumped around and I did a lot of software stuff for different companies. And then Vultr reached out to me and I've been here for about a year now and it's been awesome. I love the stack; I love the people here. I think that where the company is headed and what they've built, even the things that David's built, I'm super bullish on. That's pretty much where I am today. I just do a lot of audio, a lot of video production work, and a lot of coding. That's pretty much it.

BP David, welcome to the program. Tell us a little bit about yourself, how you got into the world of software, and what it is you find yourself doing day to day at Vultr.

DD Sure. So, hi everyone. My name's David Dymko. So how did I get into programming? I think it started late-90's, early-2000's. I kind of became obsessed with computer games. It was just basically me being obsessed with like Counter-Strike and starting to mess around with like putting graphics cards in my computer trying to get better graphics, stuff like that. And then that kind of led me down into like, "I love Counter-Strike so I want to make a clan. I need a website." So it was just kind of like this really nerdy like, "Oh, this is a computer and I can do things with it," and it just evolved from there.

RD We get that a lot from guests.

BP Yeah, that's a story we hear a lot. Folks who have been on the podcast, they built forums for Tony Hawk Pro Skater 6, they were into Neopets, they were into Myspace, and they wanted to customize it, to change it, to be part of a group, to organize a community, and that's kind of what led them on the path. So I love hearing that story.

DD Yeah. So it was mostly around computers and Counter-Strike. Then I kind of fell out of it around high school. I don't know, I just kind of fell out of computers. But then when I went to college I kind of got back into it. Computer Science degree, traditional type of path there. And after that, I kind of jumped around from a few organizations here and there. A more notable company was Vonage. And that's kind of where I got exposed to a lot of API design and cloud native type of work. So after a while, being at Vonage, kind of honing my skills there, I eventually came over to Vultr with a clear vision to focus on building out a managed Kubernetes product for Vultr.

BP Just quickly for listeners who are not familiar, can you give me a couple of sentences on what is Vultr? How would you describe it? And where does it fit into the marketplace for what's being offered to developers these days?

WR So where we fit is that Vultr wants to be pretty much the best of services for all these different developers. So we have our own block storage, but also we have a lot of partners that have block storage too that we actually work with. We have a ton of great locations, too. We have our custom ISO. So there are a lot of unique features that Vultr has that others don't. We just like to be the best of breed for people. I mean, even our marketplace is filled with a lot of great apps and on top of the things that David's built, like our API, and our open source projects, and our load balancers. So that's pretty much where Vultr stands, because in some sense it's sort of a commodity, but there are some unique prospects that we have that people depend on and that we just like to be the best of breed for all of those different products.

BP You mentioned sort of a wide spectrum of things and being best of breed, but just briefly in one sentence, what are people coming to you for top-level?

WR Yeah. So in terms of just a quick one sentence, we have the best price to performance for all of our product offerings.

RD So we opened talking about how a lot of people are getting more and more familiar with Kubernetes, but you're building out a managed Kubernetes at scale. What's the difference there?

DD That's a good question because that's something we kind of learned the hard way and I think every cloud provider kind of learned this. So, me coming to Vultr, I knew Kubernetes, I knew how to work with it. So I was pretty confident. But once you need to abstract the complexity and offer a managed product that removes, I don't want to say all the headache because there's still YAML, but a big chunk of that is a very unique problem that we had to solve for. For example, a lot of tools that people are used to are like Kubeadm or Kubespray, where they're kind of building these things and you kind of have access to everything. But on a managed platform, we remove the control plane. You don't have access to it. You don't even know how it runs, where it runs, and you're not really concerned about it. It gives you that freedom and flexibility to kind of focus on your workloads and only your workloads and you don't have to care about the underlying infrastructure. So there were a lot of challenges and interesting problems that we faced on, "Okay. How do we remove that abstraction? How do we do upgrades? How do we maintain your X.509 certs so you can have HTTPS on your cluster?" So there was a lot of unique problems that I guess I didn't account for, because I'm like, "Oh, I know Kubernetes, I use it every day." And there's a huge difference between managing it, using it, and then as a cloud provider, offering it. That kind of abstracts all of those pain points. So the journey there was quite interesting.

RD Kubernetes seems like it's already abstracting the infrastructure. This seems like it's another level up, right?

DD Yeah. So with Kubernetes, you're kind of treating your entire cloud infrastructure as one solid compute. So you write YAML and you can interact, you just don't care how many instances you have. Or if you want block, you just tell Kubernetes you want block and it does all that for you. But for us, we had to integrate with a lot of these solutions. So for example for block storage, we had to integrate with the container storage interface, and that's a plugin that allows Kubernetes to work with Vultr as a first class citizen in regards to block. The same thing for load balancers or worker nodes. We had to integrate with the CCM so that Kubernetes knows that it's running on Vultr and it knows how to interact with these resources. Now the biggest abstraction we had to do was the control plane, because you're not concerned about where the API server is or where the scheduler is running or the controller manager. That's all abstracted to you; you don't even have access to it. So when you're executing these kubectl commands, we supply a kube config but it's going to a black box inherently. So some interesting problems there were that if you don't have access to that box, how does something like kube-proxy, which is what handles the IP rules under the hood, work? Because in our current setup, we're not running these as containerized services on the control plane. And that was an interesting problem because you look online for kubeadm or all these tools and everyone's like, "Use kube-proxy. It runs as a container." And we're solving for a different problem. So there were these tools that I've never heard of, something like Connectivity, which allows you to kind of proxy to these containers so if you want to use like kube metrics or extend the API server, the control plane, which is this isolated service, has to be able to communicate back to the worker nodes through these API calls. And that was a very interesting problem to solve for because it's almost undocumented. Everyone's just like, "You use kube-proxy," and it's like, "But we can't." So that was one of the bigger ones we solved along with like X.509 certs and etcd backups, stuff like that.

BP You had mentioned to me, David, in an earlier conversation that on this journey to sort of bring Kubernetes and manage Kubernetes at scale to Vultr, you ended up doing quite a lot of work on API and V2 of the API. Can you talk a little bit about how that journey went?

DD Yeah. So as I mentioned earlier, I came on board and the whole reason was like, "We want managed Kubernetes." I'm like, "Okay. That's great." So we started looking into it and we boiled it down to what's the bare minimum we need to start with. And we kind of looked at it and were like, "Okay, we need the CCM and we need the CSI's as core components to even consider to have Vultr be a first-class citizen on a Kubernetes cluster." So we built a Go client, which, hey, we have a Go client, why don't we open source this? And then that kind of snowballed from there because we had this Go client and it's like, "Hey, you know what? We can start integrating with Terraform and Packer and write our own CLI and start integrating and basically adopting and embracing open source," which we didn't do before that. So that was like step one in the journey. We took a detour to build out our open source, and one reason for that too is, as a cloud provider, we're offering infrastructure and we're offering these abstractions, but we also want to allow any user to interact with us in as many ways as possible. So if you want to use the API– great. If you want to use the UI– awesome. But maybe you want to automate everything with Terraform or other kinds of tools. So that was a really good stopgap because it also allowed us to kind of give back to the open source, but also open these avenues up for developers to interact with Vultr in more meaningful ways for them. After that, the next step was that we needed a load balancer. One of the things with Kubernetes is you need to have some kind of IP address that allows for Ingress. You can use node ports or other solutions but they're not that great for scaling. So we needed a load balancer and we didn't offer managed load balancers so that kind of led us down another detour where, "Hey, we need managed load balancers, let's build this product. We need it for Kubernetes and load balancers are great." And similar issues there. I've used load balancers in the past, and I think one interesting thing as a cloud provider is, we've all used cloud providers, and you don't think how a load balancer works. You just deploy one, you hook it up, and you're good to go. And it's completely different when you're a cloud provider and you're offering it because it's like, "How does this work?" You have to go back to these basics of how load balancing works and stuff and offer that. After that, we integrated with the CCM and the CSI, and at this point, all these integrations, one thing that always stuck with me and kind of bothered me a bit was our API, our V1. It was a product of its time. It worked, but there were a lot of things that were left to be desired. So after that, we decided that before we go and start doing managed Kubernetes, we should probably overhaul our API to a V2 and modernize it a bit with modern CRUD actions, pagination, a restful design, something that a developer who would use an API would come to expect in today's day and age. So we did that, and then another detour, we had to update all of our tooling and adopt this new API and have that be the basis for everything moving forward. And once we did all that, it doesn't sound like a lot, right? Like, "All right. We did a client. We did load balancers. We did a few things here and there."

BP Sounds like a lot.

RD It's funny, we've done posts about scope creep and this seems like not scope creep, but realizing what the scope actually is.

BP Yeah. David, were you getting feedback internally from other developers who, like you, had ideas of what it would be like to work on? Was it coming from customers? You're describing, as Ryan said, starting out with an idea and realizing all of the different features or components or approaches or technologies that you'd need to build in for this to be a great experience when it finally comes to market. So where was all that input coming from and how are you making the decisions about what to build and what not to?

DD It was a bit of both. Internally, we saw the need, for example, like Terraform. We saw that Infrastructure as Code and this Terraform thing is picking up speed and we probably should do it. So it was a bit of that where we were kind of realizing, "Oh, if we have this Go client then we can also do these three things." It was just this realization of us connecting the dots like, "We need this. Oh, but wait. If we're going to do this, it kind of hooks it all up." And it was also customer feedback. Our customer support is pretty good because we encourage Vultr users to give us feedback. And that's one thing that I think is really good, because in any of our product launches we always release it as a beta and encourage users to try it and give us feedback. That way, we can get really good feedback and have this iterative loop of what users actually want, because sometimes it's hard to gauge if they want a specific feature for a specific product, or if something we implemented just isn't working the way we thought it would. So it's a blend of both, really. And even now with something like Kubernetes, I don't have all the answers and especially now in a cloud native-y space, users' workloads are so varied. So it's really good to have these open channels of communication. We try to dogfood as much of our own product as we can, because that definitely shows us any pain points. So if I'm using a load balancer VKE and there's just something that's just really getting under my skin, chances are it's not just me. So having that feedback loop is definitely internal and external for us.

BP Very cool. I think what you're saying makes a lot of sense. It's an interesting place to be in and it feels like we end up here a lot because of the nature of our clientele, which is like folks who are themselves developers building for other developers, or people who need great software products. But it can be challenging to dogfood. It can also be, I don't know if you've heard of this little company called AWS, but sometimes the thing you build internally ends up being quite amazing in the marketplace itself. So let's move on a little bit. What's fascinating to me about this story is the evolution and the paths that led you to, you mentioned the API and what you learned about that. For open source, you mentioned going more in that direction. What drives that? Is that understanding that you're going to get more developers and momentum and attention? Is that flexibility, or hope that it will improve the speed of innovation and evolution? When you look at open source and considering whether or not that belongs in your organization, what are the pros and cons, and what ultimately did you decide?

DD I think it boils down to, if I take a step back, I am kind of approaching this with a developer mindset, right? So if I'm a developer, and there's a new tool like Terraform or Packer or CLI, I want those tools at my disposal, and selfishly, if sometimes there's a tool I want to use, or it doesn't have a good open source project or a client, I'm inclined not to really use it. Open source allows developers to become more creative and it also allows that flexibility. Because we have a Node.js client alongside a Go client. So if we have a JavaScript developer who wants to get really creative with our API, they don't have to make these direct API calls. They can just use our JavaScript client. Or, if I'm building some kind of automation tool that's cross-cloud, then having Terraform or these Infrastructure as Code tools available to them, to me, it allows developers to become more creative and open up avenues to interact with us. If we just have the UI, that's good for a large use case, but having that API and a lot of these open source tools just allows for flexibility and integrations that may otherwise not have been available.

RD We've talked to other folks who have open sourced from within companies. Did you run into roadblocks? Any people saying, "Let's not share our secrets?"

DD Not really. We haven't really encountered that yet. We have an example of this one tool that we kind of used internally that we ended up open sourcing, although now there's no need for it because of our managed Kubernetes engine. But at the time we really wanted to test our CCM and our CSI version updates, just to make it easy. And spinning up a Kubernetes cluster by hand, I don't think that's fun. I don't think anyone enjoys that. So we built this Terraform module called Condor –the pun is intended– where you could just define a Kubernetes cluster and it would build it on Vultr. That was fun to do because it was a good stopgap because we're building managed Kubernetes but we're not there yet. But we have Terraform and we built this open source library module for Terraform that will spin up a Kubernetes cluster for you right now on Vultr as you're kind of waiting for the managed solution. So that's kind of dead now because of managed Kubernetes, but at the time it was fun because that was something we built internally and open sourced it for the users.

BP I think what you said to me also stood out as the engineering mindset. If you buy a car or a computer, as you got started with the graphics cards, you want to be able to open it up and tweak it if you decide to go in a certain direction, and you want that feedback loop to exist between you and the product where if you need to build an integration or you feel like there's something you can improve, that's where open source really empowers the developer and so they're more likely to adopt those tools. And it feels like, increasingly, that plays a key role in what wins out in the marketplace, all other things, pricing, availability, performance being equal.

DD Definitely.

RD So you built out the huge backend stuff, the load balancer, the managed Kubernetes. Once you had it on the market, what things did customers drive forward?

DD Again, this is just the difference between using it and offering it. So for our first beta launch of VKE, users quickly realized like, "Hey, you're missing this crucial aggregation layer." And I'm just like, "We're what?"

RD That's why you have a beta.

DD Yeah, so I was freaking out because I was just like, "Oh my God. We forgot the aggregation layer." So, the clusters worked, except if you wanted to install any kind of API extension tooling, it just didn't work. So that was the first thing that if we didn't have that feedback loop, I probably would've realized it way after the fact. But after that, we were still getting a lot of feedback, like, "We want more specific regions," because when we originally launched we only had a few. As of today, we're in almost every region. And then it's a lot of feature requests. Like, "Hey, you don't have a cluster autoscaler." So, "Okay. We see the need for us to integrate with the Kubernetes autoscaler." Or, and this is a big pain point, "How can I upgrade my cluster?" Well, you're going to have to deploy a new one and port over your block storage.” So a lot of the things we got back were mostly around maintenance and how a user can keep their cluster going without having to spin up a new one. So availability and upgrades are two of the big ones we're focused on right now in getting out.

WR In terms of that whole feedback loop, there's a lot of social media feedback that I get that I pass on to him and his team. I mean, it's one thing to get the feedback internally or from the actual developers, but then there's just some talk that I might see and then I'll just pass it on. When I first started at Vultr, it was my first week and I just turned and said, "So who's this huge team that's building our whole Kubernetes engine and our API?" And then it turns out it was just David and like two other people. I was like, "Oh my gosh, this is crazy." So I'm impressed both from the outside looking in, and then now that I'm here to see that it was a small team, I'm just used to seeing a bunch of pair programming kind of things within different companies. So he does a lot. And in terms of like our API or our Kubernetes engine or our open source stuff, I think that having that whole feedback loop, and the fact that it's a smaller team, is actually what helps us out a lot. It keeps us quick.

BP David, Walt, what are you excited for on the roadmap? We talked about what customers are asking for, the things that you've built. To the degree that you can let us know things that are coming or things you've announced that you're excited about, what is 2022 going to hold for Vultr?

WR The big thing on my part is location, location, location. When I first started we had 16 locations. We're now at 25. And as David said, we're in pretty much every region. We just opened up a location in Mumbai last week and we have no plans to stop. So we're at 25 right now and we continue to just keep on growing that part of it too. Because the closer that we can get to the actual fingertips of our users, just to lower the ping time, or just to have a better featureset, is a big part of what I think differentiates us from many others. And so of course there's our products too, but in terms of how I feel with where we're headed for 2022 and 2023, our locations is a big part of it.

BP David, anything you want to shout out before I head us out to the outros?

DD So, as I mentioned earlier, as of actually yesterday, we enabled our Kubernetes engine in 22 locations from the original 6 or 7 we had. So people will be pleased to see that. And in the coming weeks, we have been kind of teasing it– upgrades are coming. Users won't have to be deploying duplicate clusters anymore. So VKE upgrades for Kubernetes versions is coming out soon so I'm sure that will be a great hit.

[music plays]

BP All right, everybody. It is that time of the show. We are going to give a shout out to the winner of a lifeboat badge. That's somebody who came on Stack Overflow, they went to a question that had a score of -3 or less. They gave it an answer, and the answer now has a score of 20 or more, and that question has a score of 3 or more. So they saved some knowledge from the dustbin of history. Awarded yesterday to GIVI, "How can I use the Singleton pattern in an Android development project?" All right, if you're curious to know more about Android development and Singletons, we have some knowledge for you. Thanks for listening, everybody. I am Ben Popper. I am the Director of Content here at Stack Overflow. You can always find me on Twitter @BenPopper. If you have questions or suggestions about the programming on the podcast, shoot us an email, podcast@stackoverflow.com. And if you like what you hear, please leave us a rating and a review. It really helps.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find me on Twitter @RThorDonovan. And if you have a great idea for a blog post, please email me at pitches@stackoverflow.com.

WR So my name is Walt Ribeiro again, and you can find me on Twitter @WaltRib because no one knows how to spell my last name. I used to have the @WaltRibeiro handle and then I changed it to WaltRib. And just to find out more about what we offer, you can just go to Vultr.com. You can find all of our products and our offerings there.

DD My name is David Dymko. You can find me on Twitter @DDymko. And if you have any questions or concerns about Kubernetes, feel free to tweet at me or just drop a support ticket, I guess. Thanks.

BP Yes, everybody who listens to this show, please drop David a support ticket. All right, everybody. Thank you for listening and we will talk to you soon.

[outro music plays]