The Stack Overflow Podcast

A conversation with Spencer Kimball, creator of GIMP and CockroachDB

Episode Summary

Spencer Kimball, cofounder and CEO of Cockroach Labs and co-creator of the GIMP image editor, tells Ryan and Ceora about how database technology has evolved to handle massive data volumes, how Cockroach labs came to focus on solving latency issues through serverless technology, and his “relatively gentle” transition from engineer to CEO.

Episode Notes

Spencer was one of the original creators of open-source, cross-platform image editing software GIMP (GNU Image Manipulation Program), authored while he was still in college. He went on to spend a decade at Google, plus two years as CTO of Viewfinder, later acquired by Square.

In 2014, he cofounded Cockroach Labs to back his creation CockroachDB, a cloud-native distributed SQL database.

Database sharding is essential for CockroachDB: “a critical part of how Cockroach achieves virtually everything,” says Spencer. Read up on how sharding a database can make it faster.

Like many engineers who find themselves in the C-suite, Spencer went from full-time programmer to full-time CEO. He says it’s been a “relatively gentle” evolution, but he can always go back.

Like lots of you out there, Spencer started programming on a TI-99/4, the world’s first 16-bit home computer.

Connect with Spencer on LinkedIn or learn more about him.

Today’s Lifeboat badge goes to user Hughes M. for their answer to the question Multiple keys pointing to a single value in Redis (Cache) with Java.

Episode Transcription

[intro music plays]

Ryan Donovan Hi, everyone. Welcome to the Stack Overflow Podcast. I'm Ryan Donovan. I run the blog here at Stack Overflow. This is a place to talk all things technology and programming. Today, I'm joined by my co-host Ceora Ford.

Ceora Ford Hi, everyone!

RD You're a little bit under the weather.

CF Yeah. COVID is still a thing for too many of us unfortunately, but thankfully this time around it's pretty mild.

RD Hopefully, you'll be back on camera soon enough. We have a special guest today, Spencer Kimball. He's the CEO of Cockroach Labs. Very excited. He's also the co-creator of GIMP. We're going to talk about where he came from, where he's going, and all things database and cloud. Hi, Spencer. How are you doing?

Spencer Kimball Hi, Ryan. Yeah, thank you for having me. It's a pleasure to be part of the podcast.

RD We're glad to have you here. For a lot of these podcasts we like to talk to our guests and see where they came from, how they got started in technology, your basic origin story.

SK Happy to share it. It goes back a long way, actually, all the way to the TI-99 Texas Instruments. It was a computer that allowed you to store your data on a cassette drive, so you’d use magnetic tapes. But that was my introduction and that's where I started programming and never really stopped interestingly until I became a CEO. And even then it took some time, but these days I don't program much, but I miss it.

RD I also got started on a tape drive on a C-64.

SK Yeah, the journey's been an interesting one, but I would say that databases is relatively recent over the whole scope of my career. I never was very interested in them and didn't see that they were particularly germane to the things I wanted to do. I was much more interested in graphics so we could touch GIMP, but it wasn't until I really graduated university that databases were a concern, and boy did they become a concern so they really dominated my professional career. Before that, graphics was my passion.

CF I want to hear more about how they became a concern. I'm interested in you telling that story because I'm sure that probably ties into eventually your journey starting with Cockroach Labs as well.

SK Well, it turns out that databases are fundamental to essentially every product or service or application that isn't a single user. There are some exceptions to it, but as soon as I left university and started working I did a dot-com startup. Immediately of course, data and actually large amounts of data became a critical aspect of anything that we were trying to build. And it turns out that databases are really hard. It's probably one of the harder things to build in the world of infrastructure, and it just gets harder. So every time databases create a sort of new generation of capabilities, those are quickly used and exploited by the application ecosystem and then that process suggests the next generation of database capabilities and I've just seen numerous generations over my career. I graduated in ’97, so it's been a full 25 years and in that time we witnessed, “Okay, we have to manually shard,” we had to open source databases that started to take market share away from the big monolithic commercial databases like Oracle and SQL server and Db2. And then you had Bigtable and Mongo and all the things that followed in the NoSQL movement and then there was NewSQL. And meanwhile, now there's serverless databases, and of course the cloud databases delivered as a service, the consumption models are all changing. The list of features, is it multimodal, is it like Htab so it does transactions and analytics, there's many, many flavors of databases, probably tens or even hundreds of distinct flavors and then many, many more different products within each one of those sort of subcategories. And it's hard to build them and it's actually hard to consume them. So you can kind of see how since they're central to virtually everything that's built they are everyone's concerns and so they quickly became mine. And it's a good thing to have as a concern because the depth to which you can go in terms of finding innovative solutions that add value is almost limitless.

RD Yeah. I think it's been a really interesting last few years actually seeing how much differentiation there is in databases between the sort of financial transaction ones which need the reliability, and then the sort of real time data analytics and more. It seems that because we have so much data going around, there's more databases to accommodate it.

SK Absolutely. One thing I've realized in the seven and a half-eight years I've been working on Cockroach –actually it’s more like eight and a half if you take the pre-Cockroach Labs project on GitHub– we have seen a pretty large increase in the surface area that Cockroach is addressing. I'll just give you two really recent examples. One is absolutely embracing the idea of multi-region. So this could be useful for saying that you want to survive a region disappearing, like the east coast is offline or something like that. Doesn't happen often, but sometimes that becomes a business requirement if you have very valuable data like financial services. But I think more germane to the larger ecosystem is multi-region in service of customers wherever the customers are. And the smallest startup can have customers in South America or Australia even though they're based in the United States or Europe let's say. Servicing those customers in a way that puts their data close to them, both for data sovereignty, legal jurisdiction reasons, but also I think especially if you're trying to build a game or any kind of interactive capability where the real time feel of the application actually matters to the end user.

RD Right. Something where latency matters.

SK Then you're dealing with the speed of light, so you have to think in terms of “This application needs to be local to the user,” and that's not so hard to do if you're talking about the application logic. You can just run a different application server, run lambda functions in these different regions. But to have a database that creates a sort of consistent way to access data no matter where the data happens to be, let's say you have a game and someone in Australia wants to play with someone in the UK. You kind of want to make that possible. If you try to build a completely different service for your Australian users, a different service for your UK users, you create a lot of complexity and then the thing doesn't work well together. Think about Uber, right? You want Uber Global, you don't want Uber France and Uber UK and Uber United States, right? That's a pretty suboptimal way to design and the big companies don't design that way. So that's one that came up really recently, and it’s, as you can imagine, a rich vein really to mine. I mean, there's endless amounts that we can do to make that first, work, and then work better and have less complexity for the application developers. And then also we've been exploring serverless recently, which is just a really fundamental shift in how databases can be consumed that provides incredible efficiencies. What we're already doing is we're providing relational databases to anyone that wants them for free. And it's not just for free for 30 days or even a year, it's free perpetually. And part of what makes that possible is serverless. I mean, I can go into what exactly serverless means in a database context, but I won't use that much time to go into detail, but effectively it allows us to be more efficient, like a lot more efficient. Orders of magnitude in some cases.

RD Right, because there are actually servers behind it, right?

SK There are definitely servers behind it. The beauty is that we're abstracting that. Just kind of like if you think about VMware. They probably didn't invent it, but they really brought to market the idea of a virtual machine. And yeah, there's a physical machine behind it, but now with a virtual machine you can say, “Hey, I've got a hundred of these things,” and it might be just be one big physical machine, but now you're parsing that up into fine enough granularity that it can be used for a hundred different use cases and they're all sharing the machine. Before that, you would have one use case on that machine and it might be only using 10% of its capacity or 5% of its capacity. I think the average in a fleet before VMware was abysmal, it was like less than 15% and all of a sudden you can get it up to 80%. So it's this idea of “Take away the complexity of dealing with nodes and things like that with the database, how big the nodes are.” It's really just, you get a database, it can be incredibly small and you only get charged for that, could be in the free tier because it's so small, or you start to scale up and whatever you use is what you pay for and you can get to any size with very fine granularity. So that's a really great efficiency that's gained that really does allow us to change the nature of consuming databases.

CF I'm interested in hearing more about, I think a lot of companies are addressing issues with latency and a lot of people are trying to solve that through serverless technology. That's a fairly recent change, right? That's a fairly recent trend in technology just in general. I want to know how Cockroach Labs got to that point. I'm sure you started out and maybe that wasn't the initial problem you were addressing, so what was the journey like up until now, and then how did you deal with addressing new use cases and new customer needs, just like pivoting to focusing on serverless and things like that?

SK That's a good question, because we've seen quite a bit of evolution. We started off, as the name would suggest, really thinking about how this has to be a very resilient database, and so we do consensus based replication which is pretty common now in the database world. And we also said that this needs to just exploit the resources you give it and get as big as it needs to get, because use cases need more data and the ones that can accommodate more data can do more things and they can do them better. And so that's where everything's moving and wow, we've definitely seen that come to pass. But you start to build those resilience and the scale for your end users and you get committed users that are doing cool things and you start to see what the new opportunities are. And this idea of multi-region is one that's been born from looking at our customers and what they ultimately want to do. So we sort of follow those signals and those suggest the avenues by which we can further improve the database. And I think it's a really important thought experiment and illuminating to have a good idea of where you think the industry's going and whether something like multi-region is going to be something that goes from a niche requirement to a mainstream requirement, and that is what do the big players do? Think about Google when they launch something, or Facebook when they launch something, or Amazon when they launch something, or Apple when they launch something. All of these companies are dealing with the fact that users are all over the world. And they spend enormous amounts of resources in order to achieve those kinds of architectures. And why do they do it? Well, because it's a very superior user experience, and they can do it. Now, if you're a Fortune 500 company you might have lots of resources, but do you have the same kinds of engineers that Google hires? Probably not, but you would probably like to start doing some of the things that Google's doing for the exact same reason Google's doing them. So how do you go from an R&D capability that's much more focused on your lines of business and not just general cloud infrastructure, to an organization that uses the best cloud infrastructure available in terms of state of the art, but also continue to focus on your lines of business? Building financial services products, you’re a bank, whatever, you don't want to become Google. It's just not in the cards. So that's where Cockroach can help bridge the gap and that's really why we're extending the database in these directions. How do we not just bring to Fortune 500, but bring to every new startup so that they can build the way Google builds? Because Google spends let's say hundreds of millions of dollars in these directions on R&D and then of course on all the maintenance of that in order to improve their business. It stands to reason that if you can make that accessible and inexpensive enough, especially with things like serverless where you really can make the price very proportional to the utilization, then you can bring what is currently a very luxury niche capability into the mainstream and really help gain a lot of market share because you're allowing your users to do what the big dogs do, which I think everyone eventually will do.

RD One of the things I've learned about databases recently is it's not always in one place. You talk about consensus based replication, the sort of co-located databases, and you also talk about multi-region, so I assume you do some sort of sharding as well?

SK Yes. I mean, sharding’s a critical part of how Cockroach achieves virtually everything it does do. And there's some interesting nuance here. We're often compared, especially when you're just starting off, if you say, “Hey, if I need a Cockroach database cluster, or I could just keep using Aurora or RDS or I'm going to manage my own Postgres or MySQL instance,” the comparison on those things is really apples and oranges. We definitely do see that when you want the capabilities that Cockroach brings to market, it can get very, very, very large. It can go multi-region. It is balancing your data across availability zones or regions. In order to do that, you need to think about sharding at a very fundamental level, it's integral to the whole architecture of the database. That has a cost. It puts us in a situation where we have to find interesting ways to appeal to the new starts, the companies that don't have massive data scale yet, aren't trying to really embrace a global customer base even though they want both of those things to be true in the future. Like Cockroach, you have to pay a cost up front in order to lock in an architecture that can scale to the big leagues. So there is an interesting sort of process by which you bootstrap that where you really want to get ubiquity not just at the high end where people very clearly see why they need Cockroach, but also at the low end so it's actually easier and hopefully less expensive to start with Cockroach, and that's really where we're going with serverless. So we really want that to appeal to everyone. Now, the sharding stuff is a cost we pay, but it is what unlocks all of that capability.

CF I wanted to kind of touch on the fact that I think based off of this conversation it's pretty clear to anyone listening that you're very knowledgeable about databases and probably technology and software in general. And I heard you allude to this earlier that you went from being more engineering focused to now being in a role as a CEO, which probably means that you're not as hands on with coding and things like that. So I want to hear more about what that transition is like. I can see that you're still very passionate about databases and discussing some of the more technical sides of it, so tell us about what it's been like being a CEO and having to step back from doing all that kind of stuff.

SK Yeah. It's a work in progress, I'll say that. So I can’t tell you how I'm going to end, but the evolution has been relatively gentle.

CF Oh, good.

SK There was a time where we didn't have customers, we were just trying to build a product, and that was actually several years in the beginning. It stretched on for a while because we're building a very difficult product to build and it took time. And during that stretch I was both the CEO, but it was really a CEO of an R&D organization, and I was also developing quite a bit and I loved that. Eventually though, in fact our Chief People Officer who’s a friend from Google and has been with us since the beginning at Cockroach, she told me, “Spencer, stop programming.” So eventually it got to the point where I knew I shouldn't be either because it was just too distracting because when you're programming you can put a fundamentally endless number of hours into it. You're never going to be finished. There's always something cool to keep doing and I just become a real perfectionist when I'm coding. The problem is that the duties of a CEO involve lots of meetings and a lot of things that are scheduled, and as that workload increased, trying to context switch between the sort of endless input I could put into the programming process and these very regimented meetings where I had to context switch between all kinds of different things, it kind of made me a little bit sour I'll be honest. I'd sit in a meeting and I'd just been coding and my head was totally in the coding space and I'd think to myself, “What am I doing in this meeting? This meeting is not very important,” and I'd just be thinking the whole time about the coding. And at some point that became untenable. So it was a gradual process and I let go of the coding slowly. But it was interesting, one of the last things I did in the codebase was pretty crucial. I mean, we still are kind of struggling with the problem and I thought I did a decent job of solving it. But the engineering organization told me, “We're not going to take your code right now because it's really complicated, nobody knows what you did, and we can't support it, it's not well tested.” And that was the sign to me, like, “Okay, well I'm not actually helping anymore here. I need to actually focus on what my real job is.” And since then I've really grown accustomed to the new set of challenges which has been good for me. Being a CEO is a lot of things that are very different from being a programmer, but in some ways it is similar and you can find the same kind of challenges, which is, how do you build a big complex system that works together where you have to debug things continuously? And you're ultimately going for performance and efficiency and elegance and something that's functional fundamentally.

CF Yeah, that makes sense in my head. I want to know too, do you still ever have chances to code, maybe outside, like not on the actual codebase but little side projects or something like that? I can tell that you still really enjoy it so I was wondering if you ever get the time to do it on maybe a hobby basis.

SK I sometimes do. The last thing I did was I felt people were spending too much time in meetings so I went and wrote this thing that went and analyzed how much time and how it was happening based on different cohorts of employees and what their responsibilities were. And actually I found out that people weren't spending too much time in meetings. It was my intuition they were and I was wrong, but it was good to be able to actually show that.

CF Yeah, awesome. That's good to hear. I kind of wanted to discuss where you see Cockroach Labs going and evolving in the future.

SK That's a good question. A lot of people have suggested, “You're doing a serverless database, perhaps you should also be doing serverless execution and maybe even put it in the database,” like think of it as stored procedures if you know what those are, but reimagined for the world of the global public cloud. So these distributed cloud data functions that are very close to the data and if you've got multi-region, you need to run the execution logic next to the data, nobody knows where the data is better than the database. And then there’s, “Hey, we should do a stronger analytics capability in Cockroach so that it can compete with Snowflake,” and by the way, Snowflake's building something that is encroaching on the more operational database side. I mean, all of these are possibilities for the future. You're expanding the scope of what your initial product does in order to just create a larger addressable market for your increasingly large suite of products. And I will just say this, the relational database market is so big and so challenging but so rewarding if you can crack into it properly that I still think we have our work cut out for us. So, right now we're remaining very focused on just building the right next generation operational database is kind of how we think of ourselves. So that's distinct from something like Snowflake or Databricks. It's really about where do you store all of your metadata for your operational use cases, every single product, every order, every line item, all of that sort of metadata that describes the reality of a business, that's really what we want to become the best at. And if we are the best at that and we win more than our fair share of the market, the sky's the limit on that product alone just because it's the largest market in software and it's growing at 17% compound annual. So in 10 years it will have gone from about $65 billion to a quarter of a trillion. And so that's just enormous. So what we want to do is find our slice of that, and probably it's going to be the world's really big, fast growing companies that use Cockroach because that's really what we're suited to support. And when we think about the new starts, all of those companies that definitely aren't at that crazy level of scale yet, we want to find the ones that will be at that crazy level of scale. And so our job isn't to say Cockroach is the database for everyone, every developer that's building anything. That's just too wide an audience and it doesn't necessarily speak to what we're best at. What we try to say is it’s definitely the database for the most aggressive, fast growing, massively scaled, global businesses out there today, but also how do we find the folks that are trying to build those kinds of businesses as well and care about our differentiators from the perspective of, “This is our ambition. This is what we need to accomplish. We should start with Cockroach because we realize that it's easy enough right now,” and that's kind of part of what we're building with serverless. And it's completely justified in terms of what we're going to get out of it as we succeed.

RD So my last question, you talked about moving into spaces with big market potential but also a very sort of mature market. Do you feel like that's a big risk to jump in there or is it more of a risk to just kind of stay with the thing that works?

SK That's a really good question. I don't know the answer. I'm sure it's a big risk but it's also a big risk to create your own new product category because then you have to stake it out, you have to convince people that it makes sense, you have to educate everyone. And often what happens when you're generating a completely new idea is that every new customer you get is like, “This is cool, but I need this and this and we've thought about this thing,” and so you get a big customer and they start becoming the tail that wags the dog. We have an interesting alternative to that, which is, it's a SQL database, right? So SQL is a standard that's existed for quite some time. When we have a customer that asks us for something, it's typically something that a lot of other customers want. So that part's simpler and it's a huge, huge, huge market that essentially powers every new thing that's being built. So all of the use cases that are going to be built in the next 10 years will exceed every single legacy use case ever built. That's just how fast things are growing in the ecosystem. All of those will have a relational database. So when you think about breaking into the market, it's not so much, “Oh, we're going to take away things that are running on Oracle and they're going to migrate to Cockroach.” That happens, but that's not the business. The business is, how do you win all the new things? Folks as they're building new things within an organization that has many patterns they've used in the past all the way back to mainframes or client-server, all these things that existed in the ‘90’s and the ‘80’s, well, they've also embraced every other paradigm along the way. All those things exist. What we want to do is we want to win for the new platforms, the new paradigms that are being operated on. So from that perspective, I think it's not necessarily a big risk to enter a market that has a lot of competition as long as you feel that you have differentiation in ways that's going to unlock a reasonable portion of that addressable market.

RD GIMP is something I still use today, so I'm curious how did that come about for you?

SK Actually, I'll mention that it wasn't just me that wrote GIMP as an original author, it was also Peter Mattis who's one of the co-founders at Cockroach. He’s who recruited me to Google when I was there and we were both there for 10 years and then we did Viewfinder, our startup after Google, and we went to Square together. We were roommates at Berkeley, so we actually decided to write the GIMP, because this was 1992-93, I guess I met him in ‘93. We had both arrived at Berkeley kind of from Mac and Windows backgrounds because that's what you had. And all of a sudden we had these HP workstations and Sun workstations and SGI workstations if people remember that company.

RD Big Solaris boxes.

SK Yeah. And we looked at these things and we were incredibly impressed. I was coming from Windows so I think the gap was truly enormous. And then we saw open source and like Emacs and GCC and I couldn't believe it because I remember using Turbo Pascal and it was all closed source and it was so painful and I didn't have money to buy the things, we were trying to pirate the software and you didn’t have the manual.

RD Shhh!

SK I know. Let’s say that it’s 30 years ago so I'll probably be forgiven.

RD The statute of limitations has expired.

SK Yeah, something like that. But then you saw the power of GCC and how many different platforms it compiled for, and it was all open and you could just dig into the code and it just worked better than anything I'd ever seen before. So that was inspirational, and both of us as I mentioned, my huge passion and it was true for Peter as well, was graphics when we were at Berkeley. And the problem with that open source, let's call it Unix and Linux and FreeBSD and all of the commercial operating systems that were Unix based, those all had a pretty weak suite of applications that were really for graphics professionals. You could get Photoshop on Solaris, but again, it was way out of our price range. There was XV for viewing images, there was XPaint where you could do pixel by pixel painting, and compared to Photoshop, it was a pretty wretched situation. So that's really why we decided on the GIMP. I remember we sat down and wrote some manifesto. Originally the GIMP was going to be really command line based. You could just run filters, you could send images, kind of like those Netpbm, I can't remember what they're even called, but there were sort of more Unix filter based things that you could just do on the command line to do image processing and that was part of our inspiration. But as you probably are aware, we went much more into the GUI route. And in fact, Peter wrote a GUI because he was so tired of using Motif, which is what we started using, which is sort of a more commercial and also some sort of open source-y consortium type thing, but it didn't work very well so Peter wrote GTK which I think is still in use in a number of different developer platforms and things like that. So it was really scratching our own itch, and I've found that over my entire career, if that's what you can do, you at least have a leg up because you understand the customer's perspective because you are a customer.

RD Right. Solve your own problems.

SK Exactly.

RD I'm glad that in solving your own problems you solved mine too.

SK It's always good to hear that, even this many years after. It's part of why we decided to build Cockroach. We just felt like, “You know what? This is going to be useful. Let’s do it so that everyone can use it in open source,” and the company came after the project by quite some time. We didn't really realize it would be a company until we got into it.

[music plays]

RD As we do in many of these episodes, I'm going to shout out a lifeboat badge winner, somebody who got an answer score of 20 or more to a question of -3 or less that goes on to receive a score of 3 or more, somebody saving a question from the dustbin of history. And today's badge, awarded two days ago, goes to Hugues M. for “Multiple keys pointing to a single value in Redis (Cache) with Java.” So if you're wondering how to get multiple keys in Redis with Java, another good database question, check it out. We'll put it in the show notes. I'm Ryan Donovan, I edit the blog here at Stack Overflow. And you can find me on Twitter @RThorDonovan. And if you have an idea for a blog post, please email me at pitches@stackoverflow.com.

CF My name is Ceora Ford and I'm a Developer Advocate at Auth0. You can find me on Twitter. My username there is @Ceeoreo_.

RD Spencer, tell them who you are and where you can be found.

SK Well our Twitter handle is @CockroachDB. Of course, the webpage is the place to start, and you can start on serverless for free which is I think a great resource for any developer out there. And if you want to reach out to me directly, it’s spencer@cockroachlabs.com.

RD Alright. Well, thank you for listening. Like, subscribe, all that really helps. And we'll talk to you next time. Bye, everyone.

[outro music plays]