The Stack Overflow Podcast

What's the blast radius when your database goes down?

Episode Summary

We chat with Mark Porter, CTO of MongoDB, about his history using and building some of the foundational databases of the last four decades. We walk through his time at Oracle and AWS, then delve into the trends he's focused on at MongoDB, such as time series capabilities and serverless instances.

Episode Notes

Mark started out on a 4k TRS-80. He had to program it in assembly language, as there wasn't enough memory to use the local Basic copy.

Throughout his career, he's oscillated between using databases and building databases. He started at Caltech and NASA, using databases to store and organize space data and chip data. Then he built databases at Oracle, including versions, 5 6, 7, and 8.

After that it was back to using databases at NewsCorp for huge student data systems. 

He built databases at AWS with Amazon RDS, then moved to Grab Taxi, the Uber of Southeast Asia, and finally back to MongoDB, where he is building again.

You can find Mark on Twitter here.

This week's lifeboat badge goes to Erik Kalkoken, who answered the question: In a Slack, is there a way to see all the members that is part of that channel? 

Episode Transcription

Mark Porter Developers love databases. But developers do so much more than just store and retrieve data. Developers want to do graphs, developers want to analytics, developers want to have a connection to their mobile device. They want to do all this. So what we're doing at MongoDB, and sorry for the brand plug, but I'm pretty passionate about it, is we're building an application data platform, where the correspondence between what we produce and our main persona, the developer, we're trying to get to 100%.

[intro music]

Ben Popper What you build shouldn't be limited by your database. CockroachDB, the most highly evolved SQL database on the planet, lets you build without worrying about scale, operations or uptime. Spin up a free cluster and learn more at cockroachlabs.com/stackoverflow.

BP Hello, everybody! Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ben Popper, Director of Content here at Stack Overflow. And I am joined today by my colleague, Ryan Donovan. Hey, Ryan. 

Ryan Donovan Hey Ben, what's the news? 

BP So we do a Dev Survey every year and we asked developers what are their most loved and wanted and dreaded languages. And when it comes to databases, we often hear about a little company called MongoDB. So I guess from your perspective, looking at the big world, it seems like this is something that doesn't change. It's like remained popular, quite popular for a number of years, we thought would be interesting to chat with them. We get a lot of pitches even, I think, you know, sort of in this very area, tell people, you handle the pitches. So I'll let you speak to that.

RD Yeah, I mean, I think Mongo has done a really good job of, of kind of being the primary, no SQL database in people's minds. We got a draft in about no SQL. And it was saying no SQL databases are documents stores, which is what Mongo databases are, it's kind of taken over that brainspace.

BP Yeah, yeah. Kind of defines that category. I think that's right. Well, we have a great guest today, Mark Porter, the CTO at Mongo. So Mark, welcome to the show.

MP Well, thank you very much for having me today. I'm excited to talk about all the stuff you guys have already been talking about. [Ben & Ryan laugh]

BP Yeah, we're trying to lead it in there a little, give a little bit of lead. But Mark, you have a fascinating history and the world of tech and software. So yeah, just for people give them you know, quick backstory, and who you are, and how you got into the world of programming and tech.

MP So who I am, I am a relentless tech geek. I've loved tech my whole life. In fact, my my Twitter handle is @marklovestech. I have used databases since I was 14, with some really ancient technologies. Started out on a 4k trs 80 model, one computer, we had to programming an assembly language because there wasn't enough memory to use the local basic copy. And I very quickly got into databases, and I was talking to someone the other day, and he pointed out something I'd never noticed, which is I've oscillated between using databases, and building databases. So I started out at Caltech and NASA using databases for space data and chip data. And then I built databases at Oracle versions 5, 6, 7, 8 for about 13 years. And then I used databases at Newscorp for huge student data systems. And then I built databases at Amazon with Amazon RDS, and Aurora. Then I moved to Grab Taxi, which is the Uber of Southeast Asia, and use databases to deliver 15 million rides and meals a day. And then came back to MongoDB. And here I am building databases again, I frankly can't get away from them. [Ben laughs]

BP I love that story. I wonder, does that mean that, you know, at each point, you had some sort of frustration or saw some sort of like opportunity for innovation, you know, you kind of would build something, then you'd be the user of it, then you'd realize that like the next sort of turn of the wheel was coming, as you move between those jobs, were new paradigms and databases emerging?

MP Yeah, it's been really interesting. half my career, I've been in the bow and half my career, I've been the target. And I gotta tell you that sometimes as a customer, you're not really happy being the target of what has been produced. Look, the reality is, is relational databases have been the modus operandi since 1970, when Cod, first did his paper. And then Oracle was the first company that released them in 1979. They were actually known as Relational Technology back then, and then changed their name later to Oracle. So the mission criticality of databases has never been in doubt. What has changed is the amount of data the way we process that data. And what's really, really important. And it used to be duplication of data was important and things like that. And while that's still important, what's really important now is developer productivity. Bar none. That is job one for any mission critical software company is developer productivity and innovation.

BP Yeah, that makes a lot of sense. It does seem like data has become almost this overwhelming force for some companies. Ryan, I don't know if you have experience with this, but I've been getting a lot of pitches and in talking with folks on the podcast, and you know, it's gone from we're using data to we have data lakes, and there's a data iceberg. And, you know, we're only sort of scratching the surface of what we might be able to do with this sort of endless flow of unstructured data that we're collecting. And as you mentioned, yeah, a lot of times what they're looking to do is understand it in a way that allows them to enhance productivity or automate certain processes, which right now are very time labor intensive.

RD Yeah, at my previous job, I worked out an article about data pipelines, you know, ETL processes in that. Like there's a becoming a separation, I think, between your production database and the database you use to gain insights. Right, then the production database has to be fast. But the inside database, it can be a little more, you know, flexible in how it produces data, right?

MP So we think about systems of record, we think about systems of insight. And, yeah, I mean, definitely, different people want to do different things with the databases. And so what we do is we think about personas. Are you an analyst? Are you a developer? Are you an AI ml engineer? Are you a PhD data scientists? We always try to come at it from the customer, and what they want to accomplish.

BP Yeah, I think that's so interesting, because as you said, obviously, databases have always been part of working in the world of software and computers. But increasingly, there are these specialties that are very important, and which are producing these really interesting results that themselves are devoted to data, as opposed to it being something that, you know, needs to be part of the larger process. And so Mark, I wanted to touch on something, which is that you had part of your career at AWS, which now you know, has grown into quite quite a behemoth. And yeah, just wondering if you can talk to us a little bit about what you learned there, and maybe how some of that applies to the role you have at MongoDB today.

MP Yeah, so I joined AWS as the general manager of AWS RDS, which at that time was probably the largest fleet of databases in the world. And that fleet grew just tremendously while I was there, it was, it was amazing. You know, just showing that it's not just databases, it was managed databases that mattered. So RDS did not build any of its own databases, RDS vended, by the time I left over a million, significantly more than a million Postgres, MySQL, Maria dB, Oracle, and SQL Server databases. And so the product that we produced was managing those databases. And people love it when their database stays up when the backups and restores work, when you can change parameters, when failover works and all those things. However, over time, as much as I loved running those databases, I became frustrated with how they were shackles almost on customer innovation and customer operability. And so we developed this system called Amazon Aurora, which changed out the storage system underneath Postgres and MySQL. Obviously, we couldn't do that for the commercial databases. And we made those databases so much more resilient, so much more durable, so much more available. But we kept running into the fundamental limits of a rigid architecture, of high failover times and a single primary architecture, which meant that the blast radius of a system going down or plan changing in Oracle database, I mean, takes down a whole company. [Ben laughs] And I can talk more about availability, in fact, you'll have trouble stopping me talking about availability, if you get me started.

RD I mean, that's, that's the big thing about noSQL is is availability, right? Replicability, the speed of access?

BP Yeah, for folks who don't know, let's lay out the value prop here. Like, what is sort of the difference between the two? And why would you prefer one over the other? You know, you mentioned shackles, I love that word. But yeah, you know, what are the limitations that it allows you to avoid when when you move to a noSQL? And I guess you know, you to the degree that it makes sense, yeah, talk a little bit about availability, or I guess, you know, what I would say is almost like how robust your system can be.

MP So I do think availability is really important. But from just from a value prop point of view, the main reason that noSQL was started was multiple things. Number one was this platform reliability. I actually think you guys had a podcast with Elliot, about a year and a half ago, where he talked about the founding of MongoDB. And I will give a shameless plug for one of your other podcasts, which, which is a great podcast that Elliott did. And, you know, in it, he talked about the fact that they want to do 400,000 transactions per second, and there's no way they could do it. But along the way, they did something even more important, which is they develop the document model. And the document model is just a natural way to program. When you want to add a field to a noSQL application that you're writing, you just added in your code and your struct or in your in your structure in Java, or Go or Rust or whatever. And the database automatically starts having that field. So it's not just about availability. Now, to get to your point about availability, MongoDB uses what's called a sharding architecture or replica set architecture, where you can't actually configure a MongoDB, that doesn't have three nodes. And those nodes automatically do elections and they automatically start up. And as opposed to relational databases, where failover is measured in 30 seconds, 60 seconds, 90 minutes, 10 seconds, failover in MongoDB, is measured in single digit seconds. RP 99.9, election time in our Atlas service is less than seven seconds. And why is that important? Because when an app is down for three to five to seven seconds, people go 'Hmm, what happened, what's going on on my phone?' When it's down for 60 seconds, they've already visited another website to complete their purchase. And so there's a fundamental difference. So the ability to stay up, and the ability to, to be available is one. The second ability is the ability to scale without limit, we have customers running a petabyte in MongoDB clusters. And with over 1000 nodes, you just can't do that with relational. Even Aurora, which I just got to tell you I love deeply, because I helped architect it, you have one writable master or primary and up to 15 read replicas. And if you run out of the ability of that master or primary, to take rights, you're done. You got now split your database and do crazy stuff. So those were the fundamental premises of databases. So but the thing that's really missing there is that developers love databases. But developers do so much more than just store and retrieve data. Developers want to do graphs, developers want to analytics, developers want to have a connection to their mobile device. They want to do all this. So what we're doing at MongoDB. And sorry, for the brand plug, but I'm pretty passionate about it is we're building an application data platform, where the correspondence between what we produce and our main persona, the developer, we're trying to get to 100%. So let me tell you a funny story. Before I started at MongoDB, as a board member, I want to know what this product was. And I was sitting in a bathtub in Mexico drinking a margarita, and I got my iPad out and I said I don't know, what is this thing? Why am I joining this board? And I spun up a MongoDB database, I loaded 350 meg of application data, I built an aggregation pipeline. And I built a chart. And I did that all sitting in a bathtub on my iPad, I gotta tell you, I was sold. This felt like the most developer focused database I had ever used. And I gotta tell you, I wasn't actually that sober. And so if that tells you something about the ease of use, there's another picture for ya. [Ryan laughs]

BP Yeah, that's interesting to hear you say like, right, the ease of using it or even the ease, you know, of thinking about how you're doing when you're writing the code. I do remember Elliot Horowitz who was on, you know, a year and a half ago when he was CTO, talking about how a lot of it grew out of right, his own frustration as a developer working with databases. And you, I guess, have kind of a unique perspective, having built them and use them in equal parts. But he was definitely coming from, as you said, that persona of like, I'm a developer who's frustrated with this stuff I built, what was it a double click, you know, a huge online ad business, but fundamentally felt like what it felt like to work with databases was too frustrating. And there had to be a better way. So that was kind of a cool, though, the genesis of it, the inspiration of it was neat, because it's very much building the product you yourself would want to use.

MP Yeah, exactly. I think that building a product yourself would want to use, we still do that at MongoDB today. We have developers actually, you know, work with customers, and come back to the team and go, yeah, I think we could make that thing easier to use. And so we really work hard to keep that connection all the way from our customer success teams to our support teams, right back into core engineering.

RD So what are customers asking for? What's the new things that developers are pushing against with the database world?

MP So that's a great question. I mean, developers, we're kind of a picky bunch. And you know, the first thing we want is we want to be able to sit in our ID, and we won't be able to do our job. So we have a really awesome VS code plugin, which just lets you do everything you want to do with MongoDB, including prototyping, including data manipulation, right? In VS code. The other thing people want to do is they really, really, really want to not stand up more and more and more infrastructure. So they don't even want to how big their machine is anymore. And so we just launched two weeks ago, at our live conference, we launched Serverless, which now you just get that magic endpoint. And then second, about a year and a half ago, we launched Search. We have over 1000 customers who you Search. But for you developers out there, we did something different than anybody else. And this was an idea before my time, but I can still be proud of how clever it was. We took the Lucene search engine. And rather than standing up a different set of nodes and a different set of clusters, we sat it right beside the wiredtiger storage engine and MongoDB running on the database nodes. And so there's no duplication of data. And there's no delay. And there's no ETL between search and your database. And so literally, you want to stand up that search bar, we did a demo Dot Live, where one of our developer advocates showed that it takes less than five minutes to turn on search, put your search bar in your app, and start searching your MongoDB data with real time text search in five minutes or less. And so that's what developers want do developers just want to crank on their apps, they want to do it sitting in their editor. You know, I love all these people who talked about all these cool things they have. But developers today still have VS code, or Eclipse or IntelliJ, or whatever their app is. And then they have 20 terminal windows open. And so the other thing we launched recently is Mongo Shell, and Atlas API, where you can actually now provision and control your Atlas instances, right from your terminal window. You don't have to go to a web page and, and control everything. 

BP Yeah, I want to talk a little bit about some of the new stuff that you're announcing this year, because I know that by the time this podcast comes out, you'll just have had another big event, I guess, you know, like, over the years, we've seen Mongo grow in popularity, obviously, we've seen it being extremely popular year after year, on the depth survey, when I see people, you know, sort of trying to argue the opposite. Often they talk about a few things, you know, the loss of the ability to do transactions or multiple transaction between different set of applications, you know, and some of the, like, flexibility. This always happens within development. You know, the contrary to that would be oh, well, you've lost some of the the integrity of the data structure. You know, maybe maybe we did want a strong schema, after all. So I guess yeah, you know, how would you respond to that sort of like, the blowback to Mongo's obviously sort of meteoric rise in popularity as some of the new stuff that you're introducing meant to address those? Do you think that, you know, it's really more about like, understanding the trade offs and leaning on Mongo's strengths?

MP Yeah, those are great questions. Thank you. So the first thing I'll say is, I've been in databases for 35 years, and I retired in 2000. I don't need to work. I came to MongoDB. Because I literally think it is the capstone of my career to build a database endpoint that protects your data is easy to program against, doesn't require a bunch of DBAs dancing around a database champion spells to keep it available. So that's the passion that we talked about at the beginning of the call. I would not have come to a company that didn't have transactions and schema enforcement. So MongoDB launched full acid transactions in 2018. We've been certified by Jepsen, on those transactions, we actually have always had atomic transactions inside documents, then we launched inside collections. And now you have full acid transactions across massive clusters. So that's thing one. Now, to tell you the truth, a lot of things don't need a transaction. If I'm just recording a web click in a document, that's been atomic since day one with MongoDB. But if I'm recording transferring money from one account to another account, and I'm tying that together with a log transaction in another collection, yeah, you should do an acid transaction with that. Now, I asked my Postgres Aurora team to give me flexible transaction consistency. And they came back and said they couldn't. They said that with relational databases, you just get perfect transaction consistency. So I couldn't back down to get unbelievable ingestion speed. Without transactional consistency. On MongoDB, you get to choose whatever you want, you can choose low consistency, you can choose high consistency like acid. And let me tell you something else you can do with MongoDB, that you can't do with any other database out there. We have consistency models, where you can say, I want this to hit two data centers, we have multi region and multi cloud databases. And you can literally set your consistency that this must hit two regions, or it must hit two cloud providers. You can't do that with anybody else. Now to your next question on flexibility. Yeah, I mean, I gotta tell you, I love standing up my application and just inserting fields and doing all that. And that's great for POC in an application, proof of concept. But yeah, I mean, MongoDB is not a schema less database that's been this. I just gotta tell you, I think it's a horrific mistake, that use of words, in some ways with MongoDB. You think more about how your application uses data, and you structure your schema. And then you can turn on JSON schema enforcement, which is a standard, which we abide by, where you can get warnings if people do things that are against your defined schema, you can scan your schema for things that don't obey the schema definition, or you can actually enforce schema just like you normally database. So you ended your question by saying, hey, what are the trade offs? Are you willing to make the trade offs? I'm gonna tell you right now you don't have to make the trade offs. [Ben & Ryan laugh] I'll stand by that statement in the comment stream in this podcast.

BP Yeah, I'll admit that I'm not well versed enough in this to argue, again, I can only raise the questions, but we'll let the commenters weigh in. And we can share them with you after Yeah, we'll have to dive into the comments section after the podcast comes out.

MP Absolutely.

BP So just a few more questions I wanted to ask one was, I know, yeah, you had a couple announcements at MongoDB 5.0, the MongoDB Live. So some of this was about time series capabilities. And then Serverless instances and Serverless is just a trend I've been hearing more and more about from what we've got on the blog and people who are coming on the podcast. So can you speak to those two announcements just sort of quickly? And I guess, yeah, sort of say like, you know, what it's about and then also how it ties into maybe some of the larger technology trends we're seeing play out across the ecosystem?

MP Yeah, so time series is great time series lets you load in data that is organized by time. And because we now have collections that do it right beside your other collections, special time series indexes, special time series operators, people can just have that data in one data store rather than standing up a purpose built data store. And so that's thing, one thing, too, is Yeah, you talked about serverless. serverless is just, you know, it's so powerful for a developer to just get the end point, and not worry about scaling up or scaling it down. We are starting with, you know, applications where people just want to get going, we're going to be adding more and more features to serverless over time. And the other thing is, it's cost sensitive, everybody knows that you want your database to scale down when you're not using it and scale up when you are using it. And while MongoDB currently does that, with instance sizes will scale up and down. Serverless is just so much more granular. And such a better experience for developers. So now you get a choice of choosing your machine, scaling up and down your machine, or not even really carrying that there's a machine.

RD We've been talking a lot about Kubernetes and other infrastructures code. And I think this having a database like Serverless makes a lot of sense for any kind of network application. Like, you know, I know, database engineers who muck around making sure you know, it's replicating right, across data centers, or everybody's up at the same time, just having that endpoint and be like, give me the data, I don't have to mess with the rest of it.

BP Yeah, I think one of the things that, you know, was pretty inspiring to me was talking to people who were themselves sort of developers, but also founders, entrepreneurs, and how these things that we're talking about, this flexibility with these cloud services, allows you to scale up and down in a way that really offers you a lot more runway for your business, you know, when there's a big surge in demand, you can handle it. And when there's not, you're not spending nearly as much on your overhead and your infrastructure. So they had mentioned oh, Acloud Guru, which does a lot of cool, like online training and certification. And like, you know, when there was a big surge in something, they scaled up to meet that demand for the lessons, and when there was a lul, I don't know, summer vacation, you know, like they weren't, they didn't have the same overhead on their bills. So I actually think that's kind of cool, beyond just the the coding aspect of it, but also how it affects, you know, the ability of entrepreneurs.

RD You know, one of the things that I think I'm really excited about is just the way innovation is now what's really important. And so, with MongoDB, because you just stand up in your editor and start coding, you can do that. With Realm, you can start coding a mobile application. And you can bring real time insights with, you know, data lake and analytics and merging it all together. And so what we're seeing customers do is tear down their, their old relational monoliths, and build them in a new modern infrastructure, which just lets them innovate faster. And I mean, that's what customers want to do today is they want to innovate faster. Like I said, at the beginning of the podcast, you know, the the constraints that used to be the constraints in the 70s, and 80s are no longer the constraints. And the constraints now are how fast can you innovate in the marketplace?

BP Right. And can you keep up with the competition? [Ben laughs]

MP Can you keep up with the competition when all they have to do is go to a different URL.

[music]

BP I will end the episode, as I always do, I will shout out the winner of a lifeboat badge, somebody who came on StackOverflow and found a question with a score of negative three or less, gave an answer and got it up to a score of three or more and got themselves an answer score of 20 or more. Today, I really, I don't know how this question lasted for four years. 'In a Slack, is there a way to see all members that is part of a channel?' Well, I have to say thank you to Erik Kalkoken for answering the question, but of course, it was closed. Still, if you need that knowledge, it's there on Stack Overflow. I am Ben Popper, Director of Content here Stack Overflow. You can always find me @BenPopper on Twitter, email us podcast@stackoverflow.com. And if you liked the show, go ahead and leave a rating and review. It really helps. Ryan, tell people who you are and where you can be found.

RD I'm Ryan Donovan. I lurk on Twitter @RThorDonovan. And if you have a idea for a blog posts, please email me at pitches@stackoverflow.com

MP Hey everybody, I'm Mark Porter and I'm CTO of MongoDB. And you can find me at @marklovestech on Twitter or on LinkedIn at just Mark Porter. pretty easy to find.

 

[outro music]