The Stack Overflow Podcast

Moving up a level of abstraction with serverless on MongoDB Atlas and AWS

Episode Notes

The history of computing has been a story of moving up levels of abstraction: from hard-coding algorithms and directly manipulating memory addresses with assembly languages to using more natural language constructs in high-level general purpose languages to abstracting the hardware of the computer in cloud compute. Now serverless functions take that abstraction even further. We’ve made the algorithms that process data simple and natural; MongoDB wants to do the same for how we persist data.

On this sponsored episode of the podcast, we chat with Andrew Davidson, SVP Products at MongoDB, about how they’re turning a database into a fully-managed service that developers can use in a more natural way. Along the way, we discuss how the cost bottleneck has moved from the storage media to developers’ minds, how greater abstractions can enable developers, and how to get insights from production data faster.

Episode notes

Try MongoDB Atlas on AWS for free.

You can get started with MongoDB Atlas directly from the AWS Marketplace.

If you’re at a startup, you can take advantage of their special offer for startups.

The community edition of their classic database is available to download as well.

If you’re looking to learn a thing or two before diving in, check out MongoDB University.

Our thanks to Great Question badge winner Derek 朕會功夫 for asking How can I reverse an array in JavaScript without using libraries? You know the rarest kung fu of all: asking great questions.

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my wonderful colleague and collaborator, our blog editor and newsletter empresario, Ryan Donovan. Hey, Ryan.

Ryan Donovan Ooh, empresario. That's a new one.

BP So today we have a sponsored podcast from the fine folks at MongoDB. Ryan, what are we going to be chit-chatting about today?

RD We are going to be talking about databases in the cloud and how they specifically affect serverless computing.

BP Nice. I think that Mongo is one of the names that comes up most often when we talk about databases, and we've definitely had a lot of folks on the show from that organization over the years, but this sounds fresh to me. I'm not sure we've ever gone into necessarily the relation to the cloud and especially the serverless, which is a topic that comes up.

RD We just posted a blog post with them.

BP Oh, we did? Nice. All right. Well folks, if you're listening be sure to check that out. We've got a blog post up that dives a little deeper into this stuff. So without further ado, we'd like to welcome Andrew Davidson, SVP of Products at MongoDB, to the program. Andrew, hello.

Andrew Davidson Thanks so much, Ben, Ryan. Great to be here. Longtime Stack Overflow user and advocate. I can't tell you how many times you got me out of a tough corner when I was trying to write that code that I couldn't get working, so this is a real honor to be here and certainly to talk in particular about what we're seeing with developers today and how we conceptualize MongoDB Atlas and our journey to the cloud and where we're going from here. There's so much to discuss.

BP Great. So for folks who are listening, just give us a quick flyover. How'd you get into the world of software and what is it you do day-to-day in your role?

AD That's a great question. I actually originally grew up in Silicon Valley, and I spent let's just say the earliest part of my career at the lowest part of the stack down in the physical layer, if you will. And what I realized over time was that software's eating in the world, so I, like all these other folks, have to get into this software game. And when I started looking at the software ecosystem I was frankly very lucky to find a database company at the same time as I moved to New York City. Who would've expected that? I mean, MongoDB was this incredible software company building a database in New York City of all places. And when I moved to New York it was such a wonderful fit. And going back to that low down in the stack sort of physical connection, I realized databases are intuitive to reason about because state has so much gravity to it, state is so challenging, state is essentially a physical construct so for me it's always been an intuitive area to think about. And over the years what I've come to realize is, if you look at where developers spend a lot of their time, it's in wrestling with state. So the core spirit I think of MongoDB from the very beginning has been about how do we enable developers to have a more elegant, integrated way of dealing with state data in a way that doesn't feel so unnatural. And if you look at kind of the history of that unnatural way we've interfaced with data, it goes back actually long before cloud to the 1970’s when the key cost bottleneck in computing was the cost of storage. And I often ask people, “What do you think the key cost bottleneck of computing today is?” And I would argue it's actually developers’ minds. And so everything has changed, everything has flipped. And so I've been having a lot of fun with MongoDB over the last 10 years doing this. And MongoDB Atlas, about six and a half years ago we launched it, and it's by far the majority of our company revenue at this point. We get over 150,000 people signing up every month which is just an incredible community of people continuously coming in and learning to build on our developer data platform.

RD It's interesting you talk about the databases’ holding state and in cloud computing super popular right now. It's kind of an ephemeral bit of computing, especially in the serverless sphere. Can you talk about what the sort of issues, concerns, needs are in merging those two fields?

AD Yeah, that's a great question. The general shift for developers and software development over time as well as with cloud computing has been one of moving up levels of abstraction. If we think about the earliest days of, for example, AWS and the emergent moment of IAS, where essentially for the very first time you had commodity hardware available as a service, in those early days, MongoDB, for example, was software that essentially allowed you to democratize that distributed compute into a distributed system that made it accessible to developers through that developer API, that document data model, and that was really cool and interesting. But over time it wasn't enough to just have a bunch of VMs that expressed themselves through this wonderful developer document API. You needed to kind of continue going up a level of abstraction. That's really where we felt that pressure from our customers. They were just saying, “Look, I love what you're doing. I love this idea, but the truth of the matter is having you expressed as a fully managed service that I don't even have to worry about, that I can just think of as a deployment endpoint in the cloud, that's going to become more important potentially than even the power of what you could express inside the database.” And so we realized we’ve got to go all in on that. That's why we went all in to build Atlas whenever we started seven and a half years ago. But as we've seen, the expectations continue to rise. It wasn't even enough to just say, “Snap my fingers, deploy a database cluster somewhere in the world.” We're now at the point where folks don't want to have to think at all about sizing. Just as there might have been a time in which you thought about VMs, then you moved maybe into containerization, and then maybe you moved into serverless functional compute in Lambda or Fargate EKS. What we're seeing now, especially in AWS there are incredible thought leaders on this, is of course that you want to have that same ease of use, that same level of abstraction in the stateful layer that sits behind those serverless functions. So if you're using Lambda, having MongoDB Atlas have a serverless variant that can keep up with those demands, those spikes, those unexpected bursts without you having to do any sizing and worrying about it. Now I want to be intellectually honest that the reality of serverless computing is one of optimization necessary when you do have very large scale workloads or when you're optimizing for cost, for example. By no means are we in some magic land now where you can just write some code and everything will magically work. But the amazing thing is that we're at that point for a lot of early scale small to medium size use cases, and I think that the scale of which you can think that way is only going to rise. So 10 years from now, I think serverless is totally the default.

BP Sweet. So for folks who are listening, can you describe, at least in your opinion, are there benefits using databases that aren't restricted by that relational SQL architecture? I know that's something Mongo is quite famous for. So from your perspective and as it relates to the cloud, what are the benefits there?

AD Well yeah, the core thesis from the very beginning for us has always been anchored in this idea that if you think about how you write code, you're mostly thinking in terms of objects. And those objects have this rich structure to them, they have embedded documents and arrays in them, and that's a first class way that you think as a developer, always. And the idea that you should have to sort of fracture that object across a bunch of tables leads to two fundamental challenges that really are the core thesis for why MondoDB exists. The first challenge is that it's hard to reason about, hard to make changes, it's too brittle, it's inflexible essentially, once you've gone down that road, because you don't just think about the way your data looks, the way it's stored, you're actually having to introduce this whole impedance mismatch. But perhaps more importantly, it's non-performant as well. It's non-scalable, it's not built for continuous availability and up times that we expect today, which on the backend requires distributed systems that can handle limitless scale. So when you have this kind of double whammy of being able to actually shape your data and documents that reflect the objects that you naturally write code in every day, and you have the scalability, uptime, and performance benefits that come from that, it's that productivity and flexibility, the ability to evolve your application, combined with that scalability and performance, it's that double whammy that is just so profound. But in the earliest days, MongoDB was being pulled into use cases that we certainly weren't yet ready for, to be honest with you. And I think that just showed how much thirst there was to be able to break away from those paradigms that had so much complexity to them. Luckily we've, over the last 14 some odd years, been able to continue to invest, first with critical capabilities in line with the mobile revolution around geospatial and later critical capabilities with the new storage engine and multi-document ACID transactions. And of course, with the rise of what we can do in Atlas more quickly, the power of full text search inside the database, powered on the backend by Lucene and Atlas Search, and so many other things that we were able to rapidly bring to market that are such a great combination with so many of the building blocks on the AWS side, ranging from all the app-tier building blocks I was mentioning before like Lambda and EKS to SageMaker for building machine learning models and many more.

RD I mean, it's almost like you said, that next level of abstraction. You don't have to manage the tables and the rows, you just throw data at it.

AD It's funny you say that because, yes, but you also need to think about things like indexing and schema. And that's where I love this idea of sort of respecting the developer. Developers are people who can figure these types of things out, and in many ways MongoDB does put more onus on the developer on the application itself. But in the context of doing so, it frees the developer and it frees the application to move much faster and to do so much more.

RD So how easy is this for something like a small startup to take advantage of?

AD That's a great question. I was mentioning that incredible number of people signing up for Atlas every month– 150,000. And if I think about who those people are, it's everyone ranging from someone who's just learning to code for the first time all over the world in Vietnam or India, to folks in some of the most sophisticated enterprises in the world building hardcore, large scale mission critical applications in banking and large scale games and in government. And so I think the key is that this level of abstraction, the document model fully managed as a service on MongoDB Atlas, in combination with the incredible building blocks that AWS provides, allows small teams to move faster than ever before. And so many world-changing upstarts are being born all the time on this platform. If you think about some of the largest digital native challengers that have emerged in the last decade, many of them were built on this platform. I think many of the ones that'll emerge in the next decade will be the same. And of course many of the enterprises that we see that are very quickly retooling and responding to threats and are able to really sort of understand that increasingly they are at their core software companies, almost every type of enterprise that that shift has happened, developers being the sort of artisans of our time crafting this digital economy that is the lifeblood of the entire economy now, I think we see this general shift that small teams in every size of company can do so much now, and that's what's so empowering about the whole thing.

BP Cool. I like that you talked about both large and small companies there, and yes, that something we see at Stack Overflow all the time is clients arriving from all kinds of industries, finance to clean technology to law, that have large and complex software teams and software stacks. So I guess, for our listeners, talk about some of the use cases for Mongo and AWS that you find most interesting. If you were to sit down with some engineers at a bar and just sort of relate what has impressed you recently, what kind of use cases or examples would you bring up?

AD I mean, it's hard to even know where to begin. One thing that I even talk to my team about is, when you're in a fundamental platform layer of the economy that's so general purpose, you're sort of not focusing on particular verticals or particular use cases. You have to focus on enabling all of the above. So if I think about mobility or context in which mobile or device use cases are proliferating, think obviously the phone in your pocket, but also kiosks that you interact with. Sometimes you don't even think about it– the airport or in the brick and mortar store or on devices throughout your day, powering the backend of all of this mobile and edge context computing is a major focus for us and it's some of that hidden operational data that's just behind the scenes making it all possible. I think about what Verizon is doing in partnership with us in that area, taking advantage of our edge building blocks to power their 5G edge networks. I think about what people are doing in cutting edge software companies. We see this shift towards low-code and no-code where basically the citizen developer, or the person who might've previously been maybe a business analyst or a business operations person, can now actually unlock agility for their business by building software without having to write code. And of course, while we don't directly generally interface with those types of people because MongoDB is all about code, so many of our customers are building platforms that enable those things. So a great example of this is Uncork, for example. They've revolutionized first the insurance industry and are expanding widely across enterprises and financial services as a leader in this low-code space, dragging Atlas along and unlocking and showing what can be done in that software segment. I think e-commerce, great companies like Shutterfly. I've got a baby at home and just having all those mugs that touch our lives with my baby and my family photos on it, and just thinking about those types of companies that basically interface with your lives across many form factors, very much e-commerce-forward revolutionary. These types are great. And of course financial services all over the world. FinTech, there's a great company out of Brazil building on MongoDB Atlas and AWS called PicPay, a leading payment processor down there. Brazil, if folks don't know, is a complicated payments market, let's just put it that way. And for them to have emerged as this fast mover there, it's just an incredible success story. And there's so many more, massive games, crypto banks, and chatbots now. I mean, it's everything.

BP Yeah, today my favorite piece of news to flow across the wire was an agreement between a big international law firm and an AI bot called Harvey. They announced that it's been working on their cases for the last seven months. They really like what it's done, and they're going to bring it on board to help all their 3,500 lawyers around the world do their research and their legal arguments. So it's kind of a wild time we're living in.

AD I love it. And guess what? Every one of these applications, what do they have at their core? Operational data. Operational data or transactional data, data that is changing, that is making it possible to back end these applications that we're all interfacing with. That's really the area that is sort of the unsung hero. It's funny, we live in a time in which everyone talks about analytical data. I think it's because it's sort of more intuitive for the person on the street to think about a big spreadsheet, if you know what I'm saying. We all get that. Deriving insights from data, that's important. But it's funny, this operational or transactional segment that we sit in which is the center of spend traditionally in the data market, it's the center of what powers every kind of application in business, because it can never go down, it is the lifeblood of the economy. It's sort of a mystery to people, and I've been scratching my head wondering, “Why is operational data such a mystery?” And I realized it's really because it's abstracted away by the wonderful applications we all use. We don't have to worry about it. I'm just ordering a rideshare car to my home or a delivery grocery experience or ordering something from Shutterfly. I'm not thinking about operational data and it just happens and that's the magic of it. So it is funny to be kind of behind the scenes like that with AWS as well.

RD Right. And that is a huge pile of production data there. That's got to be available on demand for whatever traffic that comes at it. Can you talk a little bit about the sort of auto-scaling that happens for those e-commerce rushes?

AD Totally, yeah. So with MongoDB Atlas for example, the data portion of the estate is always frankly the hardest part to scale. When you have a stateless application tier, that's much easier to build autoscaling around and of course that's a pretty well understood space. But with a database service like in MongoDB Atlas, when it comes to autoscaling, this is a critical part of what we do to respond to those changes and we have sort of many different ways of doing autoscaling. There's many different layers of it from storage to compute, vertical and horizontal capabilities that you can opt into when you want to take advantage of that. And it's also a major focus for us of course in the context of our serverless database offering, which is an option for us where essentially you worry less about the autoscaling details and it's fully just abstracted away, and we do anything in our power to give you the ability to respond that much more quickly to changes in the workload demand. It's one of those things where there's always going to be some level of back pressure if you're seeing orders of magnitude level changes in workload demand, but it's kind of fun to try and build a database platform for this massive scale across this massive fleet and optimize it so that, say, on the order of a single order of magnitude level scale need you're ready to go and can respond extremely quickly. And if it goes beyond that, you do everything in your power to move fast to satisfy it as well.

BP Yeah, that makes a lot of sense. You were just talking about how you keep up with demand, and earlier on you talked about things like being able to burst from the cloud side as needed. Ryan's example was e-commerce. What do you think are sort of the key elements of deployment, flexibility, and performance tooling? If you had some developers who were sitting down and considering what tools to use or what approach, how would you describe some of the advantages from your side of the house?

AD That's a great question. For something as critical as a developer data platform like MongoDB Atlas really powering that operational data layer of an application, it’s one of those things where you don't want to have to become an expert in using lots of different ones of those, frankly. But at the same time, you don't want to have to write all your applications the exact same way. You're going to have different needs, different use cases, different scenarios, different skill sets. So how can both of these be a reality? Well, that's where I think we obsess so much about ensuring that the experience of MongoDB, MongoDB Atlas in particular, as expressed in the developer layer, is something that's ubiquitous and idiomatic in the context that people need them to be in. So for example if you're a Python developer and you're using PyMongo and you're writing a dictionary straight into the database, the database is returning a dictionary back to you. You don't have to think about it really as something that's kind of breaking with what feels Pythonic. And the same goes of course with Node.js, JavaScript obviously being in many ways traditionally kind of the center of a lot of the early MongoDB love around the MERN stack and everything. And the same with Java, Spring Boot, et cetera, C#, .NET, Golang, and Rust. All these languages, what we'd think about is, “How do we figure out how to make sure there's an amazing stack, an amazing framework in every popular language and even emerging stacks and languages, and make sure that every team that's trying to build in the ways that reflect that team skillset can all take advantage of this operational data standard behind their code that sits underneath it so that they're not having to learn how to operate lots of different operational data engines?” Because that's a big challenge. And I think the other part of making that a possibility is to be able to express the vast majority of the stateful needs that you need from those applications. When we kind of looked out at the sea of alternatives, it wasn't just that relational and tables existed. We saw kind of a different trend play out whereby folks would typically run into the limits of what they could do with those tabular relational models and so they would layer in a new system, maybe a key value store, to kind of enable certain features in their applications that required continuous availability or scale. And then we saw this trend of, okay, now I've got a key value store in the mix that has limited query ability, so I'm going to augment this gap with a search engine, perhaps. And now my app has to sprawl across three different operational engines: a key value store, a search engine, and a relational database. And all of the sudden everything gets slow and so I'm going to put a fourth layer in, maybe a caching layer. We saw this kind of dynamic of the four different engines powering apps so frequently that in many ways I think that was a core thesis for us to flip it on its head and say, “What if I could have the strong transactional general purpose capabilities and secondary indexes that I knew and loved from relational databases, and what if I could have the uptime and scalability that I might have gotten from key value store, and what if I could get the rich query ability of a search engine and the performance of a cache through one API, through one interface that I as a developer from any idiomatic language of my choice or framework could interface with?” This idea is kind of a groundbreaking one that allows for standardization but still allows all the teams to express themselves and build the way they love.

RD So are you all trying to put an end to ETL pipelines? Death to data lakes and that?

AD Well, it's funny because it's like when you create a larger highway in your city thinking it's going to reduce traffic and in the end it just causes more proliferation of neighborhoods such that in the end there's going to be more traffic. I think the reality is, by lowering the barrier to building applications, which is in many ways what's happening here, smaller teams doing more, what that means is there's more applications, more proliferation, more need to interact across application boundaries and microservice boundaries and desynchronize them, and so there's inherently going to be a data lifecycle, both in the operational layer as well as a data lifecycle. Once you kind of go beyond the operational lifecycle downstream into your data lake or your machine learning data science, you're going to have to close the loop and operationalize those models back in your applications, all that stuff is still happening and in a big way. But yes, in many ways not having to have the builder of the application go do a bunch of plumbing to ETL data from an operational store to another operational store to another operational store just to make the app work, that is a game changer and is in many ways the core focus for us to get across. Like, “Wow, what if I could push that down to the provider.” In our case, we’ve got to do all that heavy lifting for you, which is a lot on us but that's what we're here for. That's our job.

BP Cool.

[music plays]

BP All right, everybody. Thank you so much for listening. Hope you enjoyed the program. As always, we want to give a shout out to someone on Stack Overflow who came and spread a little bit of knowledge around the community. Today, thanks to Derek, who was awarded the Great Question Badge– a question with a score of 100 or more. Thanks, Derek. “How can I reverse an array in JavaScript without using libraries?” We've got an answer for you. I am Ben Popper. I am the Director of Content here at Stack Overflow. You can always find me on Twitter at @BenPopper. Email us with any questions or suggestions about the podcast, podcast@stackoverflow.com. And if you like what you hear, why don't you leave us a rating and a review? It really helps.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. That's stackoverflow.blog. And if you want to reach out to me, you can find me on Twitter @RThorDonovan.

AD And I'm Andrew Davidson, SVP of Products here at MongoDB, and it's been a real pleasure to be with you today. You can find me on LinkedIn. I'll also just encourage anyone who's potentially new to MongoDB, we have a free to get started, free forever, free tier on MongoDB Atlas. You can also download our community edition of the software. We also have massive Coursera-like courses on our MongoDB University. And you can get started as well on the AWS Marketplace or check out our MongoDB for Startups program which has a special coordination in partnership with the AWS Activate program. There's so many ways to get started in our community and we look forward to seeing what you're going to build next.

BP Very cool. All right, everybody. We will throw those links in the show notes so be sure to look for them there. And as always, thanks for listening. We'll talk to you soon.

[outro music plays]