The Stack Overflow Podcast

Supporting the world’s most-used database engine through 2050

Episode Summary

Dr. Richard Hipp, creator of SQLite, shares how he taught himself to program, the challenges he faced in creating SQLite, and the importance of testing and maintaining the software for long-term support.

Episode Notes

SQLite is the most used database engine in the world.

Stop by the forum or explore the docs.

Devs have pledged to support SQLite through the year 2050.

The developer with a team of three, all of them himself at different points in time, was Tarn Adams of Dwarf Fortress fame.

On Stack Overflow, 1.2 million people have found the answer to How can I list the tables in a SQLite database file that was opened with ATTACH?.

Episode Transcription

[intro music plays]

Ryan Donovan Hello, everyone, and welcome to the Stack Overflow Podcast, a place to talk about all things software and technology. I am Ryan Donovan, I edit the blog here at Stack Overflow, and today we'll be talking to Dr. Richard Hipp, creator of possibly the most widely deployed and used database engine, SQLite. We're going to be talking about why it's so deployed, what it's like running a small company, and sort of the ins and outs of a small database. So Dr. Hipp, welcome to the show.

Richard Hipp Thank you for having me. It's a pleasure to be here, Ryan.

RD So at the beginning of the show, we like to ask all of our guests how did you get into software and technology? What's your origin story? How did you get to where you are today?

RH Well, that goes way back, doesn't it? So when I was in high school, I started programming computers and they were really hard to come by back then. And then later on in high school, I actually made a little bit of money and I was able to buy one of the very first Apple II computers when they were still being hand-assembled. I had serial number 508 and it came with 4K of memory, and that included the video RAM. And so I taught myself to program, and I got the motherboard, I had to put everything else together myself, and I really got into it and enjoyed it. But when I went to university, I actually majored in electrical engineering because I thought, “Computers, I can do that now.” And took a master's degree and then worked for a few years at Bell Labs, and at Bell Labs had discovered Unix and I thought, “Oh, this is really cool,” and went back and got a PhD in computer science. And coming out of there, I realized that that was 1992 and there was a lot of people graduating back then and you couldn't get a tenure track position anywhere and so I didn't even try. There were 500 applicants for every open position and I was way down on the bottom of that stack. So I just started a company and I was doing freelance software development, had a great time doing that, and in the course of doing that one time during a downtime and didn't have any business, I wrote SQLite and put it out there. Other open source software that I had written before I'd put on the website and it’d get 5-10 downloads per year, something like that, and I figured this would be the same. But it seemed to resonate and it really picked up, and before long it became a business.

RD Amazing. So you created this most widely deployed database engine because you were bored?

RH Well, no. I had been working on a project and I never studied database technology before. I've never had a course in databases, and just in the course of doing projects for companies, I had to deal with databases from time to time. And in the project I was working on, they used an Informix database, and it worked fine, but sometimes it wouldn't come up when they rebooted the machine and then they'd try and launch my application and it would bring up a dialog box that says, “Can't connect the database.” And I’d get the bug report because it was my application that painted the dialogue box, even though I had absolutely nothing to do with the database installation. So I thought, “Wouldn't it be cool if we had an SQL database that could just read the files directly off the disk and not have to go through a server?” And then when this sort of thing came up, I wouldn't have to depend on the server coming up and it would just work. And I looked around and there was nothing like that, and there was a funding lapse in that project and so I had a little bit of time on my hands and said, “I'll just write my own database engine. How hard can that be?” Well, turns out a little bit more challenging than I thought. If I'd studied database technology beforehand, I probably would’ve said, “No, that's too hard. I can't do that.” Fortunately, I didn't know better.

RD Yeah, fortunately. So were there significant changes over the course of it like things that you learned that you were like, “Oh, I have to make it ACID compliant,” or something like that?

RH It was a definite learning experience for me. Version one of SQLite, SQLite 1 used the GDBM library as its storage engine. Which is a GPL library, so that means that version one of SQLite was GPL2. But that's a hash storage engine, it doesn't do a range search. So I thought, “Oh, I need a B-tree interface of some sort.” And I looked around, “What can I use, what can I use?” And I looked at, from Sleepycat, Berkeley DB. They had a lot of documentation, but I looked at the documentation, read through it multiple times, and I thought the only way I'm going to figure out the details of this is to actually write test programs, and I thought, “I'll just write my own.” So I wrote my own B-tree layer, and that was version 2, but that was still only ASCII text, it didn't do blobs. And then at that point it started to get really popular, and a number of companies approached me and said, “Hey, we need binary stuff. We'll pay you if you'll add binary support.” I forget if that was AOL or it might have been Motorola that paid for that, and that was version 3 and that's what we've been using since 2004. And the file format has not changed. It's been extended, but it's not changed since 2004. But I had to learn a lot along the way. I remember I was invited to a PHP conference in Germany somewhere and David Axmark from MySQL was there and he spoke and he was talking about covering indexes where you'd pull the data directly out of the index. The thought had never occurred to me and I thought, “Wow, that's a really cool idea.” So on the plane ride back home across the Atlantic, I implemented covering indexes in SQLite. So things like that come up. I'll see an idea or somebody will tell me about something and I’ll think, “Oh yeah, that is kind of clever. I think I'll do that.”

RD I think that's the great thing about open source and the sort of collaborative sharing culture that software development has gotten into. You see this thing you're like, “Oh, right. I can add that and I can make this better.”

RH Absolutely.

RD So what was the biggest surprise that you learned? What was the biggest thing that you were like, “Oh, that I would have never thought of.”

RH What was the biggest thing I ever did? Well one big thing that really surprised me was what a useful concept SQL is. I hear lots of people, even today, saying, “Oh, I don't need SQL. I've got my key value store here.” And I was in that camp myself. I thought, “Why do I need all this complex SQL language stuff?” Now that I've written an SQL database engine, my views have shifted. Well, more than that, I wrote my own version control system for SQLite called Fossil. We started out using CVS. CVS was the venerable version control system, and a lot of people badmouthed it. I will never badmouth it because I had to use some of the things that came before CVS and so CVS was amazing. But it does have its limitations, obviously, and so I wanted to do something like that. And I looked at Git. Git was relatively new. I looked at Mercurial– that was brand new too. For both of those, the idea kind of came from Monotone. And Monotone gave me the idea of storing the content in a SQLite database, and so I thought, “I'm just going to write my own.” And I did and we called it Fossil. And that has been an amazing boon for the SQLite project, because when I'm working on Fossil, I'm using SQLite from a totally different perspective. I'm seeing SQLite from the point of view of the developer.

RD You're dogfooding.

RH I'm dogfooding, exactly. And that has really helped me to empathize with the people who have to use SQLite. And also writing Fossil has shown me how incredibly powerful an SQLite database is as an application file format.

RD What makes it so powerful?

RH Well, people have feature requests. Can you do this report or that report, and a lot of times it's just an SQL query and then a little code around it to transform the results into HTML. It's very easy to add capabilities. Because the file format is an ACID database, we don't have to worry about corrupting the repository if we lose power in the middle of a commit or anything like that. I know Git says, “Well, we don't do that,” but they have to be very careful about the commit process to make sure that they don't do that. We don't care, the database engine takes care of it for us.

RD That's nice.

RH So it's been a real learning experience for me. And now I'm definitely an evangelist to say that you really should be using SQL of some sort, even if you decide to use one of the client server database engines out there. SQL is a very powerful mechanism.

RD And I think a lot of folks, even if they have key value stores or object stores, they'll have some sort of like Presto SQL on top or something, some sort of SQL search.

RH What I see is that a lot of people think of SQL as just a complex wire protocol to talk to your database, but it's way more than that. It removes you from having to worry about the algorithms. Each statement of SQL is a small program. But the really cool thing is that with most programming languages– Python, C, whatever– you have to specify the algorithm, but with SQL, you tell it what you want and the database engine figures out the algorithm for you. And so you're able to keep your mind more in the space of the users who're going to be using your program. You don't have to worry about all these low level details of where the data is coming from and how it's formatted and how you're going to decode it. It's an enormous time savings.

RD So there's so many database options out there. What's the ‘lite’ part of SQLite?

RH The ‘lite’ part is that we do not have a server. Every other database engine that I know of, the application that's using the database constructs a message and sends it to the server which is running as a separate process usually, often on a different machine, it goes over a network, and then the server computes the answer and sends the result back. Now, I know there are other SQL database engines that claim to be embedded. Is MySQL embedded still a thing? I haven't heard about it for a while, it used to be. There's H2 which claims to be embedded, but even these other embedded ones still have a server. It's just that the server is running in a separate thread rather than in a separate process. As far as I know, SQLite is unique in that calls to SQLite are just subroutine calls. There's no separate thread, it runs in the same address space as your application, and there are advantages and disadvantages to that. A big advantage is, one, it's very small and very lightweight. It just binds with your application. It's talking directly to the disk drive and it works around that problem I had which inspired this. When your application comes up, if the machine is healthy enough to read from the file system, you can access your data. You don't have to have a database infrastructure up and running too. There's no need for a DBMA. Another positive side before I go on is that there's very low latency. An SQL query is just a function call. You don't have to marshal parameters, send it to another process possibly on a different machine, then wait for the reply. And so we can easily do thousands of SQL queries per second, whereas on a client server system, you don't want to do that because of network latency. It'll really slow you down. The downside is that because we don't have a single server running and acting as a traffic cop, it limits our concurrency possibilities. The only way to manage concurrency is with file system locks, and if a process is holding a lock and it goes down, those locks automatically vanish and everybody else that's reading the database needs to be prepared for that. And so SQLite can do multiple simultaneous reads, but there can only be one writer at a time. Writers have to take turns. So there is the concurrency issue. Now, that's not as bad as a lot of people make it out to be. There's a lot of applications in the world that do not need high concurrency, and when you really get right down to it, none of your machines actually do concurrent writes. There's exactly one wire going between your CPU and your mass storage, so the data has to be serialized over that wire. So it's not as bad as you'd think it would be, but that's the usual complaint when people think, “Oh, we can't use SQLite.” The other thing is that we designed SQLite, and this is either an advantage or disadvantage depending on your point of view, that the entire database is a single file on disk, rather than being spread across a bunch of different files. And that's really cool for an application file format, because you can take your whole database and you can send it to somebody as an email attachment. The downside is that there are file size limits, there are limits on the maximum size of a file on a lot of operating systems, and the entire database has to fit on one volume. So really, really enormously massive databases do not work with SQLite because, in a client server arrangement, they would of course mount multiple volumes and spread the data across them. You can't do that with SQLite because it's all in one file.

RD Right, it seems like it's a very like single-user bare metal database and not a big network with a thousand concurrent users sort of thing.

RH Right. This is not the thing you want to run Facebook on. And it's never intended for that, but most applications don't need that, especially on your phone, on your desktop. It's the ideal database engine for your phone. How many concurrent users are there on your phone, really?

RD Just me, I hope.

RH It's the ideal database to store the bookmarks in your web browser. How many simultaneous users are using your web browser? It's ideal for this sort of thing. For most desktop applications, it's the ideal database because you only have one person using it at a time.

RD And do you think that that simplicity has made it so easy to adopt so widely and be adopted by every Android and iPhone device and every Mac, every Windows 10 machine, et cetera, et cetera?

RH I'm still amazed by that, that I wrote something that's on all those devices. I'm just waking up every morning, “How could this have possibly happened?”

RD It's a little unnerving, a little intimidating.

RH It is. It's scary sometimes. I was on public transit the other day and was looking around and everybody's working on their phones and I thought to myself, “All of these people are using my software and none of them know that I'm sitting in their midst.”

RD That's right. You're secretly famous.

RH It will really play with your head. It will really play with your head.

[music plays]

Ben Popper Develop skills to build accurate, explainable Gen AI apps with online courses at Neo4j Graph Academy. You'll learn to ground LLMs with knowledge graphs and how to develop a reliable chatbot. Start today at neo4j.com/LLMs. That’s neo4j.com/LLMs.

[music plays]

RD So you have a lot of things depending on this software. Do you ever get support calls from some OS team saying, “Hey, we need this feature immediately.”

RH That's how we pay for this project. This is a full time job for two of us and a part time for three others. And of course the software, we put it in the public domain. It's free for anybody to download and do whatever they want with. Now, we do hold a trademark on the name, but you could make a copy and rebrand it as ‘RyanDB’ and start selling it if you want to. There's no restrictions on its use. So how do we pay for this? There are a lot of big users of it and they want to protect their supply chain and have access to the maintainers and so they pay us an annual fee, which is a round off error for them, but it's a comfortable living for us and that's how we keep this project going. And so there's a lot of that going on behind the scenes. I often get the question, “Well SQLite has been around for a while. What do you guys do all day long?” What do we sit around and play solitaire? No. It's actually very busy, a very hectic project. And maybe I stress over it more than I should, but anytime there's the least little malfunction, I don't sleep until it's fixed.

RD I think that is a very compelling feature of the company. You want somebody who cares. And we've had so many discussions about the internet resting on this small piece of software with open source human maintainers not being paid. We talked about the XZ compression library. Somebody put a backdoor in it just because there was nobody else pushing commits.

RH It's a great story for selling my customers on renewing their contracts, the XZ thing. But the other thing to keep in mind is that we do make money with SQLite, but we do SQLite for the love of the art. There are people who go into computer science in order to make money. We make money so that we can program computers. Does that make sense?

RD Oh, yeah.

RH So we're doing what we love. Even though it's stressful at times, and I'm speaking for everybody on my team and maybe I shouldn't do this, but we really love what we do and we would do it even if we weren't paid. It's just that because we are paid, we're able to spend more time doing it. Does that make sense?

RD Yeah, absolutely

RH And so it's been such an enormous blessing that SQLite has been successful and we've been able to work on it fulltime and do what we love doing and not have to worry about keeping the lights on at the same time.

RD You're not beholden to investors. Nobody's pushing you to IPO.

RH No. Of course we get the spam emails daily from the venture capitalists trying to take over, but I just press the spam button and move on.

RD But that's nice to be able to have that kind of freedom, and I think there's a lot of folks writing software who want that kind of freedom and they don't get it in software development so they try to go to woodworking or something.

RH Oh, I would have done that long ago, absolutely. I crave that freedom. I have worked as an employee for other companies and I got along okay, but I have been freelance for so long now that I don't think I would be a very good employee anymore. I think that I would try and be good, I would try and be nice, but I think I'd probably get myself fired pretty quick. It's just hard. It's hard. I want to work for myself.

RD You're both the boss and I assume you're still writing code over there.

RH Absolutely, yes. It's funny, I was talking to a young computer science graduate who'd recently graduated and knew who I was and how I was kind of famous for SQLite and stuff, and I asked her what she did and she said, “Oh, I'm just a tester.” And I thought for a second, “That's what I do all day long– test.” I do write code, but most of my time is spent testing code that's already been written. That's what we do. That's how we spend all of our time, chasing bugs and writing tests.

RD A few years ago I talked to another person who had a solo project that caught wind and they've been writing code for 10 years on this project. Someone said there's three people who work on that project. There's him from the past, him from the present, and him from the future, and they don't always work well together.

RH Oh, that's brilliant, and it's so true as well. And I have a similar idea myself in that I insist on very detailed comments in the code because I know that I will look at my own code that I wrote two years ago and think, “What idiot wrote this? Why did he do that?” And, oh, I wrote it. Oh, sorry. It's very important to document why you did things, and I'm not talking about comments that say what the code does. I can read the code and tell what it does. I want to know why it does that, what are its constraints, and so forth. And so I'm kind of fussy about the commenting in the code and the people that work on it with me, they've kind of learned my ways and they go along. And that's very important for that very reason, because our goal, my goal, is to support this thing through the year 2050. We're not even halfway there yet. We've been going since 2000 and we'll reach the halfway point in a little over a year from now. So that's a long time for a piece of software.

RD That is, yeah.

RH And it really informs the decisions that I have to make. One thing that I've really learned is that one of the most difficult and important things for a project leader to do is to say no. People come to you, “Oh, wouldn't it be cool to add this feature? Wouldn't it be cool to do this?” And it's very important to say no, and I have not said no nearly as much as I should have. And I live with the albatrosses of me not saying no to this day, but you live and learn. You make mistakes and you move on.

RD So to get to 2050, what's the thing you're doing now to build that? What are you anticipating in the future?

RH I'm not trying to make SQLite the cutting edge go-to technology in the year 2050. I'm trying to make sure that it continues to function on all platforms for the year 2050.

RD So you're not porting it to quantum computers yet?

RH No, not yet. But we do try and make it run on all available platforms. Port early and often, that's important. We do very, very extensive testing. We put way more effort into testing than we ought really, but testing is very important. And in design decisions, I'm just very hesitant to take on new features and new code. People give us a hard time sometimes because they say that SQLite is not open contribution. We don't take drive-by pull requests. And there are several reasons for that. One of them of course is that it's public domain, and if somebody just throws some code over the wall, they own the copyright to that code, and if we incorporated it, then it's no longer public domain, and there's an elaborate legal process that we have to go through in order for them to do that release. But the other thing is that if you're going to contribute to SQLite, I expect you to be around for the next 25 years to take care of your contribution and bring it to maturity. And that is not commonly the case with a drive-by pull request. And people don't understand this. They come to you and say, “Well, I've got this cool feature. Why don't you just take it? I don't understand.” What they're really saying to me is, “Hey, I made this hack that's useful to me. Why don't you take it, test it, and maintain it for me for 25 years for free?”

RD They're adding to your workload.

RH You just added to my workload. You're not helping me.

RD And if they want to add that feature, they can fork it. They can have their own version.

RH Absolutely. And I listen to these things and sometimes people do have good ideas and they'll send us, “Hey, here's a prototype implementation.” And just to work around the copyright issues, we'll reimplement it from scratch. So we do listen to these things, but we don't often just accept the code wholesale.

RD So if you've got a contribution rejected from open source, don't have your feelings hurt. It’s just that you're not going to be able to be there to maintain it.

RH Right. Don't take it personally.

[music plays]

RD It's that time of the show where we shout out a question on the Stack Overflow community. We don't have any new Lifeboat Badges, so I'm going to shout out a very popular SQLite question. “How can I list the tables in an SQLite database that was opened with ATTACH?” And that was asked 15 years ago by IZB, and seen by 1.2 million people. So we have an answer. I don’t know if Dr. Hipp wants to correct it or verify it.

RH What is the answer? I'll tell you if I agree.

RD There are a few steps to see the tables in an SQLite database. List the tables in your database– .tables. List how the table looks– .schema tablename. Print the entire table– SELECT * FROM tablename;. And list all of the available SQLite prompt commands– .help.

RH Every one of those work great. I think that's an excellent answer. Stack Overflow comes to the rescue.

RD You heard it here first. I've been Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And if you want to reach out to me with podcast or blog article ideas, you can find me on X @RThorDonovan.

RH I'm Richard Hipp. I am the creator of SQLite, and you can find me on the SQLite Forum. That's sqlite.org/forum.

RD All right, everybody. We'll talk to you next time.

[outro music plays]