The Stack Overflow Podcast

How to keep the servers running when your Mastodon goes viral

Episode Summary

Kris Nóva, a Principal Engineer at GitHub who runs the tech-centric Mastodon instance Hackyderma, joins the home team to talk about the challenges of building, scaling, and moderating decentralized platforms.

Episode Notes

A Principal Engineer at GitHib, Kris is president of the Nivenly Foundation and an admin at Hachyderm, an instance of the decentralized social network powered by Mastodon

The ongoing changes at Twitter have fueled interest in alternative, decentralized platforms like Mastodon and Discord.

Read Leaving the Basement, Kris’s post about scaling and migrating Hachyderm out of her basement.

Watch Kris’s conversation with DigitalOcean Chief Product Officer Gabe Monroy about building decentralized IT platforms.

Find Kris on Twitter, GitHub, Twitch, or YouTube.

Congrats to 

Lifeboat badge

 winner 

metakeule

 for answering 

How can I get an error message in a string in Go?

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my wonderful collaborator and co-host, Ceora Ford. Hi, Ceora. 

Ceora Ford Hello! 

BP So if people listen to the show I think they know one thing, which is that you are extremely online and enjoy time on Twitter, especially following all your K-pop accounts, but one of the interesting trends I think that's been emerging, especially in the tech and software space over the last few months, is folks moving to new platforms, and Mastodon is one of the most popular that I see coming up in my feed about where people are migrating to for various reasons. They want to be in a more federated space that they own; Mastodon has been around for a long time and has seen waves of popularity. The recent one seems to be maybe the most significant to date. Ceora, have you dabbled in this area at all or know folks who have made the transition? 

CF I haven't. I know a lot of people who have made the transition and a lot of people were saying they've moved to LinkedIn or their email newsletter or Mastodon. And the first time I was hearing about Mastodon a few months ago, I was like, “Eh, I don't know if it'll really stick,” but it seems like it actually has a substantial hold on the community so I am heavily considering trying to figure out all the Mastodon stuff and hopping on that train too.

BP Well, if that is the case I think we have a great guest today: Nova, who is the President of the Nivenly Foundation. That is the foundation that oversees Hachyderm, which is one of the bigger sort of Mastodon instances where a lot of tech people have gone, also does stuff over at GitHub, and is here today to tell us a story about helping to scale and support some of the influx of new folks who are using Mastodon. So Nova, welcome to the Stack Overflow Podcast. 

Kris Nova Awesome. Great to meet you, Benjamin, and it's really lovely to meet you, Ceora. 

CF Likewise. 

BP So say it for me one more time and I'll say it back to you. What is the name of the foundation? 

KN So it's a nonprofit foundation called the Nivenly Foundation that is the overarching governing body that is basically the legal entity that is the front for Hachyderm, the Mastodon server. 

BP So I think right within that we get this sense that this is much more of an open source project, that this is distributed in a way that is different from maybe a standard corporation that owns a social media site. For folks who aren't familiar, what is Mastodon and why do these separate entities exist in order to govern it as opposed to your traditional corporate organization? 

KN I mean, I think decentralized is the key word there. And Ceora, I would be interested in your experience on Twitter. I feel like the word ‘decentralized’ means a lot of different things to a lot of different people, and I definitely struggle saying the word myself and also just understanding what people mean when they use the word. I think in this context, what it really means to me anyway, is that we have a lot of small communities that can federate with each other or basically can engage with each other and they all speak the same protocol. So it's small groups of folks that exist on the same Mastodon instance, and if you're on the same instance as someone, you share a lot of the same home timeline and your home feed and your content. And so we're seeing a lot of folks group together based off of their interests or their core values or their beliefs or the way that they want their social media to be governed is important to some folks. Other folks have a strong interest in data privacy for example, and they definitely don't want their data to go anywhere that they don't know about so they're focusing on these strongly-typed private instances. And so anyway, there's all these instances in the world and there's hundreds of them out there. Some of them are big and have hundreds of thousands of people, and some of them have only one person on them, and all of them federate with each other. And so instead of there being a large centralized corporation like Twitter that tries to do all of the content moderation and set all of the privacy policies, you actually have an entire menu to pick from, and you can change too. So if you start on one server and you don't like it or you want to go try something else, you can move to another one. And now anyway, when I think of decentralized, that sort of model of a lot of different sized servers all sending messages back and forth really comes to mind.

CF The thing that I can closest relate this to to understand it is probably Discord, even though it's not an exact copy of each other. But Discord also has the separate servers where each server has their own rules and moderators and all that kind of stuff so it's not necessarily like you're scrolling through a huge Discord chat with everybody getting the same things, but you can choose which server you want to be on based off of the standards the server implements and the subject or interests of the people in the server. 

BP Right. So Nova, you mentioned there being these servers in different parts of the world or different communities with different standards. So who is responsible for spinning up a server and maintaining it if the community wants new features or lots of people are joining and suddenly you have a lot more data or activity to deal with. Again, within a corporation I have a fairly clear idea of how that gets done. How does that happen in a more federated, distributed system like a Mastodon?

KN This is a great question, and this is exactly why we started this conversation off with that there's a nonprofit that you need to know about in order for us to begin to have the other discussion about Hachyderm, which is just the social media instance. And so I think I would say that it's very similar, like you said, Ceora, to Discord, where you can have a lot of instances. And I also think it shares some similarities with IRC or even email to some extent. You can have your email client, you can email anyone in the world, and for all intents and purposes that's exactly how Mastodon works. If there's another server online, you theoretically could go and find them and send them a message and it's public and it looks and feels an awful lot like a tweet. So to answer your question– who is responsible for maintaining this– it very much reminds me of early 1990s IRC. It's kind of just like servers show up, they're not necessarily coupled to a corporation, maybe it's just some person who set one up in their basement. And then I think what I'm noticing is as servers grow, they all start to solve these really interesting problems and there's three or four main problems every server kind of has to start to take seriously as they start to get a decent size and I think you see folks respond to it very differently. Some folks ask for donations and it just becomes that there's a single person who has the whole keys to the kingdom and they just do a good job and they run the whole thing off donations. I think we're taking a more structured approach and trying to really get a good, what I call a medium-sized community. So we'll have a small team of folks and we have some ideas of how we want to try to lift marginalized people up in the industry based off of our small teams that we're trying to fund with donations for operating our infrastructure. It's a good opportunity for folks like myself who is a principal engineer at GitHub to partner and pair with folks who have maybe never operated production infrastructure before. So we're really trying to take our opportunity of doing open source operations seriously, and in my opinion anyway, you don't see that with Discord. There isn't a server that you need to go keep online and roll out kernel updates to and actually go and manage security keys to the thing. And we have public graphs online and there's a lot that goes into operating it, to answer your question, Benjamin. 

BP Yeah. I think one of the things that holds a lot of people back from joining a service like Mastodon versus a more traditional corporate-owned social network is that they feel there's a higher bar to entry, more complexity in terms of what the user needs to manage, and then obviously, let's say you wanted to start a community or wanted to help grow it, there becomes quite a bit of backend complexity. Can you talk to us a little bit about this great blog post you wrote? It's called Leaving the Basement, and this deals with some of the challenges you went through and how you solved them moving from a small to medium size instance, and as you mentioned, doing this in a very public way, trying to partner with folks in an open source way, and having to solve problems not in sort of a profit-driven way but in a community-driven way. Where can we find the resources and do that together? 

KN Yeah, so I think the important takeaway here to kind of start the conversation off is that we definitely did not get started with any intention of this thing blowing up. I think honestly, I felt very similar to what you just said, Ceora, which was, “Oh yeah, I've heard about this Mastodon thing. I'm kind of considering checking it out. I don't really know what it's going to look like or feel like,” and I certainly had a large amount of doubt and some reservations about the thing. It just felt weird and the first time I looked at it I was like, “Wow, this UI is from like 1995.” And it started to kind of trigger some of those nostalgia feelings of my early internet childhood. It felt kind of good but it was also a little unsettling at times. And so it started off with my Twitch stream. So I have a small Twitch stream that I use to kind of escape from work, where I can go and have my community and we do nerdy computer science things. And we wanted to set up a little Discord for ourselves and that little Discord turned into a little Mastodon server. And I think for the first eight or nine months there was less than a hundred people on our server. It was close friends and family and we were sharing cat pics with each other and that was about the extent of it. And then everything started to happen at Twitter and it turns out one of my friends turned into two of my friends turned into 5, 20, 100. And I have some medium-sized names in tech that I've had the privilege of investing into a relationship with and they joined and they were public about it. And then next thing you know, we have 20,000 people joining this thing and it's running on this little experimental server in my basement and we had a real production crisis. We had a real infrastructure need and I had to kind of be like, “Well, good thing I know how to solve this problem because I do this for my day job, but this is no longer a hobby project.” We're doing hundreds of requests per second and we have people's data here and their passwords and we have to start taking this seriously. And this journey has been really wild to get to where we are today.

CF Yeah, I can imagine. It sounds like it. Actually, I can't imagine being in that position of having to put on your infrastructure hat all of a sudden when you were just chilling on your Mastodon server and everything blows up all of a sudden. You had that level of expertise of knowing how to deal with these kind of infrastructure issues, and I'm wondering, for someone who doesn't have that kind of knowledge and background, how could they navigate an issue like this? 

KN That's a very good question. So Gabe, who was unable to join today, has a really good answer to this. So DigitalOcean has learned from our experience. If you go read the blog you can see that my relationship with Gabe is what really helped to get Hachyderm specifically to a point of where it is today. He was a user of the service and offered some of DigitalOcean’s services. And I think that in the past six months alone we've seen cloud providers take an interest in Mastodon in giving folks the button click experience, which is one way to go about doing it. I think we're seeing the open source community do what the open source community always does in situations like this. Blogs are coming out, people are forming Discord channels that are all about just operating Mastodon. Small micro communities are forming, and we're seeing even in those communities, sub-niches form of, “Let's go operate Mastodon on Arch Linux,” and, “Let's go operate Mastodon on Kubernetes,” and, “Let's talk about the trade-offs therein.” So I think that this is one of the reasons why I love open source, because you realistically can have an opportunity to go and work side by side with folks who have done this before and you get a chance to kind of learn via osmosis from the different infrastructure patterns and different things and tools that folks are using. So I think there's a lot of options available from cloud providers to getting yourself involved with a community that can help you learn more and help walk you through the process.

[music plays]

BP Alright, everybody. Today’s episode has a very special sponsor– yours truly, Stack Overflow. Now, we all know the frustration of having to search for answers on internal wikis that have gone stale, or trying to find that one email or chat thread from months ago with information you need to get unblocked now. There is a better way. Stack Overflow for Teams is a knowledge base that has all the features you already know from StackOverflow.com but reimagined for your organization so you and your teammates can collaborate, quickly find solutions, and be more productive. It’s like a private Stack Overflow for your organization’s internal knowledge and documentation, and it’s used by companies like Microsoft and Bloomberg and Dropbox and many, many more. You can always try it out; we have a freemium version, the first 50 seats are free. If you’re interested in that, head on over to s.tk/teams-podcast. Let them know the show sent you. And if you’re listening to this and thinking, “50 seats won’t cut it. I’ve got more customers than that,” I’ve got some good news for you. Head on over to stackoverflow.co/teams and if you use the promo code ‘teamswin’ you can sign up for a basic or business account and get 30% off. New customers can get 30% off, that is pretty sweet. Alright, everybody. Please enjoy the discount. Enough spiel, let’s get on with the show.

[music plays]

BP I want to touch on one particular part of this. When this is made into a movie for television, this is the most dramatic part. You connected, as you mentioned, with Gabe over at DigitalOcean, and were reaching a point where you were really worried about your sort of initial home infrastructure failing, so you needed a way to get a terabyte of data to DigitalOcean off of disks that were already failing, and you came up with –and I suppose this is apropos of the Mastodon sort of mission statement– a way to do that with the help of your users and their use of the site. Can you walk folks through this sort of clever solution here? 

KN Yeah, so we had roughly a terabyte of data, right back there, and it very much started out as a few cat pictures and then a few cat pictures turned into more cat pictures, and then it turned into videos, and then it turned into more videos. And then Gabe joins and Gabe has a farm and so, I guess the story goes, he was trying to upload a rooster video and he got a 500 and that was kind of how this whole thing started. And if you read the blog post, it goes into all the technical detail of where we started to see IOPS start to lag on disk and the cascading failure it caused at the edge, and how all of our systems were loosely choreographed and interdependent with each other, and that's just the Mastodon architecture in general. And so what the problem was was we had the terabyte of data back here. We had all of these different services running on different computers, and we were moving them around and trying to massage data in different ways to just get a decent overview. At this point, all of the normal performance engineering big observability tasks were kind of off the table because it was so new and all of that requires a lot of time to set up so we were just trying to rough something in. And so what we ended up discovering was that it was actually serving the data that was causing the problem. So when somebody would request one of these cat videos on our social media server, it would have to go and fetch it from these discs behind me and it would have to go and propagate out at the edge and it would ideally sit in a cache out at the edge where then if folks referenced it again in the near future it would just serve from the edge and not from here at the core infrastructure. And so what we did is we used this Nginx primitive called ‘try_files’ and we put the try_files behind a reverse proxy. And what we first tried to do is we would try to receive the files from DigitalOcean, and if those files were not there, we would then go and we would get them from back here and we would serve them up at the edge layer. We then had configured Mastodon to write to DigitalOcean using some DNS tricks where we used one DNS name to read and another one to write, and then that DNS name would resolve to the reverse proxy and route accordingly. And so basically every time somebody loaded a cat video on their phone, it would download the video from the server and put it out on the edge somewhere and serve the traffic from there for up to 24 hours. And then every time somebody would try to reference it again, it would go and propagate back to DigitalOcean from the edge. And then the next time somebody tried to view the cat picture it would actually receive from DigitalOcean instead of from our server here. So the more people used our service, the faster we got our data off of the racks and into DigitalOcean and the more relevant the data was. So one of the problems we had with moving that much data was we could just start at the top and work our way down, but who knows if we're going to spend the first three days transferring all of this data that nobody ever looks at. So we were very much transferring the hot data that people were actively looking at. So as topics are trending and people are looking at different things, that's the data that was getting moved to DigitalOcean first. I just want to call out, there's a guy from Germany who's a volunteer in our community. His name is Malte. He was the one who had put together the original plan for this so I definitely don't want to take credit for his work. It is a very fascinating Nginx trick that I had never seen before and I learned a lot just from seeing how he laid this out and how he had set everything up. 

CF Yeah, I have some experience with cloud infrastructure and cloud engineering and all that kind of stuff. I would've never ever imagined this was the way to solve this problem, especially with the DNS trick. That is so clever, I would've never thought of that so that's really cool. How long did it take to come up with that solution? 

KN So I think he had gotten some help from the Mastodon blog that talks about the try_files directive in general, and we were already kind of experimenting over the course of several days with how we wanted to do this and we were trying to Rclone data up in the background and serve traffic over here and configure these caches different ways. So it was an ongoing effort, and I think at this point in time everybody who was volunteering, it was me and six other folks, we were working around the clock quite literally. I would go to sleep and right before bed I would hand off my work to volunteers in the UK and Germany and they would work until I woke up, and then I would work until they woke up. And this was just an ongoing process just keeping the servers online while also trying to come up with a plan of what we want to do. I think this is the plan that we're talking about with the reverse proxy at the edge that went right back to DigitalOcean. This was probably a consequence of all of our work and infrastructure we had set up, and if we didn't have servers online in the right place which took several days to get online, I don't think any of this would've been possible. So I think in hindsight, it took us about 30 days when everything was said and done with to go from, “We have a small server that's running on hardware no bigger than a Raspberry Pi,” to, “We have a data center in Germany and we've reached out to an attorney.” In 30 days, that was around the clock with six to ten people working 24/7.

BP Amazing. I loved what you said about the more people used it, the more you were able to complete this migration, as well as understand how to do it efficiently, the data that needed to be moved first. It almost feels like the power of a peer-to-peer file sharing system or something like that. The more people are online with a certain file to share, the faster things can go, that's what it reminds me of. So where are you hoping to go from here? Are you going to continue I guess sort of migrating away from home systems towards more professional data centers? And are you considering what the goals are for the size and scope of the community in the coming year? 

KN I think at this point if I was to give a quick state of Hachyderm, I would say that our infrastructure problem is, I wouldn't say solved, but it's stable. And so we have a team of volunteers who are working on our infrastructure and they're rolling out changes constantly, but it's no longer disruptive. People can casually work a few hours here, a few hours there. It's not a big event like it was. There is a much more important discussion that I certainly was not prepared for that I'm kind of considering the day two, the second day Mastodon discussion, which is, “We have a social media service and people use it and it's online and we have a tremendous amount of content moderation that we have to do,” and that is a very difficult problem to solve. And I'm very proud of our team of mods who have taken this very seriously, and we have yet another completely independent group of volunteers that is about four times the size of our infrastructure group that their entire job is just helping us with content moderation and dealing with, how do we be respectful of different communities? How do we manage these reports that come in? How do we communicate? What actions do we take? How do we take actions? And what judgment calls do we make? And it's pretty difficult there and so I would say that we definitely have an interesting set of technical challenges just keeping the servers online, but now we have all the major problems of a major social media service and those legal problems in there as well. 

CF That was the thing I was thinking about the most with Mastodon, because each instance is governed by a separate group of people. I was wondering how moderation and conflict resolution and rules and regulations will be put in place. So yeah, I'm not surprised that that's a pretty tricky issue to deal with. 

KN Yeah.

BP I think there's court cases that are happening right now that have a lot of implications for how social media sites need to govern themselves or what liability sites with user-generated content assume. And you make a great point, which is that the most difficult part of any of these is not necessarily the engineering –although that part is difficult– but ultimately the human, political, cultural, and sort of free speech questions that arise. We'll have to check back in a year and see how it's going. Do you think that that's an area where you are going to lean on some sort of best practices from existing social networks or maybe the old IRC days? Or do you think that there's new paradigms, new ideas that need to be tried out here? 

KN So what we're doing is, we've established a nonprofit called the Nivenly Foundation that exists above Hachyderm, and Hachyderm is one of the foundation's projects, and that foundation is set on a precedent of my personal experience of managing and maintaining open source software over the past decade. So working on various large open source projects, might work at GitHub, and just being involved either as a contributor or a maintainer of different open source efforts around the world. So we've learned a lot as far as the importance of a code of conduct and the rules of engagement and, more importantly, how to respond to it. What are the rules of how does one go and first make the decision on if a code of conduct violation has happened, but then what do you do about it afterwards? And do you forgive people? Do you let them back in? People make mistakes, and there's a lot to unpack here. So we brought in a fair amount of history from just open source projects in general. I think we're also bringing in a fair amount of history of online community management. So Quintessence, our Executive Director, she has a tremendous amount of experience with the developer advocacy community. There's a developer advocacy Slack and she's a moderator there as well, and similar problems have come up where you have to deal with people and it's very judgment based and it's very perceptive, and what did they mean versus what did they say? And so we're taking some lessons there and we're putting together a non-profit wide, what we call the Nivenly Covenant, which just is basically our global code of conduct for all of our projects, and Hachyderm is just one of the projects that, I wouldn't say enforces, but follows the code of conduct and contributes back to it.

[music plays]

BP All right, everybody. Thank you so much for listening. As we do this time of the show, we want to shout out someone who came on Stack Overflow and spread a little knowledge, helped to save a question from the dustbin of history. A Lifeboat Badge was awarded to metakeule, “How to get an error message in a string in Go Lang.” If you've been curious, well, we have an answer for you, and it's helped over 167,000 people over the years, so a question that a lot of folks had themselves. I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can always find me on Twitter @BenPopper. You can always email us with questions or suggestions, podcast@stackoverflow.com. And if you enjoy the program, leave us a rating and a review. It really helps.

CF And my name is Ceora Ford. I am a Developer Advocate at Auth0 by Okta. You can find me on Twitter. My username there is @Ceeoreo_ and my blog is ceora.dev. 

KN Thanks for having me. It was great to meet you both. My name is Kris Nova. I am the President of the Nivenly Foundation, the infrastructure admin for the Hachyderm social media site, and a principal engineer at GitHub. If you want to reach out to me, you should follow me on Mastodon and shoot me a message there. If not, you can check me out on Twitch at twitch.tv/krisnova, and you can also find out more about what we're doing at nivenly.org.

BP All right, everybody. Thanks for listening and we'll talk to you soon.

[outro music plays]