The Stack Overflow Podcast

A developer works to balance the data center boom with his climate change battle

Episode Summary

On today’s episode we chat with David Mytton, CEO of Arcjet and co-founder of Console.dev. We discuss his early work in cloud monitoring, his passion for the environment, and his love for sharing great developer tools.

Episode Notes

You can find David on LinkedIn.

You can learn more about Arcjet here.

You can subscribe to to the console.dev newsletter and podcast here.

Congrats to Stack Overflow user Greg Hewgill who earned a Populist badge for his answer to the question:

What’s a good tool to determine the lowest version of Python required?

Greg is getting close to the magic one million rep mark!

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow. Today, my guest is David Mytton. He is the founder and CEO of Arcjet, the co-founder of Console, founded some companies in the past that were acquired that we can chit-chat about, and is doing some really interesting research around sustainable computing. He’s also the host of the Console Dev Tools Podcast– be sure to check that out– and an active investor in some early stage startups with developers as users. So without further ado, David, welcome to the Stack Overflow Podcast.

David Mytton Thanks for having me.

BP Tell us a little bit about how you got into the world of software and technology. And do you remember the first time you had to hit up Stack Overflow and what you were stuck on?

DM I've been using Stack Overflow since the beta– I've got the Beta Badge. I was just looking before this call just to see how far back it went.

BP Nice, an OG user. That's great, glad to hear it.

DM Yeah. UserID 2100 and something, which is pretty cool to see. But that was a long time ago. I got into software in school and built a few things just as side projects and then picked up a full time job, built a few things through university, and that's what led into building my first company, Server Density, which was a cloud monitoring tool, building a monitoring agent in Python. I built that up and then sold it in 2018. I took a bit of a break, and then as part of that break is when I got into sustainable computing and researching the environmental impacts of data centers in the cloud and now AI.

BP I've been on this podcast, I think we're on episode 700 and something, and many, many of the calls have been about how everybody has got a lot of work going on in the cloud, everybody's got a lot of different microservices, and observability and the need to control costs is essential. So what was your play in there and why did you choose Python?

DM Cloud monitoring was very new at the time. So this is 2009, so you had a few open source options and that was it. This is pre-Datadog, pre-New Relic, and there just wasn't anything. And actually one of my first questions on Stack Overflow was how do I set up a Python process as a daemon? How do I get it running all the time and querying the server? And other languages you might choose today like Go or Rust didn't exist and so I wanted to pick what was as close as possible to what I thought was a systems programming language, not PHP, not ASP, it was going to be Python. And so that was what I decided to build the monitoring agent in.

BP Back in 2009, even AWS wasn't that old or as ubiquitous and universal as it is now. Were the problems the same or were the problems different that you were encountering with customers?

DM At that time it was very much about what are the servers doing? EC2 is the equivalent of what we had back then. EC2 was around, but it was very, very early. You had S3 and EC2 and that was kind of it. And most of the time people were deploying on real servers, maybe VPSs in some cases, but probably real servers.

BP Real servers, get out of here. Get out of here with this hardware.

DM I know, right? Actual metal. And so it was a question of what's the CPU usage? When is the disk space going to run out? And so these were the questions. Fast forward to today, you can group that under observability as a category, which is about traces, what's your code doing or your logs, what's happening behind the scenes, profiling as code is running, and the server monitoring bit is either nonexistent because you're on a serverless platform or maybe you just delegated it all to Kubernetes and it deals with it for you. The need isn't as great.

BP All right, so the company was acquired. Did that put you in a position where you could then say, “Hey, I can decide to retire. I can decide to focus on my passion, or I feel compelled to work on X, Y, and Z.” What did the acquisition mean for you?

DM So that prompted the question of what are the important problems that I should try and spend my time on? And I thought climate change and environmental impacts of society broadly was an important topic, but I didn't know anything about it. I'd done a law degree. I taught myself programming, I spent almost 10 years in tech and software, and I wanted to bootstrap my knowledge in environmental science. So I did a master's degree here in London at Imperial College in a course called Environmental Technology, which is the application of technology to environmental problems. We covered everything from fisheries management through to disease vectors and mosquitoes, but the area that I found most interesting was energy– generation of energy and how energy systems work. And as part of that, I thought it would be interesting to combine my knowledge and understanding of computing with the understanding of energy science, because most of the research at that time was coming from the opposite side. It was scientists coming into computing with less understanding of how the cloud works and how software is built, and I thought I could flip that and bring the software side of things and the software understanding into the environment. So I published a few papers and now I'm doing a part-time PhD at Oxford in environmental engineering, which is focused specifically on the environmental impact of computing and data centers.

BP So I had a discussion recently with a friend who's in the climate space, and he was saying that there is some established kind of framework in terms of how people think about distributed computing and how you might lower the environmental impact by drawing on compute from places where there's excess energy at a certain time and moving to draw compute from another place overnight. And you're always taking advantage, let's say, of renewable energy when it's in excess and that might be a good thing. But as you were pointing out, your evolution of thinking about this has gone from cloud and other things now to AI, and so what he was getting to was the paradox of many, many large tech companies that have commitments to being carbon neutral or negative by a certain year, and many which even these days make claims about how they mitigate their environmental impact but which are also racing to build the world's largest and most energy-hungry data centers in order to stay at the frontier of AI development. And so then the question becomes, is there some way to create a virtuous partnership between environmental work and this AI work?

DM So my goal with the research is to allow increasing usage of technology whilst reducing the environmental impact.

BP Right. Win-win. We're looking for the win-win.

DM Exactly. So that's what people want. And I think behavior change is impossible and so you have to have systems change so that regardless of the behavior of the individual, they can benefit from broad improvements across the system, because I just don't think people change their behavior. And so that means we need to reduce the environmental impact of computing infrastructure so that people can use it more or continue to use it in the same way. The idea of environmental impact is pretty broad, but most people when they're talking about it generally mean carbon emissions, which usually means energy consumption, but it's much broader than that. It's water consumption, it's materials, recycling, all those kinds of things go into it. And actually the largest impact that an individual has is when they buy a new device, a new phone or a new laptop. The majority of their environmental impact is in the manufacturing of that device, not in the usage of it. The opposite is true of servers where the majority of the environmental impact is in the usage just because they have a long shelf life and they're using a lot of energy. And so we need to think about this as a systems problem of what is powering the data centers where the servers are located, or what's powering the manufacturing of the facilities where your laptops and phones are produced. And so this is what the big tech companies are thinking about when they are pushing for firstly, 100% renewable energy, and then getting towards this idea of 24/7 renewable energy, and ultimately net zero, and those are kind of different goals.

BP So here's the rub that we were discussing, and I'd love to get your take on it. So we're reaching a point already in the United States, and maybe you can tell me about the situation in Europe, where many renewable energy projects that would be profitable and have the land to be deployed in solar and wind are on hold because the grid cannot deal with the energy that they would be passing in and our ability to store and deploy it usefully has sort of capped out. And so this confronts us with a difficult problem because even if the tech companies came and said, “We have the money to redo this,” you run into federal and state and city-level bureaucracy that means it's not just a shovel-ready project. It's not like if somebody showed up with the money, all of a sudden you could get it done quickly. So the suggestion was, how do we, like you said, make this a win-win. The idea being that if tech companies went out and built huge solar and wind farms that were only for powering their data centers, that in some way might feel like a violation of the social contract– that they aren't doing anything really to help people outside of their organization. What if they paid it forward by paying to unlock the renewable projects that are currently on hold because there's no place to put the energy, and for a certain number of years, that energy goes to the data centers, which they want now, and over the next period of years, hopefully with help from those companies, we do the upgrades to the grid and so then after a certain number of years, that energy can now flow back and be a public good again. So that was the pitch. I'd be curious to get your take.

DM This gets to the heart of why I ultimately started Arcjet as a security product for developers rather than going into trying to solve environmental challenges as a full-time job. We already have the technological solutions to climate change, it's the political system that is slowing things down. And I broadly mean planning and how the grid is designed as well as politics specifically, but the involvement of people in blocking planning decisions, objecting to new construction, and the fact that we've had an amazingly reliable and safe source of generating zero carbon energy for many decades, and that is nuclear energy. The challenge with wind and solar is that it's intermittent, and the key thing that data centers need is reliability. That's why they don't participate generally in demand response programs. Often you see as a solution in the academic literature that maybe data centers could turn off or they could pay to put electricity back into the grid when the grid needs more, and that just doesn't align with the incentives of data centers. They need to have reliable energy because the whole point of them is to make sure that the services are up and running. And they put significant effort into building batteries and diesel and all that stuff.

BP One nitpick here maybe just to pick at is there was one story about this, I don't know how fallacious this was or if this could be repeated, but about a big tech company buying up space to fire back up an abandoned nuclear power plant. You are of the opinion that nuclear actually can be a very climate friendly option. Nuclear has a lot of baggage that comes with it and people think of toxic waste and yada, yada, yada. But from a climate perspective, which maybe is our most pressing existential problem, a lot of smart people say that nuclear power is actually fundamentally the best solution. So what if the negotiation between capitalism and climate change mitigation was, if a large company wants to come along and put the money into firing this plant back up and indemnify the local community from any damage, they can do so and therefore we get a lot more nuclear energy and that's a good thing.

DM The challenge there is a question around, well, why has it been shut down, because it may be obsolete technology. And the challenge with nuclear is just that the regulations have made it so expensive to build that the timelines are way outside of commercial abilities to fund those projects. So that's why government has to come in, they often have cost overruns because they have to keep going back, and the idea of modular nuclear energy is based around the idea that you'd have one design and you'd build it loads of times because then you’d get the learning curve effects, you’d get the cost reduction effects, but that just never happened. You've only ever had one, maybe two of the same designs built in any one instance, and that just completely destroys the economic model.

BP Gotcha. Let me ask you a slightly different question, which is, have you gotten deep into the details of these very expensive, very energy-hungry frontier training runs, and can you help me understand exactly what goes into that? I understand it at a high level, but maybe you can help me pick apart the details. You are trying to push out your next frontier model, whatever your company is, three, four, five, and right now we're in a phase where every three to six months the company that gets out their new frontier model now scores the best on the benchmarks and maybe there's a slight turn of the wheel or whatever. We know they have these huge GPU clusters and that's what's so energy intensive, and there's this famous quote of an engineer saying that we couldn't do it with just one data center in one state because we would have brought the grid down. Google talked about how one of the things that they felt was a breakthrough for I think it was the second wave of Gemini was that they managed to do this in a distributed way with different data centers around different areas. What is it that makes it so that that training run is such a singular event where it feels like it has to be run once. It requires an intense amount of coordination, intense amount of hardware, an intense amount of power, an intense amount of money, and then if something goes wrong, one GPU cluster overheats or a cosmic ray flips a bit or the power goes out, you've blown the training run, you've wasted the money, you’ve got to start all over. Why is it such a fragile process and how do they even know when something's gone wrong? When they get to the end, do they check the model or does the process stop in the middle? Do you understand how this works?

DM The specifics I don't know, but I could guess that in reality, they're kind of breaking it up into different chunks, and it's not like you just have to run the script and hope that it completes after three months or whatever. I would expect that they're able to break it up into chunks, run training in parallel, and then combine them at the end. Otherwise, if that's not the case, then like you said, that's very inefficient. The whole process and all the discussion around AI energy really reminds me of the 10-15 years ago when people were talking about the growth in data centers. And there are quite a few academic articles predicting that the energy consumption of data centers globally is going to grow exponentially and it's going to hit 20-30% of all global electricity demand by 2015 and then 2020 and then 2025. And the common theme with all of these research papers is that it's always in the future, and then when you come to that date, it's never actually anywhere near the number that they've predicted. So that's not to say that AI energy consumption is not an important challenge, but it seems so similar to me to the past almost panic that I'm skeptical of this ultimate end goal of having such a massive environmental impact. Because ultimately the incentives for all of the big tech companies is economic, it's not to have this ever growing cost of energy, because they're providing these services for free in many cases. It's to reduce the cost and to provide it to more users, and unless they're able to do that, then it's not going to be a viable technology.

BP All right, so you said one of the reasons that you didn't pursue this was because the politics seem kind of untenable, but you did say that you're looking for ideas to create a win-win between capitalism and climate change mitigation. What are you most interested in now as a potential? What are you gravitating towards or what ideas excite you or seem to hold some possibility? Let's cover off that and then we'll move to Arcjet.

DM So I think the trajectory on transition to clean energy– not just renewable, but clean energy– is good, and we have a good path ahead of us because it's being pushed aggressively by the tech companies. You're seeing them compete directly on the metrics and on their reporting. And we've got this unusual AI kind of wild card that's appeared and so it's going to be interesting, as you said, to see how they mitigate that. And so the market is in effect there and it's working and I think that can continue. What gets very little discussion is the environmental impacts of everything else– so water and materials, and there are very few companies talking about this. Water in particular has started to come up and some mainstream journalism has been covering it because it was two years ago that we had the heat wave in Western Europe and all the nuclear power stations in France closed down because the rivers almost ran dry and there was a big outage in a cloud region in London because of the heat wave affecting the energy grid here and the water access for that data center came up. And towns and areas that are in water-stressed areas are starting to protest against the development of new data centers, so that's starting to come up in the mainstream, but it's still underreported. And the environmental impact of the materials– just where are the materials mined, what is the manufacturing process, that has very little attention. And so that's where I would want to focus my time because I think it's not that the energy question and the renewables question is solved, but I think it's on the right path and enough people are working on it. I just don't think that's the case with everything else.

BP All right, so this is something that you work on. This is your 20% project, obviously you're passionate about it. You're pursuing a PhD, you're writing articles and stuff like that, but you wanted to found another company, I guess you have the entrepreneurial bug, and that is Arcjet. So tell me a little bit about what that is.

DM So that came out of writing the console.dev newsletter every week, where since I sold my company and finished the master's degree, I've just been playing with dev tools all day every day and writing short reviews in the newsletter. And just for fun, I was getting into security and doing hacking challenges and understanding how vulnerabilities come about and thought that there just aren't any good security tools for developers once you go into production. There's a lot of tooling when you're writing code to help you spot security errors or deal with vulnerabilities in your dependencies, but when you push your code into production, there are network-level products which provide a generic firewall essentially, for things like volumetric DDoS attacks, but it's not in the workflow that developers are used to. It's not in their code, you can't inspect the decisions, and they just don't work in modern platforms with modern frameworks that you might be developing for. And so I thought there was an opportunity to take a maturing technology– WebAssembly– and build that into an SDK and help developers craft the rules in their code. You can test it locally so it's the same thing that is executed when you deploy it to production, and just really improve this area of application security for things like bot detection and rate limiting, attack detection, email validation, those kinds of problems that have a security solution, but developers are thinking about them as problems.

BP So this is kind of the shift left mentality of rather than building it and then passing it to the security team or building it and then trying to go back and figure out security, you're able to present them with solutions depending on the tech stack they're using, and they can say, “All right, I'd like to add rate limiting, bot protection, email validation,” and then it just sort of suggests how they do that within the code or generates code for them?

DM You define the rules in your code and it runs kind of like middleware or before your main code runs so that we can take a decision with the full context of the request. For example, if you have a free user, you could give them a different quota or apply different rules compared to your biggest enterprise user. And then when we return a decision, you can build that logic into your code, so rather than showing a generic block message, you can return the correct JSON response for the scheme of your API, or you could flag a user for review, or ask them to re-auth. It's about building it into the code so the developer can build it as just another part of their application and it's not a separate team. I think the idea of shift left has worked in DevOps and we've then created the discipline of SRE –site reliability engineering– to provide specialist tooling and support to the teams as systems scale. And that's worked really well, but it hasn't worked in security. You've got an entirely separate security team that has very different incentives to the development team. So security is about breaking things and mitigating risk and developers are about building things and solving customer problems.

BP It also occurs to me that people might be working at very small companies, in which case they may not have teams. They may have just a small handful of developers, some of whom may have these specialties or not, and this would enable them to sort of build a more robust security layer without having to bring on that overhead cost if they were a startup of hiring that one security engineer where that's their role inside of the company. So on the site we discuss some things that I can understand objectively: protect a signup form, email validation, bot protection, rate limiting. What is Arcjet Shield? That's the one that's got the name of the company in it. So is that a higher level or a more broad-reaching set of defenses?

DM Right. So that's similar to a web application firewall, and it looks at every request that comes into your application and detects things like SQL injection, cross-site scripting, path traversal vulnerabilities. And the principle behind that, unlike a traditional WAF which would probably block immediately, the Arcjet Shield product is analyzing requests behind the scenes, and then once we hit a certain threshold, then it determines that a client is suspicious and then blocks it. Because the last thing that you want is to block a legitimate request, and most attacks don't succeed on the very first request that's made to your application. Normally you're probing for a vulnerability, you're trying different things, and then if there is one, you might discover it after 20 or 30 attempts. We can detect that probing and block proactively rather than annoying your users. But then, of course, as a developer, you can get that response back into your code, so if you see that Arcjet has said that this request is suspicious but you know that it's actually your most trusted user that's logged in, maybe you just ask them to reauthenticate or provide a second factor just to confirm what they're doing rather than block them entirely.

BP Let's just jump over for one second to talk about your last thing– Console. This is sort of a little bit of a media property that you run. It's got a podcast, it's got a newsletter, it's got profiles and interviews. You're doing this as well as a passion project?

DM That's right, yes. So I was doing it full-time after the master’s. It's been about four years now and it came out of, as an experienced engineer, not knowing where to go to find interesting tools. They come up on Reddit, on Hacker News, on other social sites, but that's not what they're for, and there was nowhere to go with a curated list of interesting tools to try out each week. Developers love trying new stuff, and so the idea behind Console is two to three short reviews, 300 characters of what I like, but also 300 characters of what I don't like about each tool. But the key is that I'm only putting things in there that I think are worth your time. And so I've built that up to 30,000 subscribers and every week I'm just putting interesting dev tools in there that people should check out.

[music plays]

BP All right, everybody. It is that time of the show. I want to shout out somebody who came on Stack Overflow and contributed a little bit of knowledge, or maybe they just brought some curiosity that helped folks out. Awarded 10 hours ago to Greg Hewgill. “What is a good tool to determine the lowest version of Python required?” Greg had a great answer, so good that people liked it better than the accepted answer, so thanks, Greg, for sharing your knowledge. And I should say, Greg is almost at a million rep on Stack Overflow, so getting close to quite a milestone. Thanks, Greg, and congrats on your badge. As always, I am Ben Popper. You can find me on X @BenPopper. If you have questions or suggestions for the show, if you want to come on as a guest or listen to us talk about something specific, email us, podcast@stackoverflow.com and we'll hear from you. And if you enjoyed today's show, do me a favor, subscribe, and then you can get us in the future.

DM I'm David Mytton. I'm the founder of Arcjet, which is helping developers protect their apps in production. You can find out more on arcjet.com and subscribe to the Console Dev Tools newsletter at console.dev.

BP Wonderful. We'll put those links in the show notes. All right, everybody. Thanks for listening, and we will talk to you soon.

[outro music plays]