The Stack Overflow Podcast

Safety in numbers: crowdsourcing data on nefarious IP addresses

Episode Summary

We chat with Philippe Humeau, founder and CEO of CrowdSec, which bills itself as an open-source & collaborative IPS (intrusion prevention system). He explains why hackers are always the first to industrialize new DevOps practices and the security scenarios that keep him up at night.

Episode Notes

You can find Philippe on Twitter here and learn more about CrowdSec here.

They recently put together a list of the IP addresses trying to exploit the new Log4j vulnerability.

For a prescient view of today's cybersecurity challenges, Humeau recommends John Brunner's classic 1975 sci-fi novel, The Shockwave Rider.

Episode Transcription

Philippe Humeau People think, you know, one IP, one person is enough to hack into a big group. It's not the way it works like it's definitely not the way it works in modern world. DevOps are just the late kid in the room compared to hackers. They were doing industrialization a decade earlier, the industry has everything a decade earlier than we do. So they have everything they need to deploy, to redeploy, to move, to change IPs on the fly, to change domain names on the fly and so on so forth. We have to be faster in the processing and subtler in the sensitivity, right. So that's why the network size matters. The bigger the network, the most accurate and the fastest the detection will be.

[intro music]

Ben Popper Stop overpaying for big tech cloud! Vultr offers powerful cloud compute and bare metal at a fraction of the cost. Visit vultr.com/stack to redeem $100 in credit today.

BP Hello everybody. Good morning, and welcome to the Stack Overflow Podcast, a place to talk about all things software and technology. I am Ben Popper, the Director of Content here at Stack Overflow. And I am joined today, as I often am by my wonderful crew of co-hosts, Cassidy Williams, Ceora Ford and Ryan Donovan. Hey, y'all. 

Ceora Ford Hey!

Cassidy Williams Hello! 

Ryan Donovan Hey.

BP So today we are going to be talking about security, crowd security, safety in numbers. And we are going to be chatting with Philippe who is the founder and CEO. So, Felipe, welcome to the Stack Overflow Podcast. 

PH Thank you, it's a privilege to be here.

BP Tell this group of folks at a very high level, if you had to explain it to a lay person or to a software developer who's not that familiar with security, what is crowd sec do and why is that interesting to a developer, to anyone?

PH Yeah, sure. We consider ourselves somehow the Waze of cybersecurity, you know, like Waze change the way we drive on the road, because we see what's happening in front of us because of the number of people using Waze, right? Whereas you sticking your head in your position, and you know, stuff like that, and speed and telling you okay, ahead is a roadblock or a problem or whatever. We do the same stuff, more or less, but we do it in cyber security. So if someone is aggressing a machine, the software detects the aggression, but it also shares the IPs with the rest of the world, with the rest of the network. So we are defending each other and having each other's back.

CW It's like crowdfunding, but it's security, not funding. But you know, supporting and stuff. I like the group effort concept.

RD The collaboration of it.

PH The safety in the numbers, you know, and what we're doing is not so revolutionary in the way that people do it on a daily basis, like, right, why are we discussing together? Because Stack Overflow is super known, right? You're reputable, everything's fine. People loves you. And that's your reputation, okay? But what we want to know also is what the behavior, so if both your behavior and your reputation is okay, well, maybe Cassidy will invite you at home, because it's a private place, and might be risky if you would have either better potential bad behavior, right? It's the same stuff we do at nightclubs or things like this, we do this online. So meaning for developers, for example, if you see something bad happening in your logs, it's a behavior thing that you're triggering, you're checking the behavior of the person interacting with the software. And if you don't like what's happening, you can block it easily with provided competence. So it's all free, and it's all easy to set up, right. But when you know that the same IP has been doing crap all over the world, this is not anymore a problem of behavior is a problem of reputation. That's why the security component we're offering is fed both by the behavior s into your logs, and by the reputation s in at our network level,

BP And so on Stack Overflow in order to do a number of things, you know, ask a question, answer. upvote, edit, you need to earn reputation, and then you get those privileges. Is it the same on your network? Like, how do you know that you can trust the people who are reporting and then they're not doing that to punish someone they don't like or as a way of doing some kind of other malicious act?

PH You know, this is what I love, when you mean in the way you think, is because you have the same problem, right? Who is legitimate or not to come in, who is not trying to trick people into thinking this is a good balance or whatever? Well, we have the same problem. We don't want someone to trick us into thinking that Ryan's IP is a bad IP, whereas it's just a personal vantage point. We first of all quarantine every new command to the network. It's called the welcome, right. But it's for your own sake, actually. So for the six months is we're not listening to your signals at all. We're just checking your signals to see if they're accurate as compared to the rest of the network, from what we're seeing from the rest of the people using the network, but we will not take them into consideration. Second thing we're doing so after six months, is you legitimate enough to interact with us, right and with the others, but if you say you have say, I don't know, 3000 machines reporting to Europe, you can puppet them, right? You can make them all vote against someone, which would be unfair, you know. But as a matter of fact, we also have ways of identifying if those 3000 machines are under the custody of one person only, which would bring you just one vote at the table, not 3000. And the other thing we're doing is we are confronting the signals with the rest of the network. But with our own network of trust, we have we also run some honeypot machines to double check the signals, and you're not at liberty of blocking stuff like I don't know, Microsoft date or Google boat or things that are very critical for the global infrastructure of the internet. There's no way no way of banning them. And if you do, so you're going to do more harm than good, would prevent you from doing so. And the last thing we do, I know it's complicated answer for a simple question. But it's not so easy to trust people, right? The last thing we do is we have an algorithm that is checking for low signal to noise ratio. So for example, if someone is scanning the whole internet, like super large number of machines, say 65,000, for example. And they will each scan only one port each of them, well, it would be very low signal, right, the threshold would be extremely low to listen to this background noise. But since there would be 65,000 mission implied in this, we would see that discover our honeypot network. So this is how so we try to catch people that are trying to evade our techniques.

RD How do you deal with the sort of malice and deception from the other side? If somebody is setting up a botnet on computers or you know, smart fridges or whatever? How do you deal with those IPs getting banned that might have a useful purpose?

PH Yeah, well, that's another problem face of the problem indeed. So we speak here about ipv4 addresses, which we don't have enough of them, let's put it like this, right. So we use NAT techniques. So several people can be under the same IP address, for example. And it's exactly what's happening when you use your phone on a daily basis on a 4g network, there is not enough address for every phone on Earth to get one lately. So what we do is we're using a bunch of addresses, hiding everyone behind. And each of these addresses have a lot of ports that can they can use, you know, to translate your requests over the internet, and getting a response back. Those type of fringes, we cannot do much about them. But anyway, the hackers cannot do much with them as well, because you know, there is no cemetry conversation possible through those. So this pretty much useless for both ends, you know, for the good guys and the bad guys. Now, let's say we are in a company, I like to use JPMorgan as an example for everything. Let's say they have like four exit points, and there are 35,000 employees, right. If one of those exit point is exhibiting bad behavior, we might block it at some point. So what we say is the following to our users, first use the least minimum remediation. The goal is not to ban everything in everyone. It's not the point. If you can just say send a CAPTCHA, for example, it will probably deter most of the attacks on the web layer is no need to go further than that. If you have a dot send multi factor authentication is very clean, right. Or if you have further dots, you can I know like send a keep your own script, because you're probably all coders if you're listening to the Stack Overflow podcast. So you know, you can keep your own script and you know, check manually if this kind of threat is bigger than another. The other thing we're doing is we're clearing every IP every 72 hours if they have not done any more shenanigans. So a probation period if you want. And if you an IP has been caught and then doesn't exhibit any aggressive behavior for 72 hours, it's automatically removed when other mechanism to clean those IPs easily. And obviously, you can remove your own IP from the consensus if you feel like.

CF What I'm interested in hearing is like what led you to create a crowd sec and what led you to create an a product that works like this? Kind of what that whole process would like for you?

PH Yeah, it's an excellent question. And you know, it's really, I think the best companies are born from the real world need, you know? Not something just drone on a on a table or on a wall because you know, it might be great one day. When Elon Musk created SpaceX is because we needed to send a lot of tests in space, and we just could not afford to toss all of those devices away after one use. That didn't make sense. Real world use case, even though we were sending rockets in space way before he started SpaceX. So the way we created this is the following. I founded another company before its it was a hosting company. And this company had customer a very known one actually, interest for is doing like sport, wear and stuff, you know, everything related sport. And those guys are kind of meaningful and during the Olympics back in 2018, I think, they were under attack from someone that wanted to blackmail them at the worst possible period of time because you know, during the Olympics, people feel like doing sports so they sell a lot. And they were scanning their website, you know, constantly trying to find something weak and break the website. They were hosted in a very high security facility and environment we had like very efficient, but like tools of pain to create one customer environment because you know, compliance settings and stuff was really complicated, but it worked. And we blocked 3000 IPs, like in a matter of two hours. So the guy used literally 3000 different IP addresses to try to hack into the website, and could not. And we're like, Okay, what do we do with this 3000 IPs, you know, there were something somehow is fresh Intel, we know that those guys are using those IPs. And so if we broadcast it to the world, we can maybe help others to defend themselves. But as a matter of fact, it's more complicated than this. First of all, an IP address, if you're in Europe is considered a personal data or private data. So you're not allowed to expose it, you know, without some context and framework. The second thing is, with no communication channel that people would listen to, okay, we were in an industry, but why would they integrate this into their firewall rather than something else? You know, it's a big decision to take. So we thought to ourselves, okay, how do we bring at the table the same value or more or less the same value as this high security environment? And how do we broadcast this information we have all together? How do we hold ends together against cyber criminals? Because honestly, the ratio is what one to 1000 like one cyber criminal for 1000 people willing to use internet in a proper way. So the numbers matter here. If we are joining together, we can tackle this problem on a large scale. And the idea was to join together to make a crowded security system.

BP Looking at it from your perspective, having been in the world of security, and now doing it in sort of a fresh way. Do you see things headed in the right direction? From my perspective, as someone who talks a lot with software developers, but also just reads the news, it sounds like things like ransomware are getting worse, and the attack surface has gotten broader. They're going after machines and hospitals and gas pumps, and just pieces of infrastructure get taken offline. And it doesn't feel like most organizations are able to keep up, the very wealthy organizations, the very tech savvy organizations can keep up, they pay the best people and employ the best tools. I'm sure some of them are your clients. But for the average rural hospital, they don't have a clue, unfortunately, how to protect their machines. And sometimes those machines are keeping people alive, for example. So I guess what's interesting about your approach is that it's sort of like you're saying, like, if there's 1000 good actors for every bad one, we can band together and, you know, stop them. But like, when you look out more broadly, what do you see in terms of what's happening in the world of security, cybersecurity, as sort of a macro scale?

PH Sure, well, first of all, there's a shortage that is crazy. I mean, you guys knew about it, like way before us when I mean, you and us, it's you, the developer community and the DevOps community. And as the SEC ops community, you went through this shortage way before us, right. So it took time for the schools and various facilities to train people and be able to cope with the demand, which is more or less now okay. You know, developers are super well paid, and that's fine by me. But at least we can do the project we have to do in backups world, we are still, you know, day one. So we are missing roughly 50% or 70% of the of the headcount we should have. So that's a real problem. So how do we make it a better place? Well, first of all, by equipping them better, you know, because they have what we call alert fatigue, they get a lot of signals a lot, a lot of insights, and they don't have the bandwidth to treat all have them. So our goal is also to alleviate this alert fatigue their victim of because it has other if we just make them more productive, more efficient, that's already a good stuff. The second thing is, protection is a number. But to get the number, you have to make it for free. Because if you want the hospital to adopt a solution, you don't want the NIST or whatever local infrastructure you have for for health care to be spending a total lot, because it will be an adoption problem, right? There will be friction for adoption. So that's why we think the value here is as to be for free. Every participant is making us richer by giving signals, right? So in return, we can make the two for free. So that that's already one thing. The other takes you were discussing about this specifically run somewhere. So I like the opportunity to insist on the fact that ransomware is absolutely not a new technique, right? It's never been a new technique. It's a new monetization, which is way different. They still breaking through like weak password or SQL injection or cross site scripting or you name it, you know, there are tons of ways of breaking into a system. But the big difference is now they have an easy way out for making people pay what they think that that hours for not spreading the secrets or whatever, which is very very bad. As I say it is like say if you take the coal in your pipeline problem, right, the guy will be able to remove the amount of of rinse and repeat from the taxes. What it means like somehow the government is kind of painful that somehow, you know, how do you get to get rid of this problem by removing it from the taxes, it's just not possible. It's just institution for the other group, you know. And the thing also is like, there's a whole industry being built around it. So there are software developers, there are like new products coming into market, there are people negotiating your ransom for you, there are companies specializing into insurance, and so on, it's not going to disappear, because it's a whole industry being created out of it. So this is where I'm worried, you know, and one day, some guy will be paid to plant a malware, if not already, into your system, even though you will be extremely well protected. Well, what do you do if an insider is being paid 100,000, or a million dollar to plug a rogue USB key or plant a malware into your system, you're screwed, there's not much you can do against that. So we need to be able to tackle down the only weak link in this industry, and the cyber criminal industry. And this is the IP addresses. People think we have a lot we have way more than we need, but it's not true. And on top of that, they need to harvest them or rent them, maintain them and so on. Every time we burn some of them, we hurt the business. And not many ways of hurting those economics, you know, on the macroscale. An IP address is a weak link.

RD And a lot of them get through by just through social engineering techniques by sending phishing emails. And those are hard to defend against. 

PH Yeah, but they are still using IP addresses, you see, the IP where the email have been made, the IP where this malware is going to point to, the command and control centers, the scanning machines, all of them have one common point even though it's sometimes, you're right, social engineering, all of them have one common point IP address. And as a matter of fact, it's not an infinite number. So this is where there's a point that we should pressure on.

BP So like, yeah, from an IP based approach, which it sounds like CrowdSec, is also that's what you focus on, you're saying, sort of like either winnow down or really keep a tight eye on who the good and bad actors are. And like, if you had real time extremely accurate information, then you could be blocking, you know, bad actors more quickly, and sort of narrowing the window that the ransomware industry can operate at.

PH People think, you know, one IP, one person is enough to hack into a big group, it's not the way it works. Like, it's definitely not the way it works in modern world. DevOps are just the late kid in the room, compared to hackers, they were doing industrialization a decade earlier, the industry has everything a decade earlier than we do. So they have everything they need to deploy, to redeploy, to move, to change IPS on the fly to change domain names on the fly and so on to force, we have to be faster in the processing and subtler in, in the sensitivity, right. So that's why the network size matters, the bigger the network, the most accurate and the fastest that action will be. So yes, they are extremely well organized and automated, but we can beat them at this game. Because once again, we are more people willing to use it in the proper way. Let me give you an example. If we take another problem we saw lately, it's like this micro IT, but there have been tons of example, this IoT routers, cameras, whatever connected devices, they have barely any spare budget on CPU cycles and run to make any Smart Security, right? Because they are dedicated to do routine, to do network camera, whatever they have to do, right. So how to defend them? It's complicated, in a sense, right? What we can do, though, is that before accepting a new connection on your IoT device, check this IP that is connecting to you against our database through an API call. Cost nothing right? Any device in the world for the last 50 years know how to make an HTTP call, cost nothing. And just before accepting an admin or whatever, on whatever complex check if this IP is legitimate. You will fold 90% of the attacks on the IoT surface just by doing this. Again, the IP is unit of count here.

RD So what are the hard edge cases in the sector? What are the things that are really difficult to fix?

BP What keeps you up at night? Ryan's asking. 

PH Well phones, phones keep me up at night because I think we have a huge risk here that is a vastly underestimated one. And I'm not a doomsday preacher or whatever. But let's mention the following, someone as a zero day meaning an unknown exploit on Android say, right, it's doable. I really think it's doable. It requires a lot of work. Definitely, but it's doable. So one day you've got someone that can create a worm, you release it at JFK, how many people per day in shifting quite a few right? How many people connect to this Wi Fi in the plane or in an airport and so on. We saw that the COVID can Spread extremely fast, okay, across a whole tenant. If a worm is doing the same stuff smartly enough to get, you know, hidden for a while. And in fact, something like hundreds of millions of devices, how do we deal with us? Now me as CrowdSec, and the team at CrowdSec, we will have a problem with the number of IPs to deal with. Sure. But the Google mankind has a problem with the internet, because if the guy press the off button, the whole thing is screwed, instantly. And it will be able to ask for a ransom of billions of dollars. And we won't be able to patch those software, those machines on distance easily. And funnily enough, there's a guy called Bruner. who wrote the book in 70 something, 76 or something like this. Okay. John Bruner. And this guy theorized this exactly, that by then there would be a global network, that would be super critical to all mankind, and that a hacker would take control of it, and ask for ransom. It's funny, more than what? 50 years ago?

CW I don't know if funny is the word. [Philppe & Ben laugh]

CF This kind of stuff makes me nervous. 

PH It makes me nervous as well.

CF I've said this a few times before on the podcast, but I'm one of those people. I'm one of those developers that like, doesn't know enough about cybersecurity. And so hearing things like this makes me feel like I need to watch my back more, like even I do have a friend before she got into software development. She did InfoSec. And she was saying how like, there's so much risk, and the things that just everyday people do, you go to the cafe, and you use their like public Wi Fi and you have no idea how like, insecure that really is. And I just think about these things. And I'm sure that if I actually had the chance to like sit down and have like a conversation with someone like you who really knows cybersecurity, you will probably be tearing your hair out at some of this stuff.

PH I never ever connect to any Wi Fi but my own. Never.

CF See what I mean?

PH Yeah, you have a package you use 4g for that. It's way safer for you. By the way for the book, if your guys are in gals are interested into reading it. It's John Brunner is called The Shockwave Rider, it's really an excellent book and you think the guy's a time traveler when you read it.

RD Maybe he's warning us.

CF I'm interested in hearing, we just talked about something like a potential risk that could probably happen in the future, that would be a big deal for your company. How do you see your company moving ahead with things like that? Or just in general? Like how do you see your company in the way it works evolving, to be able to deal with things like that?

Well, the size is the key to whatever we're doing here. So I'm hoping that in sorry, three years from now, we will have 1 million machine reporting with aggression to face in real time into the network. That would make it the biggest value ever, I think on the CTI front. So knowing everything about cyber threats in real time. And it's like Waze nowadays, you wouldn't think about driving into any major city you don't know about without Waze, right? It's exactly the same thing. It will be the de facto standard, open and free for everyone to know if there's a risk ahead. And if there infrastructure at risk. So our future is really about like harnessing this community, getting people to get to know us to know that we're not the bad guys that we're not about to steal their data or resell it or whatever. It's not the point of the project. Any VC, honestly, any VC lately would be super happy to toss money at us, because they know where to something different. And it's not so often you can hear that in the space. So you know, we have no founding problem as such, we don't have to do a monthly recurring revenue every other day as a focus, we can focus on growing the community as best as we can. And yes, there's a business model behind it. You know, business model is like if you don't partake into this network, but you want to benefit from it, say the case of IP cameras, for example, you want to defend your IP cameras or routers against this, but you cannot possibly afford to join the network or your bank and you're not allowed for whatever regulation to share whatever, fine. You can still use what's now Goldmine, but you would have to pay to access it. On the other end, if you're part of this healthcare people, justice, police forces and all we also open our books to you for free. So if you have a question, just ask us because we think we think it's a responsibility we have, I guess.

[music]

BP Alright, well, it is that time of the show. Normally I shout out a lifeboat badge winner, but this is a security episode. So asked seven days ago on information security, 'why use random characters in passwords?' We've got an answer for you. It seems like it's a simple thing, but it's actually quite interesting. So we'll put that in the show notes. I am Ben Popper. I am the Director of Content Marketing here at Stack Overflow. You can always find me on Twitter @BenPopper, email us podcast@StackOverflow. And if you liked the show, leave us a rating and review. It really helps.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find me on Twitter @RThorDonovan. If you have a great idea for a blog post, please email me it's at pitches@stackoverflow.com

CW I'm Cassidy Williams, you can find me @cassidoo on most things.

CF And I'm Ceora Ford. I'm a developer advocate at Apollo GraphQL. And you can find me on Twitter. My username there is @ceeoreo_.

PH And I'm Philippe from CrowdSec. And don't forget it's a C at the end and not an X. [Cassidy & Ceora laugh]

[outro music]